AI at Meta’s Post

View organization page for AI at Meta, graphic

798,792 followers

New research from FAIR: Better & Faster Large Language Models via Multi-token Prediction. Paper ➡️ https://go.fb.me/o0yd06 We show that replacing next token prediction tasks with multiple token prediction can result in substantially better code generation performance with the exact same training budget and data — while also increasing inference performance by 3x. While similar approaches have previously been used in fine-tuning, this new paper expands to pre-training for large models, showing notable behaviors and results at these scales.

  • No alternative text description for this image
Ruthuvikas Ravikumar

Software Engineer | Systems | AI/ML | ML Infra | MSCS @ UC Davis

1mo

This paper states that multi-token prediction significantly improves performance on coding benchmarks, this could be a breakthrough in developing AI tools for Software Engineering workflows.

Esmail A.Gumaan

B.A. in Computer Science| Software developer| Researcher

1mo

Interesting 🤔, When we train big language models like GPT and Llama, normally they guess one word at a time. But here's the twist: teaching them to guess multiple words at once. By doing this, they learn faster and become better at understanding and generating language.

Apurv Sibal

Passionate about building and leveraging artificial intelligence to solve problems

1mo

Reducing the peak GPU utilization from O(nV+d) to O(V+d) is key to making multi-token prediction work. It is interesting to see it improves model performance for larger models (13B) but reduces it for smaller models (0.3B) and increases inference speed 3x.

You need to have a three level architecture for learning in general. The fourth is the customization layer. Data is the key ingredient.

Eddy Mintela

Escalation Engineer (HSO) at Amazon Web Services (AWS)

1mo

Thanks for sharing. I’m still learning AI development, but this helps a lot

James Bentley

AI and Strategy Director @ Awin (Axel Springer)

1mo

And you can listen to an audio summary of this research paper here: Apple Podcasts: https://podcasts.apple.com/gb/podcast/new-paradigm-ai-research-summaries/id1737607215?i=1000654222601 Spotify: https://open.spotify.com/episode/1gVW9zJL5o843H2cSC8XDg?si=uObcepjaSxiKAPsYcmhW1w This summary is 90% AI generated - I make them for myself to stay on top of fast moving research in the fields of AI and ML. I share them with others to give back to the AI community.

  • No alternative text description for this image
Enrique Calvo Ramos

🌱 Growth 🦾 Innovation 💡 Technology

1mo

Love to see these "simple" improvements that make a lot of sense. We are only at the beginning of this revolution folks

This is fascinating research! The efficiency gains in both training and inference are impressive and could have significant implications across various industries. At Superposition Mining & Minerals, we're always looking for innovative technologies to enhance our operations. AI-driven advancements in large language models like this could revolutionize our data analysis and predictive maintenance strategies. Looking forward to seeing more breakthroughs from "Given our commitment to leveraging cutting-edge technologies, we're interested in exploring AI solutions tailored for the mining industry. If anyone knows of a sophisticated mineral mining AI platform that could assist with geological analysis, resource estimation, or operational optimization, please share. We're keen to collaborate and integrate such advancements into our processes at Superposition Mining & Minerals in Zambia."

Like
Reply
Lior S.

Covering the latest in AI R&D • ML-Engineer • Ex-Mila researcher • MIT Lecturer • Building AlphaSignal.ai, a technical newsletter read by 180,000 AI/ML experts.

1mo

Fellas, we did a deep dive on AlphaSignal 6 days ago :) https://link.alphasignal.ai/lJnZvW

𝐌𝐞𝐭𝐚’𝐬 𝐍𝐞𝐰 𝐌𝐮𝐥𝐭𝐢-𝐓𝐨𝐤𝐞𝐧 𝐏𝐫𝐞𝐝𝐢𝐜𝐭𝐢𝐨𝐧 𝐌𝐚𝐤𝐞𝐬 𝐀𝐈 𝐌𝐨𝐝𝐞𝐥𝐬 𝐔𝐩 𝐭𝐨 𝟑𝐗 𝐅𝐚𝐬𝐭𝐞𝐫: 𝘐𝘯 𝘢 𝘳𝘦𝘤𝘦𝘯𝘵 𝘴𝘵𝘶𝘥𝘺, 𝘳𝘦𝘴𝘦𝘢𝘳𝘤𝘩𝘦𝘳𝘴 𝘢𝘵 Meta, Ecole des Ponts ParisTech 𝘢𝘯𝘥 Université Paris-Saclay 𝘴𝘶𝘨𝘨𝘦𝘴𝘵 𝘪𝘮𝘱𝘳𝘰𝘷𝘪𝘯𝘨 𝘵𝘩𝘦 𝘢𝘤𝘤𝘶𝘳𝘢𝘤𝘺 𝘢𝘯𝘥 𝘴𝘱𝘦𝘦𝘥 𝘰𝘧 𝘈𝘐 𝘭𝘢𝘳𝘨𝘦 𝘭𝘢𝘯𝘨𝘶𝘢𝘨𝘦 𝘮𝘰𝘥𝘦𝘭𝘴 (𝘓𝘓𝘔𝘴) 𝘣𝘺 𝘮𝘢𝘬𝘪𝘯𝘨 𝘵𝘩𝘦𝘮 𝘱𝘳𝘦𝘥𝘪𝘤𝘵 𝘮𝘶𝘭𝘵𝘪𝘱𝘭𝘦 𝘵𝘰𝘬𝘦𝘯𝘴 𝘴𝘪𝘮𝘶𝘭𝘵𝘢𝘯𝘦𝘰𝘶𝘴𝘭𝘺. 🚀🚀🚀 | Via VentureBeat | May 6, 2024 | VINCI Digital | IIoT + AI / GenAI Strategic Advisory 🚀 https://www.linkedin.com/posts/fabiobottacci_iot-ai-genai-activity-7193600911103975424-dMIF

See more comments

To view or add a comment, sign in

Explore topics