AI at Meta’s Post

View organization page for AI at Meta, graphic

798,792 followers

1mo

New research from FAIR: Better & Faster Large Language Models via Multi-token Prediction. Paper ➡️ https://go.fb.me/o0yd06 We show that replacing next token prediction tasks with multiple token prediction can result in substantially better code generation performance with the exact same training budget and data — while also increasing inference performance by 3x. While similar approaches have previously been used in fine-tuning, this new paper expands to pre-training for large models, showing notable behaviors and results at these scales.

67 Comments

Ruthuvikas Ravikumar

Software Engineer | Systems | AI/ML | ML Infra | MSCS @ UC Davis

1mo

This paper states that multi-token prediction significantly improves performance on coding benchmarks, this could be a breakthrough in developing AI tools for Software Engineering workflows.

2 Reactions

Esmail A.Gumaan

B.A. in Computer Science| Software developer| Researcher

1mo

Interesting 🤔, When we train big language models like GPT and Llama, normally they guess one word at a time. But here's the twist: teaching them to guess multiple words at once. By doing this, they learn faster and become better at understanding and generating language.

3 Reactions

Apurv Sibal

Passionate about building and leveraging artificial intelligence to solve problems

1mo

Reducing the peak GPU utilization from O(nV+d) to O(V+d) is key to making multi-token prediction work. It is interesting to see it improves model performance for larger models (13B) but reduces it for smaller models (0.3B) and increases inference speed 3x.

8 Reactions

Muthukumar Udaiyanathan

CEO at Cognetry Labs Inc.

1mo

You need to have a three level architecture for learning in general. The fourth is the customization layer. Data is the key ingredient.

2 Reactions

Eddy Mintela

Escalation Engineer (HSO) at Amazon Web Services (AWS)

1mo

Thanks for sharing. I’m still learning AI development, but this helps a lot

2 Reactions

James Bentley

AI and Strategy Director @ Awin (Axel Springer)

1mo

And you can listen to an audio summary of this research paper here: Apple Podcasts: https://podcasts.apple.com/gb/podcast/new-paradigm-ai-research-summaries/id1737607215?i=1000654222601 Spotify: https://open.spotify.com/episode/1gVW9zJL5o843H2cSC8XDg?si=uObcepjaSxiKAPsYcmhW1w This summary is 90% AI generated - I make them for myself to stay on top of fast moving research in the fields of AI and ML. I share them with others to give back to the AI community.

9 Reactions

Enrique Calvo Ramos

🌱 Growth 🦾 Innovation 💡 Technology

1mo

Love to see these "simple" improvements that make a lot of sense. We are only at the beginning of this revolution folks

2 Reactions

Superposition Mining & Minerals-Zambia

This is fascinating research! The efficiency gains in both training and inference are impressive and could have significant implications across various industries. At Superposition Mining & Minerals, we're always looking for innovative technologies to enhance our operations. AI-driven advancements in large language models like this could revolutionize our data analysis and predictive maintenance strategies. Looking forward to seeing more breakthroughs from "Given our commitment to leveraging cutting-edge technologies, we're interested in exploring AI solutions tailored for the mining industry. If anyone knows of a sophisticated mineral mining AI platform that could assist with geological analysis, resource estimation, or operational optimization, please share. We're keen to collaborate and integrate such advancements into our processes at Superposition Mining & Minerals in Zambia."

Lior S.

Covering the latest in AI R&D • ML-Engineer • Ex-Mila researcher • MIT Lecturer • Building AlphaSignal.ai, a technical newsletter read by 180,000 AI/ML experts.

1mo

Fellas, we did a deep dive on AlphaSignal 6 days ago :) https://link.alphasignal.ai/lJnZvW

9 Reactions

VINCI Digital | IIoT + AI / GenAI Strategic Advisory 🚀

𝐌𝐞𝐭𝐚’𝐬 𝐍𝐞𝐰 𝐌𝐮𝐥𝐭𝐢-𝐓𝐨𝐤𝐞𝐧 𝐏𝐫𝐞𝐝𝐢𝐜𝐭𝐢𝐨𝐧 𝐌𝐚𝐤𝐞𝐬 𝐀𝐈 𝐌𝐨𝐝𝐞𝐥𝐬 𝐔𝐩 𝐭𝐨 𝟑𝐗 𝐅𝐚𝐬𝐭𝐞𝐫: 𝘐𝘯 𝘢 𝘳𝘦𝘤𝘦𝘯𝘵 𝘴𝘵𝘶𝘥𝘺, 𝘳𝘦𝘴𝘦𝘢𝘳𝘤𝘩𝘦𝘳𝘴 𝘢𝘵 Meta, Ecole des Ponts ParisTech 𝘢𝘯𝘥 Université Paris-Saclay 𝘴𝘶𝘨𝘨𝘦𝘴𝘵 𝘪𝘮𝘱𝘳𝘰𝘷𝘪𝘯𝘨 𝘵𝘩𝘦 𝘢𝘤𝘤𝘶𝘳𝘢𝘤𝘺 𝘢𝘯𝘥 𝘴𝘱𝘦𝘦𝘥 𝘰𝘧 𝘈𝘐 𝘭𝘢𝘳𝘨𝘦 𝘭𝘢𝘯𝘨𝘶𝘢𝘨𝘦 𝘮𝘰𝘥𝘦𝘭𝘴 (𝘓𝘓𝘔𝘴) 𝘣𝘺 𝘮𝘢𝘬𝘪𝘯𝘨 𝘵𝘩𝘦𝘮 𝘱𝘳𝘦𝘥𝘪𝘤𝘵 𝘮𝘶𝘭𝘵𝘪𝘱𝘭𝘦 𝘵𝘰𝘬𝘦𝘯𝘴 𝘴𝘪𝘮𝘶𝘭𝘵𝘢𝘯𝘦𝘰𝘶𝘴𝘭𝘺. 🚀🚀🚀 | Via VentureBeat | May 6, 2024 | VINCI Digital | IIoT + AI / GenAI Strategic Advisory 🚀 https://www.linkedin.com/posts/fabiobottacci_iot-ai-genai-activity-7193600911103975424-dMIF

1 Reaction

See more comments

To view or add a comment, sign in

More Relevant Posts

Basant Singh

Looking for Roles(open work permit)| LLMs | NLP | Data Science | AI Engineer
3w Edited
Report this post
Multi token LLM modelling analogy 1. Loss is higher in natural language modelling. 2. Loss is lesser in code completion(kind of semantic task) Analogy- Its easier to code than reading book coz u you know how to predict sematic structure after writing a constructor in class.
AI at Meta

798,792 followers
1mo

New research from FAIR: Better & Faster Large Language Models via Multi-token Prediction. Paper ➡️ https://go.fb.me/o0yd06 We show that replacing next token prediction tasks with multiple token prediction can result in substantially better code generation performance with the exact same training budget and data — while also increasing inference performance by 3x. While similar approaches have previously been used in fine-tuning, this new paper expands to pre-training for large models, showing notable behaviors and results at these scales.
Like Comment
To view or add a comment, sign in
Kardome

1,554 followers
4w
Report this post
AI at Meta’s #FAIR (Fundamental AI Research,)Research team has recently published a paper that delves into some interesting technical details. The paper showcases the following key developments: - Multi-token Prediction: #Largelanguage models can now predict multiple tokens simultaneously, which is an exciting advancement. - Efficiency: The speed and performance of #language models have been improved significantly. - Innovation: Cutting-edge techniques in #AI language processing are introduced in the paper, which is a significant contribution to the field. #voicefirsr #ai
AI at Meta

798,792 followers
1mo

New research from FAIR: Better & Faster Large Language Models via Multi-token Prediction. Paper ➡️ https://go.fb.me/o0yd06 We show that replacing next token prediction tasks with multiple token prediction can result in substantially better code generation performance with the exact same training budget and data — while also increasing inference performance by 3x. While similar approaches have previously been used in fine-tuning, this new paper expands to pre-training for large models, showing notable behaviors and results at these scales.
Like Comment
To view or add a comment, sign in
Esmail A.Gumaan

B.A. in Computer Science| Software developer| Researcher
1mo
Report this post
Ever wonder how those big language models like GPT and Llama get so smart? Well, typically, they learn by predicting one word at a time. But what if they could predict multiple words together? That's the idea behind multi-token prediction. It's like solving a puzzle with more pieces at once, making learning faster and improving their ability to understand and generate language. This approach not only helps them tackle complex tasks, especially in coding, but also makes them quicker at providing answers, which is super handy for real-time applications. Exciting stuff!
AI at Meta

798,792 followers
1mo

New research from FAIR: Better & Faster Large Language Models via Multi-token Prediction. Paper ➡️ https://go.fb.me/o0yd06 We show that replacing next token prediction tasks with multiple token prediction can result in substantially better code generation performance with the exact same training budget and data — while also increasing inference performance by 3x. While similar approaches have previously been used in fine-tuning, this new paper expands to pre-training for large models, showing notable behaviors and results at these scales.
Like Comment
To view or add a comment, sign in
Siddhartha Bhomia

Director, Operational Risk Management- Technology and Cybersecurity at Citi New York City Metropolitan Area
1mo
Report this post
Interesting, any impact on memory consumption or performance ?
AI at Meta

798,792 followers
1mo

New research from FAIR: Better & Faster Large Language Models via Multi-token Prediction. Paper ➡️ https://go.fb.me/o0yd06 We show that replacing next token prediction tasks with multiple token prediction can result in substantially better code generation performance with the exact same training budget and data — while also increasing inference performance by 3x. While similar approaches have previously been used in fine-tuning, this new paper expands to pre-training for large models, showing notable behaviors and results at these scales.
Like Comment
To view or add a comment, sign in
Google Research

103,630 followers
4mo
Report this post
Outside of the mathematical setting, large language models can be prone to making logical mistakes. Today we present an evaluation benchmark for mistake identification across settings and examine how LLMs might learn to correct their own logical errors. →https://goo.gle/48Ox58T
6 Comments
Like Comment
To view or add a comment, sign in
Liran Baba

Solutions Architect, Avaya ACES
1y
Report this post
An interesting paper by Amirkeivan Mohtashami & Martin Jaggi suggests a new method "landmark attention" that enhances the training of attention in language models, allowing for inference with a context length of up to 32k(!) tokens. the implementation of landmark attention including the diff weights for llama 7b to support up to 32k context window can be found at: https://lnkd.in/d_dSXgyv To read more about Landmark Attention, check out: https://lnkd.in/da-Cbt4M
Like Comment
To view or add a comment, sign in

798,792 followers

View Profile Follow

AI at Meta’s Post

More Relevant Posts

Explore topics