Tyler Neylon’s Post

ML Founder | NLP & Recommendations focus

It's weird that LLMs immediately forget old conversations — or even parts of a conversation that are simply too far back in time, after the context window runs out. Yes, there are some huge context windows out there. But still, computers can easily store, word-for-word, the exact history of practically unlimited conversation. It feels like LLMs ought to remember things it heard years ago, just as a person (with a good memory) might. As a fun thought experiment: I type about 90 words per minute. If I typed non-stop for 100 years (no lazy things like sleeping or eating allowed), I'd produce less than 30 gigs of data. In other words, the entire lifetime text output of a single human can easily fit in the memory of a modern laptop. Are researchers just going to sit around and let LLMs have poor memories??? Obviously no, otherwise I wouldn't be writing this. The progress to report here is about a new paper from Google about an idea called Infini-attention. I have to give the authors credit for not flippantly using the word "infini:" This work really does provide a modification to attention that — in theory — has no time limit in terms of what it can remember. Well ... let's get somewhat technical. The "infini" memory of the model is a matrix that can effectively capture many key-value data points from all past conversation history. One very cool fact about high-dimensional vectors is that most of them (chosen at random) are mostly orthogonal to each other, which means that, in some sense, you can pack more information into a matrix than what you might expect by the size of the matrix alone. Having said that, the rank of the matrix (which is limited by the smaller of: number of rows, or number of columns) is indeed a kind of upper bound on the truly independent directions of information a single matrix can hold. The capacity is _not_ infinite. So it's inevitable that such a matrix must gradually forget things over time. In a way — if this is a model for the human brain — it may help us to understand why most people do tend to forget, or experience a fading away — of memories as time passes. This week's Learn & Burn summary explains the clever way the authors are able to incrementally update the memory matrix and incorporate non-linearity into the key-value lookups: https://lnkd.in/g-Gjr5aP

LLMs that never forget

learnandburn.ai

To view or add a comment, sign in

More Relevant Posts

Northwest Theatre Workshop (NWTW)

40 followers
11mo
Report this post
WORKING WITH NON-TRADITIONAL TEXTS: Pattern Recognition, Reconstructed: The Unlikely Return of Fornés’s ‘Evelyn Brown’ https://rebrand.ly/rjr9vcn
Like Comment
To view or add a comment, sign in
Jonathan Angeles

MEng Cyber Security | Data Scientist | Machine Learning | Software Security | Artificial Intelligence | Data Visualization
3mo
Report this post
Decompose a question into a set of sub-problems / questions, which can either be solved sequentially (use the answer from first + retrieval to answer the second) or in parallel (consolidate each answer into final answer). Various works such as Least-to-Most prompting
Like Comment
To view or add a comment, sign in
Oussama EL GHANNAMI

📌 Experienced Project Manager Transitioning to Software Engineering
1mo
Report this post
Quick and dirty summary of an interesting talk I watched yesterday, by applied math professor David Sumpter on what he calls the "four ways of thinking" -- things we can learn from the mindsets of scientists and how they approach problems: - 🧢 Statistical thinking: Deriving correlations from significant data points while being aware of the limitations of this approach. - 👒 Interactive thinking: Looking for causation between events by directly manipulating the factors in play. - 🤠 Chaotic thinking: Taking into account the importance of initial conditions and how very slight input differences can lead to radically divergent outputs, a.k.a. the butterfly effect. - 🎩 Complex thinking: He didn't say much about this one because of time constraints, but I suppose it's the ability the combine all of the above knowing that real-world phenomena are often non-linear (in the sense that they can be non-deterministic, subject to circular causality, or produce counter-intuitive results, among other oddities). The way I understand it, these aren't mutually exclusive, but rather incremental and can be combined depending on the problem at hand. Like programming paradigms, they represent different approaches to solve problems, and it's useful for a knowledge worker like myself to know these things to be able to choose the right tool for the job. Anyway, I can only recommend you watch the talk; it's very interesting (entertaining even), with a ton of football (soccer) examples, historical figures (mostly scientists and engineers), and even advice for couples 😅 If you're interested, the talk is based on Sumpter's book, "Four Ways of Thinking: Statistical, Interactive, Chaotic and Complex" (ISBN: 9781250806260). One more in my long reading list! #Knowledge #Science #Thinking

Four Ways of Thinking: Statistical, Interactive, Chaotic and Complex - David Sumpter

https://www.youtube.com/
Like Comment
To view or add a comment, sign in
Betul Kırsakal

Data Scientist 💯 | Data Analyst🎯 |Python| Relational DB & SQL | Machine Learning| Tableau|Statistics |Result-Driven| Flexible| Consultant|Good work-Life balance|Educator
10mo
Report this post
useful summary and a nice article for pleasant reading...I wish good reading

End-to-End Machine Learning Project Guide

link.medium.com
Like Comment
To view or add a comment, sign in
Towards Data Science

626,920 followers
3mo Edited
Report this post
"The goal of this post is that you can walk away from this post with a general sense of which performance metric you assessing for your specific use case." A High Level Guide to LLM Evaluation Metrics by David Hundley

A High Level Guide to LLM Evaluation Metrics

towardsdatascience.com

1 Comment
Like Comment
To view or add a comment, sign in
Zain Sheraz

I work with tensors in a matrix simulation (period)
2mo
Report this post
I think this is the best source for learning SVD by visual interpretation. What a powerful concept indeed! https://lnkd.in/dzHWYM6T

SVD Visualized, Singular Value Decomposition explained | SEE Matrix , Chapter 3 #SoME2

https://www.youtube.com/
Like Comment
To view or add a comment, sign in
Yoel Zeldes

Research Engineer (NLP) @ Google DeepMind | Content Creator @ One Shot Learning
4mo
Report this post
Here's what you need to know about Chain-of-Verficiation by Meta: Whenever you use a LLM for text generation you run into the risk of hallucination, where the LLM outputs claims that are not grounded in reality. The risk is event higher when you're dealing with long text generation... To tackle this challenge, the researchers device a method that resembles Chain-of-Thought: 1️⃣ Generate an initial response conditioned on the user's instruction. 2️⃣ Generate verification questions conditioned on the instruction and the response. 3️⃣ Generate answers conditioned on the questions. It's important to not condition on the response, or otherwise the LLM might repeat its hallucinations. 4️⃣ Rewrite the draft to fix any inconsistencies / mistakes found in the verification phase.
5 Comments
Like Comment
To view or add a comment, sign in
Maria Demertzis

Bruegel and part-time Professor of Economic Policy at STG, EUI
1y
Report this post
....one word is not enough to capture the real intentions of any given strategy. But as a way of summarising the motivation and predicting what it could achieve, de-risking constitutes a considerable improvement to both notions of decoupling and strategic autonomy. My latest. https://lnkd.in/eEa_Z29R
Like Comment
To view or add a comment, sign in
Ramin Mehran

Tech Lead @ Google DeepMind Multi-Modal perception/generation, AI Breakdown Podcaster
3w
Report this post
In this episode, we discuss Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations? by Zorik Gekhman, Gal Yona, Roee Aharoni, Matan Eyal, Amir Feder, Roi Reichart, Jonathan Herzig. The paper explores the effects of integrating new factual information into large language models (LLMs) during the fine-tuning phase, particularly focusing on how this affects their ability to retain and utilize pre-existing knowledge. It was found that LLMs struggle to learn new facts during fine-tuning, indicating a slower learning curve for new information compared to familiar content from their training data. Additionally, the study reveals that as LLMs incorporate new facts, they are more prone to generating factually incorrect or "hallucinated" responses, suggesting a trade-off between knowledge integration and accuracy.

arxiv preprint - Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?

podbean.com
Like Comment
To view or add a comment, sign in
Elvis S.

Co-founder at DAIR.AI | Prev: Meta AI, Galactica LLM, PapersWithCode, Elastic | Creator of the Prompting Guide (3.5M+ learners)
7mo
Report this post
The Geometry of Truth Pretty neat interactive blog to explore how LLMs represent truth. It's based on this recent paper that presents evidence that language models linearly represent the truth or falsehood of factual statements. blog: https://lnkd.in/eGA4ERQb paper: https://lnkd.in/eR_HgEq2
1 Comment
Like Comment
To view or add a comment, sign in

1,323 followers

103 Posts

View Profile Follow

Tyler Neylon’s Post

More Relevant Posts

Four Ways of Thinking: Statistical, Interactive, Chaotic and Complex - David Sumpter

https://www.youtube.com/

SVD Visualized, Singular Value Decomposition explained | SEE Matrix , Chapter 3 #SoME2

https://www.youtube.com/

arxiv preprint - Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?

podbean.com

Explore topics