It's weird that LLMs immediately forget old conversations — or even parts of a conversation that are simply too far back in time, after the context window runs out. Yes, there are some huge context windows out there. But still, computers can easily store, word-for-word, the exact history of practically unlimited conversation. It feels like LLMs ought to remember things it heard years ago, just as a person (with a good memory) might. As a fun thought experiment: I type about 90 words per minute. If I typed non-stop for 100 years (no lazy things like sleeping or eating allowed), I'd produce less than 30 gigs of data. In other words, the entire lifetime text output of a single human can easily fit in the memory of a modern laptop. Are researchers just going to sit around and let LLMs have poor memories??? Obviously no, otherwise I wouldn't be writing this. The progress to report here is about a new paper from Google about an idea called Infini-attention. I have to give the authors credit for not flippantly using the word "infini:" This work really does provide a modification to attention that — in theory — has no time limit in terms of what it can remember. Well ... let's get somewhat technical. The "infini" memory of the model is a matrix that can effectively capture many key-value data points from all past conversation history. One very cool fact about high-dimensional vectors is that most of them (chosen at random) are mostly orthogonal to each other, which means that, in some sense, you can pack more information into a matrix than what you might expect by the size of the matrix alone. Having said that, the rank of the matrix (which is limited by the smaller of: number of rows, or number of columns) is indeed a kind of upper bound on the truly independent directions of information a single matrix can hold. The capacity is _not_ infinite. So it's inevitable that such a matrix must gradually forget things over time. In a way — if this is a model for the human brain — it may help us to understand why most people do tend to forget, or experience a fading away — of memories as time passes. This week's Learn & Burn summary explains the clever way the authors are able to incrementally update the memory matrix and incorporate non-linearity into the key-value lookups: https://lnkd.in/g-Gjr5aP
Tyler Neylon’s Post
More Relevant Posts
-
WORKING WITH NON-TRADITIONAL TEXTS: Pattern Recognition, Reconstructed: The Unlikely Return of Fornés’s ‘Evelyn Brown’ https://rebrand.ly/rjr9vcn
To view or add a comment, sign in
-
MEng Cyber Security | Data Scientist | Machine Learning | Software Security | Artificial Intelligence | Data Visualization
Decompose a question into a set of sub-problems / questions, which can either be solved sequentially (use the answer from first + retrieval to answer the second) or in parallel (consolidate each answer into final answer). Various works such as Least-to-Most prompting
To view or add a comment, sign in
-
Quick and dirty summary of an interesting talk I watched yesterday, by applied math professor David Sumpter on what he calls the "four ways of thinking" -- things we can learn from the mindsets of scientists and how they approach problems: - 🧢 Statistical thinking: Deriving correlations from significant data points while being aware of the limitations of this approach. - 👒 Interactive thinking: Looking for causation between events by directly manipulating the factors in play. - 🤠 Chaotic thinking: Taking into account the importance of initial conditions and how very slight input differences can lead to radically divergent outputs, a.k.a. the butterfly effect. - 🎩 Complex thinking: He didn't say much about this one because of time constraints, but I suppose it's the ability the combine all of the above knowing that real-world phenomena are often non-linear (in the sense that they can be non-deterministic, subject to circular causality, or produce counter-intuitive results, among other oddities). The way I understand it, these aren't mutually exclusive, but rather incremental and can be combined depending on the problem at hand. Like programming paradigms, they represent different approaches to solve problems, and it's useful for a knowledge worker like myself to know these things to be able to choose the right tool for the job. Anyway, I can only recommend you watch the talk; it's very interesting (entertaining even), with a ton of football (soccer) examples, historical figures (mostly scientists and engineers), and even advice for couples 😅 If you're interested, the talk is based on Sumpter's book, "Four Ways of Thinking: Statistical, Interactive, Chaotic and Complex" (ISBN: 9781250806260). One more in my long reading list! #Knowledge #Science #Thinking
Four Ways of Thinking: Statistical, Interactive, Chaotic and Complex - David Sumpter
https://www.youtube.com/
To view or add a comment, sign in
-
Data Scientist 💯 | Data Analyst🎯 |Python| Relational DB & SQL | Machine Learning| Tableau|Statistics |Result-Driven| Flexible| Consultant|Good work-Life balance|Educator
useful summary and a nice article for pleasant reading...I wish good reading
End-to-End Machine Learning Project Guide
link.medium.com
To view or add a comment, sign in
-
"The goal of this post is that you can walk away from this post with a general sense of which performance metric you assessing for your specific use case." A High Level Guide to LLM Evaluation Metrics by David Hundley
A High Level Guide to LLM Evaluation Metrics
towardsdatascience.com
To view or add a comment, sign in
-
I think this is the best source for learning SVD by visual interpretation. What a powerful concept indeed! https://lnkd.in/dzHWYM6T
SVD Visualized, Singular Value Decomposition explained | SEE Matrix , Chapter 3 #SoME2
https://www.youtube.com/
To view or add a comment, sign in
-
Here's what you need to know about Chain-of-Verficiation by Meta: Whenever you use a LLM for text generation you run into the risk of hallucination, where the LLM outputs claims that are not grounded in reality. The risk is event higher when you're dealing with long text generation... To tackle this challenge, the researchers device a method that resembles Chain-of-Thought: 1️⃣ Generate an initial response conditioned on the user's instruction. 2️⃣ Generate verification questions conditioned on the instruction and the response. 3️⃣ Generate answers conditioned on the questions. It's important to not condition on the response, or otherwise the LLM might repeat its hallucinations. 4️⃣ Rewrite the draft to fix any inconsistencies / mistakes found in the verification phase.
To view or add a comment, sign in
-
....one word is not enough to capture the real intentions of any given strategy. But as a way of summarising the motivation and predicting what it could achieve, de-risking constitutes a considerable improvement to both notions of decoupling and strategic autonomy. My latest. https://lnkd.in/eEa_Z29R
To view or add a comment, sign in
-
In this episode, we discuss Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations? by Zorik Gekhman, Gal Yona, Roee Aharoni, Matan Eyal, Amir Feder, Roi Reichart, Jonathan Herzig. The paper explores the effects of integrating new factual information into large language models (LLMs) during the fine-tuning phase, particularly focusing on how this affects their ability to retain and utilize pre-existing knowledge. It was found that LLMs struggle to learn new facts during fine-tuning, indicating a slower learning curve for new information compared to familiar content from their training data. Additionally, the study reveals that as LLMs incorporate new facts, they are more prone to generating factually incorrect or "hallucinated" responses, suggesting a trade-off between knowledge integration and accuracy.
arxiv preprint - Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?
podbean.com
To view or add a comment, sign in
-
Co-founder at DAIR.AI | Prev: Meta AI, Galactica LLM, PapersWithCode, Elastic | Creator of the Prompting Guide (3.5M+ learners)
The Geometry of Truth Pretty neat interactive blog to explore how LLMs represent truth. It's based on this recent paper that presents evidence that language models linearly represent the truth or falsehood of factual statements. blog: https://lnkd.in/eGA4ERQb paper: https://lnkd.in/eR_HgEq2
To view or add a comment, sign in