Tyler Neylon’s Post

View profile for Tyler Neylon, graphic

ML Founder | NLP & Recommendations focus

It's weird that LLMs immediately forget old conversations — or even parts of a conversation that are simply too far back in time, after the context window runs out. Yes, there are some huge context windows out there. But still, computers can easily store, word-for-word, the exact history of practically unlimited conversation. It feels like LLMs ought to remember things it heard years ago, just as a person (with a good memory) might. As a fun thought experiment: I type about 90 words per minute. If I typed non-stop for 100 years (no lazy things like sleeping or eating allowed), I'd produce less than 30 gigs of data. In other words, the entire lifetime text output of a single human can easily fit in the memory of a modern laptop. Are researchers just going to sit around and let LLMs have poor memories??? Obviously no, otherwise I wouldn't be writing this. The progress to report here is about a new paper from Google about an idea called Infini-attention. I have to give the authors credit for not flippantly using the word "infini:" This work really does provide a modification to attention that — in theory — has no time limit in terms of what it can remember. Well ... let's get somewhat technical. The "infini" memory of the model is a matrix that can effectively capture many key-value data points from all past conversation history. One very cool fact about high-dimensional vectors is that most of them (chosen at random) are mostly orthogonal to each other, which means that, in some sense, you can pack more information into a matrix than what you might expect by the size of the matrix alone. Having said that, the rank of the matrix (which is limited by the smaller of: number of rows, or number of columns) is indeed a kind of upper bound on the truly independent directions of information a single matrix can hold. The capacity is _not_ infinite. So it's inevitable that such a matrix must gradually forget things over time. In a way — if this is a model for the human brain — it may help us to understand why most people do tend to forget, or experience a fading away — of memories as time passes. This week's Learn & Burn summary explains the clever way the authors are able to incrementally update the memory matrix and incorporate non-linearity into the key-value lookups: https://lnkd.in/g-Gjr5aP

LLMs that never forget

LLMs that never forget

learnandburn.ai

To view or add a comment, sign in

Explore topics