Last week, Google announced Gemini 1.5 Pro, a model capable of processing up to 1M tokens. For context, GPT 4 Turbo’s context window is 128K.
When groundbreaking technologies emerge, it may be tempting to draw conclusions very fast.
One such conclusion is that models with large context windows will kill RAG.
As Donato Riccio put it a few months back: "My prediction is that in a year or two RAG as we know it today will no longer be needed, as LLMs scale their context to unlimited sizes."
I disagree. (Sorry, Donato)
While Gemini 1.5 Pro resolves the limited context window issue, it doesn’t solve the 'Needle-in-a-Haystack' problem, highlighting the challenges in understanding complex flows within codebases.
Think of it this way: sending an entire codebase to an LLM is like asking a human to find an answer in a 600 page book, rather than a specific paragraph.
Simply put, when given too much context, LLMs don’t perform as well.
While the release of Gemini 1.5 Pro marks a technological milestone, it falls short in addressing the nuanced needs for RAG in coding tasks – especially when tackling questions that demand an understanding of intricate, interconnected code flows.
Google claims that the new model is able to provide relevant answers within a huge content window, but this only works for questions like “where is X?”, which requires the model to find X within the context window.
But at the end of the day, this isn’t the challenge devs have when it comes to answering questions about their codebase.
The major challenge? Answering questions that require a deeper understanding of flows that span multiple locations. And even if context windows keep increasing in size, that doesn’t solve the “needle in a haystack” problem.
A gap exists here. One that underscores the persistent challenge in refining AI models to navigate the subtleties of queries effectively.
So here's my take:
While Gemini 1.5 Pro is a groundbreaking technology, it’s too soon to kill off RAG, especially when it comes to code-related tasks, because the elimination of RAG doesn’t solve the needle in a haystack problem.
Dev | jbn.ai | 3D | three.js | opengl | AWS | Former IT Tech | I build game engines in C++ and JS for fun
3wStandardisation is good, bonus points for the threading die’s 🧐