Mozilla’s Post

View organization page for Mozilla, graphic

413,058 followers

We all know how important #opensource is. However, the interoperability of components is just as important. Take a look at how Mozilla.ai is building its #LLM evaluation framework with components that are open-source and built with interoperability in mind⬇️ https://bit.ly/3Va3Tp2

Local LLM-as-judge evaluation with lm-buddy, Prometheus and llamafile

blog.mozilla.ai

2 Comments

Justin Bentley

Standardisation is good, bonus points for the threading die’s 🧐

nick Staloy

Student at Havard

In my #opensource//not safe

See more comments

To view or add a comment, sign in

More Relevant Posts

Davide Eynard
1mo
Report this post
Happy to finally share my first post for Mozilla.ai's blog! If I were to choose a sentence from it to capture your attention that would probably be: "[...] That excited me not much for the results I got (to be fair, my laptop’s battery died before we landed), but rather for the feeling of control it gave me: I did not have to rely on a third-party inference service or cloud GPU for doing LLM-as-judge evaluation." 😁

Mozilla.ai

1,887 followers
1mo

While the #GenAI news cycle can't stop announcing new models, model cost and evaluation continue to be crucial for both developers and businesses. Our latest publication dives deep into #opensource tools that help evaluate models while keeping costs low. These include Prometheus by KAIST AI, an open-source model for LLM-as-judge evaluation; Mozilla.ai's very own lm-buddy, a tool we developed and open-sourced to scale our own fine-tuning and evaluation tasks; and llamafile, a Mozilla Innovation project that brings LLMs into single, portable files. Davide Eynard shows how these components can work together to evaluate LLMs on cheap(er) hardware, and how we assessed the evaluators’ performance to make informed choices about them. https://lnkd.in/dJx8bxUp

Local LLM-as-judge evaluation with lm-buddy, Prometheus and llamafile

blog.mozilla.ai
Like Comment
To view or add a comment, sign in
Mozilla.ai

1,887 followers
1mo
Report this post
While the #GenAI news cycle can't stop announcing new models, model cost and evaluation continue to be crucial for both developers and businesses. Our latest publication dives deep into #opensource tools that help evaluate models while keeping costs low. These include Prometheus by KAIST AI, an open-source model for LLM-as-judge evaluation; Mozilla.ai's very own lm-buddy, a tool we developed and open-sourced to scale our own fine-tuning and evaluation tasks; and llamafile, a Mozilla Innovation project that brings LLMs into single, portable files. Davide Eynard shows how these components can work together to evaluate LLMs on cheap(er) hardware, and how we assessed the evaluators’ performance to make informed choices about them. https://lnkd.in/dJx8bxUp

Local LLM-as-judge evaluation with lm-buddy, Prometheus and llamafile

blog.mozilla.ai

3 Comments
Like Comment
To view or add a comment, sign in
David Manzano-Macho, PhD

Generative AI | LLMs | AI Strategy | OpenSource | #AI #technology #scaleup #advisory
1mo
Report this post
Evaluating language models is a critical task. At Mozilla.ai, we are working on making this task simpler. In this blog post, our colleague Davide Eynard explains the conclusions of our recent investigations in the field and provides some relevant insights." #LLMs #AI #MozillaAI #OpenSource

Mozilla.ai

1,887 followers
1mo

While the #GenAI news cycle can't stop announcing new models, model cost and evaluation continue to be crucial for both developers and businesses. Our latest publication dives deep into #opensource tools that help evaluate models while keeping costs low. These include Prometheus by KAIST AI, an open-source model for LLM-as-judge evaluation; Mozilla.ai's very own lm-buddy, a tool we developed and open-sourced to scale our own fine-tuning and evaluation tasks; and llamafile, a Mozilla Innovation project that brings LLMs into single, portable files. Davide Eynard shows how these components can work together to evaluate LLMs on cheap(er) hardware, and how we assessed the evaluators’ performance to make informed choices about them. https://lnkd.in/dJx8bxUp

Local LLM-as-judge evaluation with lm-buddy, Prometheus and llamafile

blog.mozilla.ai

1 Comment
Like Comment
To view or add a comment, sign in
Britney C.

Inclusive Innovation Ninja, Driving Growth and Creating the Future | Global Leader | Builder | Cloud Exec | Mentor
6mo Edited
Report this post
Hey folks! Ever wanted to have a personal agent that just knows you so well? But also don't love the idea of the company who built the agent knowing you just as well? Mozilla is thinking about this dilemma too. Check out this concept that Liv Erickson and Kate Taylor are exploring called memory cache! https://lnkd.in/g8-JyE9s #innovation #opensource #generativeai #genai #llm

Augmenting Local AI with Browser Data: Introducing MemoryCache

future.mozilla.org

5 Comments
Like Comment
To view or add a comment, sign in
Omer Rosenbaum

CTO & Co-Founder @ Swimm
3mo
Report this post
Last week, Google announced Gemini 1.5 Pro, a model capable of processing up to 1M tokens. For context, GPT 4 Turbo’s context window is 128K. When groundbreaking technologies emerge, it may be tempting to draw conclusions very fast. One such conclusion is that models with large context windows will kill RAG. As Donato Riccio put it a few months back: "My prediction is that in a year or two RAG as we know it today will no longer be needed, as LLMs scale their context to unlimited sizes." I disagree. (Sorry, Donato) While Gemini 1.5 Pro resolves the limited context window issue, it doesn’t solve the 'Needle-in-a-Haystack' problem, highlighting the challenges in understanding complex flows within codebases. Think of it this way: sending an entire codebase to an LLM is like asking a human to find an answer in a 600 page book, rather than a specific paragraph. Simply put, when given too much context, LLMs don’t perform as well. While the release of Gemini 1.5 Pro marks a technological milestone, it falls short in addressing the nuanced needs for RAG in coding tasks – especially when tackling questions that demand an understanding of intricate, interconnected code flows. Google claims that the new model is able to provide relevant answers within a huge content window, but this only works for questions like “where is X?”, which requires the model to find X within the context window. But at the end of the day, this isn’t the challenge devs have when it comes to answering questions about their codebase. The major challenge? Answering questions that require a deeper understanding of flows that span multiple locations. And even if context windows keep increasing in size, that doesn’t solve the “needle in a haystack” problem. A gap exists here. One that underscores the persistent challenge in refining AI models to navigate the subtleties of queries effectively. So here's my take: While Gemini 1.5 Pro is a groundbreaking technology, it’s too soon to kill off RAG, especially when it comes to code-related tasks, because the elimination of RAG doesn’t solve the needle in a haystack problem.
1 Comment
Like Comment
To view or add a comment, sign in
Shabarish Ail

CSM® , Lead PS Consultant at HERE Technologies
12mo
Report this post
Exploring what LLM's are, and the use cases where they can be utilized .. #LLMs #GenerativeAiLearningpath #GenAi #LLM #google #googlecloud Google

Introduction to Large Language Models

cloudskillsboost.google
Like Comment
To view or add a comment, sign in
Ross Knutson

Manager at OneSix
3mo
Report this post
Last week, Google announced Gemini 1.5 and its significant improvements on the size of the 'context window.' What is the context window? At a high level, the context window is the amount of recent conversation history that the LLM holds on to, so that it remembers what you are talking about. You can imagine your prompts being appended with: "In case you forgot, here's my earlier prompt and your response from 2min ago." Larger context windows will have a profound effect on how to best implement custom LLM integrations. AI leaders should look at all possible tools for customization (vector databases, context windows, prompt engineering, etc.), before designing an LLM architecture for their use-case.

Ajit Monteiro
4mo Edited

Google just released the next version of their LLM, Gemini 1.5. What's particularly interesting is that it has a massive context window of 1 million tokens. Think of the context window as the LLM's 'working memory' that it can use to answer questions. This allows users to converse on larger datasets (>700k words) of their own data. For comparison, Chat GPT (GPT-4 Turbo) has a 128K context. We've been using vector databases to increase LLM's memory; think of these as a prior orchestration step that provides LLMs with a subset of the relevant context when a question is asked. Using Gemini 1.5, we can send a larger context or potentially remove the need for a vector database for smaller use cases. #largelanguagemodels #artificialintelligence

Our next-generation model: Gemini 1.5

blog.google
Like Comment
To view or add a comment, sign in
Ajit Monteiro
4mo Edited
Report this post
Google just released the next version of their LLM, Gemini 1.5. What's particularly interesting is that it has a massive context window of 1 million tokens. Think of the context window as the LLM's 'working memory' that it can use to answer questions. This allows users to converse on larger datasets (>700k words) of their own data. For comparison, Chat GPT (GPT-4 Turbo) has a 128K context. We've been using vector databases to increase LLM's memory; think of these as a prior orchestration step that provides LLMs with a subset of the relevant context when a question is asked. Using Gemini 1.5, we can send a larger context or potentially remove the need for a vector database for smaller use cases. #largelanguagemodels #artificialintelligence

Our next-generation model: Gemini 1.5

blog.google
Like Comment
To view or add a comment, sign in
Robert 🦄 Slaughter

Helping the DoD become the largest open source software contributor in the world
1y Edited
Report this post
Excited to announce that LeapfrogAI is now multi-modal! What this means is that LeapfrogAI has the ability to combine multiple types, or modes, of data to create more accurate and advanced determinations. In this case, speech and text. We are continuing to expand as mission customers request new capabilities. For those who don't know, LeapfrogAI is - Open Source (https://lnkd.in/e2ZB44T5) - Fully compatible with OpenAI API (but deployed anywhere) - Easy to deploy to air gap (Defense Unicorns's Zarf) - Runs on DoD Platform One Big Bang - Supports a wide variety of LLMs - Supports translation of speech to text in 80 languages What's coming in LeapfrogAI in the coming weeks? 🐸 👀 * CPU model support across a variety of models, enabling critical capabilities to be deployed in environments where you might not have a GPU * A model browser/zoo for understanding and selecting the right model for the capabilities your mission needs, and dynamic model uploads and usage to support capabilities you're creating on your own * More LLMs, and tooling for quickly adding open source LLMs and datasets in * Tooling for fine tuning models locally in air gapped data and usage in secure, air gapped, and edge environments * Rich embedding support - get clickable links to document sources * Tools and examples for chain-of-thought, API usage, and agents with compatibility for ChatGPT Plugins * Stability, bug fixes, performance enhancements, and more! * Another unannounced multi-modal mode 🖼 - with so many uses and models out there ready to go. Let's just say it can bring some (eye)sight to your missions. Even better - come bring your existing tooling, the LeapfrogAI API is OpenAI compatible, and we're beginning to build out an ecosystem of examples and support for people deploying it. Source code here ----> https://lnkd.in/e2ZB44T5 #artificialintelligence #LLM #AI
11 Comments
Like Comment
To view or add a comment, sign in

413,058 followers

View Profile Follow

Mozilla’s Post

More Relevant Posts

Explore topics