Live Webinar: June 25th

Join our Builder's Roundtable to learn all about fine-tuning LLMs

Text Gen Solution

Fast,
cost-optimized
LLM endpoints

Quickly evaluate and scale the latest models by leveraging OctoAI's singular API. Our deep expertise in model compilation, model curation, and ML systems means you get low-latency, affordable endpoints that can handle any production workload.

Ask About Enterprise

Read about our customers

Latitude Games lowers costs by 5x and unlocks new game experiences with Mixtral on OctoAI

Deepak Mohan & Nick Walton

Feb 16, 2024

Run your choice of models and fine-tunes

Build on your choice of OSS LLMs or your own model on our blazing fast API endpoints. Scale seamlessly and reliably without dropping performance.

Robust reliability

Serving millions of customer inferences daily

Adaptive scalability

Growth-ready for your app

Low cost with high performance

Keeping customers and the finance departments happy

Migrate with Ease

OpenAI SDK users move to OctoAI's compatible API with minimal effort

Stay up to date with new models and features

Product & Customer Updates

Fine-tuned Mistral 7B delivers over 60x lower costs and comparable quality to GPT 4

May 9, 2024

7 minutes

Visit the blog

Latest Models

Hermes 2 Pro Llama 3

The first fine-tune from Nous Research, and has a updated version of OpenHermes 2.5 Dataset. This model is great for conversational and reasoning tasks for your AI apps. Function calling support for this model is coming soon!

Chat

Coding

Experimental

Llama 3 Instruct

The most recent release from Meta. This model is instruction tuned for chat and is optimized for helpfulness and safety. This model is performing well above common benchmarks for open-source chat models.

Chat

Coding

Mixtral-8x22B Instruct

Strong mathematics and coding capabilities, with a 64K tokens context window to allow for precise information recall from large documents and can be used for chat, question and answer, and other instruction based tasks. Fluent in English, French, Italian, German, and Spanish.

Chat

Coding

Mixtral 8x22B fine-tuned

Over the coming weeks we will be utilizing the newest and strongest fine-tunes from the community. Come back often to see what new fine-tune will be here for testing. After testing several fine-tuned versions of this model we will select the top performing to persistently host on OctoAI.

Chat

Experimental

See all models

Product & Customer Updates

A Framework for Selecting the Right LLM

Jun 11, 2024

4 minutes

GitView launches AI code review analysis for engineering teams using OctoAI

Jun 4, 2024

2 minutes

30 Days of Llama 3: Newest Member of the Herd is Living up to the Hype

May 17, 2024

3 minutes

Fine-tuned Mistral 7B delivers over 60x lower costs and comparable quality to GPT 4

May 9, 2024

7 minutes

Visit the blog

Demos & Webinars

Selecting the right GenAI model for production

Watch our on-demand webinar as our engineers review all steps of model evaluation, testing, when to use checkpoints vs LoRAs, and how to get the best results.

PDF Summarizer

Learn how to build a PDF summary app using NodeJS and OctoAI’s Text Gen Solution. Convert PDFs into summarized TXT files with an LLM.

Summarization

Text generation

Recipe generator app with Llama 2 and OctoAI SDK

Learn how to build an app that takes a user list and outputs a recipe based on the input using Python and Llama 2 13B Chat.

Text generation

Summarization

Simple Chat

Learn how to build a ”hello world” question and answer chat app that uses Llama 2 70B, OctoAI, and LangChain. You just need a Python interpreter to get started.

Text generation

Question answering

View all demos & webinars

TESTIMONIALS

Trusted by GenAI Innovators

“Working with the OctoAI team, we were able to quickly evaluate the new model, validate its performance through our proof of concept phase, and move the model to production. Mixtral on OctoAI serves a majority of the inferences and end player experiences on AI Dungeon today.”

Nick Walton

CEO & Co-Founder Latitude

“The LLM landscape is changing almost every day, and we need the flexibility to quickly select and test the latest options. OctoAI made it easy for us to evaluate a number of fine tuned model variants for our needs, identify the best one, and move it to production for our application.”

Matt Shumer

CEO & Co-Founder Otherside AI

Fast & Flexible

JSON mode for reliable structured output

JSON mode is built into leading models on the OctoAI Systems Stack, allowing it to work without disruptions or quality issues. OctoAI has pushed further and optimized JSON mode for industry-leading latency performance.

See how

Text embedding for RAG

Utilize GTE Large embedding endpoint to facilitate retrieval augmented generation (RAG) or semantic search for your apps. With a score of 63.13% on the MTEB leaderboard and compatible API, migrating from OpenAI requires minimal code updates. Learn how.

Build using our high quality and cost effective Mixtral 8x7B & 8x22B models

Our accelerated Mixtral delivers quality competitive with GPT 3.5, but with open source flexibility. Enjoy reduced costs with our 4x lower price per token than GPT 3.5. Migrating is made easy with one unified OpenAI compatible API. We support fine-tunes from the community including the latest from Nous Research.

See how

MODEL COCKTAILS

Build using multiple models for your use case

Using OctoAI you can link several generative models together to create a highly performant pipeline. You can build new experiences specifically for your industry needs using language, images, audio, or your own custom models. Learn how our customer, Capitol AI, was able to work with us to achieve cost savings on their multiple models in production.

Try the Demo App

Fast,cost-optimizedLLM endpoints

Read about our customers

Latitude Games lowers costs by 5x and unlocks new game experiences with Mixtral on OctoAI

Run your choice of models and fine-tunes

Robust reliability

Adaptive scalability

Low cost with high performance

Migrate with Ease

Stay up to date with new models and features

Product & Customer Updates

A Framework for Selecting the Right LLM

GitView launches AI code review analysis for engineering teams using OctoAI

30 Days of Llama 3: Newest Member of the Herd is Living up to the Hype

Fine-tuned Mistral 7B delivers over 60x lower costs and comparable quality to GPT 4

Latest Models

Hermes 2 Pro Llama 3

Llama 3 Instruct

Mixtral-8x22B Instruct

Mixtral 8x22B fine-tuned

Product & Customer Updates

A Framework for Selecting the Right LLM

GitView launches AI code review analysis for engineering teams using OctoAI

30 Days of Llama 3: Newest Member of the Herd is Living up to the Hype

Fine-tuned Mistral 7B delivers over 60x lower costs and comparable quality to GPT 4

Demos & Webinars

Selecting the right GenAI model for production

PDF Summarizer

Recipe generator app with Llama 2 and OctoAI SDK

Simple Chat

Trusted by GenAI Innovators

JSON mode for reliable structured output

Text embedding for RAG

Build using our high quality and cost effective Mixtral 8x7B & 8x22B models

Build using multiple models for your use case

Fast,
cost-optimized
LLM endpoints