Introduction to building gen AI apps on Databricks

Databricks provides a comprehensive platform to build, deploy, and manage GenAI applications. This article guides you through the essential components and processes involved in developing GenAI applications on Databricks.

Mosaic AI Model Training

Mosaic AI Model Training (formerly Foundation Model Training) on Databricks lets you customize large language models (LLMs) using your own data. This process involves fine-tuning the training of a pre-existing foundation model, significantly reducing the data, time, and compute resources required compared to training a model from scratch. Key features include:

  • Supervised fine-tuning: Adapt your model to new tasks by training on structured prompt-response data.

  • Continued pre-training: Enhance your model with additional text data to add new knowledge or focus on a specific domain.

  • Chat completion: Train your model on chat logs to improve conversational abilities.

External model integration

Databricks supports the integration of external models, allowing you to leverage third-party models hosted outside of Databricks. This streamlines the use and management of various LLM providers, such as OpenAI and Anthropic, within your organization.

Mosaic AI Agent Framework

Agent Framework comprises a set of tools on Databricks designed to help developers build, deploy, and evaluate production-quality agents like Retrieval Augmented Generation (RAG) applications.

Building high-quality agents requires a robust evaluation toolset to test and validate agent systems. Mosaic AI Agent Evaluation provides a platform to capture and implement human feedback, ground truth, response and request logs, LLM judge feedback, chain traces, and more.