Vertex AI APIs for building search and RAG experiences

Vertex AI offers a suite of APIs to help you build your own Retrieval Augmented Generation (RAG) applications or to build your own Search engine. This page introduces those APIs.

Retrieval and generation

RAG is a methodology that enables Large Language Models (LLMs) to generate responses that are grounded to your data source of choice. There are two stages in RAG:

  1. Retrieval: Getting the most relevant facts quickly can be a common search problem. With RAG, you can quickly retrieve the facts that are important to generate an answer.
  2. Generation: The retrieved facts are used by the LLM to generate a grounded response.

Vertex AI offers options for both stages to match a variety of developer needs.

Retrieval options include Vertex AI, building your own retriever, LlamaIndex on Vertex AI, bringing your own retriever, and Google Search. Generation options include Grounded generation API and Gemini.

Retrieval

Choose the best retrieval method for your needs:

  • Vertex AI Search: Vertex AI Search is a Google Search-quality information retrieval engine that can be a component of any generative AI application that uses your enterprise data. Vertex AI Search works as an out-of-the-box semantic & keyword search engine for RAG with the ability to process a variety of document types and with connectors to a variety of source systems including BigQuery and many third party systems.

    For more information, see Vertex AI Search.

  • Build your own retrieval: If you want to build your semantic search, you can rely on Vertex AI APIs for components of your custom RAG system. This suite of APIs provide high-quality implementations for document parsing, embedding generation, vector search, and semantic ranking. Using these lower-level APIs gives you full flexibility on the design of your retriever while at the same time offering accelerated time to market and high quality by relying on lower-level Vertex AI APIs.

    For more information, see Build your own Retrieval Augmented Generation.

  • Bring your own retrieval: You can use your existing search as a retriever for grounded generation. You can also use the Vertex APIs for RAG to upgrade your existing search to higher quality.

    For more information, see Build your own Retrieval Augmented Generation.

  • LlamaIndex on Vertex AI: LlamaIndex on Vertex AI enables developers familiar with this popular open source software (OSS) framework to build retrievers to be used in production and in an enterprise-ready context.

    For more information, see LlamaIndex on Vertex AI for RAG overview in the Generative AI on Vertex AI documentation.

  • Google Search: When you use Grounding with Google Search for your Gemini model, then Gemini uses Google Search and generates output that is grounded to the relevant search results. This retrieval method doesn't require management and you get the world's knowledge available to Gemini.

    For more information, see Grounding with Google Search in the Generative AI on Vertex AI documentation.

Generation

Choose the best generation method for your needs:

  • Grounded generation API (GA with allowlist): Use the grounded generation API to generate well-grounded answers to a user's query. This API uses a specialized, fine-tuned Gemini model and is an effective way to reduce hallucinations and provide responses grounded to your sources, third-party sources, or Google Search, including references to grounding support content.

    For more information, see Generate grounded answers.

  • Gemini: Gemini is Google's most capable model and offers out-of-the-box grounding with Google Search. You can use it to build your fully-customized grounded generation solution.

    For more information, see Grounding with Google Search in the Generative AI on Vertex AI documentation.

  • Model Garden: If you want full control and the model of your choice, you can use any of the models in Vertex AI Model Garden for generation.

Build your own Retrieval Augmented Generation

Developing a custom RAG system for grounding offers flexibility and control at every step of the process. Vertex AI offers a suite of APIs to help you create your own search solutions. Using those APIs gives you full flexibility on the design of your RAG application while at the same time offering accelerated time to market and high quality by relying on these lower-level Vertex AI APIs.

Vertex AI offers APIs for processing and annotating, embedding, indexing and retrieving, ranking, grounded generation, and validation.

  • The Document AI Layout Parser. The Document AI Layout Parser transforms documents in various formats into structured representations, making content like paragraphs, tables, lists, and structural elements like headings, page headers, and footers accessible, and creating context-aware chunks that facilitate information retrieval in a range of generative AI and discovery apps.

    For more information, see Document AI Layout Parser in the Document AI documentation.

  • Embeddings API: The Vertex AI embeddings APIs let you create embeddings for text or multimodal inputs. Embeddings are vectors of floating point numbers that are designed to capture the meaning of their input. You can use the embeddings to power semantic search using Vector search.

    For more information, see Text embeddings and Multimodal embeddings in the Generative AI on Vertex AI documentation.

  • Vector Search. The retrieval engine is a key part of your RAG or search application. Vertex AI Vector Search is a retrieval engine that can search from billions of semantically similar or semantically related items at scale, with high queries per second (QPS), high recall, low latency, and cost efficiency. It can search over dense embeddings, and supports sparse embedding keyword search and hybrid search in Public preview.

    For more information, see: Overview of Vertex AI Vector Search in the Vertex AI documentation.

  • The ranking API. The ranking API takes in a list of documents and reranks those documents based on how relevant the documents are to a given query. Compared to embeddings that look purely at the semantic similarity of a document and a query, the ranking API can give you a more precise score for how well a document answers a given query.

    For more information, see Rank and rerank documents.

  • The grounded generation API. Use the grounded generation API to generate well-grounded answers to a user's prompt. The grounding sources can be your Vertex AI Search data stores, custom data that you provide, or Google Search.

    For more information, see Generate grounded answers.

  • The check grounding API. The check grounding API determines how grounded a given piece of text is in a given set of reference texts. The API can generate supporting citations from the reference text to indicate where the given text is supported by the reference texts. Among other things, the API can be used to assess the grounded-ness of responses from a RAG systems. Additionally, as an experimental feature, the API also generates contradicting citations that show where the given text and reference texts disagree.

    For more information, see Check grounding.

Workflow: Generate grounded responses from unstructured data

Here's a workflow that outlines how to integrate the Vertex AI APIs to generate grounded responses from unstructured data.

  1. Import your unstructured documents, such as PDF files, HTML files, or images with text, into a Cloud Storage location.
  2. Process the imported documents using the layout parser. The layout parser breaks down the unstructured documents into chunks and transforms the unstructured content into its structured representation. The layout parser also extracts annotations from the chunks.
  3. Create text embeddings for chunks using Vertex AI text embeddings API.
  4. Index and retrieve the chunk embeddings using Vector Search.
  5. Rank the chunks using the ranking API and determine the top-ranked chunks.
  6. Generate grounded answers based on the top-ranked chunks using the grounded generation API.

If you generated the answers using an answer generation model other than the Google models, you can check the grounding of these answers using the check grounding method.