Join the Gemini API Developer Competition! Learn more

Gemini

Gemini is a family of generative AI models that lets developers generate content and solve problems. These models are designed and trained to handle both text and images as input. This guide provides information about each model variant to help you decide which is the best fit for your use case.

Model variants

The Gemini API offers different models that are optimized for specific use cases. Here's a brief overview of Gemini variants that are available:

Model variant	Input(s)	Output	Optimized for
Gemini 1.5 Pro `gemini-1.5-pro`	Audio, images, videos, and text	Text	Complex reasoning tasks such as code and text generation, text editing, problem solving, data extraction and generation
Gemini 1.5 Flash `gemini-1.5-flash`	Audio, images, videos, and text	Text	Fast and versatile performance across a diverse variety of tasks
Gemini 1.0 Pro `gemini-1.0-pro`	Text	Text	Natural language tasks, multi-turn text and code chat, and code generation
Gemini 1.0 Pro Vision `gemini-pro-vision`	Images, videos, and text	Text	Visual-related tasks, like generating image descriptions or identifying objects in images
Text Embedding `text-embedding-004`	Text	Text embeddings	Measuring the relatedness of text strings

The following table describes the attributes of the Gemini models which are common to all model variants:

Attribute	Description
Training data	Gemini's knowledge cutoff is November 2023. Knowledge about events after that time is limited.
Supported languages	See available languages
Configurable model parameters	Top p Top k Temperature Stop sequence Max output length Number of response candidates

See the model parameters section of the generative models guide for information about each of these parameters.

Gemini 1.5 Pro

Gemini 1.5 Pro is a mid-size multimodal model that is optimized for a wide-range of reasoning tasks such as:

Code generation
Text generation
Text editing
Problem solving
Recommendations generation
Information extraction
Data extraction or generation
Creation of AI agents

1.5 Pro can process large amounts of data at once, including 1 hour of video, 9.5 hours of audio, codebases with over 30,000 lines of code or over 700,000 words.

1.5 Pro is capable of handling zero-, one-, and few-shot learning tasks.

Model details

Property	Description
Model code	`models/gemini-1.5-pro-latest`
Inputs	Audio, images, video, and text
Output	Text
Supported generation methods	`generateContent`
Input token limit^[**]	1,048,576
Output token limit^[**]	8,192
Maximum number of images per prompt	3,600
Maximum video length	1 hour
Maximum audio length	Approximately 9.5 hours
Maximum number of audio files per prompt	1
Model safety	Automatically applied safety settings which are adjustable by developers. See our page on safety settings for details.
Rate limits^[*]	Free: 2 RPM 32,000 TPM 50 RPD 46,080,000 TPD Pay-as-you-go: 360 RPM 2 million TPM 10,000 RPD 14,400,000,000 TPD Two million context: 1 RPM 2 million TPM 50 RPD
System instructions	Supported
JSON mode	Supported
Latest version	`gemini-1.5-pro-latest`
Latest stable version	`gemini-1.5-pro`
Stable versions	`gemini-1.5-pro-001`
Latest update	May 2024

Gemini 1.5 Flash

Gemini 1.5 Flash is a fast and versatile multimodal model for scaling across diverse tasks.

Model details

Property	Description
Model code	`gemini-1.5-flash-latest`
Input(s)	Audio, images, video, and text
Output	Text
Supported generation methods	`generateContent`
Input token limit^[**]	1,048,576
Output token limit^[**]	8,192
Maximum number of images per prompt	3,600
Maximum video length	1 hour
Maximum audio length	Approximately 9.5 hours
Maximum number of audio files per prompt	1
Model safety	Automatically applied safety settings which are adjustable by developers. See our page on safety settings for details.
Rate limits^[*]	Free: 15 RPM 1 million TPM 1500 RPD Pay-as-you-go: 1000 RPM 2 million TPM
System instructions	Supported
JSON mode	Supported
Model tuning	Coming soon
Latest version	`gemini-1.5-flash-latest`
Latest stable version	`gemini-1.5-flash`
Stable versions	`gemini-1.5-flash-001`
Latest update	May 2024

Gemini 1.0 Pro

Gemini 1.0 Pro is an NLP model that handles tasks like multi-turn text and code chat, and code generation.

1.0 Pro is capable of handling zero-, one-, and few-shot learning tasks.

Model details

Property	Description
Model code	`models/gemini-1.0-pro`
Input	Text
Output	Text
Supported generation methods	Python: `generate_content` REST: `generateContent`
Rate limits^[*]	Free: 15 RPM 32,000 TPM 1,500 RPD 46,080,000 TPD Pay-as-you-go: 360 RPM 120,000 TPM 30,000 RPD 172,800,000 TPD
System instructions	Unsupported
JSON mode	Unsupported
Model tuning	Supported: `gemini-1.0-pro-001`
Latest version	`gemini-1.0-pro-latest`
Latest stable version	`gemini-1.0-pro`
Stable versions	`gemini-1.0-pro-001`
Latest update	February 2024

Note: gemini-pro is an alias for gemini-1.0-pro.

Gemini 1.0 Pro Vision

Gemini 1.0 Pro Vision is a performance-optimized multimodal model that can perform visual-related tasks. For example, 1.0 Pro Vision can generate image descriptions, identify objects present in images, provide information about places or objects present in images, and more.

1.0 Pro Vision is capable of handling zero-, one-, and few-shot tasks.

Model details

Property	Description
Model code	`models/gemini-pro-vision`
Inputs	Text, video, and images
Output	Text
Supported generation methods	Python: `generate_content` REST: `generateContent`
Input token limit^[*]	12,288
Output token limit^[*]	4,096
Maximum image size	No limit
Maximum number of images per prompt	16
Maximum video length	2 minutes
Maximum number of videos per prompt	1
Model safety	Automatically applied safety settings which are adjustable by developers. See our page on safety settings for details.
Rate limit^[*]	60 requests per minute
Latest version	`gemini-1.0-pro-vision-latest`
Latest stable version	`gemini-1.0-pro-vision`
Latest update	December 2023

Text Embedding and Embedding

Text Embedding

You can use the Text Embedding model to generate text embeddings for input text. For more information on the Text Embedding model, visit the Generative AI on Vertex AI documentation about text embeddings.

The Text Embedding model is optimized for creating embeddings with 768 dimensions for text of up to 2,048 tokens. Text Embedding offers elastic embedding sizes under 768. You can use elastic embeddings to generate smaller output dimensions and potentially save computing and storage costs with minor performance loss.

Model details

Property	Description
Model code	`models/text-embedding-004` (`text-embedding-preview-0409` in Vertex AI)
Input	Text
Output	Text embeddings
Input token limit	2,048
Output dimension size	768
Supported generation methods	Python: `embed_content` REST: `embedContent`
Model safety	No adjustable safety settings.
Rate limit^[*]	1,500 requests per minute
Latest update	April 2024

Embedding

You can use the Embedding model to generate text embeddings for input text.

The Embedding model is optimized for creating embeddings with 768 dimensions for text of up to 2,048 tokens.

Embedding model details

Property	Description
Model code	`models/embedding-001`
Input	Text
Output	Text embeddings
Input token limit	2,048
Output dimension size	768
Supported generation methods	Python: `embed_content` REST: `embedContent`
Model safety	No adjustable safety settings.
Rate limit^[*]	1,500 requests per minute
Latest update	December 2023

AQA

You can use the AQA model to perform Attributed Question-Answering (AQA)–related tasks over a document, corpus, or a set of passages. The AQA model returns answers to questions that are grounded in provided sources, along with estimating answerable probability.

Model details

Property	Description
Model code	`models/aqa`
Input	Text
Output	Text
Supported generation methods	Python: `GenerateAnswerRequest` REST: `generateAnswer`
Supported languages	English
Input token limit^[**]	7,168
Output token limit^[**]	1,024
Model safety	Automatically applied safety settings which are adjustable by developers. See our page on safety settings for details.
Rate limit^[*]	60 requests per minute
Latest update	December 2023

See the examples to explore the capabilities of these model variations.

[*] A token is equivalent to about 4 characters for Gemini models. 100 tokens are about 60-80 English words.

[**] RPM: Requests per minute
TPM: Tokens per minute
RPD: Requests per day
TPD: Tokens per day

Due to capacity limitations, specified maximum rate limits are not guaranteed.

Model version name patterns

Gemini models are available in either preview or stable versions. In your code, you can use one of the following model name formats to specify which model and version you want to use.

Latest: Points to the cutting-edge version of the model for a specified generation and variation. The underlying model is updated regularly and might be a preview version. Only exploratory testing apps and prototypes should use this alias.

To specify the latest version, use the following pattern: <model>-<generation>-<variation>-latest. For example, gemini-1.0-pro-latest.
Latest stable: Points to the most recent stable version released for the specified model generation and variation.

To specify the latest stable version, use the following pattern: <model>-<generation>-<variation>. For example, gemini-1.0-pro.
Stable: Points to a specific stable model. Stable models don't change. Most production apps should use a specific stable model.

To specify a stable version, use the following pattern: <model>-<generation>-<variation>-<version>. For example, gemini-1.0-pro-001.

Available languages

Gemini models are trained to work with the following languages:

Arabic (ar)
Bengali (bn)
Bulgarian (bg)
Chinese simplified and traditional (zh)
Croatian (hr)
Czech (cs)
Danish (da)
Dutch (nl)
English (en),
Estonian (et)
Finnish (fi)
French (fr)
German (de)
Greek (el)
Hebrew (iw)
Hindi (hi)
Hungarian (hu)
Indonesian (id)
Italian (it)
Japanese (ja)
Korean (ko)
Latvian (lv),
Lithuanian (lt)
Norwegian (no)
Polish (pl)
Portuguese (pt)
Romanian (ro)
Russian (ru)
Serbian (sr)
Slovak (sk)
Slovenian (sl)
Spanish (es)
Swahili (sw)
Swedish (sv)
Thai (th)
Turkish (tr)
Ukrainian (uk)
Vietnamese (vi)