Quotas and pricing

This page describes the quotas and pricing structure for the Gemini API from Vertex AI.

Quotas by region and model

The requests per minute (RPM) quota applies to a base model and all versions, identifiers, and tuned versions of that model. Here are some examples:

  • A request to gemini-1.0-pro and a request to gemini-1.0-pro-001 are counted as two requests toward the RPM quota of the base model, gemini-1.0 pro.

  • A request to gemini-1.0-pro-001 and a request to a tuned model that's based on gemini-1.0-pro-001 are counted as two requests toward the RPM quota of the base model, gemini-1.0-pro.

These quotas apply at the project level, and they are shared across all applications and IP addresses that use that Firebase project. Also, the quotas apply to any call to the Gemini API, whether it be using the Vertex AI for Firebase SDKs, the server SDKs (including via the Gemini Firebase Extensions), REST calls, or Vertex AI Studio.

Find the quotas for each model for each region in the Google Cloud documentation.

Request a quota increase

If you want to increase any of your quotas for Vertex AI, you can use the Google Cloud console to request a quota increase. To learn more about quotas, see Work with quotas.



Pricing

Using the Gemini API from Vertex AI requires that your Firebase project use the Blaze pay-as-you-go pricing plan.

Find the pricing for each model in the Google Cloud documentation.