microchip-aiA.I. Agents - Token Limits per Minute (TPM) on OpenAI and Google Gemini

In this article, we detail the TPM limits of the latest OpenAI models (such as GPT-5, GPT-5.1, GPT-4.1, GPT-4o, etc.) and Google's (Gemini 2.0 line).

With the exponential growth of language model–based applications, understanding usage limits such as Tokens per Minute (TPM) has become essential to ensure performance, scalability, and stability in production environments.

✅ What Tokens per Minute (TPM) are

Tokens are the smallest units of text processed by a language model. A token can be a word, part of a word, or even a punctuation symbol. The Tokens per Minute (TPM) limit represents the maximum number of tokens a model can process per minute — including input (prompt) and output (response).

TPM limits at OpenAI (GPT-5, GPT-5.1, GPT-4.1, GPT-4o and Mini/Nano)

OpenAI offers a variety of models, each with different capabilities and usage limits. Typical TPM limits (subject to change):

Model

Tokens per Minute (TPM)

Requests per Minute (RPM)

GPT-5

(Check the website) TPM

(Check the website) RPM

GPT5.1

(Check the website) TPM

(Check the website) RPM

GPT-5.1 Mini

(Check the website) TPM

(Check the website) RPM

GPT-5.1 Nano

(Check the website) TPM

(Check the website) RPM

GPT-4.1

(Check the website) TPM

(Check the website) RPM

GPT-4.1 Mini

(Check the website) TPM

(Check the website) RPM

GPT-4.1 Nano

(Check the website) TPM

(Check the website) RPM

GPT-4o

(Check the website) TPM

(Check the website) RPM

GPT-4o Mini

(Check the website) TPM

(Check the website) RPM

o4 Mini

(Check the website) TPM

(Check the website) RPM

o3 Mini

(Check the website) TPM

(Check the website) RPM

circle-info

💡 Note: values may vary depending on the plan. You can view your current limits here: https://platform.openai.com/docs/modelsarrow-up-right

TPM limits at Gemini (Google AI) – Flash and Flash Light

Google has been evolving its Gemini model line, highlighting the 2.0 Flash series models, optimized for speed and cost-effectiveness.

Model

Tokens per Minute (TPM)

Requests per Minute (RPM)

Gemini 2.0 Flash

(Check the website) TPM

(Check the website) RPM

Gemini 2.0 Flash Light

(Check the website) TPM

(Check the website) RPM

circle-info

💡 Note: values may vary depending on the plan. You can view your current limits here: https://ai.google.dev/gemini-api/docs/rate-limits?hl=pt-brarrow-up-right

⚠️ What if I exceed the limits

Both OpenAI and Google will return rate limit errors (e.g., HTTP 429). To work around this:

  • You can raise your tier, following the specific rules for each model.

  • It is possible to limit the number of characters used in the chatbot according to your needs.

  • You can request a plan upgrade directly on the SprintHub platform.

Best Practices for Token Optimization

  • Simplify prompts – Avoid repetitions and unnecessary structures in your AI agent's rules.

Conclusion

With the increasing sophistication of language models, efficient management of limits like TPM becomes a technical differentiator. Understanding these limits not only helps avoid failures but also enables scaling applications more intelligently, sustainably, and economically.

Last updated

Was this helpful?