# A.I. Agents - Token Per Minute (TPM) Limits in OpenAI and Google Gemini

With the exponential growth of language model–based applications, understanding usage limits such as Tokens per Minute (TPM) has become essential to ensure performance, scalability, and stability in production environments.

{% embed url="<https://youtu.be/12tgkndCduI>" %}

## ✅ What Tokens per Minute (TPM) are

Tokens are the smallest units of text processed by a language model. A token can be a word, part of a word, or even a punctuation symbol. The Tokens per Minute (TPM) limit represents the maximum number of tokens a model can process per minute — including input (prompt) and output (response).

## TPM limits at OpenAI (GPT-5, GPT-5.1, GPT-4.1, GPT-4o and Mini/Nano)

OpenAI offers a variety of models, each with different capabilities and usage limits. Typical TPM limits (subject to change):

| Model        | Tokens per Minute (TPM) | Requests per Minute (RPM) |
| ------------ | ----------------------- | ------------------------- |
| GPT-5        | (Check the website) TPM | (Check the website) RPM   |
| GPT5.1       | (Check the website) TPM | (Check the website) RPM   |
| GPT-5.1 Mini | (Check the website) TPM | (Check the website) RPM   |
| GPT-5.1 Nano | (Check the website) TPM | (Check the website) RPM   |
| GPT-4.1      | (Check the website) TPM | (Check the website) RPM   |
| GPT-4.1 Mini | (Check the website) TPM | (Check the website) RPM   |
| GPT-4.1 Nano | (Check the website) TPM | (Check the website) RPM   |
| GPT-4o       | (Check the website) TPM | (Check the website) RPM   |
| GPT-4o Mini  | (Check the website) TPM | (Check the website) RPM   |
| o4 Mini      | (Check the website) TPM | (Check the website) RPM   |
| o3 Mini      | (Check the website) TPM | (Check the website) RPM   |

{% hint style="info" %}
💡 Note: values may vary depending on the plan.\
You can view your current limits here:\
<https://platform.openai.com/docs/models>
{% endhint %}

## TPM limits at Gemini (Google AI) – Flash and Flash Light

Google has been evolving its Gemini model line, highlighting the 2.0 Flash series models, optimized for speed and cost-effectiveness.

| Model                  | Tokens per Minute (TPM) | Requests per Minute (RPM) |
| ---------------------- | ----------------------- | ------------------------- |
| Gemini 2.0 Flash       | (Check the website) TPM | (Check the website) RPM   |
| Gemini 2.0 Flash Light | (Check the website) TPM | (Check the website) RPM   |

{% hint style="info" %}
💡 Note: values may vary depending on the plan.\
You can view your current limits here:\
<https://ai.google.dev/gemini-api/docs/rate-limits?hl=pt-br>
{% endhint %}

## ⚠️ What if I exceed the limits

Both OpenAI and Google will return rate limit errors (e.g., HTTP 429). To work around this:

* You can raise your tier, following the specific rules for each model.
* It is possible to limit the number of characters used in the chatbot according to your needs.
* You can request a plan upgrade directly on the SprintHub platform.

## Best Practices for Token Optimization

* Simplify prompts – Avoid repetitions and unnecessary structures in your AI agent's rules.

## Conclusion

With the increasing sophistication of language models, efficient management of limits like TPM becomes a technical differentiator. Understanding these limits not only helps avoid failures but also enables scaling applications more intelligently, sustainably, and economically.

<br>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.sprinthub.com/en/topics/artificial-intelligence/ai-agents/ai-agents-token-limits-per-minute-tpm-on-open-ai-and-google-gemini.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
