> For the complete documentation index, see [llms.txt](https://docs.sprinthub.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.sprinthub.com/en/topics/artificial-intelligence/ai-agents/ai-agents-token-per-minute-limits-tpm-in-openai-and-google-gemini.md).

# AI Agents - Token Per Minute (TPM) Limits in Open AI and Google Gemini

With the exponential growth of language model–based applications, understanding usage limits such as Tokens per Minute (TPM) has become essential to ensure performance, scalability, and stability in production environments.

{% embed url="<https://youtu.be/12tgkndCduI>" %}

## ✅ What Tokens per Minute (TPM) are

Tokens are the smallest units of text processed by a language model. A token can be a word, part of a word, or even a punctuation symbol. The Tokens per Minute (TPM) limit represents the maximum number of tokens a model can process per minute — including input (prompt) and output (response).

## TPM limits at OpenAI (GPT-5, GPT-5.1, GPT-4.1, GPT-4o and Mini/Nano)

OpenAI offers a variety of models, each with different capabilities and usage limits. Typical TPM limits (subject to change):

| Model        | Tokens per Minute (TPM) | Requests per Minute (RPM) |
| ------------ | ----------------------- | ------------------------- |
| GPT-5        | (Check the website) TPM | (Check the website) RPM   |
| GPT5.1       | (Check the website) TPM | (Check the website) RPM   |
| GPT-5.1 Mini | (Check the website) TPM | (Check the website) RPM   |
| GPT-5.1 Nano | (Check the website) TPM | (Check the website) RPM   |
| GPT-4.1      | (Check the website) TPM | (Check the website) RPM   |
| GPT-4.1 Mini | (Check the website) TPM | (Check the website) RPM   |
| GPT-4.1 Nano | (Check the website) TPM | (Check the website) RPM   |
| GPT-4o       | (Check the website) TPM | (Check the website) RPM   |
| GPT-4o Mini  | (Check the website) TPM | (Check the website) RPM   |
| o4 Mini      | (Check the website) TPM | (Check the website) RPM   |
| o3 Mini      | (Check the website) TPM | (Check the website) RPM   |

{% hint style="info" %}
💡 Note: values may vary depending on the plan.\
You can view your current limits here:\
<https://platform.openai.com/docs/models>
{% endhint %}

## TPM limits at Gemini (Google AI) – Flash and Flash Light

Google has been evolving its Gemini model line, highlighting the 2.0 Flash series models, optimized for speed and cost-effectiveness.

| Model                  | Tokens per Minute (TPM) | Requests per Minute (RPM) |
| ---------------------- | ----------------------- | ------------------------- |
| Gemini 2.0 Flash       | (Check the website) TPM | (Check the website) RPM   |
| Gemini 2.0 Flash Light | (Check the website) TPM | (Check the website) RPM   |

{% hint style="info" %}
💡 Note: values may vary depending on the plan.\
You can view your current limits here:\
<https://ai.google.dev/gemini-api/docs/rate-limits?hl=pt-br>
{% endhint %}

## ⚠️ What if I exceed the limits

Both OpenAI and Google will return rate limit errors (e.g., HTTP 429). To work around this:

* You can raise your tier, following the specific rules for each model.
* It is possible to limit the number of characters used in the chatbot according to your needs.
* You can request a plan upgrade directly on the SprintHub platform.

## Best Practices for Token Optimization

* Simplify prompts – Avoid repetitions and unnecessary structures in your AI agent's rules.

## Conclusion

With the increasing sophistication of language models, efficient management of limits like TPM becomes a technical differentiator. Understanding these limits not only helps avoid failures but also enables scaling applications more intelligently, sustainably, and economically.

<br>


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.sprinthub.com/en/topics/artificial-intelligence/ai-agents/ai-agents-token-per-minute-limits-tpm-in-openai-and-google-gemini.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
