Google is addressing user frustration with its Gemini AI platform by rolling out a series of updates designed to make usage limits more predictable and fair. Users of the Pro plan, and especially heavy users, had reported that their quotas were being exhausted far faster than expected, sometimes after only a handful of prompts. Now, the company is implementing several targeted fixes that should alleviate these issues.
Background: The Quota Controversy
Earlier this month, Android Authority reported that Google had quietly tightened the usage limits on its Gemini Pro plan. The changes led to widespread complaints as users found their allowances disappearing rapidly, even with routine tasks. Google initially responded by increasing quotas for what it calls "Antigravity users" — presumably those with heavy usage patterns — but the underlying problems persisted. Users noted that certain activities, such as generating short video clips or submitting complex multi-step prompts, seemed to consume disproportionately large chunks of their monthly quotas.
Now, Josh Woodward, Vice President at Google, has directly addressed the situation in a post on X. He acknowledged that users were encountering limits sooner than they should be, and outlined a series of fixes being rolled out. These changes aim to make usage more predictable, reduce confusion, and ensure quotas feel more consistent across different types of tasks.
Fixing the Omni Video Generation Bug
One of the most significant fixes involves a bug tied to Omni video generation. Users reported that just one or two video prompts could consume a large portion of their total quota. For example, someone experimenting with short clips or testing different styles might suddenly see their allowance drop dramatically after only a couple of attempts. Google has now fixed this issue. In addition, the company is immediately increasing allowances for heavy users. Ultra subscribers, for instance, will receive double the number of Omni video generations starting now.
This change is particularly important for creators and marketers who rely on Gemini's video capabilities for rapid prototyping or content creation. Previously, the unpredictable cost of video generation made it difficult to plan usage. Now, with the bug resolved and allowances increased, these users can operate with greater confidence.
Capping Costs for Complex Pro Prompts
Another major source of complaints was the cost of Google's Complex 3.1 Pro prompts. These are long, detailed instructions that often involve large file uploads or multi-step reasoning tasks. Users found that such prompts were also consuming quotas in an overly aggressive manner. Google is now addressing this by introducing per-prompt caps. Instead of one very heavy request potentially draining a substantial chunk of your usage, the system will now limit how much a single prompt can consume. The idea is to prevent extreme outliers where one task wipes out too much of your monthly allowance.
This change will be especially beneficial for researchers, developers, and data analysts who frequently submit complex queries. By capping the cost of individual prompts, Google ensures that no single task can dominate a user's quota, allowing for more balanced and predictable usage across a work session.
Failed Requests No Longer Counted
A particularly frustrating issue was that failed requests — those that returned errors due to system glitches or network issues — were still being deducted from users' quotas. Woodward noted that about 1 in 10 requests can fail due to system errors. This meant that even when Gemini was not performing as intended, users still lost valuable allowance. Google is now correcting this: if a request fails, it will not be charged against your usage. This fix helps ensure that users are only paying for successful interactions, which feels much fairer and reduces anxiety about trying out different prompts.
Flash-Lite Prompts Become Free
In a move that could significantly change how users interact with Gemini, Google has announced that Flash-Lite prompts will no longer count against quota at all. This effectively turns Flash-Lite into a free layer for lighter tasks. It also subtly encourages users to rely on lighter models when they do not need full reasoning power. This should help stretch the limits of higher tiers further, as users can reserve their premium quotas for more demanding tasks. This change is likely to be welcomed by casual users who frequently use Gemini for quick queries, summaries, or simple creativity exercises.
Improved Visibility for Deep Research Usage
Another area Google is improving is transparency around Deep Research usage. These are compute-heavy tasks where Gemini processes large inputs or runs multi-step analysis. Many users currently have little visibility into why their quotas drop faster on some days than others. The company is now working on more detailed breakdowns and notifications. Users will be able to see exactly which types of tasks are expensive and which are not. This will allow them to better plan their usage and avoid surprises.
Deep Research features are particularly popular among academics, journalists, and business analysts who need to synthesize large datasets or perform complex reasoning. The new breakdowns should help them prioritize their quotas for the most critical research tasks.
Persistent Model Selection
Finally, there is a useful improvement in how model selection works. Once you choose a specific model inside Gemini, the app will remember it across sessions. So if you prefer a particular writing or research setup, you won't need to select it every time you open the app. The only exception is when you hit a usage cap, in which case the system may automatically switch to a lighter model to keep things running. This small quality-of-life improvement will save users time and reduce friction, especially for those who frequently toggle between different models for different tasks.
These changes represent a clear effort by Google to listen to user feedback and make the Gemini platform more user-friendly. While usage limits are still in place, the company is working hard to ensure they feel logical and fair. By fixing bugs, capping aggressive quotas for complex prompts, making failed requests free, and offering free Flash-Lite usage, Google is addressing the core complaints that have been driving user frustration. The improved transparency for Deep Research and persistent model selection further enhance the experience. Whether these changes will fully satisfy users remains to be seen, but the direction is toward a more transparent, predictable, and equitable AI service.
Source: Android Authority News