Troubleshooting · Provider rate limit
Fix a Claude rate limit during summarization.
By
ReduzUpdated May 11, 2026 Fix guide
Anthropic's Claude rate limits are enforced across four dimensions: requests per minute (RPM), input tokens per minute (ITPM), output tokens per minute (OTPM), and a 5-hour rolling window on some workloads. The limits depend on your Anthropic tier (1-4) and the specific Claude model — Sonnet and Opus have lower throughput ceilings than Haiku. The fastest fix is to verify the exact limit you hit, wait for the window to reset (60 seconds for per-minute, longer for tier-level), then either reduce source size, request a tier upgrade, or switch providers in Reduz while Claude's quota catches up.
Check these first
- You've hit the requests-per-minute (RPM) limit — common after rapid retries on a low tier.
- A long PDF or transcript exceeded the input-tokens-per-minute (ITPM) cap — Sonnet has notably lower ITPM than Haiku.
- Your generated output is hitting OTPM during a long summary or full-notes generation.
- Your Anthropic tier is too low for your workload — Tier 1 is the most restrictive.
- A prompt cache miss made the request consume more billable tokens than expected (caches reset after 5 minutes of inactivity).
Fix it in this order
- 1
Read the exact limit name in the error response
Anthropic returns specific error messages: "rate_limit_error" with body text identifying RPM, ITPM, OTPM, or organization-level. The fix is different for each — read the error before retrying blindly.
- 2
Wait at least 60 seconds for per-minute limits
RPM, ITPM, and OTPM all reset on a rolling 60-second window. Wait that long before retrying. Aggressive retries during the window keep the limit pinned.
- 3
Check your usage at console.anthropic.com
Open console.anthropic.com/settings/limits. Verify your tier, current usage, and the per-model RPM/ITPM/OTPM caps. If you're consistently at the ceiling, request a tier upgrade or scale workload.
- 4
Switch to a Haiku-tier Claude model
Claude Haiku 4.5 has substantially higher throughput than Sonnet 4.6 or Opus 4.7. For high-volume daily summaries, Haiku is often the right default. In Reduz settings under Anthropic, switch the active model and retry.
- 5
Reduce source size for big PDFs and transcripts
Long sources push ITPM to the ceiling in a single request. Use selected-text mode for one section, summarize a PDF page range instead of the whole document, or break a 90-minute video summary into shorter passes.
- 6
Request a tier upgrade for sustained workload
Anthropic auto-upgrades tiers based on consistent usage and spend. For production-grade workloads, submit a manual tier-increase request through the console. New tiers unlock substantially higher RPM/ITPM/OTPM ceilings.
- 7
Switch to another provider while you wait
If urgent work is blocked, switch Reduz to OpenAI, Google Gemini, DeepSeek, or xAI Grok. The same source runs through a different provider with different limits. Switch back to Claude once your window resets.
Diagnosis
Anthropic-controlled
Rate limits are set by Anthropic for your tier, your account, and the specific model. Reduz cannot raise them — it can only reduce request size or route to a different provider.
Model-specific
Claude Sonnet 4.6 and Opus 4.7 have lower throughput than Haiku 4.5. A workload that hits limits on Sonnet may run fine on Haiku.
Token-driven for long sources
A 30-page PDF can consume 30,000+ input tokens. On Tier 1 ITPM (40k for Sonnet), one large PDF can saturate the per-minute window immediately.
Tier-dependent
Tier 1 (small spend) has restrictive limits. Tier 3-4 unlocks RPM/ITPM/OTPM ceilings that fit production summarization workloads. Auto-upgrade happens with consistent usage; manual requests are also accepted.
Prompt cache helps if reused
Anthropic offers prompt caching that reduces input-token cost on cache hits. For repeat summaries of the same source with different prompts, enabling caching reduces effective ITPM usage substantially.
Keep Claude for quality, switch when blocked
Claude's summarization quality is excellent — especially Sonnet 4.6 for long technical content and structured outputs. The trade is that on lower tiers, rate limits can interrupt high-volume work. Reduz can't raise your Anthropic limit — that's on Anthropic — but it lets you keep your workflow moving while the limit resets. Switch to OpenAI, Google Gemini, DeepSeek, or xAI Grok for the next few summaries; switch back to Claude once you're past the window. For sustained high-volume use, request a tier upgrade through the Anthropic console — Tier 3-4 limits handle production summarization workloads comfortably. Hosted Free in Reduz also gives you a Claude-independent fallback that doesn't touch your Anthropic account at all.
Frequently asked questions
Can Reduz increase my Claude rate limit?
No. Anthropic sets and enforces rate limits at the account and tier level — no client can raise them. Reduz helps by switching to another provider while your Claude window resets, and by surfacing source-size controls that reduce per-request token consumption.
Why does Claude Sonnet hit rate limits faster than Haiku?
Anthropic provisions higher throughput to Haiku than to Sonnet or Opus on the same tier. If rate limits bite at daily volume, switch from Sonnet 4.6 to Haiku 4.5. Quality is excellent for routine summaries and the rate-limit ceiling is meaningfully higher. Reserve Sonnet 4.6 and Opus 4.7 for harder content (long technical PDFs, multi-step reasoning).
How do I request a Claude tier upgrade?
Anthropic auto-upgrades tiers based on consistent spend and usage. For faster upgrades, submit a manual request through console.anthropic.com — provide your workload pattern (summarization, agent, batch processing) and expected volume. Approvals typically take a few business days.
Does prompt caching help with rate limits?
Yes, for repeat summaries of the same source with different prompts. Anthropic's prompt caching reduces effective ITPM usage on cache hits — useful when you re-summarize the same article in different output styles. Caches expire after 5 minutes of inactivity.
What's the difference between Claude's RPM, ITPM, and OTPM?
RPM = requests per minute (count). ITPM = input tokens per minute (size of prompts you send). OTPM = output tokens per minute (size of responses you generate). Long PDFs push ITPM; full-notes generation pushes OTPM; rapid retries push RPM. The error response identifies which one you hit.
Is Reduz free?
Yes. Reduz includes 100 free credits a month. Using your own AI key removes the credit limit.
Do I need an account?
Not when you use your own AI key. An account is only needed for free credits, paid plans, or cloud backup.
Where is my data stored?
Summary history is stored in your browser. Cloud backup is opt-in and encrypted on your device before upload.
Which AI providers does Reduz support?
Reduz supports OpenAI, Anthropic Claude, Google Gemini, DeepSeek, and xAI Grok. You can also use free credits without setting up an AI account.