📖 13 min read
OpenRouter Pricing 2026: Complete Guide to Every Model, Tier, and Hidden Cost
By Nik Sai | BetOnAI.net | Updated June 2026
I have been using OpenRouter since early 2024. Back then it was a scrappy aggregator with maybe 30 models. Today in June 2026, it has grown into arguably the most important middleware layer in the AI stack – routing requests to over 300 models from OpenAI, Anthropic, Google, Meta, Mistral, DeepSeek, Qwen, NVIDIA, and dozens of open-source providers.
But here is the question nobody seems to answer clearly: what does it actually cost? And more importantly – are you overpaying?
I spent two weeks documenting every model, every price point, and every hidden cost on OpenRouter. This is the guide I wish existed when I started.
📧 Want more like this? Get our free The 2026 AI Playbook: 50 Ways AI is Making People Rich — Free for a limited time - going behind a paywall soon
How OpenRouter Pricing Works (The Basics)
OpenRouter uses a pay-per-token model, same as direct API providers. You load credits into your account and get charged per million input tokens and per million output tokens. Simple enough.
But there are three layers to understand:
- Base model cost – what the underlying provider (OpenAI, Anthropic, etc.) charges. OpenRouter passes this through at the exact same per-token rate – no per-token markup.
- Credit purchase fee – OpenRouter charges a flat 5.5% fee when you buy credits ($0.80 minimum per transaction). So if you add $100, you get roughly $94.50 in inference credits. This is how they make money.
- Provider routing premium – some providers on OpenRouter charge different rates depending on speed/reliability tiers
The fee structure changed in 2025. OpenRouter used to add a per-token markup on top of provider pricing. Now they charge a flat 5.5% on credit purchases and pass through provider token prices exactly. This is actually a better deal for heavy users – you know exactly what the overhead is upfront.
Complete Model Pricing Table (June 2026)
Here is the full breakdown of the most popular models on OpenRouter. Since OpenRouter now passes through provider pricing with no per-token markup (just the 5.5% credit fee), the OpenRouter price matches direct API pricing. All prices are per million tokens.
Anthropic (Claude)
| Model | Input/1M | Output/1M | Context | Notes |
|---|---|---|---|---|
| Claude Opus 4 | $15.00 | $75.00 | 200K | Most powerful reasoning |
| Claude Opus 4.8 | $5.00 | $25.00 | 1M | NEW – 67% cheaper than Opus 4, 1M context |
| Claude Sonnet 4.6 | $3.00 | $15.00 | 1M | NEW – latest Sonnet, best value for most tasks |
| Claude Sonnet 4 | $3.00 | $15.00 | 1M | Extended thinking available at same rate |
| Claude Haiku 3.5 | $0.80 | $4.00 | 200K | Fast and cheap |
OpenAI
| Model | Input/1M | Output/1M | Context | Notes |
|---|---|---|---|---|
| GPT-4.1 | $2.00 | $8.00 | 1M | Flagship model |
| GPT-4.1 Mini | $0.40 | $1.60 | 1M | Best budget OpenAI option |
| GPT-4.1 Nano | $0.10 | $0.40 | 1M | Cheapest OpenAI model |
| GPT-4o | $2.50 | $10.00 | 128K | Previous gen, still available |
| GPT-4o Mini | $0.15 | $0.60 | 128K | |
| o3 | $2.00 | $8.00 | 200K | Reasoning model – price dropped from $10/$40 |
| o3-mini | $1.10 | $4.40 | 200K | Budget reasoning |
| o4-mini | $1.10 | $4.40 | 200K | Latest reasoning model |
| Model | Input/1M | Output/1M | Context | Notes |
|---|---|---|---|---|
| Gemini 2.5 Pro | $1.25 | $10.00 | 1M | Best value frontier reasoning |
| Gemini 2.5 Flash | $0.30 | $2.50 | 1M | Price increased from $0.075/$0.40 |
| Gemini 2.5 Flash Lite | $0.10 | $0.40 | 1M | NEW – ultra-cheap option |
Open Source and Others
| Model | Input/1M | Output/1M | Context | Notes |
|---|---|---|---|---|
| Llama 4 Scout | $0.10 | $0.30 | 10M | 10M context – cheapest long-context option |
| Llama 4 Maverick | $0.15 | $0.60 | 1M | |
| Llama 3.3 70B | $0.10 | $0.32 | 131K | Also available free tier |
| DeepSeek V3 | $0.20 | $0.77 | 164K | Price dropped slightly |
| DeepSeek R1 | $0.70 | $2.50 | 164K | Open-source reasoning |
| Mistral Large 2 | $2.00 | $6.00 | 128K | |
| Mistral Medium 3 | $0.40 | $2.00 | 131K | NEW – solid mid-range option |
| Qwen3 235B | $0.46 | $1.82 | 131K | Chinese open-weight leader |
| Qwen3.7-Plus | $0.40 | $1.60 | 1M | NEW – 1M context |
| NVIDIA Nemotron 3 Super | $0.09 | $0.45 | 1M | NEW – ultra-cheap 1M context |
| Gemma 4 31B | $0.20 | $0.20 | 128K | Also available free tier |
Prices as of June 12, 2026. OpenRouter passes through provider pricing with no per-token markup – you pay the listed token price plus a 5.5% fee on credit purchases. Check their /models endpoint for real-time rates.
The Hidden Costs Nobody Talks About
1. The 5.5% Credit Fee Adds Up
OpenRouter charges a flat 5.5% fee when you purchase credits – not a per-token markup. This means the per-token prices you see on OpenRouter match what providers charge directly. The overhead is entirely in the credit purchase.
Let me put this in perspective. If you load $1,000 in credits, you get about $945 in actual inference spend. At $10,000/month, that is $550 going to OpenRouter. Better than the old 10% per-token markup, but still meaningful at scale. Enterprise customers can negotiate lower fees and use invoicing.
2. Credit Expiration
This one stings. OpenRouter credits do not last forever. If you prepay a large amount and do not use it within the specified window, you lose it. The current policy as of early 2026:
- Credits purchased directly expire after 12 months of account inactivity
- “Free” credits from promotions expire after 30 days
- There is no partial refund mechanism for unused credits
My advice: do not bulk-buy credits unless you have predictable usage patterns. Start with $20-50 and scale up as you understand your consumption.
3. Rate Limits Vary by Provider
OpenRouter does not give you a single rate limit. Each underlying provider has its own limits, and OpenRouter inherits them. This means:
- Claude models through OpenRouter are subject to Anthropic’s rate limits (which are more restrictive than OpenAI’s)
- Popular open-source models can get congested during peak hours
- Some providers on OpenRouter offer “priority” routing for higher prices
4. Provider Routing Is Not Always Transparent
When you request an open-source model like Llama 4, OpenRouter routes your request to one of several hosting providers. These providers have different performance characteristics and sometimes different prices. You can pin a specific provider, but the default “auto” routing may not always give you the cheapest option.
5. Context Window Costs
OpenRouter charges for context window usage the same way direct APIs do, but there is a catch. Some models on OpenRouter have reduced context windows compared to their direct API counterparts. For example, a model might support 200K tokens directly but only 128K through certain OpenRouter providers. You are paying the same per-token rate but getting less capacity.
OpenRouter vs Direct API: When Each Makes Sense
Choose OpenRouter When:
- You use multiple models regularly. If you bounce between Claude, GPT-4.1, Gemini, and open-source models, having one API key and one billing system is genuinely valuable. Managing 5 separate API accounts is a real pain.
- You are prototyping or experimenting. Testing which model works best for your use case is dramatically easier with OpenRouter. Switch models by changing one parameter instead of rewriting integration code.
- You want automatic failover. If Claude goes down, OpenRouter can automatically route to GPT-4.1. This reliability layer is worth the markup for production applications.
- You are spending under $500/month. At this scale, the markup is small in absolute terms and the convenience is real.
Choose Direct APIs When:
- You primarily use one provider. If 90% of your calls go to Claude, just use Anthropic’s API directly and save 10%.
- You are spending over $2,000/month. The markup adds up. At $5K/month, you are paying $500+ for convenience you could replace with a simple routing layer.
- You need maximum rate limits. Direct API access typically gives you higher rate limits than going through OpenRouter.
- You need enterprise SLAs. OpenRouter does not offer the same enterprise agreements that OpenAI, Anthropic, or Google provide.
Real Cost Scenarios
Scenario 1: The Hobbyist ($10-30/month)
You are building a side project, experimenting with different models, maybe running a personal AI assistant.
| Usage | OpenRouter Cost | Direct API Cost | Difference |
|---|---|---|---|
| 5M input + 2M output tokens (Claude Sonnet 4.6) | $47.48 | $45.00 | $2.48 |
| 2M input + 1M output tokens (GPT-4.1 Mini) | $2.53 | $2.40 | $0.13 |
| 10M input + 5M output tokens (Gemini 2.5 Flash) | $16.43 | $15.50 | $0.93 |
Verdict: Use OpenRouter. The markup is literally a few dollars, and the convenience of one dashboard and easy model switching is worth way more than that.
Scenario 2: The Freelancer/Solo Developer ($100-500/month)
You are building client projects, running AI-powered features, maybe processing documents or generating content at moderate scale.
At $300/month in API spend, the OpenRouter credit fee costs you roughly $16.50. That is less than one lunch. If you are using multiple models (say Claude for reasoning, GPT-4.1 Mini for quick tasks, and Gemini for long context), OpenRouter is still the smart play.
Verdict: OpenRouter makes sense unless you are 90%+ on a single provider.
Scenario 3: The Agency ($500-2,000/month)
You are running AI features for multiple clients, processing significant volume, and need reliability.
At $1,500/month, the credit fee is about $82. That is starting to matter. But you also need failover, multi-model routing, and simplified billing. The question becomes: can you build a comparable routing layer for less than $82/month in engineering time?
Verdict: Borderline. Consider a hybrid approach – use direct APIs for your primary high-volume model, and OpenRouter for everything else.
Scenario 4: The Enterprise ($5,000+/month)
At this scale, the math is clear. A $5,000/month spend means $275+ going to OpenRouter’s credit fee. Over a year, that is $3,300. Enterprise plans can reduce this with custom pricing and invoicing. You can absolutely build an internal routing layer for less than that.
Join 2,400+ readers getting weekly AI insights
Free strategies, tool reviews, and money-making playbooks - straight to your inbox.
No spam. Unsubscribe anytime.
Verdict: Use direct APIs. Build a thin routing layer. The markup does not justify itself at scale.
OpenRouter’s Free Tier and Credits System
OpenRouter’s free tier has gotten significantly better in 2026. Here is what you need to know:
- 26 free models available. No credit card required – just create an account and get an API key. Free models include Qwen3 Coder, DeepSeek V4 Flash, Llama 3.3 70B, Google Gemma 4 31B, NVIDIA Nemotron 3 Super, and OpenAI GPT-OSS 120B. These are not toy models – several are genuinely useful for production work.
- Free tier rate limits. 20 requests/minute and 200 requests/day. Enough for testing and light personal use.
- Users with $10+ in credits unlock higher rate limits on paid models – worth doing even if you mainly use free models.
- Credit purchases are non-refundable. Once you buy, you are committed.
- Minimum purchase is $5. Reasonable for testing. Remember the 5.5% fee applies.
- Free models can be removed or adjusted without notice – do not build production apps that depend on a specific free model staying free.
The OpenRouter API Advantage: Model Routing
The killer feature of OpenRouter that justifies its existence is model routing. You can set up fallback chains like this:
“Try Claude Opus 4 first. If it is rate-limited or down, fall back to GPT-4.1. If that fails, use Gemini 2.5 Pro. If everything fails, use Llama 4 Maverick.”
You are reading BetOnAI
While everyone else is reacting to AI news, BetOnAI readers are getting ahead of it. We break down the signals that matter – before the mainstream catches up. Bookmark this. Share it with one person who needs to hear it. This is your edge.
This kind of resilience is genuinely hard to build yourself. You need health checks, latency monitoring, automatic rerouting, and unified error handling across four different API specifications. OpenRouter handles all of this for a 5.5% credit fee.
For production applications where downtime costs real money, this feature alone can justify the cost.
Comparing OpenRouter to Alternatives
| Feature | OpenRouter | Direct APIs | LiteLLM (Self-hosted) | Portkey |
|---|---|---|---|---|
| Model count | 300+ | Per provider | 100+ | 150+ |
| Markup | 5.5% credit fee | 0% | 0% (hosting costs) | ~5-15% |
| Auto-failover | Yes | No | Yes | Yes |
| Unified billing | Yes | No | No | Yes |
| Setup time | 5 minutes | Per provider | 1-2 hours | 15 minutes |
| Enterprise SLA | No | Yes (most) | N/A | Yes |
LiteLLM deserves special mention here. It is an open-source proxy that does much of what OpenRouter does but runs on your own infrastructure. If you are technical enough to deploy it, you get OpenRouter-style routing at zero markup – just your hosting costs. For teams spending over $2,000/month, it is worth evaluating seriously.
Tips for Minimizing OpenRouter Costs
- Use the cheapest model that works. Do not default to Claude Opus 4 for everything. Most tasks work fine with Sonnet 4 or GPT-4.1 Mini at a fraction of the cost.
- Pin providers for open-source models. Check which provider offers the best rate for Llama/Mistral models and pin to them explicitly.
- Monitor your usage dashboard. OpenRouter’s usage analytics show you exactly where your money goes. Check it weekly.
- Use streaming wisely. Streaming responses cost the same per token but can reduce perceived latency, which means you might be less tempted to retry failed requests.
- Cache when possible. If you are making similar requests, cache the results on your end. OpenRouter does not cache for you.
- Set spending limits. OpenRouter lets you set daily and monthly spending caps. Use them. A runaway loop can burn through $500 in hours.
OpenRouter for Specific Use Cases
AI App Development
If you are building an AI-powered application – a chatbot, a document processor, a writing assistant – OpenRouter gives you one specific advantage that is hard to replicate: A/B testing models without engineering overhead.
I tested this with a document summarization tool I was building in February. Using OpenRouter, I routed 50% of requests to Claude Sonnet 4 and 50% to GPT-4.1 Mini, then compared output quality and cost. The result: GPT-4.1 Mini was 85% as good at 13% of the cost for my specific use case. That experiment took 30 minutes to set up through OpenRouter. Doing it with direct APIs would have required integrating two separate SDKs, handling two different error formats, and managing two billing accounts.
For prototyping and model evaluation, OpenRouter is genuinely the fastest path from idea to data.
Content Generation at Scale
If you are running a content operation – generating product descriptions, social media posts, email drafts, or article outlines – the model choice matters less than the volume economics. At this scale, the 10% markup becomes a real line item.
Let me walk through real numbers. A content agency generating 500 blog outlines per month (roughly 1,000 tokens input, 2,000 tokens output each) would spend:
- Using GPT-4.1 Mini on OpenRouter: $0.20 (input) + $0.80 (output) = $1.00/month + $0.06 fee = $1.06/month
- Using GPT-4.1 Mini directly: $0.20 + $0.80 = $1.00/month
- Using Gemini 2.5 Flash Lite on OpenRouter: $0.05 + $0.20 = $0.25/month + pennies in fees
At these volumes, the markup is negligible. But scale this to 50,000 articles per month and the differences start compounding into meaningful budget items.
RAG (Retrieval-Augmented Generation) Pipelines
RAG workloads are input-heavy – you are stuffing retrieved documents into the context window. This makes input token pricing critical. For a typical RAG pipeline processing 10,000 queries per day with 5,000 input tokens and 500 output tokens each:
- Claude Sonnet 4.6 on OpenRouter: $150/day input + $75/day output = $225/day ($6,750/month in tokens + ~$371/month credit fee)
- Claude Sonnet 4.6 directly: $150/day + $75/day = $225/day ($6,750/month)
- OpenRouter overhead: ~$371/month (5.5% credit fee)
At $371/month in credit fees for a single RAG pipeline, you should seriously consider using the direct API. The failover benefits do not justify that cost when you can implement basic retry logic yourself.
OpenRouter OAuth and Third-Party Apps
One feature that does not get enough attention is OpenRouter’s OAuth system. Third-party apps can integrate OpenRouter so users bring their own API credits. This means developers can build AI-powered tools without paying for API costs themselves – users authenticate with their OpenRouter account and pay per usage.
This is a smart model for indie developers. Instead of charging a subscription and eating unpredictable API costs, you let users pay their own token costs through OpenRouter. Several popular open-source projects have adopted this approach, including some AI chat interfaces and coding tools.
The trade-off: your users pay the OpenRouter markup, which makes your app slightly more expensive to use than a self-hosted alternative. But for many users, the convenience of a single OpenRouter account across multiple apps outweighs the cost.
What Changed in 2026 (Updated June)
Several significant shifts have happened in OpenRouter pricing this year:
- Fee model shifted to flat 5.5% credit purchase fee. OpenRouter moved away from per-token markups. You now pay the exact same per-token price as direct APIs, with the fee baked into credit purchases. This is more transparent and slightly cheaper for most users.
- Claude Opus 4.8 dropped – and it is a game changer. At $5/$25 per million tokens with 1M context, it is 67% cheaper than Opus 4 ($15/$75) while offering a larger context window. If you were avoiding Opus pricing, Opus 4.8 changes the math entirely.
- Claude Sonnet 4.6 is the new default. Latest Sonnet-class model at the same $3/$15 pricing. No reason not to upgrade.
- o3 pricing collapsed. Went from $10/$40 to $2/$8 per million tokens – an 80% price drop. OpenAI’s reasoning models are now priced identically to GPT-4.1.
- Gemini 2.5 Flash got more expensive. Jumped from $0.075/$0.40 to $0.30/$2.50. Still cheap, but not the insane deal it was. Gemini 2.5 Flash Lite ($0.10/$0.40) fills the old price point.
- Gemini 2.0 Flash deprecated June 1. If your code still references it, switch to 2.5 Flash or Flash Lite.
- 26 free models now available. Including Llama 3.3 70B, Gemma 4 31B, and NVIDIA Nemotron 3 Super. Serious models, not just toy demos.
- Chinese open-weight models dominate volume. Qwen3, DeepSeek, and similar models now account for 45%+ of OpenRouter traffic by token volume. The ecosystem has gone truly global.
- Every price tier now has a 1M+ context option. You are no longer forced into expensive models just because you need large context windows. Llama 4 Scout offers 10M context at $0.10/$0.30.
The Bottom Line
OpenRouter is not a scam and it is not overpriced. It is a convenience layer with a reasonable markup that makes sense for certain usage patterns and does not make sense for others.
The decision framework is simple:
If you use multiple models, spend under $500/month, or need production failover – use OpenRouter. If you are single-provider and spending over $2,000/month – go direct. Everything in between is a judgment call based on how much you value your time versus your money.
For most individual developers and small teams in 2026, OpenRouter is still the best way to access the AI model ecosystem without drowning in API key management. The switch from per-token markup to a flat 5.5% credit fee actually makes it a better deal than it was six months ago. Just go in with your eyes open about the fee structure.
Frequently Asked Questions
Does OpenRouter store my prompts or responses?
OpenRouter’s privacy policy states they may log requests for abuse prevention and debugging but do not use your data for model training. However, your prompts are forwarded to the underlying model provider (OpenAI, Anthropic, etc.), and those providers have their own data policies. If privacy is critical, check both OpenRouter’s and the model provider’s terms.
Can I use OpenRouter for commercial applications?
Yes. OpenRouter does not restrict commercial use. Your usage is governed by the terms of the underlying model providers. Most commercial models (GPT-4.1, Claude, Gemini) allow commercial use through their APIs, and OpenRouter passes that through.
What happens if a provider goes down?
If you have not pinned a specific provider, OpenRouter automatically routes to another available provider for the same model. If all providers for a model are down, you get an error. You can configure fallback models to handle this scenario.
Is there a self-hosted version of OpenRouter?
No. OpenRouter is a hosted service only. If you want a self-hosted equivalent, look at LiteLLM or build a custom routing layer. Several open-source projects provide similar functionality.
Sources and References
- OpenRouter API documentation and /models endpoint (openrouter.ai/docs)
- Anthropic API pricing page (docs.anthropic.com)
- OpenAI API pricing page (platform.openai.com/pricing)
- Google AI Studio pricing (ai.google.dev/pricing)
- LiteLLM documentation (docs.litellm.ai)
- Portkey pricing page (portkey.ai/pricing)
- Author’s personal OpenRouter usage data, January-June 2026
- OpenRouter pricing page (openrouter.ai/pricing) – verified June 12, 2026
You just read something most people will not find for months.
BetOnAI tracks the real shifts in AI – the pricing moves, the tool wars, the career pivots – so you can act while others are still reading headlines. New deep dives drop daily.
Enjoyed this? There's more where that came from.
Get the AI Playbook - 50 ways AI is making people money in 2026.
Free for a limited time.
Join 2,400+ subscribers. No spam ever.