AI Inference, Routed Intelligently

Every model.
One endpoint.
Lower cost.

Route requests across 20+ open-source AI models. Intelligent failover, 99.99% uptime, and 40% lower costs. Drop-in replacement for the OpenAI SDK.

No credit card Cancel anytime OpenAI-compatible
Request RouterLive
→ POST /v1/chat/completions
model: "deepseek-v2"
messages: [...]
Routing to best provider...
Prov A
12ms
selected
Prov B
28ms
Prov C
45ms
← 200 OK
id: "chatcmpl-routed"
model: "deepseek-v2"
latency: 12ms
20+
Open-Source Models
99.99%
Uptime SLA
40%
Lower Cost
1
API Key

Why routed.sh

The advantage of aggregation.

No single provider can offer this.

No Single Point of Failure
One provider goes down and your app breaks. We route across multiple. You only have downtime if all of them are down simultaneously — that's never happened.
Provider A
Provider B
Provider C
Provider D
More Models Than Any Single Provider
Each provider offers a few models. We give you all of them from all providers. One endpoint for Llama, Mistral, DeepSeek, Qwen, and more.
llama-3mistral-7bdeepseek-v2qwen-72bcodestralphi-3
Model Fallback, Built In
Set your priority — "try this model first, fall back to that one." Your app never returns a 503 because one model is unavailable.
Fallback chain
deepseek-v2llama-3mistral-7b
One Price, Any Model
Other providers charge more for premium models. We don't. One request is one request regardless of which model you use.
OpenAI-Compatible, Zero Lock-In
Change your base URL to api.routed.sh, keep your existing SDK and code. Switch models with one parameter. Leave anytime.
base_url="https://api.routed.sh/v1"
Built by Devs, for Devs
We managed five API keys and got hit with outages one too many times. So we built what we wished existed. Small team, fast responses, you talk to us not a ticket queue.

How it works

Three steps. Two minutes.

1Point your SDK

Change your base URL. That's it.

Before
base_url = "https://api.openai.com/v1"
After
base_url = "https://api.routed.sh/v1"
2We route it

Smart router selects the best provider for your model.

Routing decision
Parsing request...
Checking provider availability...
Selecting optimal route...
Forwarding request
3Get a response

Same model, same output. Lower cost, better uptime.

deepseek-v2200 OK
via Provider A · 12ms
Routed: autoFallback: ready

Compare

routed.sh vs. going direct.

Featurerouted.shGoing Direct
Uptime99.99% (aggregated)99.5% (single provider)
Model choice20+ models, one keyFew models per provider
Failover
PricingPer-request, flatOften monthly commit
OpenAI-compatibleProvider-specific
Multi-provider routing
Vendor lock-in
Setup time< 2 minutesHours per provider
Model fallback

Pricing

One subscription. Every model.

Per-request pricing. No token math. No surprise bills.

MonthlyAnnual-20%
Pro
$8/mo
$10 billed annually
For individual developers and hobbyists.
100 requests
Get Started
  • 20+ open-source models
  • OpenAI-compatible API
  • Intelligent routing
  • Model fallback
  • 99.9% uptime
  • Community support
Ultra
POPULAR
$24/mo
$30 billed annually
For teams shipping production AI features.
1,000 requests
Get Started
  • 20+ open-source models
  • OpenAI-compatible API
  • Priority routing
  • Model fallback
  • 99.99% uptime SLA
  • Email support
  • Usage analytics
Max
$64/mo
$80 billed annually
For organizations with serious AI workloads.
2,500 requests
Contact Sales
  • 20+ open-source models
  • OpenAI-compatible API
  • Dedicated routing priority
  • Model fallback
  • 99.99% uptime SLA
  • Dedicated support
  • Advanced analytics
  • Audit logs
  • SLA credits
5 models going direct: $25+/mo5 models on routed.sh: $10/mo

FAQ

Questions & answers.

What does 'intelligent routing' mean?
When you make a request, we find the provider with the best availability and lowest latency for your model. If that provider is slow or down, we reroute to the next one — instantly, invisibly. You don't write fallback code. You don't handle errors that aren't yours.
Is it really OpenAI-compatible?
Yes. Change your base URL to api.routed.sh, keep your existing SDK and code. Same API format, same response structure. Zero migration effort.
What happens when a provider goes down?
Your request is automatically routed to another provider running the same model. You don't change anything. You don't notice anything. That's the point of aggregation.
How does per-request pricing work?
You pick a plan with a fixed number of requests per month. Model costs vary and are documented transparently. No token counting, no surprise bills.
Which models are available?
We carry the leading open-source models — Llama, Mistral, DeepSeek, Qwen, and more. As we add infrastructure partners, the catalog grows, and you get access automatically.
Can I use my existing OpenAI SDK?
Yes. Swap the base URL parameter, everything else stays the same. OpenAI Python SDK, Node SDK, HTTP calls — all work without changes.
What's the difference between Pro, Ultra, and Max?
Pro gives you 100 requests per month for individual use. Ultra gives you 1,000 requests with priority routing and model fallback for production workloads. Max gives you 2,500 requests with dedicated support, advanced analytics, and SLA credits for organizations.

One API key. Every model.

Stop managing multiple provider accounts. Route everything through routed.sh.

$export OPENAI_BASE_URL="https://api.routed.sh/v1"