AI Inference, Routed Intelligently

Every model.
One endpoint.
Lower cost.

Route requests across 20+ open-source AI models. Intelligent failover, 99.99% uptime, and 40% lower costs. Drop-in replacement for the OpenAI SDK.

Get Started See pricing

No credit card Cancel anytime OpenAI-compatible

Request RouterLive

→ POST /v1/chat/completions

model: "deepseek-v2"

messages: [...]

Routing to best provider...

Prov A

12ms

selected

Prov B

28ms

Prov C

45ms

← 200 OK

id: "chatcmpl-routed"

model: "deepseek-v2"

latency: 12ms

20+

Open-Source Models

99.99%

Uptime SLA

40%

Lower Cost

API Key

Why routed.sh

The advantage of aggregation.

No single provider can offer this.

No Single Point of Failure

One provider goes down and your app breaks. We route across multiple. You only have downtime if all of them are down simultaneously — that's never happened.

Provider A

Provider B

Provider C

Provider D

More Models Than Any Single Provider

Each provider offers a few models. We give you all of them from all providers. One endpoint for Llama, Mistral, DeepSeek, Qwen, and more.

llama-3mistral-7bdeepseek-v2qwen-72bcodestralphi-3

Model Fallback, Built In

Set your priority — "try this model first, fall back to that one." Your app never returns a 503 because one model is unavailable.

Fallback chain

deepseek-v2llama-3mistral-7b

One Price, Any Model

Other providers charge more for premium models. We don't. One request is one request regardless of which model you use.

OpenAI-Compatible, Zero Lock-In

Change your base URL to api.routed.sh, keep your existing SDK and code. Switch models with one parameter. Leave anytime.

base_url="https://api.routed.sh/v1"

Built by Devs, for Devs

We managed five API keys and got hit with outages one too many times. So we built what we wished existed. Small team, fast responses, you talk to us not a ticket queue.

How it works

Three steps. Two minutes.

1Point your SDK

Change your base URL. That's it.

Before

base_url = "https://api.openai.com/v1"

After

base_url = "https://api.routed.sh/v1"

2We route it

Smart router selects the best provider for your model.

Routing decision

Parsing request...

Checking provider availability...

Selecting optimal route...

Forwarding request

3Get a response

Same model, same output. Lower cost, better uptime.

deepseek-v2200 OK

via Provider A · 12ms

Routed: autoFallback: ready

Compare

routed.sh vs. going direct.

Feature	routed.sh	Going Direct
Uptime	99.99% (aggregated)	99.5% (single provider)
Model choice	20+ models, one key	Few models per provider
Failover
Pricing	Per-request, flat	Often monthly commit
OpenAI-compatible		Provider-specific
Multi-provider routing
Vendor lock-in
Setup time	< 2 minutes	Hours per provider
Model fallback

Pricing

One subscription. Every model.

Per-request pricing. No token math. No surprise bills.

MonthlyAnnual-20%

Pro

$8/mo

$10 billed annually

For individual developers and hobbyists.

100 requests

Get Started

20+ open-source models
OpenAI-compatible API
Intelligent routing
Model fallback
99.9% uptime
Community support

Ultra

POPULAR

$24/mo

$30 billed annually

For teams shipping production AI features.

1,000 requests

Get Started

20+ open-source models
OpenAI-compatible API
Priority routing
Model fallback
99.99% uptime SLA
Email support
Usage analytics

Max

$64/mo

$80 billed annually

For organizations with serious AI workloads.

2,500 requests

Contact Sales

20+ open-source models
OpenAI-compatible API
Dedicated routing priority
Model fallback
99.99% uptime SLA
Dedicated support
Advanced analytics
Audit logs
SLA credits

5 models going direct: $25+/mo|5 models on routed.sh: $10/mo

FAQ

Questions & answers.

What does 'intelligent routing' mean?

When you make a request, we find the provider with the best availability and lowest latency for your model. If that provider is slow or down, we reroute to the next one — instantly, invisibly. You don't write fallback code. You don't handle errors that aren't yours.

Is it really OpenAI-compatible?

Yes. Change your base URL to api.routed.sh, keep your existing SDK and code. Same API format, same response structure. Zero migration effort.

What happens when a provider goes down?

Your request is automatically routed to another provider running the same model. You don't change anything. You don't notice anything. That's the point of aggregation.

How does per-request pricing work?

You pick a plan with a fixed number of requests per month. Model costs vary and are documented transparently. No token counting, no surprise bills.

Which models are available?

We carry the leading open-source models — Llama, Mistral, DeepSeek, Qwen, and more. As we add infrastructure partners, the catalog grows, and you get access automatically.

Can I use my existing OpenAI SDK?

Yes. Swap the base URL parameter, everything else stays the same. OpenAI Python SDK, Node SDK, HTTP calls — all work without changes.

What's the difference between Pro, Ultra, and Max?

Pro gives you 100 requests per month for individual use. Ultra gives you 1,000 requests with priority routing and model fallback for production workloads. Max gives you 2,500 requests with dedicated support, advanced analytics, and SLA credits for organizations.

One API key. Every model.

Stop managing multiple provider accounts. Route everything through routed.sh.

$export OPENAI_BASE_URL="https://api.routed.sh/v1"▎

Get Started Talk to Sales