NEXUS-Bench v0 Wave-9 · Production Ready

Multi-LLM routing that
actually measures quality.

The only production API with Brier-calibrated accuracy scores on every response. LinUCB bandit routing learns from your calls. Most requests cost $0 via Groq.

1.000
Peak quality score (Groq Qwen3)
$0
Cost for most classification calls
6
Providers, 16 models
900+
Tests in production infra
POST /api/route
// One call. NEXUS picks the optimal model.
const res = await fetch('https://nexus-api.vercel.app/api/route', {
method: 'POST',
headers: {
'Authorization': 'Bearer nxs_your_key',
'Content-Type': 'application/json',
},
body: JSON.stringify({
task_type: 'classification',
prompt: 'Is this email spam? ...',
})
})
// Response:
{ content: "Not spam.",
model_used: "groq-qwen",
cost_cents: 0,
quality_score: 1.000,
routing_path: "classification → groq-qwen (Q=1.00, cost=FREE)"
}

Why NEXUS routes better than OpenRouter

Brier-calibrated quality

Every model has an empirical quality score from NEXUS-Bench. Not marketing claims — actual accuracy on 1,000+ test tasks.

🧠

LinUCB bandit routing

Uses Upper Confidence Bound exploration to improve model selection with each API call. Gets smarter over time.

$0

Groq models are FREE

Classification (Q=1.000) and research (Q=0.920) via Groq cost $0. Most use cases never touch paid models.

🎯

Task-type routing

Tell us classification / research / reasoning / code-edit. We pick the cheapest model that meets your quality threshold.

Live Brier Leaderboard

NEXUS-Bench v0 Wave-9 — empirical quality, not vendor claims

RankModelQuality ScoreCost/1KTier
#1Groq Qwen3-32B
1.000
$0FREEFASTEST
#2Claude Sonnet 4.6
0.927
$0.071/KSCALEBEST QUALITY
#3Claude Haiku 4.5
0.887
$0.004/KSTARTER
#4DeepSeek Chat
0.881
$0.0005/KSTARTERBEST CODE
#5DeepSeek R1
0.874
$0.004/KPROBEST REASON
#6Groq Llama 4
0.920
$0FREE

Simple pricing. Cheaper than AWS Bedrock.

500 free calls to start. No credit card required.

Starter
$49/mo
1,000 calls/mo
  • All task types (classify, research, code, reason)
  • Groq FREE models included
  • Usage dashboard
  • Brier quality scores in every response
  • JSON response format
Start Free Trial
Most Popular
Professional
$149/mo
10,000 calls/mo
  • Everything in Starter
  • DeepSeek R1 reasoning access
  • Priority routing queue
  • Team API key management
  • Usage analytics export
  • $0.05/100 calls overage
Get Professional
Scale
$499/mo
100,000 calls/mo
  • Everything in Professional
  • Claude Sonnet 4.6 access
  • SLA: 99.9% uptime
  • Dedicated routing queue
  • Custom quality thresholds
  • White-label option available
Contact Sales