NEXUS-Bench v0 Wave-9 · Production Ready

Multi-LLM routing that
actually measures quality.

The only production API with Brier-calibrated accuracy scores on every response. LinUCB bandit routing learns from your calls. Most requests cost $0 via Groq.

Start free trial — 500 calls View leaderboard ↗

1.000

Peak quality score (Groq Qwen3)

Cost for most classification calls

Providers, 16 models

900+

Tests in production infra

POST /api/route

// One call. NEXUS picks the optimal model.
const res = await fetch('https://nexus-api.vercel.app/api/route', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer nxs_your_key',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    task_type: 'classification',
    prompt: 'Is this email spam? ...',
  })
})
// Response:
{ content: "Not spam.",
model_used: "groq-qwen",
cost_cents: 0,
quality_score: 1.000,
routing_path: "classification → groq-qwen (Q=1.00, cost=FREE)"
}

Why NEXUS routes better than OpenRouter

⚡

Brier-calibrated quality

Every model has an empirical quality score from NEXUS-Bench. Not marketing claims — actual accuracy on 1,000+ test tasks.

🧠

LinUCB bandit routing

Uses Upper Confidence Bound exploration to improve model selection with each API call. Gets smarter over time.

Groq models are FREE

Classification (Q=1.000) and research (Q=0.920) via Groq cost $0. Most use cases never touch paid models.

🎯

Task-type routing

Tell us classification / research / reasoning / code-edit. We pick the cheapest model that meets your quality threshold.

Live Brier Leaderboard

NEXUS-Bench v0 Wave-9 — empirical quality, not vendor claims

Rank	Model	Quality Score	Cost/1K	Tier
#1	Groq Qwen3-32B	1.000	$0	FREE	FASTEST
#2	Claude Sonnet 4.6	0.927	$0.071/K	SCALE	BEST QUALITY
#3	Claude Haiku 4.5	0.887	$0.004/K	STARTER
#4	DeepSeek Chat	0.881	$0.0005/K	STARTER	BEST CODE
#5	DeepSeek R1	0.874	$0.004/K	PRO	BEST REASON
#6	Groq Llama 4	0.920	$0	FREE

Simple pricing. Cheaper than AWS Bedrock.

500 free calls to start. No credit card required.

Starter

$49/mo

1,000 calls/mo

✓All task types (classify, research, code, reason)
✓Groq FREE models included
✓Usage dashboard
✓Brier quality scores in every response
✓JSON response format

Multi-LLM routing thatactually measures quality.