KahneBench
Benchmark Results

Results & Leaderboard

Compare cognitive fingerprints across LLMs. See which models are most rational and how they compare across 15 known human biases.

Key Findings

Universal bias hotspots

2 biases appear in every model's top-5 most susceptible list: Endowment Effect and Gain-Loss Framing. No model we tested has solved these biases, regardless of provider, size, or architecture.

What the scores mean

Each model's overall score is its average Bias Magnitude Score (BMS) across the 15 tested biases. Lower means more rational. The top-ranked model, Claude Opus 4.6, scores 11.1%, while Claude Haiku 4.5 scores 26.7%. For high-stakes applications (medical, financial, legal), even small scores warrant attention.

No model is immune

Every model tested shows measurable bias. The gap between first and last place is 15.6%, so rankings do matter, but no model scores zero. If you're deploying a model for decision-support, look at its specific bias profile, not just its overall rank, and build safeguards for the biases it's most susceptible to.

Selected model
Cognitive Fingerprint: Core Bias Profile
Each axis shows treatment-condition bias rate from 0% (center) to 100% (edge). Smaller shapes indicate lower observed susceptibility under bias-triggering prompts. The gray area shows literature-derived human baseline rates.
Claude Opus 4.6
Human Baseline
Center = 0% biased responses ยท Edge = 100%
Model Leaderboard
Models ranked by average bias magnitude (BMS) across tested biases (lower is better)
RankModelProviderBMSBCIBMPHSSRCICAS
#1
Claude Opus 4.6Anthropic11.1%11.1%88.6%31.0%33.0%62.0%93.4%
#2
Claude Sonnet 4.6Anthropic18.1%18.1%84.7%34.1%36.1%56.5%95.0%
#3
Grok 4.1 Fast ReasoningxAI18.8%18.8%86.5%55.9%37.9%57.8%85.4%
#4
GPT-5.2OpenAI21.0%21.0%78.6%59.8%40.5%55.7%97.2%
#5
Claude Sonnet 4.5Anthropic21.5%21.5%83.4%58.9%31.7%53.3%95.9%
#6
Claude Haiku 4.5Anthropic26.7%26.7%83.9%46.9%40.9%49.8%93.6%
Claude Opus 4.6
Anthropiccore tier

Evaluated February 14, 2026 across 15 biases

Overall Score (Avg BMS)

11.1%

Most Susceptible

Gain-Loss FramingBase Rate NeglectEndowment Effect+2 more

Most Resistant

Conjunction FallacyConfirmation BiasSunk Cost Fallacy+2 more

AI-Specific Biases

None detected
Per-Bias Analysis
15 biases evaluated at core tier