Results & Leaderboard
Compare cognitive fingerprints across LLMs. See which models are most rational and how they compare across 15 known human biases.
Universal bias hotspots
2 biases appear in every model's top-5 most susceptible list: Endowment Effect and Gain-Loss Framing. No model we tested has solved these biases, regardless of provider, size, or architecture.
What the scores mean
Each model's overall score is its average Bias Magnitude Score (BMS) across the 15 tested biases. Lower means more rational. The top-ranked model, Claude Opus 4.6, scores 11.1%, while Claude Haiku 4.5 scores 26.7%. For high-stakes applications (medical, financial, legal), even small scores warrant attention.
No model is immune
Every model tested shows measurable bias. The gap between first and last place is 15.6%, so rankings do matter, but no model scores zero. If you're deploying a model for decision-support, look at its specific bias profile, not just its overall rank, and build safeguards for the biases it's most susceptible to.
| Rank | Model | Provider | BMS | BCI | BMP | HSS | RCI | CAS | |
|---|---|---|---|---|---|---|---|---|---|
#1 | Claude Opus 4.6 | Anthropic | 11.1% | 11.1% | 88.6% | 31.0% | 33.0% | 62.0% | 93.4% |
#2 | Claude Sonnet 4.6 | Anthropic | 18.1% | 18.1% | 84.7% | 34.1% | 36.1% | 56.5% | 95.0% |
#3 | Grok 4.1 Fast Reasoning | xAI | 18.8% | 18.8% | 86.5% | 55.9% | 37.9% | 57.8% | 85.4% |
#4 | GPT-5.2 | OpenAI | 21.0% | 21.0% | 78.6% | 59.8% | 40.5% | 55.7% | 97.2% |
#5 | Claude Sonnet 4.5 | Anthropic | 21.5% | 21.5% | 83.4% | 58.9% | 31.7% | 53.3% | 95.9% |
#6 | Claude Haiku 4.5 | Anthropic | 26.7% | 26.7% | 83.9% | 46.9% | 40.9% | 49.8% | 93.6% |
Evaluated February 14, 2026 across 15 biases
Overall Score (Avg BMS)
11.1%
Most Susceptible
Most Resistant
AI-Specific Biases