Cognitive Bias Benchmark for LLMs
How Biased Is Your AI?
KahneBench evaluates large language models for cognitive biases grounded in Kahneman-Tversky research. Lower scores mean more rational thinking.
Model Leaderboard
Models ranked by average bias magnitude (BMS) across tested biases (lower is better)
| Rank | Model | Provider | BMS | BCI | BMP | HSS | RCI | CAS | |
|---|---|---|---|---|---|---|---|---|---|
#1 | Claude Opus 4.7 | Anthropic | 8.3% | 8.3% | 87.9% | 39.6% | 30.4% | 60.0% | 57.6% |
#2 | Claude Opus 4.8 | Anthropic | 8.7% | 8.7% | 83.9% | 50.2% | 41.0% | 42.4% | 54.6% |
#3 | Claude Opus 4.6 | Anthropic | 11.1% | 11.1% | 88.6% | 31.0% | 33.0% | 62.0% | 93.4% |
#4 | GPT-5.5 | OpenAI | 13.6% | 13.6% | 93.9% | 50.8% | 29.4% | 69.7% | 55.2% |
#5 | Claude Sonnet 4.6 | Anthropic | 18.1% | 18.1% | 84.7% | 34.1% | 36.1% | 56.5% | 95.0% |
#6 | GPT-5.4 | OpenAI | 18.4% | 18.4% | 82.0% | 55.4% | 37.8% | 56.0% | 55.9% |
#7 | Grok 4.1 Fast Reasoning | xAI | 18.8% | 18.8% | 86.5% | 55.9% | 37.9% | 57.8% | 85.4% |
#8 | GPT-5.2 | OpenAI | 21.0% | 21.0% | 78.6% | 59.8% | 40.5% | 55.7% | 97.2% |
#9 | Claude Sonnet 4.5 | Anthropic | 21.5% | 21.5% | 83.4% | 58.9% | 31.7% | 53.3% | 95.9% |
#10 | Claude Haiku 4.5 | Anthropic | 26.7% | 26.7% | 83.9% | 46.9% | 40.9% | 49.8% | 93.6% |
Every model tested shows measurable cognitive bias. Overall scores range from 8.3% to 26.7% (lower is better).
See full findings