Cognitive Bias Benchmark for LLMs
How Biased Is Your AI?
KahneBench evaluates large language models for cognitive biases grounded in Kahneman-Tversky research. Lower scores mean more rational thinking.
Model Leaderboard
Models ranked by average bias magnitude (BMS) across tested biases (lower is better)
| Rank | Model | Provider | BMS | BCI | BMP | HSS | RCI | CAS | |
|---|---|---|---|---|---|---|---|---|---|
#1 | Claude Opus 4.6 | Anthropic | 11.1% | 11.1% | 88.6% | 31.0% | 33.0% | 62.0% | 93.4% |
#2 | Claude Sonnet 4.6 | Anthropic | 18.1% | 18.1% | 84.7% | 34.1% | 36.1% | 56.5% | 95.0% |
#3 | Grok 4.1 Fast Reasoning | xAI | 18.8% | 18.8% | 86.5% | 55.9% | 37.9% | 57.8% | 85.4% |
#4 | GPT-5.2 | OpenAI | 21.0% | 21.0% | 78.6% | 59.8% | 40.5% | 55.7% | 97.2% |
#5 | Claude Sonnet 4.5 | Anthropic | 21.5% | 21.5% | 83.4% | 58.9% | 31.7% | 53.3% | 95.9% |
#6 | Claude Haiku 4.5 | Anthropic | 26.7% | 26.7% | 83.9% | 46.9% | 40.9% | 49.8% | 93.6% |
Every model tested shows measurable cognitive bias. Overall scores range from 11.1% to 26.7% (lower is better). Certain biases (Endowment Effect and Gain-Loss Framing) trip up every model regardless of provider or architecture.
See full findings