KahneBench
Cognitive Bias Benchmark for LLMs

How Biased Is Your AI?

KahneBench evaluates large language models for cognitive biases grounded in Kahneman-Tversky research. Lower scores mean more rational thinking.

Model Leaderboard
Models ranked by average bias magnitude (BMS) across tested biases (lower is better)
RankModelProviderBMSBCIBMPHSSRCICAS
#1
Claude Opus 4.6Anthropic11.1%11.1%88.6%31.0%33.0%62.0%93.4%
#2
Claude Sonnet 4.6Anthropic18.1%18.1%84.7%34.1%36.1%56.5%95.0%
#3
Grok 4.1 Fast ReasoningxAI18.8%18.8%86.5%55.9%37.9%57.8%85.4%
#4
GPT-5.2OpenAI21.0%21.0%78.6%59.8%40.5%55.7%97.2%
#5
Claude Sonnet 4.5Anthropic21.5%21.5%83.4%58.9%31.7%53.3%95.9%
#6
Claude Haiku 4.5Anthropic26.7%26.7%83.9%46.9%40.9%49.8%93.6%

Every model tested shows measurable cognitive bias. Overall scores range from 11.1% to 26.7% (lower is better). Certain biases (Endowment Effect and Gain-Loss Framing) trip up every model regardless of provider or architecture.

See full findings