KahneBench
Cognitive Bias Benchmark for LLMs

How Biased Is Your AI?

KahneBench evaluates large language models for cognitive biases grounded in Kahneman-Tversky research. Lower scores mean more rational thinking.

Model Leaderboard
Models ranked by average bias magnitude (BMS) across tested biases (lower is better)
RankModelProviderBMSBCIBMPHSSRCICAS
#1
Claude Opus 4.7Anthropic8.3%8.3%87.9%39.6%30.4%60.0%57.6%
#2
Claude Opus 4.8Anthropic8.7%8.7%83.9%50.2%41.0%42.4%54.6%
#3
Claude Opus 4.6Anthropic11.1%11.1%88.6%31.0%33.0%62.0%93.4%
#4
GPT-5.5OpenAI13.6%13.6%93.9%50.8%29.4%69.7%55.2%
#5
Claude Sonnet 4.6Anthropic18.1%18.1%84.7%34.1%36.1%56.5%95.0%
#6
GPT-5.4OpenAI18.4%18.4%82.0%55.4%37.8%56.0%55.9%
#7
Grok 4.1 Fast ReasoningxAI18.8%18.8%86.5%55.9%37.9%57.8%85.4%
#8
GPT-5.2OpenAI21.0%21.0%78.6%59.8%40.5%55.7%97.2%
#9
Claude Sonnet 4.5Anthropic21.5%21.5%83.4%58.9%31.7%53.3%95.9%
#10
Claude Haiku 4.5Anthropic26.7%26.7%83.9%46.9%40.9%49.8%93.6%

Every model tested shows measurable cognitive bias. Overall scores range from 8.3% to 26.7% (lower is better).

See full findings