Cognitive Bias Benchmark for LLMs

How Biased Is Your AI?

KahneBench evaluates large language models for cognitive biases grounded in Kahneman-Tversky research. Lower scores mean more rational thinking.

Model Leaderboard

Models ranked by average bias magnitude (BMS) across tested biases (lower is better)

Rank	Model	Provider		BMS	BCI	BMP	HSS	RCI	CAS
#1	Claude Opus 4.7	Anthropic	8.3%	8.3%	87.9%	39.6%	30.4%	60.0%	57.6%
#2	Claude Opus 4.8	Anthropic	8.7%	8.7%	83.9%	50.2%	41.0%	42.4%	54.6%
#3	Claude Fable 5	Anthropic	10.3%	10.3%	88.6%	36.4%	27.9%	59.6%	62.9%
#4	Claude Opus 4.6	Anthropic	11.1%	11.1%	88.6%	31.0%	33.0%	62.0%	93.4%
#5	GPT-5.5	OpenAI	13.6%	13.6%	93.9%	50.8%	29.4%	69.7%	55.2%
#6	Claude Sonnet 4.6	Anthropic	18.1%	18.1%	84.7%	34.1%	36.1%	56.5%	95.0%
#7	GPT-5.4	OpenAI	18.4%	18.4%	82.0%	55.4%	37.8%	56.0%	55.9%
#8	Grok 4.1 Fast Reasoning	xAI	18.8%	18.8%	86.5%	55.9%	37.9%	57.8%	85.4%
#9	GPT-5.2	OpenAI	21.0%	21.0%	78.6%	59.8%	40.5%	55.7%	97.2%
#10	Claude Sonnet 4.5	Anthropic	21.5%	21.5%	83.4%	58.9%	31.7%	53.3%	95.9%
#11	Claude Haiku 4.5	Anthropic	26.7%	26.7%	83.9%	46.9%	40.9%	49.8%	93.6%

Every model tested shows measurable cognitive bias. Overall scores range from 8.3% to 26.7% (lower is better).