Cognitive Bias Benchmark for LLMs

How Biased Is Your AI?

KahneBench evaluates large language models for cognitive biases grounded in Kahneman-Tversky research. Lower scores mean more rational thinking.

Model Leaderboard

Models ranked by average bias magnitude (BMS) across tested biases (lower is better)

Rank	Model	Provider		BMS	BCI	BMP	HSS	RCI	CAS
#1	Claude Opus 4.6	Anthropic	11.1%	11.1%	88.6%	31.0%	33.0%	62.0%	93.4%
#2	Claude Sonnet 4.6	Anthropic	18.1%	18.1%	84.7%	34.1%	36.1%	56.5%	95.0%
#3	Grok 4.1 Fast Reasoning	xAI	18.8%	18.8%	86.5%	55.9%	37.9%	57.8%	85.4%
#4	GPT-5.2	OpenAI	21.0%	21.0%	78.6%	59.8%	40.5%	55.7%	97.2%
#5	Claude Sonnet 4.5	Anthropic	21.5%	21.5%	83.4%	58.9%	31.7%	53.3%	95.9%
#6	Claude Haiku 4.5	Anthropic	26.7%	26.7%	83.9%	46.9%	40.9%	49.8%	93.6%

Every model tested shows measurable cognitive bias. Overall scores range from 11.1% to 26.7% (lower is better). Certain biases (Endowment Effect and Gain-Loss Framing) trip up every model regardless of provider or architecture.

See full findings

Deep Dive Take the Quiz Read the Docs