Benchmark Results

Results & Leaderboard

Compare cognitive fingerprints across LLMs. See which models are most rational and how they compare across 15 known human biases.

Key Findings

Universal bias hotspots

No single bias appears in every model's top-5, which means models struggle with different biases. Check individual profiles to see where each model is weakest.

What the scores mean

Each model's overall score is its average Bias Magnitude Score (BMS) across the 15 tested biases. Lower means more rational. The top-ranked model, Claude Opus 4.7, scores 8.3%, while Claude Haiku 4.5 scores 26.7%. For high-stakes applications (medical, financial, legal), even small scores warrant attention.

No model is immune

Every model tested shows measurable bias. The gap between first and last place is 18.3%, so rankings do matter, but no model scores zero. If you're deploying a model for decision-support, look at its specific bias profile, not just its overall rank, and build safeguards for the biases it's most susceptible to.

Selected model

How are these scores calculated?

Cognitive Fingerprint: Core Bias Profile

Each axis shows treatment-condition bias rate from 0% (center) to 100% (edge). Smaller shapes indicate lower observed susceptibility under bias-triggering prompts. The gray area shows literature-derived human baseline rates.

Claude Opus 4.7

Human Baseline

|Center = 0% biased responses · Edge = 100%|2 of 15 biases exceed human baseline

Model Leaderboard

Models ranked by average bias magnitude (BMS) across tested biases (lower is better)

Rank	Model	Provider		BMS	BCI	BMP	HSS	RCI	CAS
#1	Claude Opus 4.7	Anthropic	8.3%	8.3%	87.9%	39.6%	30.4%	60.0%	57.6%
#2	Claude Opus 4.8	Anthropic	8.7%	8.7%	83.9%	50.2%	41.0%	42.4%	54.6%
#3	Claude Opus 4.6	Anthropic	11.1%	11.1%	88.6%	31.0%	33.0%	62.0%	93.4%
#4	GPT-5.5	OpenAI	13.6%	13.6%	93.9%	50.8%	29.4%	69.7%	55.2%
#5	Claude Sonnet 4.6	Anthropic	18.1%	18.1%	84.7%	34.1%	36.1%	56.5%	95.0%
#6	GPT-5.4	OpenAI	18.4%	18.4%	82.0%	55.4%	37.8%	56.0%	55.9%
#7	Grok 4.1 Fast Reasoning	xAI	18.8%	18.8%	86.5%	55.9%	37.9%	57.8%	85.4%
#8	GPT-5.2	OpenAI	21.0%	21.0%	78.6%	59.8%	40.5%	55.7%	97.2%
#9	Claude Sonnet 4.5	Anthropic	21.5%	21.5%	83.4%	58.9%	31.7%	53.3%	95.9%
#10	Claude Haiku 4.5	Anthropic	26.7%	26.7%	83.9%	46.9%	40.9%	49.8%	93.6%

Claude Opus 4.7

Anthropiccore tier

Evaluated April 17, 2026 across 15 biases

Overall Score (Avg BMS)

8.3%

Most Susceptible

Endowment EffectSunk Cost FallacyPresent Bias/Hyperbolic Discounting+2 more

Most Resistant

Anchoring EffectBase Rate NeglectConjunction Fallacy+2 more

AI-Specific Biases

Loss Aversion

Per-Bias Analysis

15 biases evaluated at core tier

Rank

Model

Provider

BMS

BCI

BMP

HSS

RCI

CAS

Claude Opus 4.7

Anthropic

8.3%

87.9%

39.6%

30.4%

60.0%

57.6%

Claude Opus 4.8

Anthropic

8.7%

83.9%

50.2%

41.0%

42.4%

54.6%

Claude Opus 4.6

Anthropic

11.1%

88.6%

31.0%

33.0%

62.0%

93.4%

GPT-5.5

OpenAI

13.6%

93.9%

50.8%

29.4%

69.7%

55.2%

Claude Sonnet 4.6

Anthropic

18.1%

84.7%

34.1%

36.1%

56.5%

95.0%

GPT-5.4

OpenAI

18.4%

82.0%

55.4%

37.8%

56.0%

55.9%

Grok 4.1 Fast Reasoning

xAI

18.8%

86.5%

55.9%

37.9%

57.8%

85.4%

GPT-5.2

OpenAI

21.0%

78.6%

59.8%

40.5%

55.7%

97.2%

Claude Sonnet 4.5

Anthropic

21.5%

83.4%

58.9%

31.7%

53.3%

95.9%

#10

Claude Haiku 4.5

Anthropic

26.7%

83.9%

46.9%

40.9%

49.8%

93.6%

Results & Leaderboard

Universal bias hotspots

What the scores mean

No model is immune

Anchoring EffectAnchoringSuperhuman

Availability BiasAvailabilitySuperhuman

Base Rate NeglectRepresentativenessSuperhuman

Conjunction FallacyRepresentativenessSuperhuman

Gain-Loss FramingFramingSuperhuman

Loss AversionLoss AversionWorse than Human

Endowment EffectLoss AversionSuperhuman

Status Quo BiasLoss AversionSuperhuman

Certainty EffectProbability DistortionSuperhuman

Overconfidence EffectOverconfidenceHuman

Confirmation BiasConfirmationSuperhuman

Sunk Cost FallacyLoss AversionSuperhuman

Present Bias/Hyperbolic DiscountingTemporal BiasSuperhuman

Hindsight BiasOverconfidenceSuperhuman

Gambler's FallacyRepresentativenessSuperhuman

Results & Leaderboard

Universal bias hotspots

What the scores mean

No model is immune

Anchoring EffectAnchoringSuperhuman

Availability BiasAvailabilitySuperhuman

Base Rate NeglectRepresentativenessSuperhuman

Conjunction FallacyRepresentativenessSuperhuman

Gain-Loss FramingFramingSuperhuman

Loss AversionLoss AversionWorse than Human

Endowment EffectLoss AversionSuperhuman

Status Quo BiasLoss AversionSuperhuman

Certainty EffectProbability DistortionSuperhuman

Overconfidence EffectOverconfidenceHuman

Confirmation BiasConfirmationSuperhuman

Sunk Cost FallacyLoss AversionSuperhuman

Present Bias/Hyperbolic DiscountingTemporal BiasSuperhuman

Hindsight BiasOverconfidenceSuperhuman

Gambler's FallacyRepresentativenessSuperhuman