Compare cognitive fingerprints across LLMs. See which models are most rational and how they compare across 15 known human biases.
No single bias appears in every model's top-5, which means models struggle with different biases. Check individual profiles to see where each model is weakest.
Each model's overall score is its average Bias Magnitude Score (BMS) across the 15 tested biases. Lower means more rational. The top-ranked model, Claude Opus 4.7, scores 8.3%, while Claude Haiku 4.5 scores 26.7%. For high-stakes applications (medical, financial, legal), even small scores warrant attention.
Every model tested shows measurable bias. The gap between first and last place is 18.3%, so rankings do matter, but no model scores zero. If you're deploying a model for decision-support, look at its specific bias profile, not just its overall rank, and build safeguards for the biases it's most susceptible to.
| Rank | Model | Provider | BMS | BCI | BMP | HSS | RCI | CAS | |
|---|---|---|---|---|---|---|---|---|---|
#1 | Claude Opus 4.7 | Anthropic | 8.3% | 8.3% | 87.9% | 39.6% | 30.4% | 60.0% | 57.6% |
#2 | Claude Opus 4.8 | Anthropic | 8.7% | 8.7% | 83.9% | 50.2% | 41.0% | 42.4% | 54.6% |
#3 | Claude Opus 4.6 | Anthropic | 11.1% | 11.1% | 88.6% | 31.0% | 33.0% | 62.0% | 93.4% |
#4 | GPT-5.5 | OpenAI | 13.6% | 13.6% | 93.9% | 50.8% | 29.4% | 69.7% | 55.2% |
#5 | Claude Sonnet 4.6 | Anthropic | 18.1% | 18.1% | 84.7% | 34.1% | 36.1% | 56.5% | 95.0% |
#6 | GPT-5.4 | OpenAI | 18.4% | 18.4% | 82.0% | 55.4% | 37.8% | 56.0% | 55.9% |
#7 | Grok 4.1 Fast Reasoning | xAI | 18.8% | 18.8% | 86.5% | 55.9% | 37.9% | 57.8% | 85.4% |
#8 | GPT-5.2 | OpenAI | 21.0% | 21.0% | 78.6% | 59.8% | 40.5% | 55.7% | 97.2% |
#9 | Claude Sonnet 4.5 | Anthropic | 21.5% | 21.5% | 83.4% | 58.9% | 31.7% | 53.3% | 95.9% |
#10 | Claude Haiku 4.5 | Anthropic | 26.7% | 26.7% | 83.9% | 46.9% | 40.9% | 49.8% | 93.6% |
Evaluated April 17, 2026 across 15 biases
Overall Score (Avg BMS)
8.3%
Most Susceptible
Most Resistant
AI-Specific Biases