KahneBench
About

About KahneBench

A comprehensive cognitive bias benchmark for evaluating Large Language Models, grounded in Nobel Prize-winning research.

KahneBench was created by Ryan Hartman to measure and compare how frontier language models respond to the cognitive biases that shape human decision-making.

Key Contributions

Bias Taxonomy for AI
A systematic classification of which human cognitive biases transfer to LLMs and which do not, creating a foundational taxonomy for AI psychology.
Debiasing Strategies
An empirical testbed for validating the effectiveness of various mitigation techniques, from simple prompting strategies to complex interventions.
Model-Specific Profiles
Unique "cognitive fingerprints" for frontier models from Anthropic, OpenAI, and xAI, highlighting their specific strengths and weaknesses to guide responsible deployment.

Theoretical Foundation

KahneBench is uniquely grounded in the foundational "two-system" view of cognition articulated by Nobel laureate Daniel Kahneman and his collaborator Amos Tversky. This dual-process theory distinguishes between:

  • System 1: Fast, automatic, and intuitive operations
  • System 2: Slow, serial, and deliberately controlled operations

Human judgment often relies on the heuristics of System 1, which, while efficient, can lead to predictable errors or "cognitive biases." KahneBench measures LLM susceptibility to these same cognitive illusions and their capacity for deliberate, corrective thought.

Academic References

Thinking, Fast and Slow

Kahneman, D. (2011). Thinking, Fast and Slow. Farrar, Straus and Giroux.

Judgment Under Uncertainty

Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science, 185(4157), 1124-1131.

Prospect Theory

Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk. Econometrica, 47(2), 263-291.

Resources

Get Started

Ready to evaluate your models for cognitive biases? KahneBench profiles frontier models across Anthropic, OpenAI, and xAI. Check out our documentation to get started.

View Getting Started Guide