KahneBench
Evaluation

Metrics

KahneBench uses 6 advanced metrics to capture a comprehensive picture of an LLM's decision-making profile.

Beyond Simple Accuracy

Traditional benchmarks often use binary accuracy metrics. KahneBench goes further, measuring not just whether a model is biased, but how strongly,how consistently, and whether it can self-correct.

The 6 Metrics

bms

Bias Magnitude Score

Quantifies the strength of a given bias by measuring the degree of deviation between the model's response in a treatment condition and the rational baseline established in the control condition.

Measures

How strongly the model exhibits a bias

Interpretation

0 = no bias, 1 = maximum bias. Weighted by trigger intensity: weak triggers causing bias score higher (2.0x) than strong triggers (0.67x).

Example value:45.0%
bci

Bias Consistency Index

Measures how consistently a model exhibits a particular bias across different domains and contexts, indicating whether the bias is a sporadic error or a systematic flaw.

Measures

Cross-domain consistency of the bias

Interpretation

Higher values indicate more consistent bias across domains. A bias is considered 'systematic' if it appears in >70% of domains with score >0.5.

Example value:72.0%
bmp

Bias Mitigation Potential

Assesses the model's ability to overcome a demonstrated bias when provided with explicit debiasing prompts or chain-of-thought instructions.

Measures

System 2 override capacity with debiasing prompts

Interpretation

Higher values indicate better debiasing capability. Measures how much bias is reduced when the model is warned or asked to reason carefully.

Example value:65.0%
has

Human Alignment Score

Compares the LLM's pattern of biases to established patterns in human cognition from Kahneman-Tversky research literature.

Measures

How closely model biases match human patterns

Interpretation

Values near 1.0 indicate human-like bias patterns. 'Over' means more biased than humans, 'under' means less biased, 'aligned' means similar.

Example value:83.0%
rci

Response Consistency Index

Measures the variance in model responses across multiple identical trials of the same test case, distinguishing systematic bias from stochastic noise.

Measures

Trial-to-trial variance (noise vs systematic bias)

Interpretation

Higher values indicate more consistent (stable) responses. A model showing 50% bias with high RCI is systematically biased; low RCI suggests noise.

Example value:91.0%
cas

Calibration Awareness Score

Measures whether a model recognizes when it is being influenced by a cognitive bias, comparing stated confidence against actual susceptibility.

Measures

Metacognitive accuracy (confidence vs actual performance)

Interpretation

Higher values indicate better self-awareness. A model that is 50% biased but 90% confident is more concerning than one that acknowledges uncertainty.

Example value:58.0%

Metric Relationships

The metrics work together to provide a complete picture:

  • BMS + BCI: High magnitude (BMS) with high consistency (BCI) indicates a systematic, deeply-rooted bias. High BMS with low BCI suggests context-dependent bias.
  • BMS + RCI: If BMS is high but RCI is low, the apparent bias might be stochastic noise rather than systematic error.
  • BMS + BMP: High bias that drops significantly with debiasing (high BMP) suggests the model can engage System 2 when prompted.
  • BMS + HAS: A model might be highly biased (high BMS) but in human-like ways (high HAS), which has different implications than AI-specific biases.
  • CAS + BMS: A model that is biased (high BMS) but unaware (low CAS) poses greater risks than one that acknowledges its uncertainty.

Trigger Intensity Weighting

BMS uses weighted scoring based on trigger intensity:

Weak
2.0x
High susceptibility signal
Moderate
1.0x
Baseline weight
Strong
0.67x
Expected deviation
Adversarial
0.5x
Compound triggers

This weighting reflects susceptibility, not trigger strength. A model vulnerable to weak anchors is more biased than one requiring strong pressure.