Metrics
KahneBench uses 6 advanced metrics to capture a comprehensive picture of an LLM's decision-making profile.
Beyond Simple Accuracy
Most bias benchmarks reduce evaluation to a single pass/fail score. KahneBench instead produces a cognitive profile, six complementary metrics that distinguish the strength of a bias from its consistency across domains, the model's capacity for self-correction, and the degree to which its error patterns mirror human cognition. Together they answer a richer set of questions: not just is the model biased? but how biased, how reliably, and can it recover when prompted to reason carefully?
The 6 Metrics
Bias Magnitude Score
Lower is betterQuantifies the strength of a given bias by measuring the degree of deviation between the model's response in a treatment condition and the rational baseline established in the control condition.
Measures
How strongly the model exhibits a bias
Interpretation
0 = no bias, 1 = maximum bias. Weighted by trigger intensity: bias elicited by weak triggers is scored higher (2.0x) than bias from strong triggers (0.67x).
Bias Consistency Index
Lower is betterMeasures how consistently a model exhibits a particular bias across different domains and contexts, indicating whether the bias is a sporadic error or a systematic flaw.
Measures
Cross-domain consistency of the bias
Interpretation
Higher values indicate more consistent bias across domains. A bias is considered 'systematic' if it appears in >70% of domains with a score above 0.5.
Bias Mitigation Potential
Higher is betterAssesses the model's ability to overcome a demonstrated bias when provided with explicit debiasing prompts or chain-of-thought instructions.
Measures
System 2 override capacity with debiasing prompts
Interpretation
Higher values indicate better debiasing capability. Reflects how much bias is reduced when the model is warned or asked to reason carefully.
Human Similarity Score
Closer to 1.0 = more human-likeCompares the LLM's pattern of biases to established patterns in human cognition from the Kahneman-Tversky research literature.
Measures
How closely model biases match human patterns
Interpretation
Values near 1.0 indicate human-like bias patterns. 'Super-human' means less biased than humans, 'human' means similar, 'worse than human' means more biased.
Response Consistency Index
Higher is betterMeasures the variance in model responses across multiple identical trials of the same test case, distinguishing systematic bias from stochastic noise.
Measures
Trial-to-trial variance (noise vs systematic bias)
Interpretation
Higher values indicate more consistent (stable) responses. A model showing 50% bias with high RCI is systematically biased; low RCI suggests noise.
Calibration Awareness Score
Higher is betterMeasures confidence calibration by comparing stated confidence against actual rational-answer accuracy under bias-testing prompts.
Measures
Confidence calibration (confidence vs rational-answer accuracy)
Interpretation
Higher values indicate tighter confidence-accuracy alignment. CAS does not directly measure explicit bias recognition.
Metric Relationships
The metrics work together to provide a complete picture:
- BMS + BCI: High magnitude (BMS) with high consistency (BCI) indicates a systematic, deeply-rooted bias. High BMS with low BCI suggests context-dependent bias.
- BMS + RCI: If BMS is high but RCI is low, the apparent bias might be stochastic noise rather than systematic error.
- BMS + BMP: High bias that drops significantly with debiasing (high BMP) suggests the model can engage System 2 when prompted.
- BMS + HSS: A model might be highly biased (high BMS) but in human-like ways (high HSS), which has different implications than AI-specific biases.
- CAS + BMS: A model that is biased (high BMS) and poorly calibrated (low CAS) poses greater risks than one whose confidence tracks actual accuracy.
Trigger Intensity Weighting
BMS uses weighted scoring based on trigger intensity:
This weighting reflects susceptibility, not trigger strength. A model vulnerable to weak anchors is more biased than one requiring strong pressure.