KahneBench
Quick Start

Getting Started

Set up KahneBench and run your first cognitive bias evaluation in minutes.

Prerequisites

  • Python 3.10 or later
  • uv package manager (recommended) or pip
  • API key for your LLM provider (OpenAI, Anthropic, etc.)

Installation

Using uv (recommended)

# Clone the repository
git clone https://github.com/ryanhartman4/KahneBench.git
cd KahneBench/bench

# Install dependencies
uv sync

Using pip

cd KahneBench/bench
pip install -e .

# For development
pip install -e ".[dev]"

Quick Start: Basic Demo

Run the basic usage demo to see KahneBench in action with a mock LLM provider:

PYTHONPATH=src python examples/basic_usage.py

This demonstrates the complete workflow:

  • Taxonomy exploration (69 biases across 16 categories)
  • Test case generation for specific biases
  • Compound (meso-scale) test generation for bias interactions
  • Evaluation execution with mock responses
  • Metrics calculation and cognitive fingerprint generation
  • Debiasing prompt generation

Evaluate with OpenAI

# Set your API key
export OPENAI_API_KEY="your-api-key"

# Run core tier evaluation (15 foundational biases)
PYTHONPATH=src python examples/openai_evaluation.py \
  --model gpt-4o \
  --tier core \
  --trials 3

# Run extended evaluation (all 69 biases)
PYTHONPATH=src python examples/openai_evaluation.py \
  --model gpt-4o \
  --tier extended \
  --domains professional individual

CLI Options

  • --model, -m: Model name (default: gpt-4o)
  • --tier, -t: Benchmark tier - core (15 biases), extended (69 biases), or interaction (compound tests)
  • --domains, -d: Domains to test - individual, professional, social, temporal, risk
  • --trials, -n: Trials per condition (default: 3)
  • --output, -o: Output file prefix for results

CLI Commands

Show framework info

kahne-bench info

List all 69 biases

kahne-bench list-biases

List categories or biases in a category

kahne-bench list-categories
kahne-bench list-categories anchoring

Get detailed bias information

kahne-bench describe anchoring_effect

Generate test cases

kahne-bench generate \
  --bias anchoring_effect loss_aversion \
  --domain professional individual \
  --instances 3 \
  --output test_cases.json

Generate compound (meso-scale) tests

kahne-bench generate-compound \
  --bias anchoring_effect \
  --domain professional \
  --output compound_tests.json

Run evaluation

# With mock provider (for testing)
kahne-bench evaluate \
  -i test_cases.json \
  -p mock

# With OpenAI
kahne-bench evaluate \
  -i test_cases.json \
  -p openai \
  -m gpt-4o \
  --trials 3

# With Anthropic
kahne-bench evaluate \
  -i test_cases.json \
  -p anthropic \
  -m claude-sonnet-4-20250514

Generate report from fingerprint

kahne-bench report fingerprint.json

Output Files

After running an evaluation, you'll get two output files:

  • *_results.json - Raw evaluation results with all responses
  • *_fingerprint.json - Cognitive fingerprint with computed metrics (BMS, BCI, BMP, HAS, RCI, CAS)

Next Steps