Benchmarks
How to reproduce internal and external AIXI comparisons.
External baselines in scope
The current benchmark documentation explicitly tracks two external reference implementations:
- PyAIXI, used as a Python baseline for MC-AIXI-style experiments [pyaixi_repo].
- The older MC-AIXI C++ codebase, used as a native AC-CTW reference point [mcaixi_cpp_repo].
Those are baselines, not bundled dependencies. The repository keeps them external and drives them through harness scripts so the comparison remains inspectable instead of silently vendored.
1. In-repo AIQI vs MC-AIXI
For quick relative comparisons inside this repository, use:
scripts/bench_aiqi_vs_aixi.sh
This harness is intentionally narrow. It compares the two planners in this repository; it is not a reproduction of A Model-Free Universal AI [kim2026_aiqi] and it is not an external cross-implementation benchmark.
Code references
2. Compare against PyAIXI
If you already have a PyAIXI clone, run:
scripts/bench_aixi_vs_pyaixi.sh --pyaixi-root /path/to/pyaixi
This script is useful when you want a direct comparison between the current infotheory MC-AIXI implementation and PyAIXI without bringing in the full Guix-pinned multi-implementation harness.
Code references
3. Reproducible multi-implementation runs
For the heavier, more reproducible comparison across:
infotheoryRust CLI,infotheoryPython bindings,- PyAIXI,
- and MC-AIXI C++,
use the Guix wrapper:
scripts/bench_aixi_competitors_guix.sh --trials 3 --profile default
The script writes plot-ready TSV files under target/aixi-competitors/<timestamp>/, records the commands used, and pins the external competitor repositories by commit for the run.
Comparison notes
The runner distinguishes between the directly comparable AC-CTW path and the repository's FAC-CTW variants. That distinction matters: FAC-CTW is informative and often useful, but it is not a like-for-like drop-in for older AC-CTW reference implementations [veness2011_mcaixi].
The benchmark runner also normalizes reward reporting for Kuhn Poker because some older implementations print encoded reward symbols rather than native-domain rewards. That normalization step is part of the comparison logic rather than an after-the-fact spreadsheet fix.