Getting Started

From install to first experiment in minutes.

CLI-first design. This UI mirrors common CLI workflows.

1. Requirements

The project targets x86_64 and is tested on Linux. It can work on other OSes, but you may need minor configuration.

Linux

Install Rust via rustup, plus clang++ and lld from your package manager.

FreeBSD

pkg install rust

OpenBSD

pkg_add rust

NetBSD

pkg_add rust clang lld and disable LTO in .cargo/config.toml.

If your OS enforces W^X or you are not on x86_64, set CARGO_FEATURE_NOJIT=1 to disable the ZPAQ JIT. Otherwise NCD can be inaccurate at runtime.

2. Build the CLI

Clone the repo and build the binary:

git clone --recursive https://github.com/turtle261/infotheory
cd infotheory
cargo build --release

The CLI binary will be available at ./target/release/infotheory.

3. Basic CLI usage

# Mutual information with ROSAPlus (order 8)
./target/release/infotheory mi file1.txt file2.txt 8

# NTE using CTW
./target/release/infotheory nte file1.txt file2.txt --rate-backend ctw

# NCD with custom ZPAQ method
./target/release/infotheory ncd file1.txt file2.txt 5

# Mixture rate backend (spec file)
./target/release/infotheory h file1.txt --rate-backend mixture --method mixture.json

Inputs are file based, so you can feed any preprocessing pipeline as long as it emits files or byte streams.

External reference: ROSAPlus.

4. Use as a Rust library

Add the crate via a path or git dependency:

[dependencies]
infotheory = { path = "." }
use infotheory::*;

let h = entropy_rate_bytes(data, 8);
set_default_ctx(InfotheoryCtx::new(
    RateBackend::Ctw { depth: 32 },
    NcdBackend::default()
));

For multivariate use cases that need high performance, the Rust API provides the low-level control needed for optimized pipelines.

5. Optional components

RWKV7 (via rwkvzip)

This repository includes rwkvzip: an RWKV7-based byte-level neural compressor and world-model backend. RWKV7 inference is implemented in this project and can be used locally for rate estimation and MC-AIXI planning.

Training lives in rwkvzip (not in the infotheory crate) and is gated behind the training feature; the rwkvzip binary will reject training if built without it.

# Build rwkvzip (inference + CLI)
cargo build -p rwkvzip --release

# Compress/decompress with arithmetic coding (default) or rANS
./target/release/rwkvzip compress input.bin output.rwz --model model.safetensors
./target/release/rwkvzip compress input.bin output.rwz --model model.safetensors -c rans
./target/release/rwkvzip decompress output.rwz restored.bin --model model.safetensors

# Use RWKVZip as an infotheory NCD compressor backend (requires RWKV7 model path)
export RWKV7_MODEL_PATH=./rwkvzip/rwkv-10m.safetensors
./target/release/infotheory ncd a.bin b.bin --ncd-backend rwkv7 --method ac
./target/release/infotheory ncd a.bin b.bin --ncd-backend rwkv7 --method rans

# Train (requires rwkvzip built with its training feature; see rwkvzip/Cargo.toml)
./target/release/rwkvzip train --help
VM feature (VM-backed AIXI)

Enable with cargo build --release --features vm and provide a kernel image and rootfs for Firecracker.

VM mode uses nyx-lite + Firecracker (Linux/KVM) with snapshot-based resets and a shared-memory protocol (OBS/REW/DONE framing). Performance depends on hardware and guest behavior. Some tests require /dev/kvm and VM image artifacts under nyx-lite/vm_image (see ./projman.sh init-vm).

Lean validation (ite-bench)

Run lake build and lake exe runner inside ./ite-bench to validate estimators against oracle truths and identities.