Getting Started
From install to first experiment in minutes.
1. Requirements
The project targets x86_64 and is tested on Linux. It can work on other OSes, but you may need minor configuration.
Install Rust via rustup, plus clang++ and lld from your package manager.
pkg install rust
pkg_add rust
pkg_add rust clang lld and disable LTO in .cargo/config.toml.
If your OS enforces W^X or you are not on x86_64, set CARGO_FEATURE_NOJIT=1
to disable the ZPAQ JIT. Otherwise NCD can be inaccurate at runtime.
Code references
2. Build the CLI
Clone the repo and build the binary:
git clone --recursive https://github.com/turtle261/infotheory
cd infotheory
cargo build --release
The CLI binary will be available at ./target/release/infotheory.
Code references
3. Basic CLI usage
# Mutual information with ROSAPlus (order 8)
./target/release/infotheory mi file1.txt file2.txt 8
# NTE using CTW
./target/release/infotheory nte file1.txt file2.txt --rate-backend ctw
# NCD with custom ZPAQ method
./target/release/infotheory ncd file1.txt file2.txt 5
# Mixture rate backend (spec file)
./target/release/infotheory h file1.txt --rate-backend mixture --method mixture.json
Inputs are file based, so you can feed any preprocessing pipeline as long as it emits files or byte streams.
External reference: ROSAPlus.
Code references
4. Use as a Rust library
Add the crate via a path or git dependency:
[dependencies]
infotheory = { path = "." }
use infotheory::*;
let h = entropy_rate_bytes(data, 8);
set_default_ctx(InfotheoryCtx::new(
RateBackend::Ctw { depth: 32 },
NcdBackend::default()
));
For multivariate use cases that need high performance, the Rust API provides the low-level control needed for optimized pipelines.
Code references
5. Optional components
This repository includes rwkvzip: an RWKV7-based byte-level neural
compressor and world-model backend. RWKV7 inference is implemented in this project
and can be used locally for rate estimation and MC-AIXI planning.
Training lives in rwkvzip (not in the infotheory crate) and is
gated behind the training feature; the rwkvzip binary will
reject training if built without it.
# Build rwkvzip (inference + CLI)
cargo build -p rwkvzip --release
# Compress/decompress with arithmetic coding (default) or rANS
./target/release/rwkvzip compress input.bin output.rwz --model model.safetensors
./target/release/rwkvzip compress input.bin output.rwz --model model.safetensors -c rans
./target/release/rwkvzip decompress output.rwz restored.bin --model model.safetensors
# Use RWKVZip as an infotheory NCD compressor backend (requires RWKV7 model path)
export RWKV7_MODEL_PATH=./rwkvzip/rwkv-10m.safetensors
./target/release/infotheory ncd a.bin b.bin --ncd-backend rwkv7 --method ac
./target/release/infotheory ncd a.bin b.bin --ncd-backend rwkv7 --method rans
# Train (requires rwkvzip built with its training feature; see rwkvzip/Cargo.toml)
./target/release/rwkvzip train --help
Enable with cargo build --release --features vm and provide a kernel
image and rootfs for Firecracker.
VM mode uses nyx-lite + Firecracker (Linux/KVM) with snapshot-based resets and a
shared-memory protocol (OBS/REW/DONE framing). Performance depends on hardware and
guest behavior. Some tests require /dev/kvm and VM image artifacts under
nyx-lite/vm_image (see ./projman.sh init-vm).
Code references
Run lake build and lake exe runner inside ./ite-bench
to validate estimators against oracle truths and identities.