API
This document describes the current experimental API. The API is not stable yet.
Random
mar.Random is a thin wrapper around Zig's default PRNG that forces callers
to provide a seed.
var rng = mar.Random.init(42);
const random = rng.random();
const value = random.int(u64);
The same seed produces the same stream within a single Zig version.
Clock
Clock implementations are selected at comptime:
const ProdClock = mar.Clock(.production);
const SimClock = mar.Clock(.simulation);
mar.Clock(.production) returns mar.ProductionClock, which reads host time
through Zig's host IO clock.
mar.Clock(.simulation) returns mar.SimClock, which advances only when the
caller explicitly ticks or sleeps it.
All timestamps and durations are nanoseconds:
pub const Timestamp = u64;
pub const Duration = u64;
Env
Application code should receive explicit authorities from its caller instead of
constructing them itself. Storage-oriented code should usually take std.Io,
a root std.Io.Dir, and a narrow mar.Recorder; code that needs Marionette's
clock, random hooks, or other simulator capabilities can take mar.Env:
fn service(env: anytype) !void {
const now = env.clock.now();
const jitter = try env.random.intLessThan(mar.Duration, 1_000);
if (try env.buggify(.slow_path, .oneIn(10))) {
try env.clock.sleep(jitter);
}
_ = .{ now, jitter };
}
mar.Env is the concrete harness-facing capability bundle. Its disk, clock,
random, and tracer authorities are fields, not lazy accessors. env.io()
returns the backing std.Io: host I/O in production envs, and Marionette's
current deterministic backend in simulation envs. env.recorder() returns a
narrow structured recording capability for code that should not depend on all
of Env.
Production-shaped libraries should prefer taking the smallest capabilities they
need. For example, code that only needs I/O and trace events can accept
std.Io plus mar.Recorder:
fn put(io: std.Io, recorder: mar.Recorder, key: u64, value: u64) !void {
_ = io;
try recorder.record("kv.put key={} value={}", .{ key, value });
}
Simulation builds app and harness views together through World.simulate:
fn scenario(world: *mar.World) !void {
const sim = try world.simulate(.{});
try service(sim.env);
}
sim.env supplies the handles passed to application code. sim.control is
kept by the harness for simulator-only actions such as advancing time or
crashing disk.
env.buggify draws through the env's random capability only when the env was
built by simulation; production envs construct the same composition bundle with
production adapters such as mar.RealDisk.
World
mar.World owns Phase 0 simulation engine state:
- One
SimClock. - One seeded
Random. - One trace log.
Application code should receive explicit handles from the composition root, not
World directly.
Scenarios and harnesses use World to construct simulations, drive time, and
inspect trace bytes.
Create a world with an explicit allocator:
const ns_per_ms: mar.Duration = 1_000_000;
var world = try mar.World.init(std.testing.allocator, .{
.seed = 0xC0FFEE,
.tick_ns = ns_per_ms,
});
defer world.deinit();
Advance simulated time:
try world.tick();
try world.runFor(10 * ns_per_ms);
Record service-level trace events:
try world.record("request.accepted id={}", .{42});
Use structured fields when a value comes from user text, paths, or other runtime bytes that may contain spaces or separators:
try world.recordFields("disk.open", &.{
mar.traceField("path", .{ .text = "/tmp/a b" }),
mar.traceField("mode", .{ .literal = "read" }),
});
The text field is written as path=/tmp/a%20b; raw World.record remains
strict and returns error.InvalidTracePayload for ambiguous formatted values.
Read the trace:
const trace = world.traceBytes();
The returned trace slice is invalidated by later trace writes.
Phase 0 traces start with marionette.trace format=text version=0. Every
later World.record line is prefixed with a global event=<u64> index.
Random Choices In A World
world.unsafeUntracedRandom() returns a raw std.Random view over the
world's seeded PRNG. Raw draws are deterministic, but they are not
automatically traced. The unsafe name is intentional: simulator decisions
should usually use traced helpers.
Use traced helpers when the random choice should appear in the replay trace:
const value = try world.randomU64();
const enabled = try world.randomBool();
const index = try world.randomIntLessThan(u64, 1_000_000);
randomIntLessThan uses Zig's rejection-sampling bounded integer helper, so it
does not teach modulo bias.
Application code should usually use env.random instead of receiving the
whole World:
const latency_ns = try env.random.intLessThan(u64, 1_000_000);
Event Queue
mar.UnstableEventQueue is a fixed-capacity deterministic event queue. It is
a scheduler sketch for examples, not the final scheduler API. It currently
uses a linear scan on pop; the real scheduler should use a heap once queues get
hot.
const Event = struct {
ready_at: u64,
id: u64,
};
fn lessThan(a: Event, b: Event) bool {
return a.ready_at < b.ready_at or (a.ready_at == b.ready_at and a.id < b.id);
}
const Queue = mar.UnstableEventQueue(Event, 64, lessThan);
var queue = Queue.init();
try queue.push(.{ .ready_at = 10, .id = 1 });
Callers provide the ordering function explicitly. For distributed simulation,
that ordering should be based on stable fields such as (ready_at, event_id),
not pointer identity or hash-map iteration.
Disk
mar.Disk is the lower-level disk capability beneath Marionette's std.Io
backend. It is a concrete, storable handle with sector-oriented read,
write, and sync, plus path-level stat, EOF-aware readSome,
setLength, delete, and rename.
mar.SimDisk is the deterministic
in-memory simulator behind that handle: logical files, sector-aligned
reads/writes, sparse sectors, deterministic latency, operation ids, trace
events, replayable read/write/corruption faults, and crash/restart behavior
for pending writes. mar.RealDisk is the production adapter backed by a real
root directory. mar.Disk.unavailable() remains the honest null-object for
envs without storage.
Construct a world-owned simulator bundle, then hand app code either std.Io
for ordinary file code or the lower-level sector disk capability when a test
needs that explicit surface:
const sim = try world.simulate(.{ .disk = .{
.sector_size = 4096,
.min_latency_ns = 1_000_000,
.latency_jitter_ns = 2_000_000,
} });
const io = sim.env.io();
const disk = sim.env.disk; // low-level sector API
If DiskOptions.min_latency_ns is omitted, it defaults to the world's tick
duration. Passing a concrete value keeps that exact value and validates it
against the tick size.
Write and read logical paths:
try disk.write(.{
.path = "wal.log",
.offset = 0,
.bytes = sector_bytes,
});
try disk.read(.{
.path = "wal.log",
.offset = 0,
.buffer = sector_buffer,
});
try disk.sync(.{ .path = "wal.log" });
const stat = try disk.stat(.{ .path = "wal.log" });
const read_len = try disk.readSome(.{
.path = "wal.log",
.offset = 0,
.buffer = wal_buffer,
});
try disk.setLength(.{ .path = "wal.log", .len = 0 });
try disk.rename(.{ .old_path = "compact.tmp", .new_path = "data.db" });
try disk.delete(.{ .path = "wal.log" });
Construct a production capability bundle by scoping it to a root directory:
var production = try mar.Production.init(.{
.root_dir = root_dir,
.io = io,
.disk = .{ .sector_size = 4096 },
});
defer production.deinit();
const env = production.env();
Production owns the production capability adapters and exposes the same
Env shape that simulation returns. Its disk adapter accepts relative paths,
creates parent directories on write, reads missing or short files as
zero-filled sectors, and uses the same sector alignment checks as SimDisk.
The io argument is the production host I/O backend used to perform
filesystem calls and provide host randomness. production.env().io() returns
that same host std.Io. Simulation envs return Marionette's current
deterministic std.Io backend; it supports deterministic clock/random
operations, synchronous async, immediate Io.Queue operations, and an
in-memory TCP stream subset today. It also supports a flat file subset over
SimDisk: create/open, access/statFile, positional and streaming read/write,
length/stat/setLength, sync, close, delete, and rename. Streaming cursor state
is per open file handle, advances only by bytes actually transferred, and is
left unchanged by failed streaming operations. Full
directory/filesystem behavior, process operations, datagrams, DNS, and real
external network access still fail
closed. See
std.Io Direction.
Simulated file stats report deterministic size, kind, and mutation-time
information. mtime updates on successful content mutations; access and change
timestamps remain zero because Marionette does not yet model them.
Simulated storage tests should prefer world.simulate(...).env.io() for code
that naturally uses std.Io.File. The Disk returned by
world.simulate(...).env.disk remains the low-level sector/file-lifecycle
surface for examples that intentionally test Marionette's disk model directly;
harness code keeps the matching DiskControl for faults, crash, restart, and
corruption.
Low-level disk-shaped code uses the attached Disk field and only the
app-facing operations:
const sim = try world.simulate(.{
.disk = .{ .sector_size = 4096 },
});
fn appendRecord(disk: mar.Disk, sector_bytes: []const u8) !void {
try disk.write(.{ .path = "wal.log", .offset = 0, .bytes = sector_bytes });
try disk.sync(.{ .path = "wal.log" });
try disk.syncDir(.{ .path = "." });
}
The Disk view exposes read, write, sync, syncDir, stat,
readSome, setLength, delete, and rename.
Simulator-control operations such as setFaults, crash, restart, and
corruptSector remain on mar.DiskControl, exposed through
sim.control.disk, and are kept by the harness or scenario state.
For sector-oriented read and write, offsets and lengths must be whole
multiples of sector_size. readSome and setLength are byte-oriented for
WAL iteration and file lifecycle code. Reads from unwritten sectors return
zero bytes; readSome returns the number of bytes copied and does not fill
past EOF. Logical paths are not host paths and are escaped through
World.recordFields in trace events:
disk.write op=0 path=wal.log offset=0 len=4096 status=ok latency_ns=1000000
disk.read op=1 path=wal.log offset=0 len=4096 status=ok latency_ns=1000000
disk.sync op=2 path=wal.log status=ok committed_writes=1 latency_ns=1000000
disk.sync_dir op=3 path=. status=ok committed_metadata=1 latency_ns=1000000
disk.stat op=4 path=wal.log status=ok size=4096 latency_ns=1000000
disk.read_some op=5 path=wal.log offset=0 requested_len=32 read_len=32 status=ok latency_ns=1000000
disk.set_length op=6 path=wal.log len=0 status=ok committed_writes=0 latency_ns=1000000
disk.rename op=7 path=compact.tmp new_path=data.db status=ok committed_writes=0 latency_ns=1000000
disk.delete op=8 path=wal.log status=ok committed_writes=0 latency_ns=1000000
File sync commits pending file contents. syncDir commits directory-entry
metadata for creates, deletes, and renames in that logical directory. Without
syncDir, a crash can keep file contents while losing the directory entry,
matching the classic parent-directory-fsync storage bug class. Cross-directory
renames require syncing both parent directories before the rename is fully
durable.
Faults are disabled by default. Enable them through mar.DiskControl:
const control = sim.control.disk;
try control.setFaults(.{
.read_error_rate = .oneIn(100),
.write_error_rate = .oneIn(100),
.corrupt_read_rate = .oneIn(1_000),
.crash_lost_write_rate = .oneIn(10),
.crash_torn_write_rate = .oneIn(10),
.crash_reordered_write_rate = .oneIn(10),
.crash_lost_metadata_rate = .oneIn(10),
});
Invalid rates return error.InvalidRate. Read and write errors return
error.ReadError and error.WriteError after deterministic latency. Fault
decisions are traced when their rate is non-zero:
disk.fault op=3 path=wal.log kind=write_error rate=1/100 roll=42 fired=false
disk.fault op=4 path=wal.log kind=read_error rate=1/100 roll=0 fired=true
disk.read op=4 path=wal.log offset=0 len=4096 status=io_error latency_ns=1000000
corrupt_read_rate corrupts only the returned buffer; it does not mutate the
durable in-memory model. Harnesses can inject persistent scripted sector
corruption with:
try control.corruptSector("wal.log", 0);
That simulator-control API records disk.fault ... kind=scripted_corruption;
later reads covering that sector return status=corrupt.
Writes are visible to later reads immediately, but they are pending until
sync. A crash processes pending writes according to the crash fault profile:
each pending write may land, be lost, be torn, or be applied out of issue
order. Synced writes are already committed and are not lost by crash.
try disk.write(.{ .path = "wal.log", .offset = 0, .bytes = sector_bytes });
try control.crash();
try control.restart();
While crashed, disk operations return error.DiskCrashed. Crash outcomes are
trace-visible:
disk.fault op=3 path=wal.log kind=crash_lost_write rate=1/10 roll=7 fired=false
disk.fault op=3 path=wal.log kind=crash_torn_write rate=1/10 roll=0 fired=true
disk.crash_write op=3 path=wal.log offset=0 len=4096 result=torn
disk.crash pending_writes=1 landed=0 lost=0 torn=1 reordered=0
disk.restart status=ok
Network
mar.Endpoint(Message) is the app-facing network handle. Simulation and
production setup both return this typed endpoint shape, while simulator-control
faults remain on control.network.
See Network Model for the design contract and current limits. See Network API Direction for the split between app-facing network authority and test-only simulator-control operations.
const Message = struct { value: u64 };
const sim = try world.simulate(.{ .network = .{
.nodes = 4,
.path_capacity = 64,
} });
const sender = try sim.endpoint(Message, 0);
const receiver = try sim.endpoint(Message, 1);
try sim.control.network.setLossiness(.{ .drop_rate = .percent(20) });
try sim.control.network.setLatency(.{
.min_latency_ns = 1_000_000,
.latency_jitter_ns = 2_000_000,
});
try sender.send(1, .{ .value = 42 });
while (try receiver.receive()) |envelope| {
_ = envelope.from;
_ = envelope.message;
}
send records network.send or network.drop. receive records
network.deliver, advances world time when the next queued packet is in the
future for that endpoint, and returns null when the endpoint has no pending
messages.
Latency values must align with the world's tick size because Phase 0 simulated time advances in whole ticks.
When a simulation owns time-evolved faults, advance time through simulation control:
try sim.control.tick();
try sim.control.runFor(10 * ns_per_ms);
This advances the backing world and then evolves network fault state.
Nodes are up by default. Mark one down or up with:
try sim.control.network.setNode(1, false);
try sim.control.network.setNode(1, true);
Directed links can be disabled and re-enabled:
try sim.control.network.setLink(0, 1, false);
try sim.control.network.setLink(0, 1, true);
Directed paths can also be clogged for a simulated duration:
try sim.control.network.clog(0, 1, 100 * ns_per_ms);
try sim.control.network.unclog(0, 1);
Partitions disable every directed link crossing between two groups:
const left = [_]mar.NodeId{0};
const right = [_]mar.NodeId{ 1, 2 };
try sim.control.network.partition(&left, &right);
try sim.control.network.heal();
Seeds
mar.parseSeed accepts decimal u64 seeds and 40-character Git hashes:
const seed = try mar.parseSeed("000000000000000000000000000000000000002a");
try std.testing.expectEqual(@as(u64, 42), seed);
Git hashes are parsed as u160 hexadecimal values and truncated to the low 64
bits. This is useful for CLI tools and CI jobs that want deterministic seed
variation by commit.
Trace Summary
mar.summarize(allocator, trace_bytes) builds an owned mar.Summary from a
Marionette trace. It is a debugging view, not a replay format.
var summary = try mar.summarize(allocator, trace);
defer summary.deinit();
try summary.writeSummary(writer);
The summary output is deterministic and line-oriented. It reports total event count, final simulated timestamp when present, replay context, subsystem and event counts, singleton events, network send/drop/delivery counts, drop reasons, and per-link network counts.
runCase And run
mar.runCase(opts) is the primary stateful scenario runner. It initializes
fresh state for each replay attempt, executes a scenario twice with the same
seed, runs named checks, and compares byte-identical traces.
mar.run(allocator, options, scenario) remains the lower-level world-only
runner for scenarios that do not need structured state.
fn scenario(world: *mar.World) !void {
try world.tick();
try world.record("scenario.done", .{});
}
var report = try mar.run(std.testing.allocator, .{ .seed = 0x1234 }, scenario);
defer report.deinit();
Runs can carry replay-visible tags and typed attributes:
const tags = [_][]const u8{ "example:replicated_register", "scenario:smoke" };
const attributes = [_]mar.RunAttribute{
mar.runAttribute("replicas", @as(u64, 3)),
mar.runAttribute("packet_loss_percent", @as(u8, 20)),
};
var report = try mar.run(std.testing.allocator, .{
.seed = 0x1234,
.name = "smoke",
.tags = &tags,
.attributes = &attributes,
}, scenario);
name, tags, and attributes are recorded into the trace before
scenario code runs and are included in failure summaries. Tags are loose
searchable labels. Attributes are stable scalar facts needed to reproduce the
run without forcing tools to parse presentation strings. Use
mar.runAttribute when writing exported metadata names directly.
mar.runAttributesFrom remains available for scalar-only config structs, but
it intentionally treats field names as exported attribute keys and emits fields
in declaration order. Runtime behavior should read from the config, not from
derived attributes.
World-only checks can be attached to the run options:
fn noBadState(world: *mar.World) !void {
if (std.mem.indexOf(u8, world.traceBytes(), "bad_state") != null) {
return error.BadState;
}
}
const checks = [_]mar.Check{
.{ .name = "no bad state", .check = noBadState },
};
var report = try mar.run(std.testing.allocator, .{
.seed = 0x1234,
.checks = &checks,
}, scenario);
defer report.deinit();
Stateful scenarios should usually use runCase. It infers the
state type from the initializer, and run metadata such as name, tags, and
attributes is optional:
const Model = struct {
env: mar.Env,
committed: bool = false,
fn init(world: *mar.World) Model {
const sim = world.simulate(.{}) catch unreachable;
return .{ .env = sim.env };
}
};
fn scenario(model: *Model) !void {
model.committed = true;
try model.env.record("model.commit", .{});
}
fn committed(model: *const Model) !void {
if (!model.committed) return error.NotCommitted;
}
const state_checks = [_]mar.StateCheck(Model){
.{ .name = "committed", .check = committed },
};
var report = try mar.runCase(.{
.allocator = std.testing.allocator,
.seed = 0x1234,
.name = "model-smoke",
.init = Model.init,
.scenario = scenario,
.checks = &state_checks,
});
defer report.deinit();
runCase initializes fresh state for each replay attempt and passes the
attempt's World into the initializer. Initializers may construct world-bound
simulator authorities, but should not record trace events. Stateful scenarios
and state checks receive only state; put environment authorities on the state
when they need to record or advance time. If state owns non-world resources,
provide .deinit = State.deinit; the deinitializer runs once per replay
attempt after scenario execution and checks.
Tests that only need pass/fail behavior can skip report handling:
try mar.expectPass(.{
.allocator = std.testing.allocator,
.seed = 0x1234,
.init = Model.init,
.scenario = scenario,
.checks = &state_checks,
});
try mar.expectFuzz(.{
.allocator = std.testing.allocator,
.seed = 0x1234,
.seeds = 1000,
.init = Model.init,
.scenario = scenario,
.checks = &state_checks,
});
Use mar.expectFailure when proving a checker catches a known-buggy scenario.
Use the lower-level mar.run for world-only scenarios. The older
runWithState* positional helpers are internal implementation details.
The return value is mar.RunReport:
.passedcontains the owned trace from the first successful run..failedcontains a failure report with seed, options, event counts, traces, failure kind, error name when available, and check name when a check failed.
RunFailure.writeSummary(writer) writes the compact failure line used by
RunFailure.print(). Prefer writeSummary in tests so failure output stays
stable.
See Run for details.
Error Policy
Marionette uses a small error policy:
- Invariant violations use
std.debug.assert. - Resource failures return standard Zig errors.
- Expected simulated faults will use domain-specific errors when Disk and Network exist.
Today, most fallible World methods fail only because trace logging can
allocate. That means standard allocator errors are the right surface for now.
Examples of assertions:
tick_nsmust be greater than zero.runFor(duration)must use a duration that is an exact multiple of the world's tick size.- Simulated timestamp arithmetic must not overflow.
Examples of returned errors:
- Trace allocation failure.
- Trace formatting allocation failure.
The project may add named aliases like TraceError once the trace API
settles, but it should not invent broad custom errors until there are real
domain failures to expose.
When mar.run catches a scenario error return, it preserves the partial trace
through the last completed event and includes that trace in the failure report.
Panics are harder because Zig's default panic path may abort before Marionette
can flush anything; users should prefer error-returning invariant checks for
simulated failures.
Build Support
src/build_support.zig exposes a helper for wiring marionette-tidy into a
build:
const marionette = @import("src/build_support.zig");
const tidy = marionette.addTidyStep(b, .{
.paths = &.{ "src", "examples", "tests" },
});
test_step.dependOn(&tidy.step);
The helper builds the marionette-tidy executable and creates a run step that
exits non-zero when banned non-deterministic calls are found. Projects can add
their own exact or prefix bans and file-level or pattern-level allow entries:
const tidy = marionette.addTidyStep(b, .{
.paths = &.{ "src", "examples", "tests" },
.extra_patterns = &.{
.{
.needle = "std.heap.page_allocator",
.reason = "pass an allocator explicitly",
},
.{
.needle = "std.posix",
.reason = "route host effects through explicit interfaces",
.match = .prefix,
},
},
.extra_allowed = &.{
.{ .path = "src/platform.zig", .needle = "std.posix" },
},
});
The current linter is AST-based: it ignores comments and string literals,
supports exact and prefix dotted-path bans, and catches simple const aliases
such as const time = std.time. It does not yet perform full semantic import
resolution.