Skip to content

Network Model

This is a design note for Marionette's current unstable network simulation work. It is not the final public network API yet.

For the intended production/simulation API split, see Network API Direction.

The goal is a deterministic network authority that can make distributed failures replayable from a seed. The first slice is intentionally small: messages can be delayed, dropped, queued, filtered by directed link state, clogged by directed path, partitioned, healed, stopped, restarted, and delivered in a stable order. Replay recording, node spawning, and the final scheduler API come later.

VOPR Comparison

TigerBeetle's VOPR network is built around PacketSimulator, a deterministic packet core with one link for every directed source-target path. Each link owns a queue, a command filter, an optional packet drop predicate, an optional recording filter, and path clog state. Global packet simulator options include node/client counts, seed, latency distribution, packet loss, packet replay, automatic partition settings, partition stability, unpartition stability, per-path capacity, and path clog probability/duration.

The portable lessons for Marionette are:

  • Treat the network as simulator-owned machinery, not real sockets.
  • Keep packet send/delivery separate from simulator-control faults.
  • Queue packets per directed path, not in one global network bucket.
  • Give every packet a replay-visible identity.
  • Make latency, loss, replay, clogs, and partitions seeded simulator decisions.
  • Evolve random network faults only from the simulator tick.
  • Trace sends, drops, deliveries, and state changes separately.
  • Layer advanced faults on top of a small packet core.

The current UnstableNetwork already follows the core shape: fixed topology, per-directed-path queues, path-local capacity, stable packet ids, seeded latency/drop decisions, explicit link filters, explicit partitions, path clogs, node up/down state, and delivery order by (deliver_at, packet_id).

The main differences are deliberate:

  • VOPR has broader tick-evolved automatic fault scheduling; Marionette now has a narrow version for per-path clogs and node-isolating partitions.
  • VOPR can replay packets and record selected command classes for later replay; Marionette has whole-run seed replay but no packet-sequence replay.
  • VOPR has command-aware link filters and optional per-link drop predicates; Marionette is payload-generic and does not know user protocol commands yet.
  • VOPR uses an exponential latency model with a minimum; Marionette currently uses uniform tick-aligned jitter.
  • VOPR can randomly drop an already queued packet when a path is over capacity; Marionette returns queue-capacity errors from the packet core.
  • VOPR's automatic partitions are replica/node-focused; Marionette's manual partition helper can target any configured process, including clients.

Marionette should not copy TigerBeetle's full harness. TigerBeetle has a production protocol, message pools, client/replica process identities, command classes, replay recording, and liveness-specific modes. Marionette needs the same discipline in a generic API, not the same product-specific surface.

Current API

The current app-facing type is mar.Endpoint(Message). Simulation setup creates the backing topology and returns node-scoped typed endpoints from the composition root:

const sim = try world.simulate(.{ .network = .{
    .nodes = 4,
    .service_nodes = 3,
    .path_capacity = 64,
} });
const sender = try sim.endpoint(Message, 0);
const receiver = try sim.endpoint(Message, 1);

nodes declares the total simulated process ids. With the example above, valid process ids are 0, 1, 2, and 3. path_capacity is per directed link, not global. service_nodes declares the prefix of process ids eligible for automatic node-isolating partitions; when omitted or zero, all processes are eligible. This is explicit because .nodes alone does not tell Marionette which ids are replicas/services and which ids are clients. Clients still participate in the topology and can be partitioned from services, but automatic node isolation chooses from the service prefix.

Message is user-owned data. Marionette only schedules and traces the packet metadata:

const Message = struct {
    value: u64,
};

try sim.control.network.setLossiness(.{ .drop_rate = .percent(20) });
try sim.control.network.setLatency(.{
    .min_latency_ns = 1_000_000,
    .latency_jitter_ns = 2_000_000,
});
try sender.send(1, .{ .value = 42 });

Deliverable messages are consumed explicitly:

while (try receiver.receive()) |envelope| {
    try apply(envelope.from, envelope.message);
}

receive advances simulated time when needed and returns null when the endpoint has no pending messages.

Application-shaped code sends and drains through typed endpoints, while fault orchestration goes through sim.control.network. mar.UnstableNetwork remains the lower-level packet-core primitive for focused simulator work.

Topology

The topology is fixed when simulation is created:

.nodes = 4,

All node-shaped APIs reject ids outside 0..nodes. That gives the simulator a known universe for partitions, per-link queues, node state, and future liveness cores. It also makes invalid topology use return InvalidNode instead of silently creating new processes by accident.

Each directed path owns its own queue and enabled/disabled state. The packet core scans path queue heads and picks the ready packet with the lowest (deliver_at, packet_id) for the receiving endpoint. The scan is acceptable for Phase 0 capacities; a later scheduler can add an index over active paths without changing the per-link model needed for clogging and path-local capacity.

Node State

Nodes are up by default. Mark a simulated process down or up with:

try sim.control.network.setNode(1, false);
try sim.control.network.setNode(1, true);

A down source cannot submit new packets. send still consumes a stable packet id and records:

network.drop id=<id> from=1 to=2 reason=source_down

A down destination drops ready packets at delivery time:

network.drop id=<id> from=0 to=1 reason=destination_down

Queued packets are not removed when a node goes down. If the destination is restarted before delivery time, the packet can still be delivered. That keeps process state as another deterministic delivery gate, like directed link state, without trying to model full process-local storage or restart behavior yet.

Links are directed. A disabled link drops ready packets at delivery time:

try sim.control.network.setLink(0, 1, false);

If a packet from node 0 to node 1 is already queued when the link is disabled, it remains queued. When it becomes ready, endpoint receive drops it and records:

network.drop id=<id> from=0 to=1 reason=link_disabled

This mirrors the VOPR-style idea that the network's link state at delivery can decide whether an in-flight packet makes it through.

Re-enable a directed link with:

try sim.control.network.setLink(0, 1, true);

Path Clogging

Clogs are directed path faults. A clogged path keeps its packets queued until simulated time reaches the clog deadline, while other paths keep delivering:

try sim.control.network.clog(0, 1, 100 * ns_per_ms);

If a packet for 0 -> 1 is ready at t=10ms but the path is clogged until t=100ms, endpoint receive skips that path and may deliver packets from other paths addressed to that endpoint first. The packet core's internal next-delivery calculation accounts for active clogs.

Clear one path clog explicitly with:

try sim.control.network.unclog(0, 1);

Clear all active clogs with:

try sim.control.network.unclogAll();

Clogs also expire when simulated time reaches until_ns. Endpoint receive evolves that deterministic expiry state before selecting a packet as a backstop. Scenario and scheduler code should move simulated time through sim.control.tick() or sim.control.runFor(...) so probabilistic network faults evolve at the same boundary as the clock.

Partitions

Partitions are expressed as batches of directed link filters. The current helper disables both directions between two groups:

const left = [_]mar.NodeId{0};
const right = [_]mar.NodeId{ 1, 2 };
try sim.control.network.partition(&left, &right);

This disables 0 -> 1, 1 -> 0, 0 -> 2, and 2 -> 0, while leaving traffic inside the right side alone.

Heal all disabled links with:

try sim.control.network.heal();

heal restores default network state by re-enabling links and marking nodes up, and it clears active clogs. Use healLinks when a scenario needs to clear link filters without changing node state or path clogs.

This is deliberately simple. Later network work can add asymmetric partitions, automatic partition schedules, and liveness modes.

Ordering

Packets are ordered by:

  1. deliver_at
  2. packet_id

That is the same basic tie-breaker discipline Marionette uses elsewhere: simulated time first, stable id second. Pointers, host thread scheduling, hash map iteration order, and wall-clock time must never decide delivery order.

Time

Network latency is measured in nanoseconds, but it must align with the world's tick size. Phase 0 simulated time advances in whole ticks, so UnstableNetwork rejects min_latency_ns and latency_jitter_ns values that are not whole multiples of the world's tick.

When using composition-root simulation, prefer:

try sim.control.tick();
try sim.control.runFor(10 * ns_per_ms);

over calling world.tick() or world.runFor(...) directly. Simulation control advances the world and then evolves network fault state. This mirrors VOPR's outer simulator tick and keeps future disk/network/crash subsystems from each needing separate caller-managed ticks.

The current latency model is uniform integer jitter over whole ticks:

latency = min_latency_ns + random(0..latency_jitter_ns)

where the random jitter is tick-aligned.

Later versions may add distributions such as exponential latency. The first priority is deterministic replay and clear traces, not realism.

Drops

Every send consumes a packet id. If the drop decision fires, Marionette records network.drop and does not enqueue the payload.

The current drop model uses BuggifyRate:

try sim.control.network.setLossiness(.{ .drop_rate = .percent(20) });

This keeps the API consistent with BUGGIFY without making packet drops into opaque user behavior. Marionette owns the random decision; user code owns the payload and protocol semantics.

Trace Events

Current network trace events:

  • network.send id={} from={} to={} deliver_at={} latency_ns={}
  • network.drop id={} from={} to={} drop_rate={}/{} roll={} reason=send_drop
  • network.drop id={} from={} to={} reason=source_down
  • network.drop id={} from={} to={} reason=destination_down
  • network.drop id={} from={} to={} reason=link_disabled
  • network.deliver id={} from={} to={} now_ns={}
  • network.node node={} up={}
  • network.link from={} to={} enabled={}
  • network.clog from={} to={} duration_ns={} until_ns={}
  • network.clog from={} to={} duration_ns={} until_ns={} automatic=true
  • network.unclog from={} to={} active={}
  • network.unclog_all clogged_count={}
  • network.partition left_count={} right_count={}
  • network.auto_partition node={} isolated_count=1 connected_count={}
  • network.auto_heal node={}
  • network.heal disabled_count={} down_count={} clogged_count={}
  • network.heal_links disabled_count={}
  • network.lossiness drop_rate={}/{}
  • network.latency min_latency_ns={} latency_jitter_ns={}
  • network.clog_faults path_clog_rate={}/{} path_clog_duration_ns={}
  • network.partition_dynamics partition_rate={}/{} unpartition_rate={}/{} partition_stability_min_ns={} unpartition_stability_min_ns={}

The payload is not dumped into the core network trace. User code should record domain-specific payload facts separately when useful, as the replicated register example does with register.message.

Fault Evolution

Packet loss is still a send-time decision. Probabilistic path clogs and automatic partitions are tick-evolved decisions: random rolls happen only when simulation control advances time with sim.control.tick() or sim.control.runFor(...). Lazy popReady expiration remains only for deterministic clog deadlines; random partition or clog probabilities do not fire from observation methods.

The runtime fault profile is separate from static topology. Prefer focused control calls for scenario readability:

try sim.control.network.setLossiness(.{ .drop_rate = .percent(1) });
try sim.control.network.setLatency(.{
    .min_latency_ns = 1 * ns_per_ms,
    .latency_jitter_ns = 2 * ns_per_ms,
});
try sim.control.network.setClogs(.{
    .path_clog_rate = .percent(10),
    .path_clog_duration_ns = 2 * ns_per_ms,
});
try sim.control.network.setPartitionDynamics(.{
    .partition_rate = .percent(1),
    .unpartition_rate = .percent(5),
    .partition_stability_min_ns = 20 * ns_per_ms,
    .unpartition_stability_min_ns = 20 * ns_per_ms,
});

SimNetworkOptions describes what exists; runtime fault controls describe how the simulator may perturb it during a run. The focused setters replace their own fault family and leave the other fault families unchanged. Automatic partitioning currently isolates one random service node from every other configured process and heals only after the unpartition stability floor has elapsed and the unpartition roll fires. Explicit partition, heal, setLink, and clog calls remain immediate scenario actions.

Current Limits

UnstableNetwork does not yet support:

  • Replay recording.
  • Packet duplication.
  • Broadcast.
  • Node spawning.
  • Multiple named buses or bus registry.
  • Command-aware or user-classified link filters.
  • Per-link drop predicates.
  • Exponential or profile-selected latency distributions.
  • Capacity-overflow policies other than returning EventQueueFull.
  • Event-by-event scheduler callbacks.
  • Human summary rendering.

These are deliberate omissions. The current primitive should prove the smallest useful packet core before growing.

Next Step

The app-facing/control split in network-api.md still matters. The remaining network work is liveness-oriented: replay recording, duplicate/broadcast semantics, richer latency distributions, and named bus composition. UnstableNetwork remains a simulator primitive; examples should not teach it as the final production network surface.