GitHub user numinnex added a comment to the discussion: Simulator "Kill Node" 
Operation - Design Discussion

Here is an short markdown with some more details: 

## Core idea

Keep `IggyShard` the same in production and in the simulator.

The difference should be only:

- `MessageBus` implementation
- storage implementation
- runtime driver

## Runtime split

Do not make `MessageBus` or `IggyShard` own the event loop.

Use this boundary:

```text
IggyShard / consensus / handlers
    -> MessageBus::send_to_replica()
    -> MessageBus::send_to_client()
    -> stage outbound message

runtime driver
    -> drain outbound messages
    -> deliver inbound messages
```

## Production

In the real server / cluster binary:

- keep the real `compio` listener tasks
- keep the per-accepted-connection read tasks
- keep transport tasks enabled only when config enables `tcp` / `quic` / `http` 
/ `ws`

So production still looks like:

- listener accepts connection
- connection task reads and decodes frames
- decoded inbound message is injected into `IggyShard`
- outbound messages are drained from the bus and written by runtime-owned tasks

## Simulator

In the simulator:

- do not start the real listener/connection tasks
- do not bind sockets
- let `Simulator::step()` be the runtime driver

So simulator does:

- drain outbound messages from the bus
- submit them into simulated network
- take ready packets from simulated network
- inject them into `IggyShard`

## Important rule

Inbound should not go through `MessageBus`.

`MessageBus` should stay outbound-only.

- sockets / simulator network produce inbound messages
- inbound messages go directly into shard/replica handling
- bus is only for outbound staging

## Most important implementation points
1. `MemBus` should become outbox-only, not the global delivery queue.
2. The simulator should own node liveness: `Up`, `Paused`, `Down`.
3. `replica_crash()` should discard transient state but preserve durable state.
4. `replica_restart()` should rebuild a fresh runtime from durable state.
5. All inter-node traffic should go through the simulator `Network`.

GitHub link: 
https://github.com/apache/iggy/discussions/3017#discussioncomment-16297425

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to