kevin-wu24 opened a new pull request, #21574:
URL: https://github.com/apache/kafka/pull/21574
This PR implements step 1 of the incremental refactor: wrapping the existing
`poll()` call in an EventExecutor-based self-rescheduling loop. The
`KafkaRaftClient` operates the same as before, but now the
`KafkaRaftClientDriver` holds an `EventExecutor` instead of extending
`ShutdownableThread`.
### What changed
- Adds `EventExecutor` as a constructor parameter to `KafkaRaftClient`, so
the client holds a direct reference to the executor alongside
`KafkaRaftClientDriver`. This positions the client to schedule its own events
(election timeouts, heartbeats, batch drains, etc.) directly in future PRs,
without introducing a circulardependency between the client and driver.
- Refactors `KafkaRaftClient` to use event-based scheduling via
`KafkaRaftClientDriver`, replacing the previous `ShutdownableThread` polling
loop. The driver submits a self-rescheduling poll event to an `EventExecutor`,
with identical runtime behavior to the old thread-based approach.
- Introduces the `EventExecutor` interface and `DefaultEventExecutor`
implementation in server-common, providing a single-threaded executor with
submit(), schedule(), and graceful shutdown() semantics. Adds
`MockEventExecutor` for deterministic testing with manual poll() and
MockTime-driven scheduled task expiration.
Test plan
- `KafkaRaftClientDriverTest` — verifies the self-rescheduling poll loop,
graceful shutdown, and fault handler invocation on uncaught exceptions
- `DefaultEventExecutorTest` — covers submit()/schedule() for runnables
and callables, exception propagation, cancellation, capacity enforcement, and
shutdown semantics
- `MockEventExecutorTest` — validates deterministic polling, scheduled
task expiration via MockTime, cancellation, shutdown rejection, and no-op poll()
### Proposed next steps of refactor
Prior to this PR, an invoke of `KafkaRaftClient#poll()` can be thought of as
appending 3 distinct "events":
1. A poll state event: this brings the local client state "up-to-date" and
potentially sends outgoing RPCs
2. A blocking inbound message handling event: the client waits for a inbound
message to handle and update more state
3. Polling listeners event: update raft listener states
After this PR, we will only have 1 event (`poll()`) on the event queue, so
the behavior of the system should be identical. However, subsequent PRs will
break up what currently goes on in poll into distinct events. We may also want
to write some benchmarks to make sure performance regressions aren't happening
due to this refactor.
I think the most natural iteration in the next PR will be breaking things up
into the events mentioned above. What this might look like is:
- the event queue initially starts empty and the client periodically
schedule a "poll state event" to kick off KRaft
- whenever an inbound message comes, the client can first submit a "poll
state event" if we consider the last poll state to be too out of date, then a
"handle inbound message event", and finally a "poll listeners" event
This way the behavior should also be the same as currently, except we could
remove the `RaftMessageQueue` at that point, and start using the
`EventExecutor` directly in unit tests. Then follow-on PRs can break the "poll
state event" down further.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]