[PR] KAFKA-19830 [1/N]: refactor KafkaRaftClient to use event-based scheduling [kafka]

via GitHub Wed, 25 Feb 2026 01:29:14 -0800


kevin-wu24 opened a new pull request, #21574:
URL: https://github.com/apache/kafka/pull/21574


   This PR implements step 1 of the incremental refactor: wrapping the existing 
`poll()` call in an EventExecutor-based self-rescheduling loop. The 
`KafkaRaftClient` operates the same as before, but now the 
`KafkaRaftClientDriver` holds an `EventExecutor` instead of extending 
`ShutdownableThread`.
   
   ###   What changed
   
    - Adds `EventExecutor` as a constructor parameter to `KafkaRaftClient`, so 
the client holds a direct reference to the executor alongside 
`KafkaRaftClientDriver`. This positions the client to schedule its own events 
(election timeouts, heartbeats, batch drains, etc.) directly in future PRs, 
without introducing a circulardependency between the client and driver.
     - Refactors `KafkaRaftClient` to use event-based scheduling via 
`KafkaRaftClientDriver`, replacing the previous `ShutdownableThread` polling 
loop. The driver submits a self-rescheduling poll event to an `EventExecutor`, 
with identical runtime behavior to the old thread-based approach.
     - Introduces the `EventExecutor` interface and `DefaultEventExecutor` 
implementation in server-common, providing a single-threaded executor with 
submit(), schedule(), and graceful shutdown() semantics. Adds 
`MockEventExecutor` for deterministic testing with manual poll() and 
MockTime-driven scheduled task expiration.
   
     Test plan
   
     - `KafkaRaftClientDriverTest` — verifies the self-rescheduling poll loop, 
graceful shutdown, and fault handler invocation on uncaught exceptions
     - `DefaultEventExecutorTest` — covers submit()/schedule() for runnables 
and callables, exception propagation, cancellation, capacity enforcement, and 
shutdown semantics
     - `MockEventExecutorTest` — validates deterministic polling, scheduled 
task expiration via MockTime, cancellation, shutdown rejection, and no-op poll()
   
   ### Proposed next steps of refactor
   
   Prior to this PR, an invoke of `KafkaRaftClient#poll()` can be thought of as 
appending 3 distinct "events":
   
   1. A poll state event: this brings the local client state "up-to-date" and 
potentially sends outgoing RPCs
   2. A blocking inbound message handling event: the client waits for a inbound 
message to handle and update more state
   3. Polling listeners event: update raft listener states
   
   After this PR, we will only have 1 event (`poll()`) on the event queue, so 
the behavior of the system should be identical. However, subsequent PRs will 
break up what currently goes on in poll into distinct events. We may also want 
to write some benchmarks to make sure performance regressions aren't happening 
due to this refactor.
   
   I think the most natural iteration in the next PR will be breaking things up 
into the events mentioned above. What this might look like is: 
   
   - the event queue initially starts empty and the client periodically 
schedule a "poll state event" to kick off KRaft 
   - whenever an inbound message comes, the client can first submit a "poll 
state event" if we consider the last poll state to be too out of date, then a 
"handle inbound message event", and finally a "poll listeners" event
   
   This way the behavior should also be the same as currently, except we could 
remove the `RaftMessageQueue` at that point, and start using the 
`EventExecutor` directly in unit tests. Then follow-on PRs can break the "poll 
state event" down further.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] KAFKA-19830 [1/N]: refactor KafkaRaftClient to use event-based scheduling [kafka]

Reply via email to