On Thu, 26 Mar 2026 17:42:05 GMT, Alan Bateman <[email protected]> wrote:
>> Francesco Nigro has updated the pull request incrementally with one
>> additional commit since the last revision:
>>
>> Disable edge-triggered epoll for POLLER_PER_CARRIER mode
>>
>> Per-carrier sub-pollers have carrier affinity, which creates a
>> scheduling conflict with edge-triggered registrations: the sub-poller
>> competes with user VTs for the same carrier. By the time the sub-poller
>> runs, user VTs have often already consumed data via tryRead(), causing
>> the sub-poller to find a POLLED sentinel and waste a full park/unpark
>> cycle on the master (each costing an epoll_ctl). Under load this
>> causes a 2x throughput regression.
>>
>> VTHREAD_POLLERS mode is unaffected because its sub-pollers have no
>> carrier affinity and can run on any available carrier, processing
>> events before user VTs consume the data.
>
> This looks like a 1% improvement in ops/sec. I think we'll need to get a more
> real-world benchmark. Do you have something other than the micro.
>
> Do you agree with the proposal to put this in its own branch so that we can
> iterate on it?
@AlanBateman first round of results of the existing benchmark and the
explanation of a JMH bug I have found.
## Benchmark: Edge-triggered epoll vs EPOLLONESHOT for VT read sub-pollers
Machine: AMD Ryzen 9 7950X 16-Core, Linux 6.19.8
JVM args: -Djdk.pollerMode=2 -Djdk.virtualThreadScheduler.parallelism=P
-Djdk.virtualThreadScheduler.maxPoolSize=2*P -Djdk.readPollers=P
-Djmh.executor=VIRTUAL
JMH: -f 3 -wi 3 -w 5s -i 3 -r 10s -t 100 -p readSize=1
### Throughput (ops/s, higher is better)
<details>
<summary>Raw JMH output</summary>
EPOLLONESHOT (baseline):
Benchmark (readSize) (serverCount) Mode Cnt
Score Error Units
# parallelism=1, readPollers=1
SocketReadPollerBench.rpcRoundTrip 1 4 thrpt 9
123365.250 ± 1783.282 ops/s
# parallelism=2, readPollers=2
SocketReadPollerBench.rpcRoundTrip 1 4 thrpt 9
221708.710 ± 4651.978 ops/s
# parallelism=4, readPollers=4
SocketReadPollerBench.rpcRoundTrip 1 8 thrpt 9
436302.988 ± 11788.896 ops/s
EPOLLET (edge-triggered):
Benchmark (readSize) (serverCount) Mode Cnt
Score Error Units
# parallelism=1, readPollers=1
SocketReadPollerBench.rpcRoundTrip 1 4 thrpt 9
134129.081 ± 1199.593 ops/s
# parallelism=2, readPollers=2
SocketReadPollerBench.rpcRoundTrip 1 4 thrpt 9
244213.173 ± 3760.140 ops/s
# parallelism=4, readPollers=4
SocketReadPollerBench.rpcRoundTrip 1 8 thrpt 9
467931.437 ± 17245.761 ops/s
</details>
Ratio (ET / baseline):
parallelism=1: 1.087x (+8.7%) non-overlapping CI
parallelism=2: 1.102x (+10.2%) non-overlapping CI
parallelism=4: 1.072x (+7.2%) non-overlapping CI
### async-profiler CPU breakdown (parallelism=1, 30s, ~58K samples)
| Component | EPOLLONESHOT | EPOLLET | Delta |
|:---|---:|---:|:---|
| `epoll_ctl` path | 2,183 (3.8%) | 0 | **eliminated** |
| Poller loop (carrier) | 3,747 (6.5%) | 1,589 (2.7%) | **-57%** |
| Continuation mount/unmount | 29,399 | 29,417 | unchanged |
### Note: `maxPoolSize=2*P` workaround
JMH's VIRTUAL executor uses VTs for both benchmark workers and iteration
control (timing/warmdown signaling). At `parallelism>=2` with 100 busy VTs
doing tight blocking I/O loops, the iteration control VT can get starved and
never signal iteration end (`awaitWarmdownReady` hangs). Setting
`maxPoolSize=2*parallelism` provides enough carrier headroom for the JMH
control VTs to get scheduled. This is a JMH/scheduler interaction issue, not a
Loom bug.
-------------
PR Comment: https://git.openjdk.org/loom/pull/223#issuecomment-4141801767