On Thu, 26 Mar 2026 14:40:16 GMT, Alan Bateman <[email protected]> wrote:
>> Ported from 7ac9ca128885c5dd561e6fbd6bbeaddb86d6264c to the latest upstream >> fibers branch. Adapted to the current API which renamed >> implRegister/implDeregister to implStartPoll/implStopPoll and added >> Mode/EventFD/Cleaner/PollerGroup architecture. > > Just an FYI that we've been experiments with epoll edge triggered mode in the > past. The main concerns were that it's very fragile (only works with specific > usage patterns) and adds complexity by way of book keeping. Yes, it can > reduce the need to re-arm a file descriptor but overall it was never clear if > significant benefits could be proving in real world cases to justify the > complexity. > > I'm not opposed to trying again but I think this require creating a new > branch and iterating there. Would you be okay with that? > > I think it would be useful to know what testing has been done so far. I did > some quick testing and see failures/timeouts with HTTP3 tests which seems to > be UDP or selection ops in the context of a virtual thread. I think it would > also be useful to see some benchmark data. @AlanBateman I've added a JMH benchmark in https://github.com/openjdk/loom/pull/223/commits/28755d93663ce722b53cea38dabbb5701c2a6e1d. IMO, since is not a CPU bound test, it requires some care to read its results Running it produces this diff: ┌─────────────────────┬──────────┬────────────────┬─────────────────┐ │ Counter │ Baseline │ Edge-triggered │ Ratio │ ├─────────────────────┼──────────┼────────────────┼─────────────────┤ │ ops/s │ 105,620 │ 107,168 │ 1.01x │ ├─────────────────────┼──────────┼────────────────┼─────────────────┤ │ cycles/op │ 11,724 │ 3,272 │ 3.6x less │ ├─────────────────────┼──────────┼────────────────┼─────────────────┤ │ instructions/op │ 7,009 │ 2,031 │ 3.5x less │ ├─────────────────────┼──────────┼────────────────┼─────────────────┤ │ branches/op │ 1,513 │ 438 │ 3.5x less │ ├─────────────────────┼──────────┼────────────────┼─────────────────┤ │ branch-misses/op │ 115 │ 33 │ 3.5x less │ ├─────────────────────┼──────────┼────────────────┼─────────────────┤ │ L1-dcache-loads/op │ 2,764 │ 816 │ 3.4x less │ ├─────────────────────┼──────────┼────────────────┼─────────────────┤ │ L1-dcache-misses/op │ 358 │ 101 │ 3.5x less │ ├─────────────────────┼──────────┼────────────────┼─────────────────┤ │ stalled-frontend/op │ 5,144 │ 1,442 │ 3.6x less │ ├─────────────────────┼──────────┼────────────────┼─────────────────┤ │ CPI │ 1.67 │ 1.61 │ slightly better │ └─────────────────────┴──────────┴────────────────┴─────────────────┘ In short it is a huge (for the specific case with tiny read) CPU saving, but it won't impact latencies being bound to loopback RTT. But the CPU saving is there, and pretty relevant. As per https://github.com/openjdk/loom/pull/223/commits/7e36c5fe089db64cb7a4921aae9a9ef5f583fbb2 instead: I have pushed a fix for a race condition (you rightly pointed out to be complex to deal with ET and I can agree w it :D ) which disable ET for pollerMode=3 due to this behaviour: - with pollerMode=2 lazy submit allow a subpoller to enqueue locally first an awaken VT, without finding the CHM entry in POLLED state - with pollerMode=2 there's no "local" submit (unless a custom scheduler implement it!) so is likely another FJ worker which compete with the subpoller, finding POLLED state: this wastes a full park/unpark cycle on the master poller (each costing an epoll_ctl on it!) So, in short, the fix at https://github.com/openjdk/loom/pull/223/commits/1ac6dc35942e4a71a2a6dcac400cba6d992ee85b is good enough for pollerMode=2 (which rarely see it unless stealing from the FJ worker local queue), but pollerMode=3 with built-in scheduler, won't make it that good, bothering the master poller way too much. ------------- PR Comment: https://git.openjdk.org/loom/pull/223#issuecomment-4136322872
