** Description changed:
[SRU Justification]
[Impact]
The current epoll implementation in the 5.15 kernel utilizes a read-write
semaphore (rwlock_t) to protect the ready event list. While this allows
multiple producers to concurrently add items, it introduces a scheduling
priority inversion vulnerability.
If a high-priority consumer (such as a real-time thread calling epoll_wait) is
blocked waiting for the exclusive write lock, it can be indefinitely stalled
by
a low-priority producer holding the read lock. This results in
un-deterministic
system stalls and latency spikes.
- The fix involves replacing rwlock_t with a standard spinlock_t one-to-one, and
- removing the now-redundant lockless helper functions (list_add_tail_lockless
- and chain_epi_lockless). This ensures that under real-time configurations,
- priority inheritance works correctly across the epoll subsystem, eliminating
- priority inversion.
+ The fix involves replacing rwlock_t with spinlock_t, and removing the
+ now-redundant lockless helper functions (list_add_tail_lockless and
+ chain_epi_lockless). This ensures that under real-time configurations,
priority
+ inheritance works correctly across the epoll subsystem, eliminating priority
+ inversion.
[Fix]
Backport upstream commit:
0c43094f8cc9 ("eventpoll: Replace rwlock with spinlock")
[Test Plan]
Due to the nature of scheduling priority inversion, reproducing this bug
reliably on demand is highly impractical. Because this race condition relies
on erratic, non-deterministic scheduling micro-windows, a standard
deterministic reproduction script cannot be provided.
Therefore, validation relies on verifying that the replacement locking
mechanism functions correctly, introduces no regressions, and scales safely
under synthetic load.
There is a test kernel available in the following PPA:
https://launchpad.net/~munirsid/+archive/ubuntu/lp2154194
[Where Problems Could Occur]
There could be a performance degradation with highly specific, synthetic
workloads on the GA kernel. As seen in the upstream commit description [0],
in artificial benchmarks where hundreds of threads continuously spam epoll
events, throughput can drop due to serialization around the new spinlock.
However, testing with realistic workloads (via perf bench epoll wait) actually
demonstrates a performance improvement on x86 architectures.
The regression potential for real-world production environments is low, as
typical workloads do not exhibit continuous, uninterrupted event-spamming
behavior. Moreover, the fix is strictly isolated to fs/eventpoll.c and alters
no external kernel APIs.
[Other Info]
This bug was addressed upstream and has already been integrated into Noble and
subsequent releases.
[0] -
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=0c43094f8cc9d3d99d835c0ac9c4fe1ccc62babd
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2154194
Title:
[Jammy] Priority inversion problem in epoll for rt kernel
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2154194/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs