** Description changed:
[SRU Justification]
[Impact]
The current epoll implementation in the 5.15 kernel utilizes a read-write
semaphore (rwlock_t) to protect the ready event list. While this allows
multiple producers to concurrently add items, it introduces a scheduling
priority inversion vulnerability.
If a high-priority consumer (such as a real-time thread calling epoll_wait) is
blocked waiting for the exclusive write lock, it can be indefinitely stalled
by
a low-priority producer holding the read lock. This results in
un-deterministic
system stalls and latency spikes.
+ [Fix]
+
+ Backport upstream commit:
+ 0c43094f8cc9 ("eventpoll: Replace rwlock with spinlock")
+
The fix involves replacing rwlock_t with spinlock_t, and removing the
now-redundant lockless helper functions (list_add_tail_lockless and
chain_epi_lockless). This ensures that under real-time configurations,
priority
inheritance works correctly across the epoll subsystem, eliminating the
priority inversion problem.
-
- [Fix]
-
- Backport upstream commit:
- 0c43094f8cc9 ("eventpoll: Replace rwlock with spinlock")
[Test Plan]
Due to the nature of scheduling priority inversion, reproducing this bug
reliably on demand is highly impractical. Because this race condition relies
on erratic, non-deterministic scheduling micro-windows, a standard
deterministic reproduction script cannot be provided.
Therefore, validation relies on verifying that the replacement locking
mechanism functions correctly, introduces no regressions, and scales safely
under synthetic load.
There is a test kernel available in the following ppa:
https://launchpad.net/~munirsid/+archive/ubuntu/lp2154194
[Where Problems Could Occur]
There is a trade-off in raw throughput for highly specific, synthetic
workloads.
As seen in the upstream commit description [0], in artificial benchmarks where
hundreds of threads continuously spam epoll events, throughput can drop by
~38%
due to serialization around the new spinlock.
However, testing with realistic workloads (via perf bench epoll wait) actually
demonstrates a performance improvement on x86 architectures.
The regression potential for real-world production environments is considered
low, as typical workloads do not exhibit continuous, uninterrupted
event-spamming behavior. Moreover, the fix is strictly isolated to
fs/eventpoll.c and alters no external kernel APIs.
[Other Info]
This bug was addressed upstream and has already been integrated into Noble and
subsequent releases.
[0] -
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0c43094f8cc9d3d99d835c0ac9c4fe1ccc62babd
** Description changed:
[SRU Justification]
[Impact]
The current epoll implementation in the 5.15 kernel utilizes a read-write
semaphore (rwlock_t) to protect the ready event list. While this allows
multiple producers to concurrently add items, it introduces a scheduling
priority inversion vulnerability.
If a high-priority consumer (such as a real-time thread calling epoll_wait) is
blocked waiting for the exclusive write lock, it can be indefinitely stalled
by
a low-priority producer holding the read lock. This results in
un-deterministic
system stalls and latency spikes.
[Fix]
Backport upstream commit:
0c43094f8cc9 ("eventpoll: Replace rwlock with spinlock")
The fix involves replacing rwlock_t with spinlock_t, and removing the
now-redundant lockless helper functions (list_add_tail_lockless and
chain_epi_lockless). This ensures that under real-time configurations,
priority
inheritance works correctly across the epoll subsystem, eliminating the
priority inversion problem.
[Test Plan]
Due to the nature of scheduling priority inversion, reproducing this bug
reliably on demand is highly impractical. Because this race condition relies
on erratic, non-deterministic scheduling micro-windows, a standard
deterministic reproduction script cannot be provided.
Therefore, validation relies on verifying that the replacement locking
mechanism functions correctly, introduces no regressions, and scales safely
under synthetic load.
There is a test kernel available in the following ppa:
https://launchpad.net/~munirsid/+archive/ubuntu/lp2154194
[Where Problems Could Occur]
- There is a trade-off in raw throughput for highly specific, synthetic
workloads.
- As seen in the upstream commit description [0], in artificial benchmarks where
- hundreds of threads continuously spam epoll events, throughput can drop by
~38%
- due to serialization around the new spinlock.
+ There could be a performance degradation with highly specific, synthetic
+ workloads on the GA kernel. As seen in the upstream commit description [0],
+ in artificial benchmarks where hundreds of threads continuously spam epoll
+ events, throughput can drop due to serialization around the new spinlock.
However, testing with realistic workloads (via perf bench epoll wait) actually
demonstrates a performance improvement on x86 architectures.
- The regression potential for real-world production environments is considered
- low, as typical workloads do not exhibit continuous, uninterrupted
- event-spamming behavior. Moreover, the fix is strictly isolated to
- fs/eventpoll.c and alters no external kernel APIs.
+ The regression potential for real-world production environments is low, as
+ typical workloads do not exhibit continuous, uninterrupted event-spamming
+ behavior. Moreover, the fix is strictly isolated to fs/eventpoll.c and alters
+ no external kernel APIs.
[Other Info]
- This bug was addressed upstream and has already been integrated into Noble and
- subsequent releases.
+ Similar issues have been reported in [1] and [2]. This bug was addressed
+ upstream [0] and has already been integrated into Noble and subsequent
+ releases. Backporting this to Jammy ensures critical stability for LTS users
+ utilizing the real-time kernel.
- [0] -
-
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0c43094f8cc9d3d99d835c0ac9c4fe1ccc62babd
+ [0] -
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0c43094f8cc9d3d99d835c0ac9c4fe1ccc62babd
+ [1] -
https://lore.kernel.org/linux-rt-users/[email protected]/
+ [2] -
https://lore.kernel.org/linux-rt-users/20210825132754.GA895675@lothringen/
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2154194
Title:
[Jammy] Priority inversion problem in epoll for rt kernel
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2154194/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs