** Description changed:

- The current epoll implementation in the Jammy kernel utilizes a read-
- write semaphore (rwlock_t) to protect the ready event list. While this
- allows multiple producers to concurrently add items, it introduces a
- scheduling priority inversion vulnerability.
+ [SRU Justification]
+ 
+ [Impact]
+ 
+ The current epoll implementation in the 5.15 kernel utilizes a read-write
+ semaphore (rwlock_t) to protect the ready event list. While this allows
+ multiple producers to concurrently add items, it introduces a scheduling
+ priority inversion vulnerability. 
+ 
+ If a high-priority consumer (such as a real-time thread calling epoll_wait) is
+ blocked waiting for the exclusive write lock, it can be indefinitely stalled 
by
+ a low-priority producer holding the read lock. This results in 
un-deterministic
+ system stalls and latency spikes.
+ 
+ The fix involves replacing rwlock_t with a standard spinlock_t one-to-one, and
+ removing the now-redundant lockless helper functions (list_add_tail_lockless
+ and chain_epi_lockless). This ensures that under real-time configurations,
+ priority inheritance works correctly across the epoll subsystem, eliminating
+ priority inversion.
+ 
+ [Fix]
+ 
+ Backport upstream commit:
+ 0c43094f8cc9 ("eventpoll: Replace rwlock with spinlock")
+ 
+ [Test Plan]
+ 
+ Due to the nature of scheduling priority inversion, reproducing this bug
+ reliably on demand is highly impractical. Because this race condition relies
+ on erratic, non-deterministic scheduling micro-windows, a standard
+ deterministic reproduction script cannot be provided.
+ 
+ Therefore, validation relies on verifying that the replacement locking
+ mechanism functions correctly, introduces no regressions, and scales safely
+ under synthetic load.
+ 
+ There is a test kernel available in the following ppa:
+ https://launchpad.net/~munirsid/+archive/ubuntu/lp2154194
+ 
+ [Where Problems Could Occur]
+ 
+ There is a trade-off in raw throughput for highly specific, synthetic 
workloads.
+ As seen in the upstream commit description [0], in artificial benchmarks where
+ hundreds of threads continuously spam epoll events, throughput can drop by 
~38%
+ due to serialization around the new spinlock. 
+ 
+ However, testing with realistic workloads (via perf bench epoll wait) actually
+ demonstrates a performance improvement on x86 architectures.
+ 
+ The regression potential for real-world production environments is considered
+ low, as typical workloads do not exhibit continuous, uninterrupted
+ event-spamming behavior. Moreover, the fix is strictly isolated to
+ fs/eventpoll.c and alters no external kernel APIs.
+ 
+ [Other Info]
+ 
+ This bug was addressed upstream and has already been integrated into newer
+ Ubuntu releases. Noble and subsequent releases already include this fix.

** Description changed:

  [SRU Justification]
  
  [Impact]
  
  The current epoll implementation in the 5.15 kernel utilizes a read-write
  semaphore (rwlock_t) to protect the ready event list. While this allows
  multiple producers to concurrently add items, it introduces a scheduling
- priority inversion vulnerability. 
+ priority inversion vulnerability.
  
  If a high-priority consumer (such as a real-time thread calling epoll_wait) is
  blocked waiting for the exclusive write lock, it can be indefinitely stalled 
by
  a low-priority producer holding the read lock. This results in 
un-deterministic
  system stalls and latency spikes.
  
  The fix involves replacing rwlock_t with a standard spinlock_t one-to-one, and
  removing the now-redundant lockless helper functions (list_add_tail_lockless
  and chain_epi_lockless). This ensures that under real-time configurations,
  priority inheritance works correctly across the epoll subsystem, eliminating
  priority inversion.
  
  [Fix]
  
  Backport upstream commit:
  0c43094f8cc9 ("eventpoll: Replace rwlock with spinlock")
  
  [Test Plan]
  
  Due to the nature of scheduling priority inversion, reproducing this bug
  reliably on demand is highly impractical. Because this race condition relies
  on erratic, non-deterministic scheduling micro-windows, a standard
  deterministic reproduction script cannot be provided.
  
  Therefore, validation relies on verifying that the replacement locking
  mechanism functions correctly, introduces no regressions, and scales safely
  under synthetic load.
  
  There is a test kernel available in the following ppa:
  https://launchpad.net/~munirsid/+archive/ubuntu/lp2154194
  
  [Where Problems Could Occur]
  
  There is a trade-off in raw throughput for highly specific, synthetic 
workloads.
  As seen in the upstream commit description [0], in artificial benchmarks where
  hundreds of threads continuously spam epoll events, throughput can drop by 
~38%
- due to serialization around the new spinlock. 
+ due to serialization around the new spinlock.
  
  However, testing with realistic workloads (via perf bench epoll wait) actually
  demonstrates a performance improvement on x86 architectures.
  
  The regression potential for real-world production environments is considered
  low, as typical workloads do not exhibit continuous, uninterrupted
  event-spamming behavior. Moreover, the fix is strictly isolated to
  fs/eventpoll.c and alters no external kernel APIs.
  
  [Other Info]
  
  This bug was addressed upstream and has already been integrated into newer
  Ubuntu releases. Noble and subsequent releases already include this fix.
+ 
+ [0] -
+ 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=0c43094f8cc9d3d99d835c0ac9c4fe1ccc62babd

** Description changed:

  [SRU Justification]
  
  [Impact]
  
  The current epoll implementation in the 5.15 kernel utilizes a read-write
  semaphore (rwlock_t) to protect the ready event list. While this allows
  multiple producers to concurrently add items, it introduces a scheduling
  priority inversion vulnerability.
  
  If a high-priority consumer (such as a real-time thread calling epoll_wait) is
  blocked waiting for the exclusive write lock, it can be indefinitely stalled 
by
  a low-priority producer holding the read lock. This results in 
un-deterministic
  system stalls and latency spikes.
  
  The fix involves replacing rwlock_t with a standard spinlock_t one-to-one, and
  removing the now-redundant lockless helper functions (list_add_tail_lockless
  and chain_epi_lockless). This ensures that under real-time configurations,
  priority inheritance works correctly across the epoll subsystem, eliminating
  priority inversion.
  
  [Fix]
  
  Backport upstream commit:
  0c43094f8cc9 ("eventpoll: Replace rwlock with spinlock")
  
  [Test Plan]
  
  Due to the nature of scheduling priority inversion, reproducing this bug
  reliably on demand is highly impractical. Because this race condition relies
  on erratic, non-deterministic scheduling micro-windows, a standard
  deterministic reproduction script cannot be provided.
  
  Therefore, validation relies on verifying that the replacement locking
  mechanism functions correctly, introduces no regressions, and scales safely
  under synthetic load.
  
  There is a test kernel available in the following ppa:
  https://launchpad.net/~munirsid/+archive/ubuntu/lp2154194
  
  [Where Problems Could Occur]
  
  There is a trade-off in raw throughput for highly specific, synthetic 
workloads.
  As seen in the upstream commit description [0], in artificial benchmarks where
  hundreds of threads continuously spam epoll events, throughput can drop by 
~38%
  due to serialization around the new spinlock.
  
  However, testing with realistic workloads (via perf bench epoll wait) actually
  demonstrates a performance improvement on x86 architectures.
  
  The regression potential for real-world production environments is considered
  low, as typical workloads do not exhibit continuous, uninterrupted
  event-spamming behavior. Moreover, the fix is strictly isolated to
  fs/eventpoll.c and alters no external kernel APIs.
  
  [Other Info]
  
- This bug was addressed upstream and has already been integrated into newer
- Ubuntu releases. Noble and subsequent releases already include this fix.
+ This bug was addressed upstream and has already been integrated into Noble and
+ subsequent releases.
  
  [0] -
  
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=0c43094f8cc9d3d99d835c0ac9c4fe1ccc62babd

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2154194

Title:
  [Jammy] Priority inversion problem in epoll for rt kernel

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2154194/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to