[Bug 2154748] Re: [Ubuntu 26.04] Severe Performance Degradation on kernel 7.0.0-15

Massimiliano Pellizzer Wed, 17 Jun 2026 08:40:45 -0700

** Description changed:

+ [ Impact ]
+ 
+ s390 selects GENERIC_LOCKBREAK if PREEMPT is enabled. Reason is a historic
+ 18 years old commit [1] which fixed a compile error for PREEMPT enabled
+ kernels. Back than only PREEMPT_NONE and PREEMPT_VOLUNTARY kernels were
+ considered to be important for s390. PREEMPT should "just work".
+ 
+ However, since recently PREEMPT is always enabled [2], which also causes
+ GENERIC_LOCKBREAK to be always enabled. For some workloads this leads to
+ massive performance degradation; e.g. a simple kernel compile on machines
+ with many CPUs may take up to four times longer.
+ 
+ To fix this just remove the GENERIC_LOCKBREAK from s390's Kconfig, since
+ the compile error from 18 years ago does not exist anymore.
+ 
+ [1] commit b6b40c532a36 ("[S390] Define GENERIC_LOCKBREAK.")
+ [2] commit 7dadeaa6e851 ("sched: Further restrict the preemption modes")
+ 
+ [ Fix ]
+ 
+ Backport commit:
+ 1f57f68c4dd1 ("s390: Remove GENERIC_LOCKBREAK Kconfig option")
+ 
+ [ Test Plan ]
+ 
+ Compile and boot tested.
+ Tested performance by compiling a kernel and monitoring execution
+ with perf.
+ 
+ [ Regression Potential ]
+ 
+ The regression potential of the patch is low.
+ It affects only s390x spinlock implementation.
+ 
+ ---
+ 
  == Comment: #2 - Mete Durlu <[email protected]> - 2026-06-01 08:59:07 ==
  ---Problem Description---
  
  Ubuntu 26.04 shows massive performance degradation.
  On large machines with more than 20 COREs (40 CPUs with SMT)
  CPU bound workloads suffer greatly.
  Ex: linux kernel compilation takes >10x more time
  Resource utilization shows up to 100% system time during the
  workload.
  
- 
  perf top output indicates excessive lock contention in the kernel.
  
  $ make -j$(nproc)
  
  $ perf top
-   52.41%  [kernel]                    [k] arch_spin_trylock_retry
-    8.76%  [kernel]                    [k] _raw_spin_lock_irqsave
-    2.03%  [kernel]                    [k] arch_spin_relax
-    1.09%  cc1                         [.] ht_lookup_with_hash(ht*, unsigned 
char
-    0.97%  [kernel]                    [k] diag49c
-    0.95%  [kernel]                    [k] lru_gen_add_folio
-    0.80%  [kernel]                    [k] post_alloc_hook.localalias
-    0.77%  [kernel]                    [k] lru_gen_del_folio.constprop.0
-    0.63%  cc1                         [.] htab_find_slot_with_hash
-    0.60%  [kernel]                    [k] folios_put_refs
-    0.49%  [kernel]                    [k] arch_vcpu_is_preempted
-    0.48%  cc1                         [.] ggc_internal_alloc_no_dtor(unsigned 
lo
-    0.44%  cc1                         [.] _cpp_lex_direct
+   52.41%  [kernel]                    [k] arch_spin_trylock_retry
+    8.76%  [kernel]                    [k] _raw_spin_lock_irqsave
+    2.03%  [kernel]                    [k] arch_spin_relax
+    1.09%  cc1                         [.] ht_lookup_with_hash(ht*, unsigned 
char
+    0.97%  [kernel]                    [k] diag49c
+    0.95%  [kernel]                    [k] lru_gen_add_folio
+    0.80%  [kernel]                    [k] post_alloc_hook.localalias
+    0.77%  [kernel]                    [k] lru_gen_del_folio.constprop.0
+    0.63%  cc1                         [.] htab_find_slot_with_hash
+    0.60%  [kernel]                    [k] folios_put_refs
+    0.49%  [kernel]                    [k] arch_vcpu_is_preempted
+    0.48%  cc1                         [.] ggc_internal_alloc_no_dtor(unsigned 
lo
+    0.44%  cc1                         [.] _cpp_lex_direct
  ...
- 
  
  The lock contention seems to be linked directly to the thread count
  on the workload;
  
  # on a system with 34 COREs (68 CPUs w SMT)
  
  $ make -j20
-   perf top shows no arch_spin_trylock_retry
+   perf top shows no arch_spin_trylock_retry
  
  $ make -j25
-   perf top shows ~2% arch_spin_trylock_retry
+   perf top shows ~2% arch_spin_trylock_retry
  
  $ make -j30
-   perf top shows ~5% arch_spin_trylock_retry
+   perf top shows ~5% arch_spin_trylock_retry
  
  $ make -j34 # thread count = core count
-   perf top shows ~15% arch_spin_trylock_retry
+   perf top shows ~15% arch_spin_trylock_retry
  
  $ make -j40 # thread count > core count
-   perf top shows >30% arch_spin_trylock_retry
- 
+   perf top shows >30% arch_spin_trylock_retry
  
  There has also been hints of delays on workqueue execution in dmesg output:
  ...
  [10600.136975] workqueue: vmstat_update hogged CPU for >10000us 4 times, 
consider switching to WQ_UNBOUND
  [10806.428576] workqueue: delayed_vfree_work hogged CPU for >10000us 4 times, 
consider switching to WQ_UNBOUND
  [10819.822422] workqueue: delayed_vfree_work hogged CPU for >10000us 5 times, 
consider switching to WQ_UNBOUND
  [10885.381900] workqueue: delayed_vfree_work hogged CPU for >10000us 7 times, 
consider switching to WQ_UNBOUND
  [10915.209117] workqueue: pcpu_balance_workfn hogged CPU for >10000us 4 
times, consider switching to WQ_UNBOUND
  [11059.719121] workqueue: pcpu_balance_workfn hogged CPU for >10000us 5 
times, consider switching to WQ_UNBOUND
  [20223.529295] workqueue: inode_switch_wbs_work_fn hogged CPU for >10000us 4 
times, consider switching to WQ_UNBOUND
  [22584.374168] workqueue: mmput_async_fn hogged CPU for >10000us 4 times, 
consider switching to WQ_UNBOUND
  [22602.115559] workqueue: delayed_vfree_work hogged CPU for >10000us 11 
times, consider switching to WQ_UNBOUND
  [22817.328172] workqueue: vmstat_update hogged CPU for >10000us 5 times, 
consider switching to WQ_UNBOUND
  [22840.202092] workqueue: delayed_vfree_work hogged CPU for >10000us 19 
times, consider switching to WQ_UNBOUND
  [26834.512017] workqueue: delayed_vfree_work hogged CPU for >10000us 35 
times, consider switching to WQ_UNBOUND
  [26883.480296] workqueue: vmstat_update hogged CPU for >10000us 7 times, 
consider switching to WQ_UNBOUND
  ...
  
  Systems with less COREs don't seem to be effected. The limit seems to be
  around 15 COREs (30 CPUs)
  
  ---uname output---
  Linux localhost 7.0.0-15-generic #15-Ubuntu SMP PREEMPT Wed Apr 22 15:04:00 
UTC 2026 s390x GNU/Linux


-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2154748

Title:
  [Ubuntu 26.04] Severe Performance Degradation on kernel 7.0.0-15

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-z-systems/+bug/2154748/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2154748] Re: [Ubuntu 26.04] Severe Performance Degradation on kernel 7.0.0-15

Reply via email to