** Description changed:
+ [ Impact ]
+
+ s390 selects GENERIC_LOCKBREAK if PREEMPT is enabled. Reason is a historic
+ 18 years old commit [1] which fixed a compile error for PREEMPT enabled
+ kernels. Back than only PREEMPT_NONE and PREEMPT_VOLUNTARY kernels were
+ considered to be important for s390. PREEMPT should "just work".
+
+ However, since recently PREEMPT is always enabled [2], which also causes
+ GENERIC_LOCKBREAK to be always enabled. For some workloads this leads to
+ massive performance degradation; e.g. a simple kernel compile on machines
+ with many CPUs may take up to four times longer.
+
+ To fix this just remove the GENERIC_LOCKBREAK from s390's Kconfig, since
+ the compile error from 18 years ago does not exist anymore.
+
+ [1] commit b6b40c532a36 ("[S390] Define GENERIC_LOCKBREAK.")
+ [2] commit 7dadeaa6e851 ("sched: Further restrict the preemption modes")
+
+ [ Fix ]
+
+ Backport commit:
+ 1f57f68c4dd1 ("s390: Remove GENERIC_LOCKBREAK Kconfig option")
+
+ [ Test Plan ]
+
+ Compile and boot tested.
+ Tested performance by compiling a kernel and monitoring execution
+ with perf.
+
+ [ Regression Potential ]
+
+ The regression potential of the patch is low.
+ It affects only s390x spinlock implementation.
+
+ ---
+
== Comment: #2 - Mete Durlu <[email protected]> - 2026-06-01 08:59:07 ==
---Problem Description---
Ubuntu 26.04 shows massive performance degradation.
On large machines with more than 20 COREs (40 CPUs with SMT)
CPU bound workloads suffer greatly.
Ex: linux kernel compilation takes >10x more time
Resource utilization shows up to 100% system time during the
workload.
-
perf top output indicates excessive lock contention in the kernel.
$ make -j$(nproc)
$ perf top
- 52.41% [kernel] [k] arch_spin_trylock_retry
- 8.76% [kernel] [k] _raw_spin_lock_irqsave
- 2.03% [kernel] [k] arch_spin_relax
- 1.09% cc1 [.] ht_lookup_with_hash(ht*, unsigned
char
- 0.97% [kernel] [k] diag49c
- 0.95% [kernel] [k] lru_gen_add_folio
- 0.80% [kernel] [k] post_alloc_hook.localalias
- 0.77% [kernel] [k] lru_gen_del_folio.constprop.0
- 0.63% cc1 [.] htab_find_slot_with_hash
- 0.60% [kernel] [k] folios_put_refs
- 0.49% [kernel] [k] arch_vcpu_is_preempted
- 0.48% cc1 [.] ggc_internal_alloc_no_dtor(unsigned
lo
- 0.44% cc1 [.] _cpp_lex_direct
+ 52.41% [kernel] [k] arch_spin_trylock_retry
+ 8.76% [kernel] [k] _raw_spin_lock_irqsave
+ 2.03% [kernel] [k] arch_spin_relax
+ 1.09% cc1 [.] ht_lookup_with_hash(ht*, unsigned
char
+ 0.97% [kernel] [k] diag49c
+ 0.95% [kernel] [k] lru_gen_add_folio
+ 0.80% [kernel] [k] post_alloc_hook.localalias
+ 0.77% [kernel] [k] lru_gen_del_folio.constprop.0
+ 0.63% cc1 [.] htab_find_slot_with_hash
+ 0.60% [kernel] [k] folios_put_refs
+ 0.49% [kernel] [k] arch_vcpu_is_preempted
+ 0.48% cc1 [.] ggc_internal_alloc_no_dtor(unsigned
lo
+ 0.44% cc1 [.] _cpp_lex_direct
...
-
The lock contention seems to be linked directly to the thread count
on the workload;
# on a system with 34 COREs (68 CPUs w SMT)
$ make -j20
- perf top shows no arch_spin_trylock_retry
+ perf top shows no arch_spin_trylock_retry
$ make -j25
- perf top shows ~2% arch_spin_trylock_retry
+ perf top shows ~2% arch_spin_trylock_retry
$ make -j30
- perf top shows ~5% arch_spin_trylock_retry
+ perf top shows ~5% arch_spin_trylock_retry
$ make -j34 # thread count = core count
- perf top shows ~15% arch_spin_trylock_retry
+ perf top shows ~15% arch_spin_trylock_retry
$ make -j40 # thread count > core count
- perf top shows >30% arch_spin_trylock_retry
-
+ perf top shows >30% arch_spin_trylock_retry
There has also been hints of delays on workqueue execution in dmesg output:
...
[10600.136975] workqueue: vmstat_update hogged CPU for >10000us 4 times,
consider switching to WQ_UNBOUND
[10806.428576] workqueue: delayed_vfree_work hogged CPU for >10000us 4 times,
consider switching to WQ_UNBOUND
[10819.822422] workqueue: delayed_vfree_work hogged CPU for >10000us 5 times,
consider switching to WQ_UNBOUND
[10885.381900] workqueue: delayed_vfree_work hogged CPU for >10000us 7 times,
consider switching to WQ_UNBOUND
[10915.209117] workqueue: pcpu_balance_workfn hogged CPU for >10000us 4
times, consider switching to WQ_UNBOUND
[11059.719121] workqueue: pcpu_balance_workfn hogged CPU for >10000us 5
times, consider switching to WQ_UNBOUND
[20223.529295] workqueue: inode_switch_wbs_work_fn hogged CPU for >10000us 4
times, consider switching to WQ_UNBOUND
[22584.374168] workqueue: mmput_async_fn hogged CPU for >10000us 4 times,
consider switching to WQ_UNBOUND
[22602.115559] workqueue: delayed_vfree_work hogged CPU for >10000us 11
times, consider switching to WQ_UNBOUND
[22817.328172] workqueue: vmstat_update hogged CPU for >10000us 5 times,
consider switching to WQ_UNBOUND
[22840.202092] workqueue: delayed_vfree_work hogged CPU for >10000us 19
times, consider switching to WQ_UNBOUND
[26834.512017] workqueue: delayed_vfree_work hogged CPU for >10000us 35
times, consider switching to WQ_UNBOUND
[26883.480296] workqueue: vmstat_update hogged CPU for >10000us 7 times,
consider switching to WQ_UNBOUND
...
Systems with less COREs don't seem to be effected. The limit seems to be
around 15 COREs (30 CPUs)
---uname output---
Linux localhost 7.0.0-15-generic #15-Ubuntu SMP PREEMPT Wed Apr 22 15:04:00
UTC 2026 s390x GNU/Linux
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2154748
Title:
[Ubuntu 26.04] Severe Performance Degradation on kernel 7.0.0-15
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-z-systems/+bug/2154748/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs