The Linux kernel provides mechanisms like 'isolcpus' and 'nohz_full' to
reduce interference for latency-sensitive workloads. However, these are
locked behind the "Reboot Wall" - they can only be configured via boot
parameters and require a system restart for changes to take effect.

In modern cloud-native environments, CPU resources often need to be
dynamically re-partitioned to accommodate container scaling without
the performance penalty and downtime of a full system reboot. Similarly,
high-frequency trading (HFT) platforms require the ability to fine-tune
CPU isolation at runtime to minimize jitter for critical execution threads
based on shifting market demands.

This patch series introduces Dynamic Housekeeping & Enhanced Isolation
(DHEI). DHEI allows administrators to reconfigure the kernel's
housekeeping boundaries at runtime via a new sysfs interface at
/sys/kernel/housekeeping/.

Key Features:
- Fine-grained control: Separate sysfs nodes for timer, rcu, tick,
  workqueue, kthread, managed_irq, domain, and misc.
- Dynamic NOHZ_FULL: Supports enabling/disabling full dynticks mode
  on-the-fly.
- SMT Awareness: Optional 'smt_aware_mode' for core-granular isolation.
- Safety Guards: Prevents isolating all CPUs, requires at least one
  online housekeeping CPU, and enforces CAP_SYS_ADMIN capability.

Core Architecture:
1. Notifier-Driven Synchronization: HK_UPDATE_MASK blocking notifier chain.
2. Decoupled Memory Management: Runtime-safe cpumask allocation.
3. Subsystem Handlers: Dynamic migration for IRQ, RCU, Sched, etc.

The series is organized as follows:
- Patches 01-03: Core infrastructure (dynamic allocation, notifier,
  enum separation)
- Patches 04-09: Subsystem notifier handlers (genirq, RCU, scheduler,
  watchdog, workqueue, mm/compaction)
- Patch 10: tick/nohz dynamic full dynticks
- Patches 11-13: SMT-aware isolation, boot-time bridging, sysfs interface
- Patch 14: ABI documentation
- Patch 15: kselftest suite

Tested on x86_64 (8 vCPUs, SMT enabled) with all selftests passing.

As suggested by Joel Fernandes and Thomas Gleixner, this V1 version
provides a stronger rationale for dynamic isolation and addresses
all RFC feedback regarding naming and notifier robustness.

To: Ingo Molnar <[email protected]>
To: Peter Zijlstra <[email protected]>
To: Juri Lelli <[email protected]>
To: Vincent Guittot <[email protected]>
To: Dietmar Eggemann <[email protected]>
To: Steven Rostedt <[email protected]>
To: Ben Segall <[email protected]>
To: Mel Gorman <[email protected]>
To: Valentin Schneider <[email protected]>
To: Thomas Gleixner <[email protected]>
To: Paul E. McKenney <[email protected]>
To: Frederic Weisbecker <[email protected]>
To: Neeraj Upadhyay <[email protected]>
To: Joel Fernandes <[email protected]>
To: Josh Triplett <[email protected]>
To: Boqun Feng <[email protected]>
To: Uladzislau Rezki <[email protected]>
To: Mathieu Desnoyers <[email protected]>
To: Lai Jiangshan <[email protected]>
To: Zqiang <[email protected]>
To: Tejun Heo <[email protected]>
To: Andrew Morton <[email protected]>
To: Vlastimil Babka <[email protected]>
To: Suren Baghdasaryan <[email protected]>
To: Michal Hocko <[email protected]>
To: Brendan Jackman <[email protected]>
To: Johannes Weiner <[email protected]>
To: Zi Yan <[email protected]>
To: Anna-Maria Behnsen <[email protected]>
To: Ingo Molnar <[email protected]>
To: Shuah Khan <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Qiliang Yuan <[email protected]>

Changes since RFC:
- Dynamic RCU NOCB rewrite: Perform full runtime offload/deoffload via 
remove_cpu()/add_cpu() for online CPUs, with lazy initialization.
- Robust Timer Migration: Added logic to dynamically migrate tick_do_timer_cpu 
when a housekeeper is isolated.
- Enhanced Isolation Safety: Hardened sysfs interface with CAP_SYS_ADMIN 
checks, 0600 permissions, and strict cpumask validations including SMT subset 
checks.
- Lifecycle Cleanups: Replaced system_state boot checks with 
slab_is_available() and added hotplug shutdown guards for clean power-off.
- Testing & Docs: Added comprehensive kselftest suite for isolation scenarios 
and detailed ABI documentation.
- Link to RFC: 
https://lore.kernel.org/all/20260206-feature-dynamic_isolcpus_dhei-v1-0-00a711eb0...@gmail.com/

---
Qiliang Yuan (15):
      sched/isolation: Support dynamic allocation for housekeeping masks
      sched/isolation: Introduce housekeeping notifier infrastructure
      sched/isolation: Separate housekeeping types in enum hk_type
      genirq: Support dynamic migration for managed interrupts
      rcu: Support runtime NOCB initialization and dynamic offloading
      sched/core: Dynamically update scheduler domain housekeeping mask
      watchdog: Allow runtime toggle of lockup detector affinity
      workqueue: Support dynamic housekeeping mask updates
      mm/compaction: Support dynamic housekeeping mask updates for kcompactd
      tick/nohz: Transition to dynamic full dynticks state management
      sched/isolation: Implement SMT-aware isolation and safety guards
      sched/isolation: Bridge boot-time parameters with dynamic isolation
      sched/isolation: Implement sysfs interface for dynamic housekeeping
      Documentation: isolation: Document DHEI sysfs interfaces
      selftests: dhei: Add functional tests for dynamic housekeeping

 .../ABI/testing/sysfs-kernel-housekeeping          |  22 ++
 include/linux/sched/isolation.h                    |  40 +++-
 kernel/irq/manage.c                                |  49 +++++
 kernel/rcu/rcu.h                                   |   4 +
 kernel/rcu/tree.c                                  |  76 +++++++
 kernel/rcu/tree.h                                  |   2 +-
 kernel/rcu/tree_nocb.h                             |  27 ++-
 kernel/sched/core.c                                |  28 +++
 kernel/sched/isolation.c                           | 236 ++++++++++++++++++++-
 kernel/time/tick-sched.c                           | 130 +++++++++---
 kernel/watchdog.c                                  |  25 +++
 kernel/workqueue.c                                 |  42 ++++
 mm/compaction.c                                    |  27 +++
 tools/testing/selftests/Makefile                   |   1 +
 tools/testing/selftests/dhei/Makefile              |   4 +
 tools/testing/selftests/dhei/dhei_test.sh          | 160 ++++++++++++++
 16 files changed, 818 insertions(+), 55 deletions(-)
---
base-commit: 63804fed149a6750ffd28610c5c1c98cce6bd377
change-id: 20260324-dhei-v12-final-891d1ba62bd3

Best regards,
-- 
Qiliang Yuan <[email protected]>


Reply via email to