Private bug reported:
PCIe/CXL hotplug enables dynamic addition and removal of devices without
requiring system reboot. In complex platforms—especially those involving
switches, multi-level fabrics, and shared resources—error handling and
hotplug coordination can be managed through a System Firmware
Intermediary (SFI) model.
In this model, system firmware acts as an intermediary between hardware
events and the OS, coordinating hotplug and recovery flows. MPRAS
(Multi-Protocol RAS) in-band error reporting plays a key role by
propagating error and hotplug-related events across PCIe/CXL fabrics in
a unified manner. Firmware receives these events, performs initial
containment or policy decisions, and then communicates actionable events
to the OS.
This approach is particularly useful in environments where centralized
control, platform policy enforcement, or cross-fabric coordination is
required. It complements OS-first models by offloading certain
responsibilities to firmware while still requiring OS participation for
device management and driver handling.
In the Linux kernel, current support for hotplug (pciehp) and error
handling (AER/DPC) assumes either OS-first or firmware-first models, but
SFI + MPRAS coordinated flows are not fully standardized or widely
supported. Enhancing support would enable robust hotplug handling in
advanced PCIe Gen5/Gen6 and CXL fabric environments.
Feature Request:
Requested details to be enabled on OS:
Enable support for System Firmware Intermediary (SFI) based hotplug handling
flows.
Integrate MPRAS in-band event reporting with firmware-to-OS communication
mechanisms.
Support firmware-mediated hotplug event notification and coordination.
Enable OS handling of hotplug actions triggered via firmware (device
add/remove, reset).
Provide interfaces (ACPI, mailbox, or vendor-specific) for firmware-to-OS
event delivery.
Enhance PCIe/CXL subsystems to support hybrid (firmware + OS coordinated)
handling modes.
Provide sysfs/debugfs visibility into firmware-mediated hotplug events and
states.
Support hotplug across PCIe switches and multi-level CXL fabrics.
Enable error containment and recovery coordination between firmware and OS.
Provide tools for validation, fault injection, and debugging of SFI/MPRAS
flows.
Document supported flows, configuration options, and platform dependencies.
Business Justification:
Enables robust hotplug handling in complex, multi-device and fabric-based
systems.
Improves coordination between firmware and OS for error handling and
recovery.
Enhances reliability and serviceability in PCIe Gen5/Gen6 and CXL
environments.
Supports centralized policy enforcement for device management.
Reduces OS complexity for certain hotplug and error handling scenarios.
Aligns with emerging industry trends for firmware-assisted fabric management.
References:
PCI-SIG PCIe Specification (AER, DPC, Hotplug)
Linux Kernel PCIe AER, DPC, and Hotplug (pciehp) Documentation
ACPI Specification (_OSC for OS-first control)
Industry Whitepapers on PCIe Hotplug and Error Handling
** Affects: linux (Ubuntu)
Importance: Undecided
Status: New
** Information type changed from Public to Private
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2146710
Title:
Request for Hotplug Support – System Firmware Intermediary via MPRAS
Handling Mode
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2146710/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs