On Tue, Mar 24, 2026 at 05:33:06PM +0800, Kai-Heng Feng wrote: > On 2026-03-20 09:52, Bjorn Helgaas wrote: > > On Thu, Mar 19, 2026 at 07:13:09PM +0800, Kai-Heng Feng wrote: > > > Add support for decoding NVIDIA-specific CPER sections delivered via > > > the APEI GHES vendor record notifier chain. NVIDIA hardware generates > > > vendor-specific CPER sections containing error signatures and diagnostic > > > register dumps. This implementation registers a notifier_block with the > > > GHES vendor record notifier and decodes these sections, printing error > > > details via dev_info(). > > > > > > The driver binds to ACPI device NVDA2012, present on NVIDIA server > > > platforms. The NVIDIA CPER section contains a fixed header with error > > > metadata (signature, error type, severity, socket) followed by > > > variable-length register address-value pairs for hardware diagnostics. > > > > > > This work is based on libcper [0]. > > > > > > Example output: > > > nvidia-ghes NVDA2012:00: NVIDIA CPER section, error_data_length: 544 > > > nvidia-ghes NVDA2012:00: signature: CMET-INFO > > > nvidia-ghes NVDA2012:00: error_type: 0 > > > nvidia-ghes NVDA2012:00: error_instance: 0 > > > nvidia-ghes NVDA2012:00: severity: 3 > > > nvidia-ghes NVDA2012:00: socket: 0 > > > nvidia-ghes NVDA2012:00: number_regs: 32 > > > nvidia-ghes NVDA2012:00: instance_base: 0x0000000000000000 > > > nvidia-ghes NVDA2012:00: register[0]: address=0x8000000100000000 > > > value=0x0000000100000000 > > > > Is there a convenient way to connect NVDA2012:00 with the actual > > device? I assume this is typically a PCIe device? How would we > > relate this with PCIe errors? > > The CPER report is from ARM RAS firmware and not neccessarily be > related to a PCIe device.
Right, I know CPER is more general than just PCI/PCIe. But in this case, I think NVDA2012 probably *is* a PCIe device. How would we figure out which one? If we have to manually do an acpidump, figure out which NVDA2012 is :00, and look for an _ADR or something, that doesn't really seem convenient for multi-NVDA2012 situations.

