On Wed, Mar 11, 2026 at 11:00:02AM +0800, fengchengwen wrote:
> On 3/11/2026 6:09 AM, Bjorn Helgaas wrote:
> > On Tue, Mar 10, 2026 at 10:58:49AM -0500, Jeremy Linton wrote:
> >> On 3/9/26 10:20 PM, Chengwen Feng wrote:
> >>> pcie_tph_get_cpu_st() is broken on ARM64:
> >>> 1. pcie_tph_get_cpu_st() passes cpu_uid to the PCI ACPI DSM method.
> >>> cpu_uid should be the ACPI Processor UID [1].
> >>> 2. In BNXT, pcie_tph_get_cpu_st() is passed a cpu_uid obtained via
> >>> cpumask_first(irq->cpu_mask) - the logical CPU ID of a CPU core,
> >>> generated and managed by kernel (e.g., [0,255] for a system with 256
> >>> logical CPU cores).
> >>> 3. On ARM64 platforms, ACPI assigns Processor UID to cores listed in the
> >>> MADT table, and this UID may not match the kernel's logical CPU ID.
> >>> When this occurs, the mismatch results in the wrong CPU steer-tag.
> >>> 4. On AMD x86 the logical CPU ID is identical to the ACPI Processor UID
> >>> so the mismatch is not seen.
> >
> >>> int pcie_tph_get_cpu_st(struct pci_dev *pdev, enum tph_mem_type
> >>> mem_type,
> >>> - unsigned int cpu_uid, u16 *tag)
> >>> + unsigned int cpu, u16 *tag)
> >>> {
> >>> #ifdef CONFIG_ACPI
> >>> + u32 cpu_uid = acpi_get_cpu_acpi_id(cpu);
> >
> > From AI review (gemini/gemini-3.1-pro-preview):
> >
> > Does this code need to validate that `cpu` is within bounds before
> > using it? Before this change, the `cpu_uid` parameter was passed
> > opaquely to the ACPI firmware via `tph_invoke_dsm()`, which would
> > gracefully handle invalid values.
> >
> > Now, `cpu` is treated as a logical CPU index and passed to
> > `acpi_get_cpu_acpi_id(cpu)`. On architectures like arm64 and riscv,
> > `acpi_get_cpu_acpi_id()` uses `cpu` directly as an array index
> > (`&cpu_madt_gicc[cpu]` and `&cpu_madt_rintc[cpu]`). On x86, it uses
> > `per_cpu(x86_cpu_to_acpiid, cpu)`.
> >
> > If a caller passes an out-of-bounds `cpu` index (for example, if an
> > IRQ affinity mask is empty and `cpumask_first()` returns
> > `nr_cpu_ids`, or if userspace passes an arbitrary ID via
> > `mlx5_st_alloc_index()`), this will result in an out-of-bounds
> > memory read.
> >
> > Consider adding a bounds check:
> >
> > if (cpu >= nr_cpu_ids)
> > return -EINVAL;
> >
> > I agree that this is an issue, and I think implementations of
> > acpi_get_cpu_acpi_id() should validate their inputs.
> >
> > I don't know if there's a value that can never be a valid ACPI CPU UID
> > and could be used as an error value from acpi_get_cpu_acpi_id(). I do
> > see a few mentions of a ~0 value meaning "all processors" (ACPI r6.6,
> > sec 5.2.12.13).
>
> I only have the ACPI Specification Version 6.5, so I will use v6.5
> as an example.
https://uefi.org/sites/default/files/resources/ACPI_Spec_6.6.pdf
> int acpi_get_cpu_uid(unsigned int cpu, u32 *uid) {
> if (cpu >= nr_cpu_ids)
> return -EINVAL;
> *uid = xxx;
> return 0;
> }
This looks good to me.
> Another issue: This commit also provides an implementation for the
> x86 platform. However, further code analysis revealed a potential
> problem in the implementation:
>
> The acpi_get_cpu_acpi_id() retrieves uid from x86_cpu_to_acpiid in
> SMP, and x86_cpu_to_acpiid is set through the call chain:
> acpi_parse_lapic() -> topology_register_apic() ->
> topo_register_apic() -> topo_set_cpuids() -> x86_cpu_to_acpiid. It
> appears to retrieve the "ACPI Processor UID" from ACPI Section
> 5.2.12.2, but the problem is that this field is only one byte in
> length, which may cause issues in huge-core systems.
>
> Therefore, I suggest re-implementing the acpi_get_cpu_uid function
> for the x86 platform. Either I provide a default implementation
> (shown below), or x86 guys contribute to the implementation:
>
> s64 acpi_get_cpu_uid(unsigned int cpu) {
> if (cpu >= nr_cpu_ids)
> return -EINVAL;
> return cpu;
> }
I don't think this is your problem to solve, so don't worry about it.
If you implement acpi_get_cpu_uid() to return the same values as
cpu_acpi_id(), it's up to the x86 folks to deal with any one-byte ID
issues.
Bjorn