Patches submitted to kernel team list:
https://lists.ubuntu.com/archives/kernel-team/2025-May/159553.html

----

SRU Justification:

[Impact]

The PCI ACS capability parameter is used to enable and configure access
control between PCIe devices. In particular, this parameter can enable
and/or restrict peer-to-peer traffic between so-configured PCIe devices.

For example, this parameter is necessary for GPUDirect RDMA
applications, where peer-to-peer communication between a GPU and an
RDMA-capable device is required. This parameter allows an administrator
to configure the system for the specific level of isolation between PCIe
devices required to enable this feature for their use case.

[Fix]

For Oracular, this consists of a clean cherry pick from mainline of
commit 9cf8a952d57b ("PCI/ACS: Fix 'pci=config_acs=' parameter") to fix
the functionality of the config_acs parameter introduced by commit
47c8846a49ba ("PCI: Extend ACS configurability").

For Noble, this consists of clean cherry picks of commits 47c8846a49ba
("PCI: Extend ACS configurability") and 9cf8a952d57b ("PCI/ACS: Fix
'pci=config_acs=' parameter").

[Test Plan]

The Noble and Oracular patchsets were tested on a DGX GH200 system by
booting with the kernel parameter test cases described in the commit
message of 9cf8a952d57b ("PCI/ACS: Fix 'pci=config_acs=' parameter").

Multiple PCIe devices could be configured with the pci=config_acs
parameter as is expected with the fix commit, and pci=disable_acs_redir
works as expected.

[Where problems could occur]

This affects the pci=config_acs and pci=config_acs_redir kernel boot
parameters. Issues could arise as malfunctioning of these two boot
parameters, or as improper configuration of PCIe devices.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2100340

Title:
  Backport pci=config_acs parameter with fix commit

Status in linux package in Ubuntu:
  Invalid
Status in linux-nvidia package in Ubuntu:
  Invalid
Status in linux source package in Noble:
  In Progress
Status in linux-nvidia source package in Noble:
  Fix Released
Status in linux source package in Oracular:
  In Progress
Status in linux source package in Plucky:
  Fix Committed

Bug description:
  Linux kernel upstream Commit 47c8846a49ba ("PCI: Extend ACS
  configurability") introduced bugs  that fail to configure ACS ctrl to
  the value specified by the kernel parameter. Essentially there are two
  bugs:

  
  1) When ACS is configured for multiple PCI devices using 'config_acs'
     kernel parameter, it results into error "PCI: Can't parse ACS command
     line parameter". This is due to a bug that doesn't preserve the ACS
     mask, but instead overwrites the mask with value 0.

     For example, using 'config_acs' to configure ACS ctrl for multiple BDFs
     fails:

        Kernel command line: 
pci=config_acs=1111011@0020:02:00.0;101xxxx@0039:00:00.0 "dyndbg=file 
drivers/pci/pci.c +p"
        PCI: Can't parse ACS command line parameter
        pci 0020:02:00.0: ACS mask  = 0x007f
        pci 0020:02:00.0: ACS flags = 0x007b
        pci 0020:02:00.0: Configured ACS to 0x007b

     After this fix:

        Kernel command line: 
pci=config_acs=1111011@0020:02:00.0;101xxxx@0039:00:00.0 "dyndbg=file 
drivers/pci/pci.c +p"
        pci 0020:02:00.0: ACS mask  = 0x007f
        pci 0020:02:00.0: ACS flags = 0x007b
        pci 0020:02:00.0: ACS control = 0x005f
        pci 0020:02:00.0: ACS fw_ctrl = 0x0053
        pci 0020:02:00.0: Configured ACS to 0x007b
        pci 0039:00:00.0: ACS mask  = 0x0070
        pci 0039:00:00.0: ACS flags = 0x0050
        pci 0039:00:00.0: ACS control = 0x001d
        pci 0039:00:00.0: ACS fw_ctrl = 0x0000
        pci 0039:00:00.0: Configured ACS to 0x0050

  2) In the bit manipulation logic, we copy the bit from the firmware
     settings when mask bit 0.

     For example, 'disable_acs_redir' fails to clear all three ACS P2P redir
     bits due to the wrong bit fiddling:

        Kernel command line: 
pci=disable_acs_redir=0020:02:00.0;0030:02:00.0;0039:00:00.0 "dyndbg=file 
drivers/pci/pci.c +p"
        pci 0020:02:00.0: ACS mask  = 0x002c
        pci 0020:02:00.0: ACS flags = 0xffd3
        pci 0020:02:00.0: Configured ACS to 0xfffb
        pci 0030:02:00.0: ACS mask  = 0x002c
        pci 0030:02:00.0: ACS flags = 0xffd3
        pci 0030:02:00.0: Configured ACS to 0xffdf
        pci 0039:00:00.0: ACS mask  = 0x002c
        pci 0039:00:00.0: ACS flags = 0xffd3
        pci 0039:00:00.0: Configured ACS to 0xffd3

     After this fix:

        Kernel command line: 
pci=disable_acs_redir=0020:02:00.0;0030:02:00.0;0039:00:00.0 "dyndbg=file 
drivers/pci/pci.c +p"
        pci 0020:02:00.0: ACS mask  = 0x002c
        pci 0020:02:00.0: ACS flags = 0xffd3
        pci 0020:02:00.0: ACS control = 0x007f
        pci 0020:02:00.0: ACS fw_ctrl = 0x007b
        pci 0020:02:00.0: Configured ACS to 0x0053
        pci 0030:02:00.0: ACS mask  = 0x002c
        pci 0030:02:00.0: ACS flags = 0xffd3
        pci 0030:02:00.0: ACS control = 0x005f
        pci 0030:02:00.0: ACS fw_ctrl = 0x005f
        pci 0030:02:00.0: Configured ACS to 0x0053
        pci 0039:00:00.0: ACS mask  = 0x002c
        pci 0039:00:00.0: ACS flags = 0xffd3
        pci 0039:00:00.0: ACS control = 0x001d
        pci 0039:00:00.0: ACS fw_ctrl = 0x0000
        pci 0039:00:00.0: Configured ACS to 0x0000

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2100340/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to