On Fri, Apr 21, 2023 at 06:06:49PM +0200, Robin Voetter wrote: > > > On 4/21/23 10:22, Michael S. Tsirkin wrote: > > On Thu, Apr 20, 2023 at 05:38:39PM +0200, [email protected] wrote: > >> From: Robin Voetter <[email protected]> > >> > >> The ROCm driver for Linux uses PCIe atomics to schedule work and > >> generally communicate between the host and the device. This does not > >> currently work in QEMU with regular vfio-pci passthrough, because the > >> pcie-root-port does not advertise the PCIe atomic completer > >> capabilities. When initializing the GPU from the Linux driver, it > >> queries whether the PCIe connection from the CPU to GPU supports the > >> required capabilities[1] in the pci_enable_atomic_ops_to_root > >> function[2]. Currently the only part where this fails is checking the > >> atomic completer capabilities (32 and 64 bits) on the root port[3]. In > >> this case, the driver determines that PCIe atomics are not supported at > >> all, and this causes ROCm programs to misbehave. (While AMD advertises > >> that there is some support for ROCm without PCIe atomics, I have never > >> actually gotten that working...) > >> > >> This patch allows ROCm to properly function by introducing an > >> additional experimental property to the pcie-root-port, > >> x-atomic-completion. > > > > so what exactly makes it experimental? from this description > > it looks like it actually has to be enabled for things to work? > > I was not sure which would be appropriate, but I'm fine with making it a > non-experimental option.
So I guess the real thing to do is to query this from vfio right? Unfortunately we don't have access to vfio when we are creating the root port, but I think the thing to do would be to check at the time when vfio is attached, and if atomic is set but not supported, fail attaching vfio. Right? -- MST
