[Public]
Not able to relate to how it affects gfx/mem DPM alone. Unless Alex has other
ideas, would you be able to enable drm debug messages and share the log?
Enabling verbose debug messages is done through the drm.debug
parameter, each category being enabled by a bit:
drm.debug=0x1 will enable CORE messages
drm.debug=0x2 will enable DRIVER messages
drm.debug=0x3 will enable CORE and DRIVER messages
...
drm.debug=0x1ff will enable all messages
An interesting feature is that it's possible to enable verbose logging
at run-time by echoing the debug value in its sysfs node:
# echo 0xf > /sys/module/drm/parameters/debug
Thanks,
Lijo
-----Original Message-----
From: James Turner <[email protected]>
Sent: Sunday, January 23, 2022 2:41 AM
To: Lazar, Lijo <[email protected]>
Cc: Alex Deucher <[email protected]>; Thorsten Leemhuis
<[email protected]>; Deucher, Alexander <[email protected]>;
[email protected]; [email protected]; Greg KH
<[email protected]>; Pan, Xinhui <[email protected]>; LKML
<[email protected]>; [email protected]; Alex Williamson
<[email protected]>; Koenig, Christian <[email protected]>
Subject: Re: [REGRESSION] Too-low frequency limit for AMD GPU
PCI-passed-through to Windows VM
Hi Lijo,
> Could you provide the pp_dpm_* values in sysfs with and without the
> patch? Also, could you try forcing PCIE to gen3 (through pp_dpm_pcie)
> if it's not in gen3 when the issue happens?
AFAICT, I can't access those values while the AMD GPU PCI devices are bound to
`vfio-pci`. However, I can at least access the link speed and width elsewhere
in sysfs. So, I gathered what information I could for two different cases:
- With the PCI devices bound to `vfio-pci`. With this configuration, I
can start the VM, but the `pp_dpm_*` values are not available since
the devices are bound to `vfio-pci` instead of `amdgpu`.
- Without the PCI devices bound to `vfio-pci` (i.e. after removing the
`vfio-pci.ids=...` kernel command line argument). With this
configuration, I can access the `pp_dpm_*` values, since the PCI
devices are bound to `amdgpu`. However, I cannot use the VM. If I try
to start the VM, the display (both the external monitors attached to
the AMD GPU and the built-in laptop display attached to the Intel
iGPU) completely freezes.
The output shown below was identical for both the good commit:
f1688bd69ec4 ("drm/amd/amdgpu:save psp ring wptr to avoid attack") and the
commit which introduced the issue:
f9b7f3703ff9 ("drm/amdgpu/acpi: make ATPX/ATCS structures global (v2)")
Note that the PCI link speed increased to 8.0 GT/s when the GPU was under heavy
load for both versions, but the clock speeds of the GPU were different under
load. (For the good commit, it was 1295 MHz; for the bad commit, it was 501
MHz.)
# With the PCI devices bound to `vfio-pci`
## Before starting the VM
% ls /sys/module/amdgpu/drivers/pci:amdgpu
module bind new_id remove_id uevent unbind
% find /sys/bus/pci/devices/0000:01:00.0/ -type f -name 'current_link*' -print
-exec cat {} \; /sys/bus/pci/devices/0000:01:00.0/current_link_width
8
/sys/bus/pci/devices/0000:01:00.0/current_link_speed
8.0 GT/s PCIe
## While running the VM, before placing the AMD GPU under heavy load
% find /sys/bus/pci/devices/0000:01:00.0/ -type f -name 'current_link*' -print
-exec cat {} \; /sys/bus/pci/devices/0000:01:00.0/current_link_width
8
/sys/bus/pci/devices/0000:01:00.0/current_link_speed
2.5 GT/s PCIe
## While running the VM, with the AMD GPU under heavy load
% find /sys/bus/pci/devices/0000:01:00.0/ -type f -name 'current_link*' -print
-exec cat {} \; /sys/bus/pci/devices/0000:01:00.0/current_link_width
8
/sys/bus/pci/devices/0000:01:00.0/current_link_speed
8.0 GT/s PCIe
## While running the VM, after stopping the heavy load on the AMD GPU
% find /sys/bus/pci/devices/0000:01:00.0/ -type f -name 'current_link*' -print
-exec cat {} \; /sys/bus/pci/devices/0000:01:00.0/current_link_width
8
/sys/bus/pci/devices/0000:01:00.0/current_link_speed
2.5 GT/s PCIe
## After stopping the VM
% find /sys/bus/pci/devices/0000:01:00.0/ -type f -name 'current_link*' -print
-exec cat {} \; /sys/bus/pci/devices/0000:01:00.0/current_link_width
8
/sys/bus/pci/devices/0000:01:00.0/current_link_speed
2.5 GT/s PCIe
# Without the PCI devices bound to `vfio-pci`
% ls /sys/module/amdgpu/drivers/pci:amdgpu
0000:01:00.0 module bind new_id remove_id uevent unbind
% for f in /sys/module/amdgpu/drivers/pci:amdgpu/*/pp_dpm_*; do echo "$f"; cat
"$f"; echo; done /sys/module/amdgpu/drivers/pci:amdgpu/0000:01:00.0/pp_dpm_mclk
0: 300Mhz
1: 625Mhz
2: 1500Mhz *
/sys/module/amdgpu/drivers/pci:amdgpu/0000:01:00.0/pp_dpm_pcie
0: 2.5GT/s, x8
1: 8.0GT/s, x16 *
/sys/module/amdgpu/drivers/pci:amdgpu/0000:01:00.0/pp_dpm_sclk
0: 214Mhz
1: 501Mhz
2: 850Mhz
3: 1034Mhz
4: 1144Mhz
5: 1228Mhz
6: 1275Mhz
7: 1295Mhz *
% find /sys/bus/pci/devices/0000:01:00.0/ -type f -name 'current_link*' -print
-exec cat {} \; /sys/bus/pci/devices/0000:01:00.0/current_link_width
8
/sys/bus/pci/devices/0000:01:00.0/current_link_speed
8.0 GT/s PCIe
James