On Mon, Jul 17, 2023 at 03:04:57PM +0300, Ilpo Järvinen wrote:
> Don't assume that only the driver would be accessing LNKCTL. ASPM
> policy changes can trigger write to LNKCTL outside of driver's control.
> And in the case of upstream bridge, the driver does not even own the
> device it's changing the registers for.
>
> Use RMW capability accessors which do proper locking to avoid losing
> concurrent updates to the register value.
>
> Fixes: a2e73f56fa62 ("drm/amdgpu: Add support for CIK parts")
> Fixes: 62a37553414a ("drm/amdgpu: add si implementation v10")
> Suggested-by: Lukas Wunner <[email protected]>
> Signed-off-by: Ilpo Järvinen <[email protected]>
> Cc: [email protected]
Do we have any reports of problems that are fixed by this patch (or by
others in the series)? If not, I'm not sure it really fits the usual
stable kernel criteria:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/stable-kernel-rules.rst?id=v6.4
> ---
> drivers/gpu/drm/amd/amdgpu/cik.c | 36 +++++++++-----------------------
> drivers/gpu/drm/amd/amdgpu/si.c | 36 +++++++++-----------------------
> 2 files changed, 20 insertions(+), 52 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/cik.c
> b/drivers/gpu/drm/amd/amdgpu/cik.c
> index 5641cf05d856..e63abdf52b6c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/cik.c
> +++ b/drivers/gpu/drm/amd/amdgpu/cik.c
> @@ -1574,17 +1574,8 @@ static void cik_pcie_gen3_enable(struct amdgpu_device
> *adev)
> u16 bridge_cfg2, gpu_cfg2;
> u32 max_lw, current_lw, tmp;
>
> - pcie_capability_read_word(root, PCI_EXP_LNKCTL,
> - &bridge_cfg);
> - pcie_capability_read_word(adev->pdev, PCI_EXP_LNKCTL,
> - &gpu_cfg);
> -
> - tmp16 = bridge_cfg | PCI_EXP_LNKCTL_HAWD;
> - pcie_capability_write_word(root, PCI_EXP_LNKCTL, tmp16);
> -
> - tmp16 = gpu_cfg | PCI_EXP_LNKCTL_HAWD;
> - pcie_capability_write_word(adev->pdev, PCI_EXP_LNKCTL,
> - tmp16);
> + pcie_capability_set_word(root, PCI_EXP_LNKCTL,
> PCI_EXP_LNKCTL_HAWD);
> + pcie_capability_set_word(adev->pdev, PCI_EXP_LNKCTL,
> PCI_EXP_LNKCTL_HAWD);
>
> tmp = RREG32_PCIE(ixPCIE_LC_STATUS1);
> max_lw = (tmp &
> PCIE_LC_STATUS1__LC_DETECTED_LINK_WIDTH_MASK) >>
> @@ -1637,21 +1628,14 @@ static void cik_pcie_gen3_enable(struct amdgpu_device
> *adev)
> msleep(100);
>
> /* linkctl */
> - pcie_capability_read_word(root, PCI_EXP_LNKCTL,
> - &tmp16);
> - tmp16 &= ~PCI_EXP_LNKCTL_HAWD;
> - tmp16 |= (bridge_cfg & PCI_EXP_LNKCTL_HAWD);
> - pcie_capability_write_word(root, PCI_EXP_LNKCTL,
> - tmp16);
> -
> - pcie_capability_read_word(adev->pdev,
> - PCI_EXP_LNKCTL,
> - &tmp16);
> - tmp16 &= ~PCI_EXP_LNKCTL_HAWD;
> - tmp16 |= (gpu_cfg & PCI_EXP_LNKCTL_HAWD);
> - pcie_capability_write_word(adev->pdev,
> - PCI_EXP_LNKCTL,
> - tmp16);
> + pcie_capability_clear_and_set_word(root,
> PCI_EXP_LNKCTL,
> +
> PCI_EXP_LNKCTL_HAWD,
> + bridge_cfg &
> +
> PCI_EXP_LNKCTL_HAWD);
> + pcie_capability_clear_and_set_word(adev->pdev,
> PCI_EXP_LNKCTL,
> +
> PCI_EXP_LNKCTL_HAWD,
> + gpu_cfg &
> +
> PCI_EXP_LNKCTL_HAWD);
Wow, there's a lot of pointless-looking work going on here:
set root PCI_EXP_LNKCTL_HAWD
set GPU PCI_EXP_LNKCTL_HAWD
for (i = 0; i < 10; i++) {
read root PCI_EXP_LNKCTL
read GPU PCI_EXP_LNKCTL
clear root PCI_EXP_LNKCTL_HAWD
if (root PCI_EXP_LNKCTL_HAWD was set)
set root PCI_EXP_LNKCTL_HAWD
clear GPU PCI_EXP_LNKCTL_HAWD
if (GPU PCI_EXP_LNKCTL_HAWD was set)
set GPU PCI_EXP_LNKCTL_HAWD
}
If it really *is* pointless, it would be nice to clean it up, but that
wouldn't be material for this patch, so what you have looks good.
> /* linkctl2 */
> pcie_capability_read_word(root, PCI_EXP_LNKCTL2,
The PCI_EXP_LNKCTL2 stuff also includes RMW updates. I don't see any
uses of PCI_EXP_LNKCTL2 outside this driver that look relevant, so I
guess we don't care about making the PCI_EXP_LNKCTL2 updates atomic?
Bjorn