On Tue, Jan 21, 2025 at 12:50 AM Deucher, Alexander
<[email protected]> wrote:
>
> [Public]
>
> > -----Original Message-----
> > From: amd-gfx <[email protected]> On Behalf Of Pavel
> > Nikulin
> > Sent: Sunday, January 19, 2025 2:29 PM
> > To: Alex Deucher <[email protected]>
> > Cc: [email protected]
> > Subject: Re: drm/amdgpu: AMDGPU unusable since 6.12.1 and it looks like no 
> > one
> > cares.
> >
> > On Sun, Jan 19, 2025 at 5:53 PM Pavel Nikulin <[email protected]> wrote:
> > >
> > > On Fri, Jan 17, 2025 at 6:08 PM Alex Deucher <[email protected]> 
> > > wrote:
> > > >
> > > > On Fri, Jan 17, 2025 at 7:27 AM Pavel Nikulin <[email protected]> 
> > > > wrote:
> > > > >
> > > > > I think it persists as of 6.12.9 and today's firmware version from 
> > > > > git.
> > > > >
> > > > > Hardware Asus um560.6
> > > > >
> > > > > It only happens when the AC adaptor is disconnected, and the
> > > > > screen refresh frequency is set to 120hz. It does not happen on
> > > > > any other refresh frequency, or when the charger is connected.
> > > > >
> > > > > It might be happening in Windows, but at much lower rate, like
> > > > > once in a month. The windows version might be applying some 
> > > > > mitigations.
> > > > >
> > > > > Trying to catch what may be a prelude to hang never worked. It's
> > > > > just instahang, without panic, or anything. I cannot debug it
> > > > > without JTAGing the CPU, for which I have no equipment, nor am I
> > > > > sure if there are even JTAG headers exposed on the laptop motherboard.
> > > >
> > > > Please file a bug report and attach your dmesg output.
> > > > https://gitlab.freedesktop.org/drm/amd/-/issues
> > > >
> > > > Alex
> > >
> > > Unfortunately, what I would have would be the same dmesg as anyone
> > > else, however I have made following observations:
> > >
> > > Disabling PSR with debug mask makes it stable.
> > >
> > > If I set the refresh frequency to 60Hz, the lpddr memory clocks wiggle
> > > around 600mHz, and keep going back and forth (spread spectrum
> > > working.)
> > >
> > > If I switch to any other frequency, they stay stably at 937mhz (spread
> > > spectrum stops working,) and hangs happen.
> > >
> > > If I disconnect antennas from the MT7925 WiFi module, the issues are
> > > gone (as well as the wifi connectivity.)
> > >
> > > If I RFKILL the mt7925, both wifi, and bluetooth, it may still hang.
> > >
> > > If I nevertheless try to connect by putting the open laptop right next
> > > to the access point, the laptop will hang.
> > >
> > > But if I only try to do the same with 2.4GHz bluetooth mouse, it will
> > > continue to work. If I connect to 2.4GHz wifi, it will still hang
> > > after a few minutes.
> > >
> > > If I use the RTL8156BG based type-C usb dongle, and disconnect the
> > > power. It works stable. If I keep the connection going on type-C
> > > dongle, but switch on wifi, and set it as a default route, everything
> > > works stable, regardless if I connect to 5GHz or 2.4GHz wifi.
> > >
> > > If I try to put grounding tape around DP cables, and around the wifi
> > > module, it did not do anything conclusively.
> > >
> > > If I try to manually set the GPU performance to high, it marginally
> > > improves the hanging rate.
> > >
> > > DP 2.0, and 2.1 works on 600MHz, 1.4 on 300MHz, 1.2 on 150MHz
> > > depending on link speed, which I can't measure
> > >
> > > So, here is what think may have happened during the transition from
> > > 6.11 to 6.12
> > >
> > > - Something PCIE related (ASPM, other PCIE frequency/power settings)
> > > - Something PSR related (PSR raises memory clock rate, disables spread
> > > spectrum)
> > > - Something power related (undervoltage happens when type-C port, or
> > > power is not plugged in)
> > > - Something RF related (rendered less likely by it keeping working
> > > with type-C ethernet dongle plugged in, but not active)
> > >
> > > My guess it's an interplay in between PCIE, and PSR setting. Less
> > > likely, a hardware problem.
> > >
> > > I do remember, someone with a similar bug did dissect the breakage to
> > > a PCIE related commit.
> > >
> > > Do you want me to still put all of the above into a bug ticket on gitlab?
> >
> > What is stabilising the system:
> >
> > Following kernel command line parameters:
> > pcie_aspm=off
> > amdgpu_debugmask=0x200
> > amdgpu_debugmask=0x10
>
> There were a bunch of PSR related fixes that went into 6.13 (and cc'ed 
> stable, so should eventually make their way to 6.12) last week.  Can you try 
> an updated 6.13 kernel without those debug options?
>
> Alex
>

I am running git 9bffa1ad25b8b3b95d8f463e5c24dabe3c87d54d . Does
anyone here have a recent Ryzen based ASUS laptop, and hardware debug
gear?

Reply via email to