On Tue, Jan 21, 2025 at 12:50 AM Deucher, Alexander <[email protected]> wrote: > > [Public] > > > -----Original Message----- > > From: amd-gfx <[email protected]> On Behalf Of Pavel > > Nikulin > > Sent: Sunday, January 19, 2025 2:29 PM > > To: Alex Deucher <[email protected]> > > Cc: [email protected] > > Subject: Re: drm/amdgpu: AMDGPU unusable since 6.12.1 and it looks like no > > one > > cares. > > > > On Sun, Jan 19, 2025 at 5:53 PM Pavel Nikulin <[email protected]> wrote: > > > > > > On Fri, Jan 17, 2025 at 6:08 PM Alex Deucher <[email protected]> > > > wrote: > > > > > > > > On Fri, Jan 17, 2025 at 7:27 AM Pavel Nikulin <[email protected]> > > > > wrote: > > > > > > > > > > I think it persists as of 6.12.9 and today's firmware version from > > > > > git. > > > > > > > > > > Hardware Asus um560.6 > > > > > > > > > > It only happens when the AC adaptor is disconnected, and the > > > > > screen refresh frequency is set to 120hz. It does not happen on > > > > > any other refresh frequency, or when the charger is connected. > > > > > > > > > > It might be happening in Windows, but at much lower rate, like > > > > > once in a month. The windows version might be applying some > > > > > mitigations. > > > > > > > > > > Trying to catch what may be a prelude to hang never worked. It's > > > > > just instahang, without panic, or anything. I cannot debug it > > > > > without JTAGing the CPU, for which I have no equipment, nor am I > > > > > sure if there are even JTAG headers exposed on the laptop motherboard. > > > > > > > > Please file a bug report and attach your dmesg output. > > > > https://gitlab.freedesktop.org/drm/amd/-/issues > > > > > > > > Alex > > > > > > Unfortunately, what I would have would be the same dmesg as anyone > > > else, however I have made following observations: > > > > > > Disabling PSR with debug mask makes it stable. > > > > > > If I set the refresh frequency to 60Hz, the lpddr memory clocks wiggle > > > around 600mHz, and keep going back and forth (spread spectrum > > > working.) > > > > > > If I switch to any other frequency, they stay stably at 937mhz (spread > > > spectrum stops working,) and hangs happen. > > > > > > If I disconnect antennas from the MT7925 WiFi module, the issues are > > > gone (as well as the wifi connectivity.) > > > > > > If I RFKILL the mt7925, both wifi, and bluetooth, it may still hang. > > > > > > If I nevertheless try to connect by putting the open laptop right next > > > to the access point, the laptop will hang. > > > > > > But if I only try to do the same with 2.4GHz bluetooth mouse, it will > > > continue to work. If I connect to 2.4GHz wifi, it will still hang > > > after a few minutes. > > > > > > If I use the RTL8156BG based type-C usb dongle, and disconnect the > > > power. It works stable. If I keep the connection going on type-C > > > dongle, but switch on wifi, and set it as a default route, everything > > > works stable, regardless if I connect to 5GHz or 2.4GHz wifi. > > > > > > If I try to put grounding tape around DP cables, and around the wifi > > > module, it did not do anything conclusively. > > > > > > If I try to manually set the GPU performance to high, it marginally > > > improves the hanging rate. > > > > > > DP 2.0, and 2.1 works on 600MHz, 1.4 on 300MHz, 1.2 on 150MHz > > > depending on link speed, which I can't measure > > > > > > So, here is what think may have happened during the transition from > > > 6.11 to 6.12 > > > > > > - Something PCIE related (ASPM, other PCIE frequency/power settings) > > > - Something PSR related (PSR raises memory clock rate, disables spread > > > spectrum) > > > - Something power related (undervoltage happens when type-C port, or > > > power is not plugged in) > > > - Something RF related (rendered less likely by it keeping working > > > with type-C ethernet dongle plugged in, but not active) > > > > > > My guess it's an interplay in between PCIE, and PSR setting. Less > > > likely, a hardware problem. > > > > > > I do remember, someone with a similar bug did dissect the breakage to > > > a PCIE related commit. > > > > > > Do you want me to still put all of the above into a bug ticket on gitlab? > > > > What is stabilising the system: > > > > Following kernel command line parameters: > > pcie_aspm=off > > amdgpu_debugmask=0x200 > > amdgpu_debugmask=0x10 > > There were a bunch of PSR related fixes that went into 6.13 (and cc'ed > stable, so should eventually make their way to 6.12) last week. Can you try > an updated 6.13 kernel without those debug options? > > Alex >
I am running git 9bffa1ad25b8b3b95d8f463e5c24dabe3c87d54d . Does anyone here have a recent Ryzen based ASUS laptop, and hardware debug gear?
