On Tue, Dec 21, 2021 at 1:47 PM Deucher, Alexander <[email protected]> wrote: > > [Public] > > > -----Original Message----- > > From: Deucher, Alexander > > Sent: Tuesday, December 21, 2021 12:01 PM > > To: Linus Torvalds <[email protected]>; Imre Deak > > <[email protected]>; [email protected] > > Cc: Daniel Vetter <[email protected]>; Kai-Heng Feng > > <[email protected]> > > Subject: RE: Expecting to revert commit 55285e21f045 "fbdev/efifb: Release > > PCI device ..." > > > > [Public] > > > > > -----Original Message----- > > > From: Linus Torvalds <[email protected]> > > > Sent: Monday, December 20, 2021 5:05 PM > > > To: Imre Deak <[email protected]> > > > Cc: Daniel Vetter <[email protected]>; Deucher, Alexander > > > <[email protected]>; Kai-Heng Feng > > > <[email protected]> > > > Subject: Re: Expecting to revert commit 55285e21f045 "fbdev/efifb: > > > Release PCI device ..." > > > > > > On Mon, Dec 20, 2021 at 1:33 PM Imre Deak <[email protected]> > > wrote: > > > > > > > > amdgpu.runpm=0 > > > > > > Hmmm. > > > > > > This does seem to "work", but not very well. > > > > > > With this, what seems to happen is odd: I lock the screen, wait, it > > > goes "No signal, shutting down", but then doesn't actually shut down > > > but stays black (with the backlight on). After _another_ five seconds > > > or so, the monitor goes "No signal, shutting down" _again_, and at that > > point it actually does it. > > > > > > So it solves my immediate problem - in that yes, the backlight finally > > > does turn off in the end - but it does seem to be still broken. > > > > > > I'm very surprised if no AMD drm developers can see this exact same thing. > > > This is a very simple setup. The only possibly slightly less common > > > thing is that I have two monitors, but while that is not necessarily > > > the _most_ common setup in an absolute sense, I'd expect it to be very > > > common among DRM developers.. > > > > > > I guess I can just change the revert to just a > > > > > > -int amdgpu_runtime_pm = -1; > > > +int amdgpu_runtime_pm = 0; > > > > > > instead. The auto-detect is apparently broken. Maybe it should only > > > kick in for LVDS screens on actual laptops? > > > > > > Note: on my machine, I get that > > > > > > amdgpu 0000:49:00.0: amdgpu: Using BACO for runtime pm > > > > > > so maybe the other possible runtime pm models (ARPX and BOCO) are ok, > > > and it's only that BACO case that is broken. > > > > > > I have no idea what any of those three things are - I'm just looking > > > at the uses of that amdgpu_runtime_pm variable. > > > > > > amdgpu people: if you don't want that amdgpu_runtime_pm turned off by > > > default, tell me something else to try. > > > > For a little background, runtime PM support was added about 10 year ago > > originally to support laptops with multiple GPUs (integrated and discrete). > > It's not specific to the display hardware. When the GPU is idle, it can be > > powered down completely. In the case of these laptops, it's D3 cold > > (managed by ACPI, we call this BOCO in AMD parlance - Bus Off, Chip Off) > > which powers off the dGPU completely (i.e., it disappears from the bus). A > > few years ago we extended this to support desktop dGPUs as well which > > support their own version of runtime D3 (called BACO in AMD parlance - Bus > > Active, Chip Off). The driver can put the chip into a low power state where > > everything except the bus interface is powered down (to avoid the device > > disappearing from the bus). So this has worked for almost 2 years now on > > BACO capable parts and for a decade or more on BOCO systems. > > Unfortunately, changing the default runpm parameter setting would cause a > > flood of bug reports about runtime power management breaking and > > suddenly systems are using more power. > > > > Imre's commit (55285e21f045) fixes another commit (a6c0fd3d5a8b). > > Runtime pm was working on amdgpu prior to that commit. Is it possible > > there is still some race between when amdgpu takes over from efifb? Does > > it work properly when all pm_runtime calls in efifb are removed or if efifb > > is > > not enabled? Runtime pm for Polaris boards has been enabled by default > > since 4fdda2e66de0b which predates both of those patches. > > Thinking about this more, I wonder if there was some change in some userspace > component which was hidden by the changes in 55285e21f045 and a6c0fd3d5a8b. > E.g., some desktop component started polling for display changes or GPU > temperature or something like that and when a6c0fd3d5a8b was in place the GPU > never entered runtime suspend. Then when 55285e21f045 was applied, it > unmasked the new behavior in the userpace component. > > What should happen is that when all of the displays blank, assuming the GPU > is otherwise idle, the GPU will runtime suspend after seconds. When you > move the mouse or hit the keyboard, that should trigger the GPU should > runtime resume and then the displays will be re-enabled. > > In the behavior you are seeing, when the displays come back on after they > blank are you seeing the device resume from runtime suspend? On resume from > suspend (runtime or system) we issue a hotplug notification to userspace in > case any displays changed during suspend when the GPU was powered down (and > hence could not detect a hotplug event). Perhaps that event is triggering > userspace to reprobe and re-enable the displays shortly after resume from > runtime suspend due to some other event that caused the device to runtime > resume. Does something like this help by any chance? > https://bugzilla.kernel.org/attachment.cgi?id=300103&action=diff&collapsed=&headers=1&format=raw
I've been working with the reporters of the following bug reports. Probably best to follow the progress there: https://bugzilla.kernel.org/show_bug.cgi?id=215203 https://gitlab.freedesktop.org/drm/amd/-/issues/1840 Alex
