On Tue, Mar 10, 2026 at 11:26:44AM +0200, Imre Deak wrote:
> On Tue, Mar 10, 2026 at 10:40:14AM +0200, Ville Syrjälä wrote:
> > On Mon, Mar 09, 2026 at 06:48:03PM +0200, Imre Deak wrote:
> > > intel_dmc_update_dc6_allowed_count() oopses when DMC hasn't been
> > > initialized, and dmc is thus NULL.
> > >
> > > That would be the case when the call path is
> > > intel_power_domains_init_hw() -> {skl,bxt,icl}_display_core_init() ->
> > > gen9_set_dc_state() -> intel_dmc_update_dc6_allowed_count(), as
> > > intel_power_domains_init_hw() is called *before* intel_dmc_init().
> > >
> > > However, gen9_set_dc_state() calls intel_dmc_update_dc6_allowed_count()
> > > conditionally, depending on the current and target DC states. At probe,
> > > the target is disabled, but if DC6 is enabled, the function is called,
> > > and an oops follows. Apparently it's quite unlikely that DC6 is enabled
> > > at probe, as we haven't seen this failure mode before.
> > >
> > > It is also strange to have DC6 enabled at boot, since that would require
> > > the DMC firmware (loaded by BIOS); the BIOS loading the DMC firmware and
> > > the driver stopping / reprogramming the firmware is a poorly specified
> > > sequence and as such unlikely an intentional BIOS behaviour. It's more
> > > likely that BIOS is leaving an unintentionally enabled DC6 HW state
> > > behind (without actually loading the required DMC firmware for this).
> >
> > Wasn't the original case some kdump kernel thing?
>
> According to Jani the original issue was a KASAN run in QEMU, see [1].
> Not sure if that also resulted in kexec/kdump.
>
> However the case reported by Tao later is related to kexec/kdump indeed.
>
> > I think that has a few issues:
> > - loading full GPU drivers for a kdump kernel after the real kernel
> > has crashed seems a bit risky. Who knows what state the hardware
> > is in after the crash...
> > - we should probably try to unload DMC at kexec time (to the extent
> > that DMC can actually be unloaded)
>
> AFAICS that involves calling the pci_driver::shutdown which (for both xe
> and i915) ends up calling intel_power_domains_disable(), which disables
> DC states at least (hence the kexec'ed kernel should still not see DC6
> being enabled). The DMC FW event handlers are not disabled though in
> this case (which would be what you refer to unloading DMC I presume) as
> opposed to system/runtime suspend, where all the DMC events are also
> disabled.
>
> I agree that the kexec->shutdown, driver remove etc. handlers should be
> synced at least wrt. the above DMC unloading with the suspend handlers.
> However, I consider that as a separate issue to the one fixed in this
> patch, which is using the HW DC state (which is unreliable) incorrectly
> to track the DC6 allowed counter (the correct way being using the SW DC
> state instead). So are you okay to go ahead with this patch still for
> now and follow up with syncing the above shutdown/driver remove handlers
> with the suspend ones?
Yeah this seems fine.
As we discussed, eventually we may want:
- make sure we sanitize DMC(*) early enough during driver load to
make sure it isn't running while we're initializing anything important
- also sanitize similarly it during shutdown/unload/etc. to make sure it
doesn't screw up anything for the next driver/whatever
- sprinkle some more asserts to make sure the DC state matches in
software and hardware, if we don't already have enough of these
* disable DC states and all event handlers
>
> [1]
> https://lore.kernel.org/all/[email protected]
>
> > > The tracking of the DC6 allowed counter only works if starting /
> > > stopping the counter depends on the _SW_ DC6 state vs. the current _HW_
> > > DC6 state (since stopping the counter requires the DC5 counter captured
> > > when the counter was started). Thus, using the HW DC6 state is incorrect
> > > and it also leads to the above oops. Fix both issues by using the SW DC6
> > > state for the tracking.
> > >
> > > This is v2 of the fix originally sent by Jani, updated based on the
> > > first Link: discussion below.
> > >
> > > Link:
> > > https://lore.kernel.org/all/[email protected]
> > > Link: https://lore.kernel.org/all/[email protected]
> > > Fixes: 88c1f9a4d36d ("drm/i915/dmc: Create debugfs entry for dc6 counter")
> > > Cc: Mohammed Thasleem <[email protected]>
> > > Cc: Jani Nikula <[email protected]>
> > > Cc: Tao Liu <[email protected]>
> > > Cc: <[email protected]> # v6.16+
> > > Tested-by: Tao Liu <[email protected]>
> > > Reviewed-by: Jani Nikula <[email protected]>
> > > Signed-off-by: Imre Deak <[email protected]>
> > > ---
> > > drivers/gpu/drm/i915/display/intel_display_power_well.c | 2 +-
> > > drivers/gpu/drm/i915/display/intel_dmc.c | 3 +--
> > > 2 files changed, 2 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/i915/display/intel_display_power_well.c
> > > b/drivers/gpu/drm/i915/display/intel_display_power_well.c
> > > index 1e03187dbd38a..f855f0f886946 100644
> > > --- a/drivers/gpu/drm/i915/display/intel_display_power_well.c
> > > +++ b/drivers/gpu/drm/i915/display/intel_display_power_well.c
> > > @@ -852,7 +852,7 @@ void gen9_set_dc_state(struct intel_display *display,
> > > u32 state)
> > > power_domains->dc_state, val & mask);
> > >
> > > enable_dc6 = state & DC_STATE_EN_UPTO_DC6;
> > > - dc6_was_enabled = val & DC_STATE_EN_UPTO_DC6;
> > > + dc6_was_enabled = power_domains->dc_state & DC_STATE_EN_UPTO_DC6;
> > > if (!dc6_was_enabled && enable_dc6)
> > > intel_dmc_update_dc6_allowed_count(display, true);
> > >
> > > diff --git a/drivers/gpu/drm/i915/display/intel_dmc.c
> > > b/drivers/gpu/drm/i915/display/intel_dmc.c
> > > index c3b411259a0c5..90ba932d940ac 100644
> > > --- a/drivers/gpu/drm/i915/display/intel_dmc.c
> > > +++ b/drivers/gpu/drm/i915/display/intel_dmc.c
> > > @@ -1598,8 +1598,7 @@ static bool intel_dmc_get_dc6_allowed_count(struct
> > > intel_display *display, u32 *
> > > return false;
> > >
> > > mutex_lock(&power_domains->lock);
> > > - dc6_enabled = intel_de_read(display, DC_STATE_EN) &
> > > - DC_STATE_EN_UPTO_DC6;
> > > + dc6_enabled = power_domains->dc_state & DC_STATE_EN_UPTO_DC6;
> > > if (dc6_enabled)
> > > intel_dmc_update_dc6_allowed_count(display, false);
> > >
> > > --
> > > 2.49.1
> >
> > --
> > Ville Syrjälä
> > Intel
--
Ville Syrjälä
Intel