On Fri, Jul 11, 2025 at 12:34 AM John Stultz <jstu...@google.com> wrote: > > On Thu, Jul 10, 2025 at 2:59 PM John Stultz <jstu...@google.com> wrote: > > On Thu, Jul 10, 2025 at 12:52 PM Ben Hutchings <b...@decadent.org.uk> wrote: > > > There seems to be a longstanding issue with the combination of user- > > > space watchdog timers (using CLOCK_MONOTONIC) and suspend-to-idle. This > > > was reported at <https://bugzilla.kernel.org/show_bug.cgi?id=200595> and > > > more recently at <https://bugs.debian.org/1107785>. > > > > > > During suspend-to-idle the system may be woken by interrupts and the > > > CLOCK_MONOTONIC clock may tick while that happens, but no user-space > > > tasks are allowed to run. So when the system finally exits suspend, a > > > watchdog timer based on CLOCK_MONOTONIC may expire immediately without > > > the task being supervised ever having an opportunity to pet the > > > watchdog. > > > > > > This seems like a hard problem to solve! > > > > So I don't know much about suspend-to-idle, but I'm surprised it's not > > suspending timekeeping! That definitely seems problematic. > > Hrm. The docs here seem to call out that timekeeping is supposed to be > suspended in s2idle: > https://docs.kernel.org/admin-guide/pm/sleep-states.html#suspend-to-idle > > Looking at enter_s2idle_proper(): > https://elixir.bootlin.com/linux/v6.16-rc5/source/drivers/cpuidle/cpuidle.c#L154 > > We call tick_freeze(): > https://elixir.bootlin.com/linux/v6.16-rc5/source/kernel/time/tick-common.c#L524 > > Which calls timekeeping_suspend() when the last cpu's tick has been frozen. > > So it seems like the problem might be somehow all the cpus maybe > aren't entering s2idle, causing time to keep running?
Well, there is a suspend-to-idle path in which timekeeping is not suspended. It is the one in which cpuidle_enter_s2idle() returns 0 (or less) causing cpuidle_idle_call() to fall back to call_cpuidle() after selecting the deepest available idle state. This happens when the cpuidle driver in use doesn't implement ->enter_s2idle() callbacks for any of its states and the most straightforward remedy is to implement those callbacks in the given cpuidle driver (they must guarantee that interrupts will not be enabled, however). There are also cases in which suspending timekeeping is delayed for various reasons. For instance, on some systems, if the temperature is too high, the platform will refuse to enter its deepest power state (ask platform designers which they thought that this would be a good idea), so the kernel waits for the temperature to drop before it attempts to go for proper suspend-to-idle. Moreover, if there are wakeup events while suspended that do not cause the system to resume (you may regard them as "spurious"), timekeeping is resumed and suspended again every time this happens. So in general time may keep running at least somewhat in the suspend-to-idle flow, but this also happens during any system suspend-resume flow (timekeeping is only suspended after all devices have been suspended and it takes time to suspend them all and analogously for resume).