On Sat, 18 May 2019, Konstantin Khlebnikov wrote: > On 18.05.2019 18:17, Thomas Gleixner wrote: > > On Wed, 15 May 2019, Konstantin Khlebnikov wrote: > > > > > Timekeeping watchdog verifies doubtful clocksources using more reliable > > > candidates. For x86 it likely verifies 'tsc' using 'hpet'. But 'hpet' > > > is far from perfect too. It's better to have second opinion if possible. > > > > > > We're seeing sudden jumps of hpet counter to 0xffffffff: > > > > On which kind of hardware? A particular type of CPU or random ones? > > In general this is very rare event. > > This exact pattern have been seen ten times or so on several servers with > Intel(R) Xeon(R) CPU E5-2660 v4 @ 2.00GHz > (this custom built platform with chipset Intel C610) > > and haven't seen for previous generation > Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz > (this is another custom built platform)
Same chipset? Note the HPET is part of the chipset not of the CPU. > Link was in patch: https://lore.kernel.org/patchwork/patch/667413/ Hmm. Not really helpful either. > > > This patch uses second reliable clocksource as backup for validation. > > > For x86 this is usually 'acpi_pm'. If watchdog and backup are not consent > > > then other clocksources will not be marked as unstable at this iteration. > > > > The mess you add to the watchdog code is unholy and that's broken as there > > is no guarantee for acpi_pm (or any other secondary watchdog) being > > available. > > ACPI power management timer is a pretty standard x86 hardware. Used to be. > But my patch should work for any platform with any second reliable > clocksource. Which is close to zero if PM timer is not exposed. > If there is no second clocksource my patch does noting: > watchdog_backup stays NULL and backup_consent always true. That still does not justify the extra complexity for a few custom built systems. Thanks, tglx

