Hi Janusz, On 2025-07-08 at 15:04:15 +0200, Janusz Krzysztofik wrote: > In case of soft lockups, it might be helpful from root cause analysis > perspective to see if the test was still able to complete despite > triggering the soft lockup warning, or if that soft lockup seems not > recoverable without killing the test. For that to be possible, igt_runner > should not kill the test too promptly if a soft lockup related kernel > taint is detected. > > On kernel taints, igt_runner now decreases per test and inactivity > timeouts by a factor of 10. Let it check if the taint is caused by a > soft lockup and decrease the timeouts only by the factor of 2 in those > cases. > > v2: Define symbols for taint bits and use them (Kamil) > > Signed-off-by: Janusz Krzysztofik <[email protected]>
LGTM Reviewed-by: Kamil Konieczny <[email protected]> > --- > lib/igt_taints.c | 8 ++++---- > lib/igt_taints.h | 6 ++++++ > runner/executor.c | 14 ++++++++++---- > 3 files changed, 20 insertions(+), 8 deletions(-) > > diff --git a/lib/igt_taints.c b/lib/igt_taints.c > index 6b36d11cba..1d238fd2af 100644 > --- a/lib/igt_taints.c > +++ b/lib/igt_taints.c > @@ -13,10 +13,10 @@ static const struct { > int bad; > const char *explanation; > } abort_taints[] = { > - { 4, 1, "TAINT_MACHINE_CHECK: Processor reported a Machine Check > Exception."}, > - { 5, 1, "TAINT_BAD_PAGE: Bad page reference or an unexpected page flags." > }, > - { 7, 1, "TAINT_DIE: Kernel has died - BUG/OOPS." }, > - { 9, 1, "TAINT_WARN: WARN_ON has happened." }, > + { TAINT_MACHINE_CHECK, 1, "TAINT_MACHINE_CHECK: Processor reported a > Machine Check Exception."}, > + { TAINT_BAD_PAGE, 1, "TAINT_BAD_PAGE: Bad page reference or an > unexpected page flags." }, > + { TAINT_DIE, 1, "TAINT_DIE: Kernel has died - BUG/OOPS." }, > + { TAINT_WARN, 1, "TAINT_WARN: WARN_ON has happened." }, > { -1 } > }; > > diff --git a/lib/igt_taints.h b/lib/igt_taints.h > index be4195c5aa..50c4cf16f8 100644 > --- a/lib/igt_taints.h > +++ b/lib/igt_taints.h > @@ -6,6 +6,12 @@ > #ifndef __IGT_TAINTS_H__ > #define __IGT_TAINTS_H__ > > +#define TAINT_MACHINE_CHECK 4 > +#define TAINT_BAD_PAGE 5 > +#define TAINT_DIE 7 > +#define TAINT_WARN 9 > +#define TAINT_SOFT_LOCKUP 14 > + > unsigned long igt_kernel_tainted(unsigned long *taints); > const char *igt_explain_taints(unsigned long *taints); > > diff --git a/runner/executor.c b/runner/executor.c > index 13180a0a46..847abe481a 100644 > --- a/runner/executor.c > +++ b/runner/executor.c > @@ -871,10 +871,15 @@ static const char *need_to_timeout(struct settings > *settings, > if (settings->abort_mask & ABORT_TAINT && > is_tainted(taints)) { > /* list of timeouts that may postpone immediate kill on taint */ > - if (settings->per_test_timeout || settings->inactivity_timeout) > - decrease = 10; > - else > + if (settings->per_test_timeout || settings->inactivity_timeout) > { > + if (is_tainted(taints) == (1 << TAINT_WARN) && > + taints & (1 << TAINT_SOFT_LOCKUP)) > + decrease = 2; > + else > + decrease = 10; > + } else { > return "Killing the test because the kernel is > tainted.\n"; > + } > } > > if (settings->per_test_timeout != 0 && > @@ -1526,8 +1531,9 @@ static int monitor_output(pid_t child, > sigfd = -1; /* we are dying, no signal handling for now > */ > } > > + igt_kernel_tainted(&taints); > timeout_reason = need_to_timeout(settings, killed, > - igt_kernel_tainted(&taints), > + taints, > > igt_time_elapsed(&time_last_activity, &time_now), > > igt_time_elapsed(&time_last_subtest, &time_now), > igt_time_elapsed(&time_killed, > &time_now), > -- > 2.50.0 >
