In case of soft lockups, it might be helpful from root cause analysis perspective to see if the test was still able to complete despite triggering the soft lockup warning, or if that soft lockup seems not recoverable without killing the test. For that to be possible, igt_runner should not kill the test too promptly if a soft lockup related kernel taint is detected.
On kernel taints, igt_runner now decreases per test and inactivity timeouts by a factor of 10. Let it check if the taint is caused by a soft lockup and decrease the timeouts only by the factor of 2 in those cases. v2: Define symbols for taint bits and use them (Kamil) Signed-off-by: Janusz Krzysztofik <[email protected]> --- lib/igt_taints.c | 8 ++++---- lib/igt_taints.h | 6 ++++++ runner/executor.c | 14 ++++++++++---- 3 files changed, 20 insertions(+), 8 deletions(-) diff --git a/lib/igt_taints.c b/lib/igt_taints.c index 6b36d11cba..1d238fd2af 100644 --- a/lib/igt_taints.c +++ b/lib/igt_taints.c @@ -13,10 +13,10 @@ static const struct { int bad; const char *explanation; } abort_taints[] = { - { 4, 1, "TAINT_MACHINE_CHECK: Processor reported a Machine Check Exception."}, - { 5, 1, "TAINT_BAD_PAGE: Bad page reference or an unexpected page flags." }, - { 7, 1, "TAINT_DIE: Kernel has died - BUG/OOPS." }, - { 9, 1, "TAINT_WARN: WARN_ON has happened." }, + { TAINT_MACHINE_CHECK, 1, "TAINT_MACHINE_CHECK: Processor reported a Machine Check Exception."}, + { TAINT_BAD_PAGE, 1, "TAINT_BAD_PAGE: Bad page reference or an unexpected page flags." }, + { TAINT_DIE, 1, "TAINT_DIE: Kernel has died - BUG/OOPS." }, + { TAINT_WARN, 1, "TAINT_WARN: WARN_ON has happened." }, { -1 } }; diff --git a/lib/igt_taints.h b/lib/igt_taints.h index be4195c5aa..50c4cf16f8 100644 --- a/lib/igt_taints.h +++ b/lib/igt_taints.h @@ -6,6 +6,12 @@ #ifndef __IGT_TAINTS_H__ #define __IGT_TAINTS_H__ +#define TAINT_MACHINE_CHECK 4 +#define TAINT_BAD_PAGE 5 +#define TAINT_DIE 7 +#define TAINT_WARN 9 +#define TAINT_SOFT_LOCKUP 14 + unsigned long igt_kernel_tainted(unsigned long *taints); const char *igt_explain_taints(unsigned long *taints); diff --git a/runner/executor.c b/runner/executor.c index 13180a0a46..847abe481a 100644 --- a/runner/executor.c +++ b/runner/executor.c @@ -871,10 +871,15 @@ static const char *need_to_timeout(struct settings *settings, if (settings->abort_mask & ABORT_TAINT && is_tainted(taints)) { /* list of timeouts that may postpone immediate kill on taint */ - if (settings->per_test_timeout || settings->inactivity_timeout) - decrease = 10; - else + if (settings->per_test_timeout || settings->inactivity_timeout) { + if (is_tainted(taints) == (1 << TAINT_WARN) && + taints & (1 << TAINT_SOFT_LOCKUP)) + decrease = 2; + else + decrease = 10; + } else { return "Killing the test because the kernel is tainted.\n"; + } } if (settings->per_test_timeout != 0 && @@ -1526,8 +1531,9 @@ static int monitor_output(pid_t child, sigfd = -1; /* we are dying, no signal handling for now */ } + igt_kernel_tainted(&taints); timeout_reason = need_to_timeout(settings, killed, - igt_kernel_tainted(&taints), + taints, igt_time_elapsed(&time_last_activity, &time_now), igt_time_elapsed(&time_last_subtest, &time_now), igt_time_elapsed(&time_killed, &time_now), -- 2.50.0
