Hi,
On Wed, Apr 13, 2022 at 05:21:03PM +0000, [email protected] wrote:
> A new failure has been detected on builder elfutils-centos-x86_64 while
> building elfutils.
>
> Full details are available at:
> https://builder.wildebeest.org/buildbot/#builders/1/builds/932
>
> Build state: failed test (failure)
> Revision: 399b55a75830f1854c8da9f29282810e82f270b6
> Worker: centos-x86_64
> Build Reason: (unknown)
> Blamelist: Mark Wielaard <[email protected]>
>
> Steps:
> [...]
> - 8: make check ( failure )
> Logs:
> - stdio:
> https://builder.wildebeest.org/buildbot/#builders/1/builds/932/steps/8/logs/stdio
> - test-suite.log:
> https://builder.wildebeest.org/buildbot/#builders/1/builds/932/steps/8/logs/test-suite_log
Hmmm, this seems a little random. The change was just adding some
(unused) constants to dwarf.h. The log says:
command timed out: 1200 seconds without output running ['make', 'check',
'-j4'], attempting to kill
process killed by signal 9
program finished with exit code -1
elapsedTime=1925.591587
It looks like run-debuginfod-federation-sqlite.sh is missing.
Looking at the buildbot worker the end of
run-debuginfod-federation-sqlite.sh.log is:
+ mvalue=2
+ '[' -z 2 ']'
+ echo 'metric thread_work_total{role="groom"}: 2'
metric thread_work_total{role="groom"}: 2
+ '[' 2 -eq 2 ']'
+ break
+ '[' 18 -eq 0 ']'
+ curl -s http://127.0.0.1:9112/buildid/beefbeefbeefd00dd00d/debuginfo
+ curl -s http://127.0.0.1:9112/metrics
+ grep 'error_count.*sqlite'
error_count{sqlite3="database disk image is malformed"} 6
error_count{sqlite3="file is encrypted or is not a database"} 1
+ kill -INT 28184 28371
+ wait 28184 28371
Which seems to correspond to this part in run-debuginfod-federation-sqlite.sh
########################################################################
# Corrupt the sqlite database and get debuginfod to trip across its errors
curl -s http://127.0.0.1:$PORT1/metrics | grep 'sqlite3.*reset'
dd if=/dev/zero of=$DB bs=1 count=1
# trigger some random activity that's Sure to get sqlite3 upset
kill -USR1 $PID1
wait_ready $PORT1 'thread_work_total{role="traverse"}' 2
wait_ready $PORT1 'thread_work_pending{role="scan"}' 0
wait_ready $PORT1 'thread_busy{role="scan"}' 0
kill -USR2 $PID1
wait_ready $PORT1 'thread_work_total{role="groom"}' 2
curl -s http://127.0.0.1:$PORT1/buildid/beefbeefbeefd00dd00d/debuginfo >
/dev/null || true
curl -s http://127.0.0.1:$PORT1/metrics | grep 'error_count.*sqlite'
# Run the tests again without the servers running. The target file should
# be found in the cache.
kill -INT $PID1 $PID2
wait $PID1 $PID2
So maybe corruptin the sqlite database prevents a proper shutdown of
the debuginfod process?
Cheers,
Mark