On Wed, Mar 18, 2026 at 03:52:02PM -0400, Tom Lane wrote: > which makes me wonder whether the problematic session is the second or > third bgworker. I am not seeing entries indicating that those > stopped, as there is for the first bgworker.
Looking at the logs produced at [1], the worker launched as number 1 would not be able to interact, it connects to the database postgres, under PID 1616001, and is reported as exited by the postmaster. The only interacting sessions would be: 1) The bgworker launched as number 2, connected to database testdb. 2) The session checking for pg_stat_activity, launched by launch_bgworker(). The test was connected with the database we want to rename, and this could interact as an extra session. This query could be run while connected to the database postgres to reduce the friction and discarding this one. The timestamps of the logs tell that it takes 5 seconds for this host to get out of the ALTER DATABASE .. RENAME TO, which implies that we are looping inside CountOtherDBBackends() for 5 seconds. So it really looks like the second bgworker is the one we are waiting for here. Now, we are sure of the following things when we try to launch the RENAME TO: - The worker is seen in pg_stat_activity. - The worker is already in worker_spi_main(), per its "LOG initialized with" entry. - The worker is connected to the database. - The worker can receive signals. How would it be possible for this worker to not receive the requests? The only thing I could think of is that the postmaster does not have the time to process the PMSIGNAL_BACKGROUND_WORKER_CHANGE requests? The next thing would be to gather more data, I guess. The attached would help in providing more information. If it happens that we are able to send the requests and that the postmaster does not have the time to process them, I don't really see what we can do except: - Drop the portion of the tests for DROP DATABASE, SET TABLESPACE and RENAME DB, because all these scenarios involve commands that work on the same database as the worker connected, and if the postmaster does not have the time to process the termination requests, I don't really see what we could do. This could also point to a timing issue with the feature in itself, of course. - Revert the feature, stop playing with the buildfarm due to the end of the release cycle, and rework it for v20. For now I am planning for the attached to get more information from widowbird, which should take a few days at worst. That would make clear if we have a timing issue with the requests sent to the postmaster. Launching the queries for worker_spi_launch() and pg_stat_activity on the database postgres may also improve things, but I don't really buy it, even if I may be wrong. [1]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=widowbird&dt=2026-03-17%2015%3A35%3A03 -- Michael
From adb474866f12a66ac48c702d9ab25ce82277abca Mon Sep 17 00:00:00 2001 From: Michael Paquier <[email protected]> Date: Thu, 19 Mar 2026 09:51:16 +0900 Subject: [PATCH] Add more debugging information for termination tests of worker_spi --- src/backend/postmaster/bgworker.c | 6 ++++++ src/test/modules/worker_spi/t/002_worker_terminate.pl | 10 ++++++++-- 2 files changed, 14 insertions(+), 2 deletions(-) diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c index 0104a86b9ecd..fd678ef2596d 100644 --- a/src/backend/postmaster/bgworker.c +++ b/src/backend/postmaster/bgworker.c @@ -1413,6 +1413,9 @@ TerminateBackgroundWorkersForDatabase(Oid databaseId) { bool signal_postmaster = false; + elog(DEBUG1, "attempting worker termination for database %u", + databaseId); + LWLockAcquire(BackgroundWorkerLock, LW_EXCLUSIVE); /* @@ -1432,6 +1435,9 @@ TerminateBackgroundWorkersForDatabase(Oid databaseId) { slot->terminate = true; signal_postmaster = true; + + elog(DEBUG1, "termination requested for worker (PID %d) on database %u", + slot->pid, databaseId); } } } diff --git a/src/test/modules/worker_spi/t/002_worker_terminate.pl b/src/test/modules/worker_spi/t/002_worker_terminate.pl index 6db80ffec88c..b0e6a5376d4c 100644 --- a/src/test/modules/worker_spi/t/002_worker_terminate.pl +++ b/src/test/modules/worker_spi/t/002_worker_terminate.pl @@ -24,7 +24,7 @@ sub launch_bgworker # Launch a background worker on the given database. my $pid = $node->safe_psql( - $database, qq( + 'postgres', qq( SELECT worker_spi_launch($testcase, '$database'::regdatabase, 0, '{}', $interruptible); )); @@ -32,7 +32,7 @@ sub launch_bgworker $node->wait_for_log( qr/LOG: .*worker_spi dynamic worker $testcase initialized with .*\..*/, $offset); - my $result = $node->safe_psql($database, + my $result = $node->safe_psql('postgres', "SELECT count(*) > 0 FROM pg_stat_activity WHERE pid = $pid;"); is($result, 't', "dynamic bgworker $testcase launched"); @@ -52,6 +52,11 @@ sub run_bgworker_interruptible_test qr/terminating background worker \"worker_spi dynamic\" due to administrator command/, $offset); + # Postmaster entry reporting the worker as exiting. + $node->wait_for_log( + qr/LOG: .*background worker \"worker_spi dynamic\" \(PID $pid\) exited with exit code/, + $offset); + my $result = $node->safe_psql('postgres', "SELECT count(*) = 0 FROM pg_stat_activity WHERE pid = $pid;"); is($result, 't', "dynamic bgworker stopped for $testname"); @@ -63,6 +68,7 @@ $node->append_conf( "postgresql.conf", qq( autovacuum = off debug_parallel_query = off +log_min_messages = debug1 )); $node->start; -- 2.53.0
signature.asc
Description: PGP signature
