** Description changed: Hi The Debian packages for PostgreSQL (and thus the Ubuntu packages because of the shared use of pg_wrapper) are subject to a potentially critical data loss bug because of an unsafe procedure for restarting PostgreSQL. This issue has been recognised and patched in Debian: - http://anonscm.debian.org/loggerhead/pkg-postgresql/postgresql-common/trunk/revision/1181 - http://archives.postgresql.org/pgsql-general/2012-07/msg00501.php + http://anonscm.debian.org/loggerhead/pkg-postgresql/postgresql-common/trunk/revision/1181 + http://archives.postgresql.org/pgsql-general/2012-07/msg00501.php but should be urgently included in Ubuntu and backported. I quote Tom Lane (key PostgreSQL dev): - [The] forced unlink on the postmaster.pid file [...] (a) is entirely - unnecessary, and (b) defeats the safety interlock against starting a - new postmaster before all the old backends have flushed out. + [The] forced unlink on the postmaster.pid file [...] (a) is entirely + unnecessary, and (b) defeats the safety interlock against starting a + new postmaster before all the old backends have flushed out. It is VITAL that pg_wrapper NEVER unlink the postmaster.pid file. The postmaster will do that its self if it finds the pid to be stale, but only after performing some checks to make sure there are no backends still running and to ensure that there's no other postmaster running against the database. See: - http://archives.postgresql.org/pgsql-general/2012-07/msg00475.php + http://archives.postgresql.org/pgsql-general/2012-07/msg00475.php Context here: - http://archives.postgresql.org/pgsql-general/2012-07/msg00350.php - http://dba.stackexchange.com/questions/20959/recover-postgresql-database-from-wal-errors-on-startup/20961 + http://archives.postgresql.org/pgsql-general/2012-07/msg00350.php + http://dba.stackexchange.com/questions/20959/recover-postgresql-database-from-wal-errors-on-startup/20961 + + SRU INFORMATION: + * Impact: Severe data loss in rare corner cases. + + * Regression potential: Very low. The change has been in Debian, + Quantal, and my very popular PostgreSQL backports repository for quite + some time. pg_ctlcluster has a function start_check_pid_file() which + cleans up a stale PID file on startup if it still exists after + pg_ctlcluster stop --force goes to kill -9 the postmaster, so that does + not stop a subsequent startup. The test suite (t/030_errors.t) + explicitly covers scenarios with missing, broken, and stale PID files + and ensures that they are handled properly. + + * Test case: I do not know a realistic and reliable test case to cause + the data loss, but the analysis of the bug in above ML thread is very + clear. I suggest to regression-test the change only, i. e. run the + postgresql-common test suite and a manual check that starting a cluster + still works with a stale pid file being around: + + sudo pg_createcluster 9.1 test --start + sudo cp /var/lib/postgresql/9.1/test/postmaster.pid{,.save} + sudo pg_ctlcluster 9.1 test stop + # now cause a stale pid file + sudo cp /var/lib/postgresql/9.1/test/postmaster.pid{.save,} + + # this should succeed and say "Removed stale pid file." + sudo pg_ctlcluster 9.1 test start + + # this should say that 9.1/test is online + pg_lsclusters
-- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1042556 Title: Critical data loss bug in postgresql-common initscript To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/postgresql-common/+bug/1042556/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs