pg_basebackup + incremental base backups
Our stream replication slave server got out of sync so we need to base backup again. In case of big databases, can we do incremental backups with pg_basebackup? Is there any alternative? Here was a proposal: https://wiki.postgresql.org/wiki/Incremental_backup
Re: pg_basebackup + incremental base backups
In case of big databases, can we do incremental backups with pg_basebackup? pg_basebackup doesn't support incremental backups, though there's been discussion of adding some kind of support for it, check -hackers if you're curious. Is there any alternative? There's a few different alternatives for PG backup software, some of which support incremental backups and restores. I'm personally involved with pgbackrest- https://pgbackrest.org though there's other options out there also (wal-g, barman, and more). Thanks, Stephen Hi Stephen, We tried pgbackrest as a solution to rebuild the standby cluster, by making a differential backup of the primary cluster. But it seems that pgbackrest is only able to rebuild the standby cluster by restoring an existing backup from a repository cluster, and not directly from the primary cluster. We are afraid that creating a backup on the repository cluster is an additional step that could be avoided. Is there some way to rebuild the standby cluster by doing a differential backup of the primary cluster directly?
Re: pg_basebackup + incremental base backups
On 21-May-20 08:43, Stephen Frost wrote: * Christopher Pereira (krip...@imatronix.cl) wrote: [...] Is there some way to rebuild the standby cluster by doing a differential backup of the primary cluster directly? We've contemplated adding support for something like this to pgbackrest, since all the pieces are there, but there hasn't been a lot of demand for it and it kind of goes against the idea of having a proper backup solution, really.. It'd also create quite a bit of load on the primary to checksum all the files to do the comparison against what's on the replica that you're trying to update, so not something you'd probably want to do a lot more than necessary. We have backups of the whole server and only need a efficient way to rebuild the hot-standby cluster when pg_rewind is not able to do so. I agree with your concerns about the increased load on the primary server, but this rebuilding process would only be done in case of emergency or during low load hours. pg_basebackup works fine but does not support differential/incremental backups which is a blocker. Do you know any alternative software that is able to rebuild the standby PG data dir using rsync or similar while the primary is still online? It seems a simple pg_start_backup + rsync + pg_stop_backup (maybe combined with a LVM snapshot) would do, but we would prefer to use some existing tool. We just tried barman, but it also seems to require a restore from the backup before being able to start the standby server (?), and we are afraid this would require double storage, IO and time for rebuilding the standby cluster. Thanks.
Re: pg_basebackup + incremental base backups
We've contemplated adding support for something like this to pgbackrest, since all the pieces are there, but there hasn't been a lot of demand for it and it kind of goes against the idea of having a proper backup solution, really.. It'd also create quite a bit of load on the primary to checksum all the files to do the comparison against what's on the replica that you're trying to update, so not something you'd probably want to do a lot more than necessary. Ok, we want to use pgbackrest to *rebuild a standby that has fallen behind* (where pg_rewind won't work). After reading the docs, we believe we should use this setup: a) Primary host: primary cluster b) Repository host: needed for rebuilding the standby (and having PITR as bonus). c) Standby host: standby cluster Some questions: 1) The standby will use streaming replication and will be in sync until someday something funny happens and both standby and repository get out of sync with the primary. Now, to rebuild the standby first we will have to create a new backup transferring the data from *primary -> repository*, right? Wouldn't this also have a load impact on the primary cluster? 2) In the user guide section 17.3 is explained how to create a "pg-standby host" to replicate the data *from the repository host*. And in section 17.4 is explained how to setup Streaming Replication to replicate the data *from the primary host*. Do 17.3 and 17.4 work together so that the data is *replicated from the repository* and then *streamed from the primary*? 3) Before being able to rebuild the standby cluster, would we first need to update the backup on the repository (backup from primary -> repository) in order for streaming replication to work (from primary -> standby)? 4) Once the backup on the repository is ready, what are the chances that streaming replication from primary to standby won't work because they got out of sync again? 5) Could we just work with 2 hosts (primary and standby) instead of 3? FAQ section 8 says the repository shouldn't be on the same host as the standby and having it on the primary doesn't make much sense because if the primary host is down we won't have access to the backup. It would be ideal to have the repository on the standby host and taking good care of the configurations. What exactly should be cared of for this setup to be safe? I'm afraid I'm not understanding very well the pgbackrest design or how to use it efficiently to rebuild a standby cluster that got out of sync.
Re: pg_basebackup + incremental base backups
On 24-May-20 15:48, Stephen Frost wrote: That really shouldn't be possible. I'm very curious as to exactly what happened that resulted in your primary/replica being 'out of sync', as you say. Hi Stephen, Actually this was more a hypothetical question to find a solution in case some day one of our standby clusters goes out of sync and we have to rebuild it having a very big database. With proper WAL archiving this shouldn't happen but we wanted to be prepared for this scenario just in case. We did some tests measuring IO and traffic and are very happy with the results. We will definitely be adding pgBackRest to our toolchain. Regarding my initial question, I still believe that the world deserves a simple direct pg_basebackup replacement even when putting an additional "repo host" in the middle is a better idea in the long term. As you said, all the pieces are there and it would be quite easy to write a new "pg_basebackup_delta" script that could be executed on the standby host to: 1) setup a pgBackRest repo on the primary host (via SSH) 2) create a backup on the primary host (via SSH) 3) do a delta restore on the standby Even when the repository on the primary host is only created temporarily (and require double storage, resources, etc), it may still be worth considering the traffic that can be saved by doing a delta restore on a standby host in a different region, right? Thanks and congratulations for the good work.
Re: pg_basebackup + delta base backups
On 26-May-20 10:20, Stephen Frost wrote: [...] "out of sync" is a bit of an odd concept, but having a replica fall behind a long way is certainly something that can happen and may require a rebuild from a backup (or from a new sync off of the primary in some other way, as you suggest below). In a situation where there's async replication happening and you promote a replica to take over, that's definitely a case where you might also have to rebuild the former primary. Hi Stepehen, Yes, a common case with async streaming is when primary (A) goes down and replica is promoted as a new master (B). Then A comes back and has some data that was not streamed to B so pg_rewind is useless. I wonder if there is some option to just discard this branched data from A in order to start as a new replica. I noticed that pg_rewind is useless even when both DBs are identical (according to pg_dumpall | md5sum). [...] As you said, all the pieces are there and it would be quite easy to write a new "pg_basebackup_delta" script that could be executed on the standby host to: 1) setup a pgBackRest repo on the primary host (via SSH) 2) create a backup on the primary host (via SSH) 3) do a delta restore on the standby Even when the repository on the primary host is only created temporarily (and require double storage, resources, etc), it may still be worth considering the traffic that can be saved by doing a delta restore on a standby host in a different region, right? So... There's actually a way to do this with pgbackrest, but it doesn't support the delta capability. If I understood correctly the method you described, you were basically doing a "backup" between A (primary) and B (repo) and in such a way the repo is then compatible with the pg_data structure, but without delta support (ie. transfering the whole database)? Delta support is critical for VLDBs, so I see two alternatives to replace pg_basebackup with pgbackrest to rebuild a replica: 1) Create a temporary repo on the primary 2) Create a temporary repo on the replica All configurations would be undone after the replica has been rebuilt and both alternatives would be using delta over the wire. In your opinion, which alternative is better considering network traffic? Thanks, Christopher
Prevent pg_rewind destroying the data
Hi, When pg_rewind is interrupted due to network errors, the cluster gets corrupted: Running pg_rewind for a second time returns "pg_rewind: fatal: target server must be shut down cleanly". Trying to fix the cluster with "/usr/pgsql-12/bin/postmaster' --single -F -D '/var/lib/pgsql/12/mydb' -c archive_mode=on -c archive_command=false" throws: LOG: could not read from log segment 003B003E, offset 0: read 0 of 8192 LOG: invalid primary checkpoint record PANIC: could not locate a valid checkpoint record When a cluster failsover because of a network problem, chances are high that another network problem may occur while we run pg_rewind. It would be nice if pg_rewind wouldn't destroy the data and leave the cluster in a state where retrying pg_rewind can succeed. As a workaround we are thinking in taking a LVM snapshot or do a "cp --reflink" before running pg_rewind and restore if there is a failure, but it would be nice if pg_rewind were "non destructive". Is this possible? Am I missing something? We are using PG 12.