Re: Logical replication disabled, recovery not possible because of 1 large transaction with schema changes?
Hello Muhammad, The problem with my scenario is the changes are written as a single transaction, with a BEGIN and COMMIT. In that transaction, there are first inserts, then a schema change, and then inserts on the new schema. Doing as you said does not work. The subscriber will first complain it cannot do the last inserts. If I fix the schema, it will complain it cannot do the first inserts. I thought I would be able to drop the table from the publication, and then do the subscription again, but that fails, as the subscription is disabled and cannot be enabled again, even if I remove the table from publication. On Thu, Oct 17, 2024 at 5:49 AM Muhammad Usman Khan wrote: > Hi, > When you execute schema-altering operations on the publisher, these > changes are not automatically replicated to the subscriber which causes > the following error > logical replication target relation "public.dummy_table" is missing > replicated columns: "contact_email", "status", "phone_number", "username" > > Before making schema changes, temporarily disable the subscription to > prevent replication errors. > ALTER SUBSCRIPTION your_subscription_name DISABLE; > > Manually apply the same schema modifications to the subscriber database to > ensure alignment. > > Once the schema changes are applied to both databases, re-enable the > subscription: > ALTER SUBSCRIPTION your_subscription_name ENABLE; > > > On Thu, 17 Oct 2024 at 02:59, Koen De Groote wrote: > >> If this question is more suitable for another mailing list, please let me >> know. >> >> I've set up the following table on both publisher and subscriber, both >> are pg16: >> >> CREATE TABLE dummy_table ( >> id SERIAL PRIMARY KEY, >> name VARCHAR(100) NOT NULL, >> email VARCHAR(100) UNIQUE NOT NULL, >> age INT, >> created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP >> ); >> >> Added to publication, refreshed subscription. >> >> Add some data on the publisher side: >> INSERT INTO dummy_table (name, email, age) >> VALUES >> ('John Doe', 'john@example.com', 25), >> ('Jane Smith', 'jane.sm...@example.com', 30), >> ('Michael Johnson', 'michae...@example.com', 45), >> ('Emily Davis', 'emil...@example.com', 27), >> ('Robert Brown', 'robert.br...@example.com', 40); >> >> The data can be seen on the subscriber. So far, so good. >> >> I then execute the following patch on the publisher: >> https://gist.github.com/KoenDG/d1c06d8c740c64e4e5884d0c64b81f11 >> >> It is a single transaction that does the following: >> >> 1/ Insert data, 1000 items >> 2/ Drop a column >> 3/ Alter a column name >> 4/ Add 2 columns, nullable >> 5/ Add a column and give it a unique constraint >> 6/ Update values for a column with NULL values, added in step 4. >> 7/ Set the column updated in step 6 to be NOT NULL >> 8/ Create a unique index with the columns from step 3 and 6 >> 9/ Insert a column with a default value >> 10/ Insert data for this schema, another 1000 items. >> >> The subscription disabled, this is to be expected, there are new columns >> names, the schema needs to be updated on the subscriber side. >> >> However, it seems I'm stuck. >> >> I can't enable the subscription. This is to be expected, it will try to >> resume and run into the same issues. >> >> Ok, I update the schema and enable again. It runs into an error for the >> inserts of step 1. These set values for columns dropped in step 2. >> >> I revert to the old schema and enable again. It runs into an error again, >> this time for values that don't exist yet at step 1. >> >> I tried dropping the table at the subscriber side, recreating the correct >> schema, but this runs into the same error. >> >> I remove the table from the publication and retry. Same error. Even with >> the table no longer in the publication, and the table on the subscriber >> side dropped and re-created, I'm still getting the exact same errors of >> "logical replication target relation "public.dummy_table" is missing >> replicated columns: "contact_email", "status", "phone_number", "username"" >> >> >> The only solution I've found is to drop the table from the publication, >> and then drop the entire subscription and set it back up again, with the >> correct schema. >> >> Am I making a mistake? Or does putting all these commands in a single >> transaction ruin my chances? >> >> Clarification much appreciated. >> >> Regards, >> Koen De Groote >> >>
Re: Logical replication disabled, recovery not possible because of 1 large transaction with schema changes?
> On 17 Oct 2024, at 11:07, Koen De Groote wrote: > > Hello Muhammad, > > The problem with my scenario is the changes are written as a single > transaction, with a BEGIN and COMMIT. In that transaction, there are first > inserts, then a schema change, and then inserts on the new schema. I guess until logical replication of DDL is available you’re out of luck. The best you can do is to have a separate table for recording and replaying schema changes. Create triggers that perform actual DDL operations based on DML in this table. Publish this table on the publisher in the same publication as the tables affected by the DDL. On the subscriber side it is the same - just make the trigger is marked as ENABLE REPLICA TRIGGER or ENABLE ALWAYS TRIGGER. Kind regards, Michał
Re: Backup
This is great, thank you so much! On Thu, Oct 17, 2024 at 12:47 AM Asad Ali wrote: > > Hi Andy, > > I hope you're doing well. Based on your inquiry about PostgreSQL backups > for your 100GB historical database with images, here are some suggestions > that should help you achieve compressed, efficient backups without running > into storage issues. > > *1. Use Custom Format with Compression* > A more efficient option would be to use the custom format (-Fc) with > compression. You can also adjust the compression level and make use of your > machine's multiple CPUs by using parallel jobs: > > pg_dump -Fc -Z 9 -j 4 --blobs -f /path/to/backup/file.dump > your_database_name > >- -Fc: Custom format (supports compression and flexible restore >options). >- -Z 9: Maximum compression level (0-9 scale). >- -j 4: Number of parallel jobs (adjust based on CPU cores). >- --blobs: Includes large objects (important for your images). > > This approach should give you a smaller backup file with faster > performance. > > *2. Splitting Backups into Parts* > If you're concerned about running out of storage space, consider splitting > the backup by table or schema, allowing more control over the backup size: > > pg_dump -Fc --table=my_large_table -f /path/to/backup/my_large_table.dump > your_database_name > pg_dump -Fc --table=other_table -f /path/to/backup/other_table.dump > your_database_name > > This can be helpful when you archive different tables or sections of data. > > *3. External Compression Tools* > If you need additional compression, you can pipe the pg_dump output > through an external compression tool like gzip: > > pg_dump -Fc --blobs your_database_name | gzip > > /path/to/backup/file.dump.gz > > This should further reduce the size of your backups. > > *4. Consider Alternative Backup Methods* > - Explore other backup methods like `*pgBackRest` or `WAL-E`*. These > tools are specifically designed for PostgreSQL backups and offer features > like incremental backups and point-in-time recovery > pgbackrest --stanza=your-database --type=full --compress-type=zst > --compress-level=6 --process-max=4 backup > > - You can use *pg_basebackup* for PostgreSQL backups, but it has > limitations compared to tools like pgBackRest. While pg_basebackup is easy > to use and built-in with PostgreSQL, it is primarily designed for physical > backups (base backups) and doesn't offer as many advanced features such as > incremental backups, sophisticated compression, or parallelism. > > However, it does support basic compression and can be used for full > backups. > > pg_basebackup -D /path/to/backup/directory -F t -z -Z 9 -P -X stream > >- -D: The destination directory for the backup. >- -F t: Specifies the tar format for the backup, which is required for >compression. >- -z: Compresses the output. >- -Z 9: Compression level (0–9, where 9 is the highest). >- -P: Shows the progress of the backup. >- -X stream: Includes the WAL files needed to make the backup >consistent (important for recovery). > > pg_basebackup -D /backups/db_backup -F t -z -Z 9 -P -X stream > This command will take a full physical backup of the database, compress > the output using gzip, and store the backup in a tarball. > > *5. Automating Backups* > Since you need monthly backups, I recommend automating this process with a > cron job. For example, you can set this up to run on the 1st of every month > at 2 AM: > > 0 2 1 * * /usr/bin/pg_dump -Fc -Z 9 -j 4 --blobs -f > /path/to/backup/file.dump your_database_name > > *6. Monitoring Disk Usage & * *Backup Performance* > Finally, it's important to monitor your available storage. You can either > ensure you have enough free space or consider moving older backups to > external or cloud storage to free up space. > Use monitoring tools to track the performance of your backups. This will > help you identify any potential bottlenecks and optimize the backup process. > > I hope this helps you create smaller and quicker backups for your > PostgreSQL database. Let me know if you have any questions or need further > assistance! > > Best regards, > > Asad Ali > > > On Thu, Oct 17, 2024 at 12:38 AM Andy Hartman > wrote: > >> I am very new to Postgres and have always worked in the mssql world. I'm >> looking for suggestions on DB backups. I currently have a DB used to store >> Historical information that has images it's currently around 100gig. >> >> I'm looking to take a monthly backup as I archive a month of data at a >> time. I am looking for it to be compressed and have a machine that has >> multiple cpu's and ample memory. >> >> Suggestions on things I can try ? >> I did a pg_dump using these parms >> --format=t --blobs lobarch >> >> it ran my device out of storage: >> >> pg_dump: error: could not write to output file: No space left on device >> >> I have 150gig free on my backup drive... can obviously add more >> >> looking for the quickest and smallest backup file output...
Re: Logical replication disabled, recovery not possible because of 1 large transaction with schema changes?
Hello Michał, Thanks for the reply. I suspected as much, I was just wondering if there was an easy fix that didn't involve dropping the entire subscription and having to re-do all the table because of that. Guess my only option is to remove the affected tables from the publisher before the patch, refresh subscription, do the patch, recreate the tables on the subscriber and do the sync for only those tables. I will look in to your suggestion. Regards, Koen De Groote On Thu, Oct 17, 2024 at 11:17 AM Michał Kłeczek wrote: > > > > On 17 Oct 2024, at 11:07, Koen De Groote wrote: > > > > Hello Muhammad, > > > > The problem with my scenario is the changes are written as a single > transaction, with a BEGIN and COMMIT. In that transaction, there are first > inserts, then a schema change, and then inserts on the new schema. > > I guess until logical replication of DDL is available you’re out of luck. > > The best you can do is to have a separate table for recording and > replaying schema changes. > Create triggers that perform actual DDL operations based on DML in this > table. > Publish this table on the publisher in the same publication as the tables > affected by the DDL. > > On the subscriber side it is the same - just make the trigger is marked as > ENABLE REPLICA TRIGGER or ENABLE ALWAYS TRIGGER. > > Kind regards, > Michał > >
Download of v16.1 for Windows 64B
>From where can I download v16.1? I can only find v16.4 in the EDB site. V/r. Carlos Oliva
Re: Performance difference between Primary & Secondary in the query execution
Hi, You can check and verify the following points: *Check the cpu resources on both primary and secondary *Check the execution plans on both the primary and secondary by running EXPLAIN (ANALYZE, BUFFERS) for the problematic queries like the following explain (analyze,buffers) select * from test ; QUERY PLAN - Seq Scan on test (cost=0.00..22.70 rows=1270 width=36) (actual time=0.013..0.014 rows=1 loops=1) Buffers: shared hit=1 Planning: Buffers: shared hit=1 Planning Time: 0.082 ms Execution Time: 0.033 ms (6 rows) *check vacuuming and bloating using the following query select * from pg_stat_user_tables; select * from pg_stat_all_tables; *Check the shared_buffers, work_mem, and maintenance_work_mem settings on both instances. *Compare the disk I/O performance using the following query select * from pg_stat_bgwriter; *Check for any replication lag (pg_stat_replication on the primary) to ensure the secondary isn't falling behind. On Thu, 17 Oct 2024 at 19:08, Siraj G wrote: > Hello Experts! > > We have a PgSQL instance running with HA (secondary is being in sync with > streaming replication). Both the ends, we have same version, but not sure a > few SQLs behave badly in the secondary: > > Primary: > PostgreSQL 12.20 (Ubuntu 12.20-0ubuntu0.20.04.1) on x86_64-pc-linux-gnu, > compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0, 64-bit > > OS: Ubuntu 20.04.6 LTS \n \l > > Secondary: > ostgreSQL 12.20 (Ubuntu 12.20-0ubuntu0.20.04.1) on x86_64-pc-linux-gnu, > compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0, 64-bit > > OS: Ubuntu 20.04.6 LTS \n \l > > The application consumes more data from secondary, hence the server has > extra vCPUs. > > Can you please advise what needs to be checked. > > FYI, I am attaching the query with the different execution plans. > > Regards > Siraj >
Re: Download of v16.1 for Windows 64B
On 10/17/24 08:21, Carlos Oliva wrote: Thank you, Erik. That download installed fine. I must use v16.1 because I will be working with a legacy application. Should not make a difference. For what changed between 16.1 and 16.4 see: https://www.postgresql.org/docs/16/release.html *V/r.* * * *Carlos Oliva* -- Adrian Klaver adrian.kla...@aklaver.com
Performance difference between Primary & Secondary in the query execution
Hello Experts! We have a PgSQL instance running with HA (secondary is being in sync with streaming replication). Both the ends, we have same version, but not sure a few SQLs behave badly in the secondary: Primary: PostgreSQL 12.20 (Ubuntu 12.20-0ubuntu0.20.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0, 64-bit OS: Ubuntu 20.04.6 LTS \n \l Secondary: ostgreSQL 12.20 (Ubuntu 12.20-0ubuntu0.20.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0, 64-bit OS: Ubuntu 20.04.6 LTS \n \l The application consumes more data from secondary, hence the server has extra vCPUs. Can you please advise what needs to be checked. FYI, I am attaching the query with the different execution plans. Regards Siraj bad-explain-secondary Description: Binary data explain-primary Description: Binary data query Description: Binary data
Re: Download of v16.1 for Windows 64B
On 2024-10-17 16:06 +0200, Carlos Oliva wrote: > From where can I download v16.1? I can only find v16.4 in the EDB site. https://get.enterprisedb.com/postgresql/postgresql-16.1-1-windows-x64.exe I think they used to also link to older minor releases, but I can't find that anymore. But you should use 16.4 unless you have a specific reason for using 16.1 which has known bugs. -- Erik
Re: Download of v16.1 for Windows 64B
On Thu, Oct 17, 2024 at 11:21 AM Carlos Oliva wrote: > Thank you, Erik. > > That download installed fine. I must use v16.1 because I will be working > with a legacy application. > What does that have to do with whether you use 16.1 or 16.4? (Note that PG point releases only fix bugs.) -- Death to , and butter sauce. Don't boil me, I'm still alive. crustacean!