Re: Upgrade Ubuntu 22 -> 24 may break PostgreSQL
On 8/31/24 19:32, Justin Clift wrote: On 2024-09-01 02:54, Peter J. Holzer wrote: 'Tis the season again. As usual, a newer Ubuntu version comes with a newer Postgres version (16 instead of 14). Also as usual, I got a message during the upgrade that Postgres 14 is obsolete,. but the binaries have been left installed and I should upgrade to Postgres 16 manually ASAP. It'd *technically* be possible to automatically run an upgrade of the PostgreSQL repository (via scripting?) at launch time, though just blindly doing it for everyone would be a *major* change of behaviour. The OP was using the Ubuntu repo and for a given major version of Ubuntu that is pinned to a given major version of Postgres. I am not seeing changing that would go over well. Now if a user is using the PGDG repo's that is a different story, then you point at the repo's for the new Ubuntu version do an update/upgrade and you are back on track. Thoughts? This is something the end user needs to work out ahead of time as there is an overhead in the process they need to take into consideration. Cranking up a new version of Ubuntu/Debian and have it take off doing things behind the scenes to the Postgres instance(s) would disturb me. Regards and best wishes, Justin Clift -- Adrian Klaver adrian.kla...@aklaver.com
Re: Upgrade Ubuntu 22 -> 24 may break PostgreSQL
On 8/31/24 11:02, Peter J. Holzer wrote: On 2024-08-31 10:35:01 -0700, Adrian Klaver wrote: On 8/31/24 09:54, Peter J. Holzer wrote: 'Tis the season again. Ubuntu 24.04.1 has just been released, so many Ubuntu LTS users will now be prompted to upgrade from 22.04 to 24.04. Which I ignore. But I think our personal preferences are besides the point. Many people will upgrade Ubuntu in the next weeks or months. So I wanted to give them a heads-up that this might not be as smooth as expected. Major updates rarely are. That is why I tend to look before I leap. One way to make the leap easier is to change the Postgres repo from the Ubuntu to the PGDG repo on the current Ubuntu version being run then do the Ubuntu upgrade. The caveat being what the current version of Ubuntu/Postgres is you are running and whether that Postgres version is still supported in the main PGDG repo. Otherwise you are going to: https://apt-archive.postgresql.org/ -- Adrian Klaver adrian.kla...@aklaver.com
Re: Postgres Logical Replication - how to see what subscriber is doing with received data?
Since nobody more knowledgeable has replied... I'm very interested in this area and still surprised that there is no official/convenient/standard way to approach this (see https://www.postgresql.org/message-id/CAHAc2jdAHvp7tFZBP37awcth%3DT3h5WXCN9KjZOvuTNJaAAC_hg%40mail.gmail.com ). Based partly on that thread, I ended up with a script that connects to both ends of the replication, and basically loops while comparing the counts in each table. On Fri, 30 Aug 2024, 12:38 Michael Jaskiewicz, wrote: > I've got two Postgres 13 databases on AWS RDS. > >- One is a master, the other a slave using logical replication. >- Replication has fallen behind by about 350Gb. >- The slave was maxed out in terms of CPU for the past four days >because of some jobs that were ongoing so I'm not sure what logical >replication was able to replicate during that time. >- I killed those jobs and now CPU on the master and slave are both low. >- I look at the subscriber via `select * from pg_stat_subscription;` >and see that latest_end_lsn is advancing albeit very slowly. >- The publisher says write/flush/replay lags are all 13 minutes behind >but it's been like that for most of the day. >- I see no errors in the logs on either the publisher or subscriber >outside of some simple SQL errors that users have been making. >- CloudWatch reports low CPU utilization, low I/O, and low network. > > > > Is there anything I can do here? Previously I set wal_receiver_timeout > timeout to 0 because I had replication issues, and that helped things. I > wish I had *some* visibility here to get any kind of confidence that it's > going to pull through, but other than these lsn values and database logs, > I'm not sure what to check. > > > > Sincerely, > > mj >
Re: Postgres Logical Replication - how to see what subscriber is doing with received data?
Hi Shaheed, Maybe these considerations could help you or give any hint to the problem ? Check if wal_receiver_timeout being set to 0 could potentially cause issues, like not detecting network issues quickly enough. Consider re-evaluating this setting if you see connection issues. If you notice that some data is missing on subscriber then could you increase max_slot_wal_keep_size on publisher so that WALs are not deleted until they are applied on subscriber. Do you have flexibility to increase max_worker_processes and max_logical_replication_workers, work_mem and maintenance_work_mem on subscriber (In case bottleneck exists on subscriber) If there's significant lag, consider whether it might be more efficient to drop the subscription and re-initialize it from scratch using a new base backup, depending on the data volume and how long it might take for the existing replication to catch up. Regards, Muhammad Ikram On Sun, Sep 1, 2024 at 9:22 PM Shaheed Haque wrote: > Since nobody more knowledgeable has replied... > > I'm very interested in this area and still surprised that there is no > official/convenient/standard way to approach this (see > https://www.postgresql.org/message-id/CAHAc2jdAHvp7tFZBP37awcth%3DT3h5WXCN9KjZOvuTNJaAAC_hg%40mail.gmail.com > ). > > Based partly on that thread, I ended up with a script that connects to > both ends of the replication, and basically loops while comparing the > counts in each table. > > On Fri, 30 Aug 2024, 12:38 Michael Jaskiewicz, > wrote: > >> I've got two Postgres 13 databases on AWS RDS. >> >>- One is a master, the other a slave using logical replication. >>- Replication has fallen behind by about 350Gb. >>- The slave was maxed out in terms of CPU for the past four days >>because of some jobs that were ongoing so I'm not sure what logical >>replication was able to replicate during that time. >>- I killed those jobs and now CPU on the master and slave are both >>low. >>- I look at the subscriber via `select * from pg_stat_subscription;` >>and see that latest_end_lsn is advancing albeit very slowly. >>- The publisher says write/flush/replay lags are all 13 minutes >>behind but it's been like that for most of the day. >>- I see no errors in the logs on either the publisher or subscriber >>outside of some simple SQL errors that users have been making. >>- CloudWatch reports low CPU utilization, low I/O, and low network. >> >> >> >> Is there anything I can do here? Previously I set wal_receiver_timeout >> timeout to 0 because I had replication issues, and that helped things. I >> wish I had *some* visibility here to get any kind of confidence that >> it's going to pull through, but other than these lsn values and database >> logs, I'm not sure what to check. >> >> >> >> Sincerely, >> >> mj >> > -- Muhammad Ikram
Re: PG17 optimizations to vacuum
On Sun, Sep 1, 2024 at 5:44 PM Pavel Luzanov wrote: > I see a perfectly working TID-store optimization. > With reduced maintenance_work_mem it used only one 'vacuuming indexes' > phase instead of 21 in v16. > But I also expected to see a reduction in the number of WAL records > and the total size of the WAL. Instead, WAL numbers have significantly > degraded. > > What am I doing wrong? That does seem weird. CC'ing the authors of the relevant VACUUM enhancements. -- Peter Geoghegan
Re: Partitioning and unique key
On Sun, 1 Sept 2024 at 11:38, veem v wrote: > > On Sun, 1 Sept 2024 at 10:03, veem v wrote: > >> >> On Sun, 1 Sept 2024 at 09:13, David G. Johnston < >> david.g.johns...@gmail.com> wrote: >> >>> On Saturday, August 31, 2024, veem v wrote: >>> iii)And then alter the datatype of the partition key transaction_date to DATE in one shot at the table level(which should be fast as its having more granularity as compare to existing timestamptype, so should be catalog or dictionary change only), and that will remain the part of composite PK (transaction_id,transaction_date). >>> >>> While this might seem logical, in reality date and timestamptz are >>> different fixed-width data types and thus any attempt to change from one to >>> the other will involve a table rewrite. Best you could do is leave the >>> timestamptz in place and just truncate to day so the time is always >>> midnight UTC. >>> >>> >> Here , if we keep the PK column as is i.e. the transaction_timestamp as >> timestamptz but truncate the time component , in that case again in future >> if someone tries to insert(using insert on conflict) data into the table >> with time component , it will get consumed and will not be restricted by >> the PK constraint. So I was trying to make the data type also as DATE for >> the transaction_timestap column. >> >> As in this case anyway we have to create another column to populate the >> date+timestamp values as we cant throw those values away per business need, >> so we will be kind of rewriting the table.So is it okay if if we will >> >> 1) Detach all the partitions. >> 2)Do the alter using "only" key word in table level. (For adding new >> column transaction_timestamp_new to hold date+timestamp value and also >> altering the existing transaction_timestamp column to DATE from type >> timestamptz). >> 3)Then do the data fix(delete the duplicates) and alter the column, one >> partition at a time for all of the partitions and once done , attach those >> partitions one by one. >> 5)Rename the columns at table level.Hope this won't need any table >> rewrite. >> >> Is there any downside if we go by the above approach? >> > > Or do you mean to say there is no way we can modify the data type of a > partition key even by detaching the partitions one by one? And thus we may > have only way left is to create the table from scratch with partitions and > populate the data to it? I was avoiding this because we have many indexes > also in it , so creating from scratch means creating those indexes again. > So I wanted to achieve it by detaching partitions, doing the required > change and attaching it again. > > I tried by detaching the partitions, but still then it's not allowing me to alter the DAT TYPE of the partition key and throwing error as below. ERROR: cannot alter column "" because it is part of the partition key of relation "" Now I am thinking if it's really going to get too complex if we try to stick with the partition detach and attach strategy. As a few teammates say , having a new column added with just a date type and then drop the existing FK and PK first and then detach all the partitions, and attach the partitions back using the new DATE column. and then recreate the PK again. Btw we have ~5 partition tables with parent child relationship on which this fix has to be applied. So I'm still wondering the best way possible for fixing this issue.
Could we go back in a replication slot?
Hi, community, The START_REPLICATION command accepts a lsn parameter, but according to the document that replication "starts at either WAL location XXX/XXX or the slot's confirmed_flush_lsn (see Section 54.19), whichever is greater." I wonder if there is a way to go back, like a command to set confirmed_flush_lsn to a previous lsn. I am asking because I may need to replay the replication messages to Kafka. Sincerely, Xiong Ding
PG17 optimizations to vacuum
Hello, While playing with optimizations to vacuum in v17 I can't understand how to measure this one: "Allow vacuum to more efficiently remove and freeze tuples". My test script and results: CREATE TABLE t(id integer) WITH (autovacuum_enabled = off); INSERT INTO t SELECT gen.id FROM generate_series(1,3_500_000) gen(id); CREATE INDEX t_id ON t(id); SET maintenance_work_mem = '1MB'; UPDATE t SET id = id + 1; VACUUM FREEZE VERBOSE t; v16.4 INFO: aggressively vacuuming "postgres.public.t" INFO: finished vacuuming "postgres.public.t": index scans: 21 pages: 0 removed, 30974 remain, 30974 scanned (100.00% of total) tuples: 350 removed, 350 remain, 0 are dead but not yet removable removable cutoff: 1675, which was 0 XIDs old when operation ended new relfrozenxid: 1675, which is 4 XIDs ahead of previous value frozen: 15488 pages from table (50.00% of total) had 350 tuples frozen index scan needed: 15487 pages from table (50.00% of total) had 350 dead item identifiers removed index "t_id": pages: 19196 in total, 0 newly deleted, 0 currently deleted, 0 reusable avg read rate: 473.207 MB/s, avg write rate: 92.511 MB/s buffer usage: 212718 hits, 267930 misses, 52380 dirtied WAL usage: 96585 records, 42819 full page images, 198029405 bytes system usage: CPU: user: 3.17 s, system: 0.48 s, elapsed: 4.42 s VACUUM master INFO: aggressively vacuuming "postgres.public.t" INFO: finished vacuuming "postgres.public.t": index scans: 1 pages: 0 removed, 30974 remain, 30974 scanned (100.00% of total) tuples: 350 removed, 350 remain, 0 are dead but not yet removable removable cutoff: 950, which was 0 XIDs old when operation ended new relfrozenxid: 950, which is 4 XIDs ahead of previous value frozen: 15488 pages from table (50.00% of total) had 350 tuples frozen index scan needed: 15487 pages from table (50.00% of total) had 350 dead item identifiers removed index "t_id": pages: 19196 in total, 0 newly deleted, 0 currently deleted, 0 reusable avg read rate: 101.121 MB/s, avg write rate: 120.530 MB/s buffer usage: 48900 hits, 47749 reads, 56914 dirtied WAL usage: 125391 records, 46626 full page images, 330547751 bytes system usage: CPU: user: 2.90 s, system: 0.27 s, elapsed: 3.68 s VACUUM I see a perfectly working TID-store optimization. With reduced maintenance_work_mem it used only one 'vacuuming indexes' phase instead of 21 in v16. But I also expected to see a reduction in the number of WAL records and the total size of the WAL. Instead, WAL numbers have significantly degraded. What am I doing wrong? -- Pavel Luzanov Postgres Professional:https://postgrespro.com