Partitioning and unique key
Hello, We have our transaction tables daily range partitioned based on transaction_timestamp column which is timestamptz data type and these are having composite primary key on (transaction_id, transaction_timestamp). And we were using an "insert on conflict" for loading data to our system , which means if another record comes to the system with the same transaction_id and transaction_timestamp, it will get updated. This way we already have 60 days worth of data stored in our system with approx. 70 million transactions per day. But we just got to know from business that the data should be unique by only transaction_id but not transaction_timestamp. Any incoming data with the same transaction_id(even different transaction_timestamp) should get updated but not inserted. Also these daily partitions are going to hold 400million rows in future and will be queried on the transaction_timestamp filter so we can't really avoid the partitioning option here considering future growth. But due to postgres limitations we are unable to have this unique constraint or primary key only on the transaction_id column, we have to include transaction_timestamp with it as a composite key. So I want to understand from experts if there is any possible way to satisfy both partitioning on transaction_timestamp column and unique key or pk just on *trans*action_id only? Note-its 15.4 postgres database. Regards Veem
Re: Upgrade Ubuntu 22 -> 24 may break PostgreSQL
On 8/31/24 09:54, Peter J. Holzer wrote: 'Tis the season again. Ubuntu 24.04.1 has just been released, so many Ubuntu LTS users will now be prompted to upgrade from 22.04 to 24.04. Which I ignore. A word of warning to those who use Postgresql from the Ubuntu repo (not PGDG): Why I do use the PGDG repo's. As usual, a newer Ubuntu version comes with a newer Postgres version (16 instead of 14). Also as usual, I got a message during the upgrade that Postgres 14 is obsolete,. but the binaries have been left installed and I should upgrade to Postgres 16 manually ASAP. But after the reboot, PostgreSQL failed to start because it needed a shared library (libldap) which was no longer there. So a normal pg_upgradecluster wouldn't work. sudo apt install libldap-X.x did not work? In my case the quickest way to recover was to install postgresql-14 on a VM, copy the data direcory into that instance and make a fresh dump, then install postgresql-16 on my laptop and restore the dump. Annoying, but no big deal for the small test database I keep on my laptop. If you have multi-terabyte databases, your situation may be different. I'm not exactly sure what went wrong (I got some conflicts during the upgrade and maybe I shouldn't have invoked apt autoremove?), and you may not have this problem, but make sure you have a backup before the upgrade. hp -- Adrian Klaver adrian.kla...@aklaver.com
Re: Partitioning and unique key
On Saturday, August 31, 2024, veem v wrote: > > 1) if it's technically possible to have a unique key on only the > transaction_id column having the partition key on the > transaction_timestamp, because the table is going to be queried/purged > based on the transaction_timestamp? > There is presently no such thing as a cross-partition unique constraint. If you define the constraint on the [partitioned] table the documentation is perfectly clear, as are I believe the error messages, that it will require all partitioning columns to be included - since that is what happens in reality. If you target the partitions directly with the unique index or constraint no such limitation should exist. > > iii)And then alter the datatype of the partition key transaction_date to > DATE in one shot at the table level(which should be fast as its having more > granularity as compare to existing timestamptype, so should be catalog or > dictionary change only), and that will remain the part of composite PK > (transaction_id,transaction_date). > While this might seem logical, in reality date and timestamptz are different fixed-width data types and thus any attempt to change from one to the other will involve a table rewrite. Best you could do is leave the timestamptz in place and just truncate to day so the time is always midnight UTC. > iv) Repeat this step for all child partition tables and then for the > parent partition tables. > I’d suggest trying to just build a new partitioned table that is correctly defined. Then populate it. Add a trigger to the existing one to keep the new one in sync. Then change your application code to point to the new partitioned table. At which point the old partitioned table can be dropped. David J.
Re: Upgrade Ubuntu 22 -> 24 may break PostgreSQL
On 2024-09-01 02:54, Peter J. Holzer wrote: 'Tis the season again. Ubuntu 24.04.1 has just been released, so many Ubuntu LTS users will now be prompted to upgrade from 22.04 to 24.04. A word of warning to those who use Postgresql from the Ubuntu repo (not PGDG): As usual, a newer Ubuntu version comes with a newer Postgres version (16 instead of 14). Also as usual, I got a message during the upgrade that Postgres 14 is obsolete,. but the binaries have been left installed and I should upgrade to Postgres 16 manually ASAP. It'd *technically* be possible to automatically run an upgrade of the PostgreSQL repository (via scripting?) at launch time, though just blindly doing it for everyone would be a *major* change of behaviour. Some people would likely love it, while others would be horrified (etc). That being said, if we announce it ahead of time as a feature of a major release (ie PG 18 or something), and if we have a clear way to not automatically upgrade (a variable in postgresql.conf?), then we might be able to solve this problem ~permanently. We'd also need to figure out how to handle (say) rebuilding of indexes that need updating between major versions and stuff like that. Thoughts? Regards and best wishes, Justin Clift
Upgrade Ubuntu 22 -> 24 may break PostgreSQL
'Tis the season again. Ubuntu 24.04.1 has just been released, so many Ubuntu LTS users will now be prompted to upgrade from 22.04 to 24.04. A word of warning to those who use Postgresql from the Ubuntu repo (not PGDG): As usual, a newer Ubuntu version comes with a newer Postgres version (16 instead of 14). Also as usual, I got a message during the upgrade that Postgres 14 is obsolete,. but the binaries have been left installed and I should upgrade to Postgres 16 manually ASAP. But after the reboot, PostgreSQL failed to start because it needed a shared library (libldap) which was no longer there. So a normal pg_upgradecluster wouldn't work. In my case the quickest way to recover was to install postgresql-14 on a VM, copy the data direcory into that instance and make a fresh dump, then install postgresql-16 on my laptop and restore the dump. Annoying, but no big deal for the small test database I keep on my laptop. If you have multi-terabyte databases, your situation may be different. I'm not exactly sure what went wrong (I got some conflicts during the upgrade and maybe I shouldn't have invoked apt autoremove?), and you may not have this problem, but make sure you have a backup before the upgrade. hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature
Re: Partitioning and unique key
On Sun, 1 Sept 2024 at 03:58, Adrian Klaver wrote: > > The model is at odds with itself and untenable. If the tables hold > multiple rows for a given transaction_id then you cannot have a > PK/Unique constraint on that column. Seems there is a decided lack of > any planning. The only way I can see this happening is consolidating all > the duplicate transaction_id rows into a single row for each > transaction_id. That then leads to the question of how to do that and > retain the 'correct' information from the selection of rows for each > transaction_id. > > Yes we had messed up the data for now and have multiple records for each transaction_id persisted and thus we need to fix the data. But more than that , as I stated , I wanted to understand first 1) if it's technically possible to have a unique key on only the transaction_id column having the partition key on the transaction_timestamp, because the table is going to be queried/purged based on the transaction_timestamp? 2) Additionally we were thinking if above is technically not possible, then the maximum granularity which we can have for each transaction_id will be a day, so the partition key transaction_timestmp can be truncated to have only date component but no time component. So the primary key will be (transaction_id, transaction_date). But we also don't want to lose the time component and persist the existing data of transaction_timestmp (which will have a time component in it, in a separate column). And in above case , for fixing the existing data in least disruptive way, as we have currently duplicate transaction_id inserted into the table already because of the composite primary key(transaction_id, transaction_timestmp).Can we simply i)rename the existing column transaction_timestmp to transaction_date and then add new column transaction_timestmp using the values of existing column partition by partition. ii)And then delete the duplicate data using query something as below , each partition by partition. iii)And then alter the datatype of the partition key transaction_date to DATE in one shot at the table level(which should be fast as its having more granularity as compare to existing timestamptype, so should be catalog or dictionary change only), and that will remain the part of composite PK (transaction_id,transaction_date). iv) Repeat this step for all child partition tables and then for the parent partition tables. Will this technique be the most efficient way of fixing this mess? WITH ranked_records AS ( SELECT column1_id, column2_timestamptz, ROW_NUMBER() OVER (PARTITION BY column1_id, date_trunc('day', column2_timestamptz) ORDER BY column2_timestamptz DESC) AS rn FROM partition_name ) DELETE FROM partition_name T1 WHERE EXISTS ( SELECT 1 FROM ranked_records T2 WHERE T1.column1_id = T2.column1_id AND T1.column2_timestamptz = T2.column2_timestamptz AND T2.rn > 1 )
Re: Partitioning and unique key
On Sun, 1 Sept 2024 at 09:13, David G. Johnston wrote: > On Saturday, August 31, 2024, veem v wrote: > >> >> >> iii)And then alter the datatype of the partition key transaction_date to >> DATE in one shot at the table level(which should be fast as its having more >> granularity as compare to existing timestamptype, so should be catalog or >> dictionary change only), and that will remain the part of composite PK >> (transaction_id,transaction_date). >> > > While this might seem logical, in reality date and timestamptz are > different fixed-width data types and thus any attempt to change from one to > the other will involve a table rewrite. Best you could do is leave the > timestamptz in place and just truncate to day so the time is always > midnight UTC. > > >> >> Here , if we keep the PK column as is i.e. the transaction_timestamp as timestamptz but truncate the time component , in that case again in future if someone tries to insert(using insert on conflict) data into the table with time component , it will get consumed and will not be restricted by the PK constraint. So I was trying to make the data type also as DATE for the transaction_timestap column. As in this case anyway we have to create another column to populate the date+timestamp values as we cant throw those values away per business need, so we will be kind of rewriting the table.So is it okay if if we will 1) Detach all the partitions. 2)Do the alter using "only" key word in table level. (For adding new column transaction_timestamp_new to hold date+timestamp value and also altering the existing transaction_timestamp column to DATE from type timestamptz). 3)Then do the data fix(delete the duplicates) and alter the column, one partition at a time for all of the partitions and once done , attach those partitions one by one. 5)Rename the columns at table level.Hope this won't need any table rewrite. Is there any downside if we go by the above approach?
Re: Partitioning and unique key
On 8/31/24 13:02, veem v wrote: Hello, We have our transaction tables daily range partitioned based on transaction_timestamp column which is timestamptz data type and these are having composite primary key on (transaction_id, transaction_timestamp). And we were using an "insert on conflict" for loading data to our system , which means if another record comes to the system with the same transaction_id and transaction_timestamp, it will get updated. This way we already have 60 days worth of data stored in our system with approx. 70 million transactions per day. But we just got to know from business thatthe data should be unique by only transaction_id but not transaction_timestamp. Any incoming data with the same transaction_id(even different transaction_timestamp) should get updated but not inserted. Also these daily partitions are going to hold 400million rows in future and will be queried on the transaction_timestamp filter so we can't really avoid the partitioning option here considering future growth. But due to postgres limitations we are unable to have this unique constraint or primary key only on the transaction_id column, we have to include transaction_timestamp with it as a composite key. So I want to understand from experts if there is any possible way to satisfy both partitioning on transaction_timestamp column and unique key or pk just on _trans_action_id only? The model is at odds with itself and untenable. If the tables hold multiple rows for a given transaction_id then you cannot have a PK/Unique constraint on that column. Seems there is a decided lack of any planning. The only way I can see this happening is consolidating all the duplicate transaction_id rows into a single row for each transaction_id. That then leads to the question of how to do that and retain the 'correct' information from the selection of rows for each transaction_id. Note-its 15.4 postgres database. Regards Veem -- Adrian Klaver adrian.kla...@aklaver.com
Re: Partitioning and unique key
On Sun, 1 Sept 2024 at 10:03, veem v wrote: > > On Sun, 1 Sept 2024 at 09:13, David G. Johnston < > david.g.johns...@gmail.com> wrote: > >> On Saturday, August 31, 2024, veem v wrote: >> >>> >>> >>> iii)And then alter the datatype of the partition key transaction_date to >>> DATE in one shot at the table level(which should be fast as its having more >>> granularity as compare to existing timestamptype, so should be catalog or >>> dictionary change only), and that will remain the part of composite PK >>> (transaction_id,transaction_date). >>> >> >> While this might seem logical, in reality date and timestamptz are >> different fixed-width data types and thus any attempt to change from one to >> the other will involve a table rewrite. Best you could do is leave the >> timestamptz in place and just truncate to day so the time is always >> midnight UTC. >> >> >>> >>> > Here , if we keep the PK column as is i.e. the transaction_timestamp as > timestamptz but truncate the time component , in that case again in future > if someone tries to insert(using insert on conflict) data into the table > with time component , it will get consumed and will not be restricted by > the PK constraint. So I was trying to make the data type also as DATE for > the transaction_timestap column. > > As in this case anyway we have to create another column to populate the > date+timestamp values as we cant throw those values away per business need, > so we will be kind of rewriting the table.So is it okay if if we will > > 1) Detach all the partitions. > 2)Do the alter using "only" key word in table level. (For adding new > column transaction_timestamp_new to hold date+timestamp value and also > altering the existing transaction_timestamp column to DATE from type > timestamptz). > 3)Then do the data fix(delete the duplicates) and alter the column, one > partition at a time for all of the partitions and once done , attach those > partitions one by one. > 5)Rename the columns at table level.Hope this won't need any table rewrite. > > Is there any downside if we go by the above approach? > Or do you mean to say there is no way we can modify the data type of a partition key even by detaching the partitions one by one? And thus we may have only way left is to create the table from scratch with partitions and populate the data to it? I was avoiding this because we have many indexes also in it , so creating from scratch means creating those indexes again. So I wanted to achieve it by detaching partitions, doing the required change and attaching it again.
Re: Upgrade Ubuntu 22 -> 24 may break PostgreSQL
On 2024-08-31 10:35:01 -0700, Adrian Klaver wrote: > On 8/31/24 09:54, Peter J. Holzer wrote: > > 'Tis the season again. > > > > Ubuntu 24.04.1 has just been released, so many Ubuntu LTS users will now > > be prompted to upgrade from 22.04 to 24.04. > > Which I ignore. At some point I do want to upgrade. Not necessarily at the earliest oportunity, but for workstations/laptops I'm usually rather quick. This laptop was originally delivered with a pre-installed Ubuntu 18.04 LTS, and has since been upgraded to 20.04, 22.04 and now 24.04. Servers are different. They are usually upgraded when neccessary. Or sometimes new hardware is ready before support ends ... But I think our personal preferences are besides the point. Many people will upgrade Ubuntu in the next weeks or months. So I wanted to give them a heads-up that this might not be as smooth as expected. > > A word of warning to those who use Postgresql from the Ubuntu repo (not > > PGDG): > > Why I do use the PGDG repo's. So do I on production systems. But this was just my personal laptop. But yeah, it reminded me that one of the reasons I prefer the PGDG repos is that they decouple PostgreSQL upgrades from OS upgrades. > > As usual, a newer Ubuntu version comes with a newer Postgres version (16 > > instead of 14). Also as usual, I got a message during the upgrade that > > Postgres 14 is obsolete,. but the binaries have been left installed and > > I should upgrade to Postgres 16 manually ASAP. > > > > But after the reboot, PostgreSQL failed to start because it needed a > > shared library (libldap) which was no longer there. So a normal > > pg_upgradecluster wouldn't work. > > sudo apt install libldap-X.x did not work? It might have worked after reenabling the Ubuntu 22 repo. Or it might have worked to simply symlink to the newer libldap. Frankly I didn't try. I had a VM with ubuntu 22 without postgres lying around, so that seemed like the safeste route. hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature