Re: question on audit columns
On Wed, 4 Sep 2024 18:19:47 +0530 yudhi s wrote: Hi, > In postgres database , we have all the tables with audit columns like > created_by_user, created_timestamp,updated_by_user, updated_timestamp. So > we have these fields that were supposed to be populated by the time at > which the insert/update operation happened on the database but not at the > application level. So we are planning to populate the created_by_user, > created_timestamp columns by setting a default value of "current_timestamp" > and "current_user" for the two columns, but no such this is available to > populate while we do the update of the row, so the only option seems to be > through a trigger. > If you can live with the fact that updated_by_user and updated_timestamp get the same values as created_by_user and created_timestamp when inserting the record, then you can do : vv=> create table audit (created_by_user text default current_user, created_timestamp timestamp default now(), updated_by_user text default current_user, updated_timestamp timestamp default now(), data text); CREATE TABLE vv=> insert into audit (data) values ('abc'); INSERT 0 1 vv=> select * from audit; created_by_user | created_timestamp | updated_by_user | updated_timestamp | data -++-++-- vincent | 2024-09-05 19:17:53.446109 | vincent | 2024-09-05 19:17:53.446109 | abc (1 row) --as user postgres update audit set updated_by_user = DEFAULT, updated_timestamp = DEFAULT, data = 'def'; vv=> select * from audit; created_by_user | created_timestamp | updated_by_user | updated_timestamp | data -++-+---+-- vincent | 2024-09-05 19:17:53.446109 | postgres| 2024-09-05 19:24:01.19186 | def (1 row) -- Bien à vous, Vincent Veyron https://marica.fr/ Logiciel de suivi des contentieux juridiques, des sinistres d'assurance et des contrats
Faster data load
Hi, We are having a requirement to create approx 50 billion rows in a partition table(~1 billion rows per partition, 200+gb size daily partitions) for a performance test. We are currently using ' insert into select.. From or ;' method . We have dropped all indexes and constraints First and then doing the load. Still it's taking 2-3 hours to populate one partition. Is there a faster way to achieve this? Few teammate suggesting to use copy command and use file load instead, which will be faster. So I wanted to understand, how different things it does behind the scenes as compared to insert as select command? As because it only deals with sql engine only. Additionally, when we were trying to create indexes post data load on one partition, it took 30+ minutes. Any possible way to make it faster? Is there any way to drive the above things in parallel by utilizing full database resources? It's postgres 15.4 Regards Lok
Re: Faster data load
On Thu, Sep 5, 2024 at 4:14 PM Lok P wrote: > Hi, > > We are having a requirement to create approx 50 billion rows in a > partition table(~1 billion rows per partition, 200+gb size daily > partitions) for a performance test. We are currently using ' insert into > select.. From or transformed query>;' method . We have dropped all indexes and constraints > First and then doing the load. Still it's taking 2-3 hours to populate one > partition. > At three hours, that's 92,593 records/second. Seems pretty slow. How much of that time is taken by ? How big are the records? How fast is the hardware? Is there a faster way to achieve this? > Testing is the only way to know for sure. > Few teammate suggesting to use copy command and use file load instead, > which will be faster. So I wanted to understand, how different things it > does behind the scenes as compared to insert as select command? As because > it only deals with sql engine only. > COPY is highly optimized for buffered operation. INSERT... maybe not so much. But if the source data is already in a table, that would require piping the data to stdout and then back into the database. psql appdb -c "COPY (SELECT ...) TO STDOUT;" | psql appdb -c "COPY some_table FROM STDOUT;". Use binary mode, so text conversion isn't required. Maybe that's faster, maybe not. Additionally, when we were trying to create indexes post data load on one > partition, it took 30+ minutes. Any possible way to make it faster? > > Is there any way to drive the above things in parallel by utilizing full > database resources? > Put the destination tables in a different tablespace on a different controller. > It's postgres 15.4 > Why not 15.8? -- Death to America, and butter sauce. Iraq lobster!
Re: Faster data load
On 9/5/24 14:14, Lok P wrote: Hi, We are having a requirement to create approx 50 billion rows in a partition table(~1 billion rows per partition, 200+gb size daily partitions) for a performance test. We are currently using ' insert into select.. From or ;' method . We have dropped all indexes and constraints First and then doing the load. Still it's taking 2-3 hours to populate one partition. Is there a faster way to achieve this? Few teammate suggesting to use copy command and use file load instead, which will be faster. So I wanted to understand, how different things it does behind the scenes as compared to insert as select command? As because it only deals with sql engine only. Additionally, when we were trying to create indexes post data load on one partition, it took 30+ minutes. Any possible way to make it faster? Is there any way to drive the above things in parallel by utilizing full database resources? It's postgres 15.4 Regards Lok Try pg_bulkload to load the data--takes a little set up but it is very fast. Do pay attention to the caveats. For a performance test they probably won't be relevant. https://github.com/ossc-db/pg_bulkload?tab=readme-ov-file Jeff
Re: Faster data load
Hi, You can use pg_partman. If your table is partitioned, you can manage partitions in parallel by distributing the load across partitions concurrently. Or you can use citus. It can be an excellent solution, especially for handling large data volumes and parallelizing data operations On Fri, 6 Sept 2024 at 01:14, Lok P wrote: > Hi, > > We are having a requirement to create approx 50 billion rows in a > partition table(~1 billion rows per partition, 200+gb size daily > partitions) for a performance test. We are currently using ' insert into > select.. From or transformed query>;' method . We have dropped all indexes and constraints > First and then doing the load. Still it's taking 2-3 hours to populate one > partition. Is there a faster way to achieve this? > > Few teammate suggesting to use copy command and use file load instead, > which will be faster. So I wanted to understand, how different things it > does behind the scenes as compared to insert as select command? As because > it only deals with sql engine only. > > Additionally, when we were trying to create indexes post data load on one > partition, it took 30+ minutes. Any possible way to make it faster? > > Is there any way to drive the above things in parallel by utilizing full > database resources? > > It's postgres 15.4 > > Regards > Lok >
Re: Faster data load
On Fri, 6 Sept, 2024, 9:20 am Muhammad Usman Khan, wrote: > Hi, > > You can use pg_partman. If your table is partitioned, you can manage > partitions in parallel by distributing the load across partitions > concurrently. Or you can use citus. It can be an excellent solution, > especially for handling large data volumes and parallelizing data operations > Thank you. The tables are partitioned. Also during index creation we are trying to do it multiple partitions at same time from multiple sessions.But seeing out of memory error in 5th or 6th session. And even each sessions taking 30mins per partitions for index creation. Attach index partitions happening in seconds though. > >
Re: Faster data load
On Fri, Sep 6, 2024 at 12:43 AM Lok P wrote: > Also during index creation we are trying to do it multiple partitions at > same time from multiple sessions.But seeing out of memory error in 5th or > 6th session. > Had that same problem during pg_restore. Reduced maintenance_work_mem and the problem went away. -- Death to America, and butter sauce. Iraq lobster!