Re: question on audit columns

2024-09-05 Thread Vincent Veyron
On Wed, 4 Sep 2024 18:19:47 +0530
yudhi s  wrote:

Hi,

> In postgres database , we have all the tables with audit columns like
> created_by_user, created_timestamp,updated_by_user, updated_timestamp. So
> we have these fields that were supposed to be populated by the time at
> which the insert/update operation happened on the database but not at the
> application level. So we are planning to populate the created_by_user,
> created_timestamp columns by setting a default value of "current_timestamp"
> and "current_user" for the two columns,  but no such this is available to
> populate while we do the update of the row, so the only option seems to be
> through a trigger.
> 

If you can live with the fact that updated_by_user and updated_timestamp get 
the same values as created_by_user and created_timestamp when inserting the 
record, then you can do :

vv=> create table audit (created_by_user text default current_user, 
created_timestamp timestamp default now(), updated_by_user text default 
current_user, updated_timestamp timestamp default now(), data text);
CREATE TABLE
vv=> insert into audit (data) values ('abc');
INSERT 0 1
vv=> select * from audit;
 created_by_user | created_timestamp  | updated_by_user | 
updated_timestamp  | data 
-++-++--
 vincent | 2024-09-05 19:17:53.446109 | vincent | 2024-09-05 
19:17:53.446109 | abc
(1 row)

--as user postgres
update audit set updated_by_user = DEFAULT, updated_timestamp = DEFAULT, data = 
'def';

vv=> select * from audit;
 created_by_user | created_timestamp  | updated_by_user | 
updated_timestamp | data 
-++-+---+--
 vincent | 2024-09-05 19:17:53.446109 | postgres| 2024-09-05 
19:24:01.19186 | def
(1 row)



-- 
Bien à vous, Vincent Veyron 

https://marica.fr/ 
Logiciel de suivi des contentieux juridiques, des sinistres d'assurance et des 
contrats




Faster data load

2024-09-05 Thread Lok P
Hi,

We are having a requirement to create approx 50 billion rows in a partition
table(~1 billion rows per partition, 200+gb size daily partitions) for a
performance test. We are currently using ' insert into  select.. From  or ;' method . We have dropped all indexes and constraints
First and then doing the load. Still it's taking 2-3 hours to populate one
partition. Is there a faster way to achieve this?

Few teammate suggesting to use copy command and use file load instead,
which will be faster. So I wanted to understand, how different things it
does behind the scenes as compared to insert as select command? As because
it only deals with sql engine only.

Additionally, when we were trying to create indexes post data load on one
partition, it took 30+ minutes. Any possible way to make it faster?

Is there any way to drive the above things in parallel by utilizing full
database resources?

It's postgres 15.4

Regards
Lok


Re: Faster data load

2024-09-05 Thread Ron Johnson
On Thu, Sep 5, 2024 at 4:14 PM Lok P  wrote:

> Hi,
>
> We are having a requirement to create approx 50 billion rows in a
> partition table(~1 billion rows per partition, 200+gb size daily
> partitions) for a performance test. We are currently using ' insert into
>  select.. From  or  transformed query>;' method . We have dropped all indexes and constraints
> First and then doing the load. Still it's taking 2-3 hours to populate one
> partition.
>

At three hours, that's 92,593 records/second.  Seems pretty slow.

How much of that time is taken by ?
How big are the records?
How fast is the hardware?

Is there a faster way to achieve this?
>

Testing is the only way to know for sure.


> Few teammate suggesting to use copy command and use file load instead,
> which will be faster. So I wanted to understand, how different things it
> does behind the scenes as compared to insert as select command? As because
> it only deals with sql engine only.
>

COPY is highly optimized for buffered operation.  INSERT... maybe not so
much.

But if the source data is already in a table, that would require piping the
data to stdout and then back into the database.

psql appdb -c "COPY (SELECT ...) TO STDOUT;" | psql appdb -c "COPY
some_table FROM STDOUT;".  Use binary mode, so text conversion isn't
required.

Maybe that's faster, maybe not.

Additionally, when we were trying to create indexes post data load on one
> partition, it took 30+ minutes. Any possible way to make it faster?
>
> Is there any way to drive the above things in parallel by utilizing full
> database resources?
>

Put the destination tables in a different tablespace on a different
controller.


> It's postgres 15.4
>

Why not 15.8?

-- 
Death to America, and butter sauce.
Iraq lobster!


Re: Faster data load

2024-09-05 Thread Jeff Ross

On 9/5/24 14:14, Lok P wrote:


Hi,

We are having a requirement to create approx 50 billion rows in a 
partition table(~1 billion rows per partition, 200+gb size daily 
partitions) for a performance test. We are currently using ' insert 
into  select.. From  
or ;' method . We have dropped all indexes and 
constraints First and then doing the load. Still it's taking 2-3 hours 
to populate one partition. Is there a faster way to achieve this?


Few teammate suggesting to use copy command and use file load instead, 
which will be faster. So I wanted to understand, how different things 
it does behind the scenes as compared to insert as select command? As 
because it only deals with sql engine only.


Additionally, when we were trying to create indexes post data load on 
one partition, it took 30+ minutes. Any possible way to make it faster?


Is there any way to drive the above things in parallel by utilizing 
full database resources?


It's postgres 15.4

Regards
Lok


Try pg_bulkload to load the data--takes a little set up but it is very 
fast.  Do pay attention to the caveats.  For a performance test they 
probably won't be relevant.


https://github.com/ossc-db/pg_bulkload?tab=readme-ov-file

Jeff





Re: Faster data load

2024-09-05 Thread Muhammad Usman Khan
Hi,

You can use pg_partman. If your table is partitioned, you can manage
partitions in parallel by distributing the load across partitions
concurrently. Or you can use citus. It can be an excellent solution,
especially for handling large data volumes and parallelizing data operations


On Fri, 6 Sept 2024 at 01:14, Lok P  wrote:

> Hi,
>
> We are having a requirement to create approx 50 billion rows in a
> partition table(~1 billion rows per partition, 200+gb size daily
> partitions) for a performance test. We are currently using ' insert into
>  select.. From  or  transformed query>;' method . We have dropped all indexes and constraints
> First and then doing the load. Still it's taking 2-3 hours to populate one
> partition. Is there a faster way to achieve this?
>
> Few teammate suggesting to use copy command and use file load instead,
> which will be faster. So I wanted to understand, how different things it
> does behind the scenes as compared to insert as select command? As because
> it only deals with sql engine only.
>
> Additionally, when we were trying to create indexes post data load on one
> partition, it took 30+ minutes. Any possible way to make it faster?
>
> Is there any way to drive the above things in parallel by utilizing full
> database resources?
>
> It's postgres 15.4
>
> Regards
> Lok
>


Re: Faster data load

2024-09-05 Thread Lok P
On Fri, 6 Sept, 2024, 9:20 am Muhammad Usman Khan, 
wrote:

> Hi,
>
> You can use pg_partman. If your table is partitioned, you can manage
> partitions in parallel by distributing the load across partitions
> concurrently. Or you can use citus. It can be an excellent solution,
> especially for handling large data volumes and parallelizing data operations
>


Thank you.
The tables are partitioned. Also during index creation we are trying to do
it multiple partitions at same time from multiple sessions.But seeing out
of memory error in 5th or 6th session. And even each sessions taking 30mins
per partitions for index creation. Attach index partitions happening in
seconds though.

>
>


Re: Faster data load

2024-09-05 Thread Ron Johnson
On Fri, Sep 6, 2024 at 12:43 AM Lok P  wrote:

> Also during index creation we are trying to do it multiple partitions at
> same time from multiple sessions.But seeing out of memory error in 5th or
> 6th session.
>

Had that same problem during pg_restore.  Reduced maintenance_work_mem and
the problem went away.

-- 
Death to America, and butter sauce.
Iraq lobster!