date:20230730

Results of experiments with UUIDv7, UUIDv8

2023-07-30 Thread peter plachta

Hi all

My colleague and I did some experiments to see what effect using UUIDs as
2nd-ary indexes has on Index IO. The context is that by default ORM
frameworks will use UUIDs as index keys which I found as a major factor to
performance issues at Celonis. I suspect this isn't specific to Celonis.
The secondary factor is that random IO on Azure Single Server can be slow
as a dog -- thus for large enough indexes that aren't cached, and workloads
doing insert/delete at a high enough QPS, this really hurts.

We found that using UUID v7 (which has a longer time based prefix than v8)
gave 30% in IO savings in index access and roughly the same in index size
after I/D workload. v8 was ~24%. We simulated slow, random IO by running
this on a USB key which seemed to match Azure performance pretty well. SSD
was maybe 2x better.
This is relative to UUID v3 which is essentially random (actually, pretty
good random distribution on a 500Gb table).

This isn't as much as I expected, but, again for large indexes, slow IO, it
was significant.

 peter

Table copy with SERIALIZABLE is incredibly slow

2023-07-30 Thread peter plachta

Hi all

Background is we're trying a pg_repack-like functionality to compact a
500Gb/145Gb index (x2) table from which we deleted 80% rows. Offline is not
an option. The table has a moderate (let's say 100QPS) I/D workload running.

The typical procedure for this type of thing is basically CDC:

1. create 'log' table/create trigger
2. under SERIALIZABLE: select * from current_table insert into new_table

What we're finding is that for the 1st 30 mins the rate is 10Gb/s, then it
drops to 1Mb/s and stays there and 22 hours later the copy is still
going and now the log table is huge so we know the replay will also take a
very long time.

===

Q: what are some ways in which we could optimize the copy?

Btw this is Postgres 9.6

(we tried unlogged table (that did nothing), we tried creating indexes
after (that helped), we're experimenting with RRI)

Thanks!

Re: Table copy with SERIALIZABLE is incredibly slow

2023-07-30 Thread Laurenz Albe

On Sun, 2023-07-30 at 23:00 -0600, peter plachta wrote:
> Background is we're trying a pg_repack-like functionality to compact a 
> 500Gb/145Gb
> index (x2) table from which we deleted 80% rows. Offline is not an option. 
> The table
> has a moderate (let's say 100QPS) I/D workload running.
> 
> The typical procedure for this type of thing is basically CDC:
> 
> 1. create 'log' table/create trigger
> 2. under SERIALIZABLE: select * from current_table insert into new_table
> 
> What we're finding is that for the 1st 30 mins the rate is 10Gb/s, then it 
> drops to
> 1Mb/s and stays there and 22 hours later the copy is still going and now 
> the log
> table is huge so we know the replay will also take a very long time.
> 
> ===
> 
> Q: what are some ways in which we could optimize the copy?
> 
> Btw this is Postgres 9.6
> 
> (we tried unlogged table (that did nothing), we tried creating indexes after
> (that helped), we're experimenting with RRI)

Why are you doing this the hard way, when pg_squeeze or pg_repack could do it?

You definitely should not be using PostgreSQL 9.6 at this time.

Yours,
Laurenz Albe

Results of experiments with UUIDv7, UUIDv8

Table copy with SERIALIZABLE is incredibly slow

Re: Table copy with SERIALIZABLE is incredibly slow

3 matches

Site Navigation

Mail list logo

Footer information