Results of experiments with UUIDv7, UUIDv8
Hi all My colleague and I did some experiments to see what effect using UUIDs as 2nd-ary indexes has on Index IO. The context is that by default ORM frameworks will use UUIDs as index keys which I found as a major factor to performance issues at Celonis. I suspect this isn't specific to Celonis. The secondary factor is that random IO on Azure Single Server can be slow as a dog -- thus for large enough indexes that aren't cached, and workloads doing insert/delete at a high enough QPS, this really hurts. We found that using UUID v7 (which has a longer time based prefix than v8) gave 30% in IO savings in index access and roughly the same in index size after I/D workload. v8 was ~24%. We simulated slow, random IO by running this on a USB key which seemed to match Azure performance pretty well. SSD was maybe 2x better. This is relative to UUID v3 which is essentially random (actually, pretty good random distribution on a 500Gb table). This isn't as much as I expected, but, again for large indexes, slow IO, it was significant. peter
Table copy with SERIALIZABLE is incredibly slow
Hi all Background is we're trying a pg_repack-like functionality to compact a 500Gb/145Gb index (x2) table from which we deleted 80% rows. Offline is not an option. The table has a moderate (let's say 100QPS) I/D workload running. The typical procedure for this type of thing is basically CDC: 1. create 'log' table/create trigger 2. under SERIALIZABLE: select * from current_table insert into new_table What we're finding is that for the 1st 30 mins the rate is 10Gb/s, then it drops to 1Mb/s and stays there and 22 hours later the copy is still going and now the log table is huge so we know the replay will also take a very long time. === Q: what are some ways in which we could optimize the copy? Btw this is Postgres 9.6 (we tried unlogged table (that did nothing), we tried creating indexes after (that helped), we're experimenting with RRI) Thanks!
Re: Table copy with SERIALIZABLE is incredibly slow
On Sun, 2023-07-30 at 23:00 -0600, peter plachta wrote: > Background is we're trying a pg_repack-like functionality to compact a > 500Gb/145Gb > index (x2) table from which we deleted 80% rows. Offline is not an option. > The table > has a moderate (let's say 100QPS) I/D workload running. > > The typical procedure for this type of thing is basically CDC: > > 1. create 'log' table/create trigger > 2. under SERIALIZABLE: select * from current_table insert into new_table > > What we're finding is that for the 1st 30 mins the rate is 10Gb/s, then it > drops to > 1Mb/s and stays there and 22 hours later the copy is still going and now > the log > table is huge so we know the replay will also take a very long time. > > === > > Q: what are some ways in which we could optimize the copy? > > Btw this is Postgres 9.6 > > (we tried unlogged table (that did nothing), we tried creating indexes after > (that helped), we're experimenting with RRI) Why are you doing this the hard way, when pg_squeeze or pg_repack could do it? You definitely should not be using PostgreSQL 9.6 at this time. Yours, Laurenz Albe
