Re: Sequence vs UUID
On Thu, Feb 2, 2023 at 11:47 AM veem v wrote: > Tested the UUIDv7 generator for postgres as below. > > With regards to performance , It's still way behind the sequence. I was > expecting the insert performance of UUID v7 to be closer to the sequence , > but it doesn't seem so, as it's 500ms vs 3000ms. And the generation takes a > lot longer time as compared to sequence too i.e. 59ms vs 1700ms. Read time > or the index scan looks close i.e. 2.3ms vs 2.6ms. > Thank you for taking the effort in testing and measuring this. Those numbers make some intuitive sense to me. The function is written in plpgsql, not C, and is dependent on generating a UUIDv4 and then modifying it to include the timestamp and version change. While I suspect it will never beat a bigint by virtue of 64-bits will always be half the size of 128-bit, the read time on the index scan after it is generated is encouraging with a strong suggestion there's a lot of low-hanging fruit for improvement. Also, like UUIDv4, v7 can be generated by clients, ameliorating the generation bottleneck. Once again, thank you for following up with good quality analysis.
Fwd: Sequence vs UUID
Copying the list... -- Forwarded message - From: Dominique Devienne Date: Fri, Feb 3, 2023 at 4:57 PM Subject: Re: Sequence vs UUID To: veem v On Thu, Feb 2, 2023 at 8:47 PM veem v wrote: > Tested the UUIDv7 generator for postgres as below. > With regards to performance , It's still way behind the sequence. [...] > explain analyze select count(nextval('myseq') ) from > generate_series(1,10); > Execution Time: 59.687 ms > > explain analyze select count(gen_random_uuid()) from > generate_series(1,100'000); > Execution Time: 904.868 ms > > explain analyze select count(uuid_generate_v7()) from > generate_series(1,10); > Execution Time: 1711.187 ms > Something's off regarding Guid generations IMHO... You generate 100K Guids in ~1s. While we generate (in C++, Windows Release, using Boost) 16M of them in +/- the same time: Enabling Performance tests >>> >> > generate 16'000'000 guids in 0.980s (user: 0.984s) 12 MB >>> >> generate 16'000'000 guids in parallel on 4 CPUs in 0.309s (user: 1.188s) >>> 12 MB >>> >> That's 2 orders of magnitude faster. Sure there's some overhead from the SQL, but still. Something seems fishy. And that's on a 2.5y old desktop. --DD
Re: Sequence vs UUID
Actually I did the testing by connecting to "https://dbfiddle.uk/"; postgres version -15. PostgreSQL 15.0 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10), 64-bit Am I doing it wrong, please confirm? On Fri, 3 Feb 2023 at 21:28, Dominique Devienne wrote: > Copying the list... > > -- Forwarded message - > From: Dominique Devienne > Date: Fri, Feb 3, 2023 at 4:57 PM > Subject: Re: Sequence vs UUID > To: veem v > > On Thu, Feb 2, 2023 at 8:47 PM veem v wrote: > >> Tested the UUIDv7 generator for postgres as below. >> With regards to performance , It's still way behind the sequence. [...] >> explain analyze select count(nextval('myseq') ) from >> generate_series(1,10); >> Execution Time: 59.687 ms >> >> explain analyze select count(gen_random_uuid()) from >> generate_series(1,100'000); >> Execution Time: 904.868 ms >> >> explain analyze select count(uuid_generate_v7()) from >> generate_series(1,10); >> Execution Time: 1711.187 ms >> > > Something's off regarding Guid generations IMHO... > > You generate 100K Guids in ~1s. While we generate (in C++, Windows > Release, using Boost) 16M of them in +/- the same time: > > Enabling Performance tests >>> > >> generate 16'000'000 guids in 0.980s (user: 0.984s) 12 MB >>> generate 16'000'000 guids in parallel on 4 CPUs in 0.309s (user: 1.188s) 12 MB >>> > That's 2 orders of magnitude faster. Sure there's some overhead from the > SQL, but still. Something seems fishy. > And that's on a 2.5y old desktop. --DD >
Re: Sequence vs UUID
On Fri, Feb 3, 2023 at 5:48 PM veem v wrote: > Actually I did the testing by connecting to "https://dbfiddle.uk/"; > postgres version -15. > > PostgreSQL 15.0 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 > 20210514 (Red Hat 8.5.0-10), 64-bit > > Am I doing it wrong, please confirm? > > No clue. But even SQLite can generate 1M uuid (16 bytes random blobs are equivalent) is 1/3 of the time, so 30x faster. So your timings of generating 100K uuids and counting them seems way too slow to me. --DD sqlite> select count(randomblob(16)) from generate_series(1,1000*1000); QUERY PLAN `--SCAN generate_series VIRTUAL TABLE INDEX 3: addr opcode p1p2p3p4 p5 comment - - -- - 0 Init 0 1500 Start at 15 1 Null 0 1 10 r[1..1]=NULL 2 VOpen 0 0 0 vtab:274D3E0 0 3 Integer1 4 00 r[4]=1 4 Multiply 6 6 50 r[5]=r[6]*r[6] 5 Integer3 2 00 r[2]=3 6 Integer2 3 00 r[3]=2 7 VFilter0 1120 iplan=r[2] zplan='' 8 Function 1 8 7 randomblob(1) 0 r[7]=func(r[8]) 9 AggStep0 7 1 count(1) 1 accum=r[1] step(r[7]) 10VNext 0 8 00 11AggFinal 1 1 0 count(1) 0 accum=r[1] N=1 12Copy 1 9 00 r[9]=r[1] 13ResultRow 9 1 00 output=r[9] 14Halt 0 0 00 15Transaction0 0 1 0 1 usesStmtJournal=0 16Integer1000 6 00 r[6]=1000 17Integer168 00 r[8]=16 18Goto 0 1 00 ┌───┐ │ count(randomblob(16)) │ ├───┤ │ 100 │ └───┘ Run Time: real 0.278 user 0.25 sys 0.00 > On Fri, 3 Feb 2023 at 21:28, Dominique Devienne > wrote: > >> Something's off regarding Guid generations IMHO... >> You generate 100K Guids in ~1s. While we generate (in C++, Windows >> Release, using Boost) 16M of them in +/- the same time: >> >
Switching identity column to serial
I was wondering if it's possible to drop a column identity (not the column itself) while keeping the attached sequence. This would avoid recreating an identical sequence (especially with a correct start value and owner). Changing the sequence owner to NONE before dropping identity is not allowed. Also changing pg_class.relowner to some role did not help. The sequence is still dropped together with the column identity. But I managed it by clearing pg_attribute.attidentity. See the following psql session: test=# create table t (id int generated always as identity, x char); CREATE TABLE test=# insert into t (x) values ('a'), ('b') returning *; id | x +--- 1 | a 2 | b (2 rows) INSERT 0 2 test=# select pg_get_serial_sequence('t', 'id'); pg_get_serial_sequence public.t_id_seq (1 row) test=# update pg_attribute set attidentity = '' where (attrelid, attname) = ('t'::regclass, 'id'); UPDATE 1 test=# alter table t alter id drop identity; psql:1.sql:6: ERROR: column "id" of relation "t" is not an identity column test=# select pg_get_serial_sequence('t', 'id'); pg_get_serial_sequence public.t_id_seq (1 row) test=# alter table t alter id set default nextval('t_id_seq'); ALTER TABLE test=# insert into t (x) values ('c'), ('d') returning *; id | x +--- 3 | c 4 | d (2 rows) INSERT 0 2 test=# insert into t (id, x) values (-1, 'e') returning *; id | x +--- -1 | e (1 row) INSERT 0 1 test=# select * from t; id | x +--- 1 | a 2 | b 3 | c 4 | d -1 | e (5 rows) Is this sufficient or am I missing some detail and messing around with pg_catalog is not enough (in addition to being risky)? Some context: I have to change identity columns to a form that resembles a definition as serial. Creating a new column and migrating the primary key constraint is not an option. Why is this change necessary? My team is importing data with QGIS which fails to properly handle identity columns. QGIS uses INSERT with OVERRIDING SYSTEM VALUE but tries to insert NULL although identity columns imply NOT NULL (also it's the primary key). QGIS tries to generate an ID with nextval but does not use the qualified sequence name although search_path does not contain the namespace. It's weird that QGIS thinks that it should generate the ID instead of delegating this to the database, yet it uses RETURNING id. Maybe it needs the ID in advance for reference. I don't know. The "serial" style with nextval as column default works as expected. Probably because QGIS just uses the column default expression which should reference the correct sequence. Oh, did I mention yet that QGIS generates the ID before issuing an INSERT with RETURNING id? I'll still open a bug ticket with QGIS but right now there's no other way than ditching identity columns. -- Erik
Re: Switching identity column to serial
> On 04/02/2023 01:54 CET Erik Wienhold wrote: > > I was wondering if it's possible to drop a column identity (not the column > itself) while keeping the attached sequence. This would avoid recreating > an identical sequence (especially with a correct start value and owner). > > Changing the sequence owner to NONE before dropping identity is not allowed. > Also changing pg_class.relowner to some role did not help. The sequence is > still dropped together with the column identity. > > But I managed it by clearing pg_attribute.attidentity. See the following > psql session: Forgot to mention: tested on 12.13 and 15.1. > > test=# create table t (id int generated always as identity, x char); > CREATE TABLE > > test=# insert into t (x) values ('a'), ('b') returning *; >id | x > +--- > 1 | a > 2 | b > (2 rows) > > INSERT 0 2 > > test=# select pg_get_serial_sequence('t', 'id'); >pg_get_serial_sequence > >public.t_id_seq > (1 row) > > test=# update pg_attribute set attidentity = '' where (attrelid, > attname) = ('t'::regclass, 'id'); > UPDATE 1 > > test=# alter table t alter id drop identity; > psql:1.sql:6: ERROR: column "id" of relation "t" is not an identity > column > > test=# select pg_get_serial_sequence('t', 'id'); >pg_get_serial_sequence > >public.t_id_seq > (1 row) > > test=# alter table t alter id set default nextval('t_id_seq'); > ALTER TABLE > > test=# insert into t (x) values ('c'), ('d') returning *; >id | x > +--- > 3 | c > 4 | d > (2 rows) > > INSERT 0 2 > > test=# insert into t (id, x) values (-1, 'e') returning *; >id | x > +--- >-1 | e > (1 row) > > INSERT 0 1 > > test=# select * from t; >id | x > +--- > 1 | a > 2 | b > 3 | c > 4 | d >-1 | e > (5 rows) > > Is this sufficient or am I missing some detail and messing around with > pg_catalog is not enough (in addition to being risky)? > > Some context: > > I have to change identity columns to a form that resembles a definition as > serial. Creating a new column and migrating the primary key constraint is > not an option. > > Why is this change necessary? > > My team is importing data with QGIS which fails to properly handle identity > columns. QGIS uses INSERT with OVERRIDING SYSTEM VALUE but tries to insert > NULL although identity columns imply NOT NULL (also it's the primary key). > QGIS tries to generate an ID with nextval but does not use the qualified > sequence name although search_path does not contain the namespace. It's > weird that QGIS thinks that it should generate the ID instead of delegating > this to the database, yet it uses RETURNING id. Maybe it needs the ID in > advance for reference. I don't know. > > The "serial" style with nextval as column default works as expected. > Probably because QGIS just uses the column default expression which should > reference the correct sequence. Oh, did I mention yet that QGIS generates > the ID before issuing an INSERT with RETURNING id? > > I'll still open a bug ticket with QGIS but right now there's no other way > than ditching identity columns. > > -- > Erik
Re: Switching identity column to serial
On 2/3/23 18:54, Erik Wienhold wrote: I was wondering if it's possible to drop a column identity (not the column itself) while keeping the attached sequence. This would avoid recreating an identical sequence (especially with a correct start value and owner). Why doesn't this work? BEGIN; DROP SEQUENCE t_id; CREATE SEQUENCE new_t_id_seq AS INTEGER OWNED BY t.id; ALTER SEQUENCE new_t_id_seq OWNER TO new_owner; SELECT setval('new_t_id', (SELECT MAX(id) FROM t)); SELECT nextval('new_t_id'); COMMIT; Changing the sequence owner to NONE before dropping identity is not allowed. Also changing pg_class.relowner to some role did not help. The sequence is still dropped together with the column identity. Manually diigging around the system catalog is /never/ recommended. -- Born in Arizona, moved to Babylonia.
Re: Switching identity column to serial
On 2/3/23 22:41, Ron wrote: On 2/3/23 18:54, Erik Wienhold wrote: I was wondering if it's possible to drop a column identity (not the column itself) while keeping the attached sequence. This would avoid recreating an identical sequence (especially with a correct start value and owner). Why doesn't this work? BEGIN; DROP SEQUENCE t_id; CREATE SEQUENCE new_t_id_seq AS INTEGER OWNED BY t.id; ALTER SEQUENCE new_t_id_seq OWNER TO new_owner; SELECT setval('new_t_id', (SELECT MAX(id) FROM t)); SELECT nextval('new_t_id'); COMMIT; Note: this was not tested... -- Born in Arizona, moved to Babylonia.