Custom opclass for column statistics?
Hi, I've been wondering whether it is possible somehow to have the standard column statistics to respect a certain operator class? The reason why I am asking for this is that I have a UUID column with a unique index at it using a custom operator class which implies a different sort order than for the default UUID operator class. This results into planner mistakes when determining whether to use the index for row selection or not. Too often it falls back into sequential scan due to this. Thanx and Cheers, Ancoron
Re: Custom opclass for column statistics?
On Sat, Jul 06, 2019 at 11:02:27AM +0200, Ancoron Luciferis wrote: Hi, I've been wondering whether it is possible somehow to have the standard column statistics to respect a certain operator class? The reason why I am asking for this is that I have a UUID column with a unique index at it using a custom operator class which implies a different sort order than for the default UUID operator class. This results into planner mistakes when determining whether to use the index for row selection or not. Too often it falls back into sequential scan due to this. Can you share an example demonstrating the issue? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Custom opclass for column statistics?
On 06/07/2019 15:38, Tomas Vondra wrote: > On Sat, Jul 06, 2019 at 11:02:27AM +0200, Ancoron Luciferis wrote: >> Hi, >> >> I've been wondering whether it is possible somehow to have the standard >> column statistics to respect a certain operator class? >> >> The reason why I am asking for this is that I have a UUID column with a >> unique index at it using a custom operator class which implies a >> different sort order than for the default UUID operator class. >> >> This results into planner mistakes when determining whether to use the >> index for row selection or not. Too often it falls back into sequential >> scan due to this. >> > > Can you share an example demonstrating the issue? > > > regards > Yes, I have an opclass as follows: CREATE OPERATOR CLASS uuid_timestamp_ops FOR TYPE uuid USING btree AS OPERATOR1 <*, OPERATOR1 <~ (uuid, timestamp with time zone), OPERATOR2 <=*, OPERATOR2 <=~ (uuid, timestamp with time zone), OPERATOR3 =, OPERATOR3 =~ (uuid, timestamp with time zone), OPERATOR4 >=*, OPERATOR4 >=~ (uuid, timestamp with time zone), OPERATOR5 >*, OPERATOR5 >~ (uuid, timestamp with time zone), FUNCTION1 uuid_timestamp_cmp(uuid, uuid), FUNCTION1 uuid_timestamp_only_cmp(uuid, timestamp with time zone), FUNCTION2 uuid_timestamp_sortsupport(internal) ; ...and e.g. operator "<*" is defined as: CREATE FUNCTION uuid_timestamp_lt(uuid, uuid) RETURNS bool AS 'MODULE_PATHNAME', 'uuid_timestamp_lt' LANGUAGE C IMMUTABLE LEAKPROOF STRICT PARALLEL SAFE; COMMENT ON FUNCTION uuid_timestamp_lt(uuid, uuid) IS 'lower than'; CREATE OPERATOR <* ( LEFTARG = uuid, RIGHTARG = uuid, PROCEDURE = uuid_timestamp_lt, COMMUTATOR = '>*', NEGATOR = '>=*', RESTRICT = scalarltsel, JOIN = scalarltjoinsel ); The function "uuid_timestamp_lt" is basically defined as follows: 1. if not version 1 UUID fallback to standard uuid compare 2. extract timestamp values and compare 3. if equal timestamps fallback to standard uuid compare ...so that a chronological order is established. The test table is created as follows: CREATE TABLE uuid_v1_ext (id uuid); CREATE UNIQUE INDEX idx_uuid_v1_ext ON uuid_v1_ext (id uuid_timestamp_ops); The values for "histogram_bounds" of the test table look like this (due to the default sort order for standard type UUID): 3789-97bf-11e9-b6bb-e03f49f7f733 008b88f8-6deb-11e9-901a-e03f4947f477 010a8b22-586a-11e9-8258-e03f49ce78f3 ... 6f682e68-978d-11e9-901a-e03f4947f477 6ff412ee-926f-11e9-901a-e03f4947f477 7079ffe2-642f-11e9-b0cc-e03f49e7fd3b 70ffaeca-4645-11e9-adf9-e03f494677fb ... fef26b41-9b9d-11e9-b0cc-e03f49e7fd3b ff779ce8-9e52-11e9-8258-e03f49ce78f3 6bfc-4de4-11e9-b0d4-e03f49d6f6bf ...and I think that's where the planner gets the decision for a query such as: DELETE FROM uuid_v1_ext WHERE id <* '4bdf6f81-56ad-11e9-8258-e03f49ce78f3'; ...which then get's executed as sequential scan instead of an index scan. I was also thinking about changing the selectivity function used by the custom operator, but I didn't find any hints how to implement that without duplicating a lot of internal code. Cheers, Ancoron
Re: Custom opclass for column statistics?
On Sat, Jul 06, 2019 at 05:35:33PM +0200, Ancoron Luciferis wrote: On 06/07/2019 15:38, Tomas Vondra wrote: On Sat, Jul 06, 2019 at 11:02:27AM +0200, Ancoron Luciferis wrote: Hi, I've been wondering whether it is possible somehow to have the standard column statistics to respect a certain operator class? The reason why I am asking for this is that I have a UUID column with a unique index at it using a custom operator class which implies a different sort order than for the default UUID operator class. This results into planner mistakes when determining whether to use the index for row selection or not. Too often it falls back into sequential scan due to this. Can you share an example demonstrating the issue? regards Yes, I have an opclass as follows: CREATE OPERATOR CLASS uuid_timestamp_ops FOR TYPE uuid USING btree AS OPERATOR1 <*, OPERATOR1 <~ (uuid, timestamp with time zone), OPERATOR2 <=*, OPERATOR2 <=~ (uuid, timestamp with time zone), OPERATOR3 =, OPERATOR3 =~ (uuid, timestamp with time zone), OPERATOR4 >=*, OPERATOR4 >=~ (uuid, timestamp with time zone), OPERATOR5 >*, OPERATOR5 >~ (uuid, timestamp with time zone), FUNCTION1 uuid_timestamp_cmp(uuid, uuid), FUNCTION1 uuid_timestamp_only_cmp(uuid, timestamp with time zone), FUNCTION2 uuid_timestamp_sortsupport(internal) ; ...and e.g. operator "<*" is defined as: CREATE FUNCTION uuid_timestamp_lt(uuid, uuid) RETURNS bool AS 'MODULE_PATHNAME', 'uuid_timestamp_lt' LANGUAGE C IMMUTABLE LEAKPROOF STRICT PARALLEL SAFE; COMMENT ON FUNCTION uuid_timestamp_lt(uuid, uuid) IS 'lower than'; CREATE OPERATOR <* ( LEFTARG = uuid, RIGHTARG = uuid, PROCEDURE = uuid_timestamp_lt, COMMUTATOR = '>*', NEGATOR = '>=*', RESTRICT = scalarltsel, JOIN = scalarltjoinsel ); The function "uuid_timestamp_lt" is basically defined as follows: 1. if not version 1 UUID fallback to standard uuid compare 2. extract timestamp values and compare 3. if equal timestamps fallback to standard uuid compare ...so that a chronological order is established. The test table is created as follows: CREATE TABLE uuid_v1_ext (id uuid); CREATE UNIQUE INDEX idx_uuid_v1_ext ON uuid_v1_ext (id uuid_timestamp_ops); The values for "histogram_bounds" of the test table look like this (due to the default sort order for standard type UUID): 3789-97bf-11e9-b6bb-e03f49f7f733 008b88f8-6deb-11e9-901a-e03f4947f477 010a8b22-586a-11e9-8258-e03f49ce78f3 ... 6f682e68-978d-11e9-901a-e03f4947f477 6ff412ee-926f-11e9-901a-e03f4947f477 7079ffe2-642f-11e9-b0cc-e03f49e7fd3b 70ffaeca-4645-11e9-adf9-e03f494677fb ... fef26b41-9b9d-11e9-b0cc-e03f49e7fd3b ff779ce8-9e52-11e9-8258-e03f49ce78f3 6bfc-4de4-11e9-b0d4-e03f49d6f6bf ...and I think that's where the planner gets the decision for a query such as: DELETE FROM uuid_v1_ext WHERE id <* '4bdf6f81-56ad-11e9-8258-e03f49ce78f3'; ...which then get's executed as sequential scan instead of an index scan. I was also thinking about changing the selectivity function used by the custom operator, but I didn't find any hints how to implement that without duplicating a lot of internal code. Not sure, I'm not very familiar with this code, so I'd have to play with it and try things. But that's hard when I don't have any code. Would it be possible to share a small self-contained test case? I wonder what does uuid_timestamp_cmp do? I suppose it first compares by a timestamp extracted from the UUID, right? It'd be interesting to see (a) statistics for the column from pg_stats, both for the table and index (which should have been built using the custom opclass, I think). (b) EXPLAIN ANALYZE for queries with your opclass, and perhaps with the default one (that can't use the timestamp condition, but it should be possible to generate smallers/largest uuid for a timestamp). BTW which PostgreSQL version is this? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Custom opclass for column statistics?
On 06/07/2019 17:57, Tomas Vondra wrote: > On Sat, Jul 06, 2019 at 05:35:33PM +0200, Ancoron Luciferis wrote: >> On 06/07/2019 15:38, Tomas Vondra wrote: >>> On Sat, Jul 06, 2019 at 11:02:27AM +0200, Ancoron Luciferis wrote: Hi, I've been wondering whether it is possible somehow to have the standard column statistics to respect a certain operator class? The reason why I am asking for this is that I have a UUID column with a unique index at it using a custom operator class which implies a different sort order than for the default UUID operator class. This results into planner mistakes when determining whether to use the index for row selection or not. Too often it falls back into sequential scan due to this. >>> >>> Can you share an example demonstrating the issue? >>> >>> >>> regards >>> >> >> Yes, I have an opclass as follows: >> >> CREATE OPERATOR CLASS uuid_timestamp_ops FOR TYPE uuid >> USING btree AS >> OPERATOR 1 <*, >> OPERATOR 1 <~ (uuid, timestamp with time zone), >> OPERATOR 2 <=*, >> OPERATOR 2 <=~ (uuid, timestamp with time zone), >> OPERATOR 3 =, >> OPERATOR 3 =~ (uuid, timestamp with time zone), >> OPERATOR 4 >=*, >> OPERATOR 4 >=~ (uuid, timestamp with time zone), >> OPERATOR 5 >*, >> OPERATOR 5 >~ (uuid, timestamp with time zone), >> FUNCTION 1 uuid_timestamp_cmp(uuid, uuid), >> FUNCTION 1 uuid_timestamp_only_cmp(uuid, timestamp >> with time zone), >> FUNCTION 2 uuid_timestamp_sortsupport(internal) >> ; >> >> ...and e.g. operator "<*" is defined as: >> >> CREATE FUNCTION uuid_timestamp_lt(uuid, uuid) >> RETURNS bool >> AS 'MODULE_PATHNAME', 'uuid_timestamp_lt' >> LANGUAGE C >> IMMUTABLE >> LEAKPROOF >> STRICT >> PARALLEL SAFE; >> >> COMMENT ON FUNCTION uuid_timestamp_lt(uuid, uuid) IS 'lower than'; >> >> CREATE OPERATOR <* ( >> LEFTARG = uuid, >> RIGHTARG = uuid, >> PROCEDURE = uuid_timestamp_lt, >> COMMUTATOR = '>*', >> NEGATOR = '>=*', >> RESTRICT = scalarltsel, >> JOIN = scalarltjoinsel >> ); >> >> >> The function "uuid_timestamp_lt" is basically defined as follows: >> 1. if not version 1 UUID fallback to standard uuid compare >> 2. extract timestamp values and compare >> 3. if equal timestamps fallback to standard uuid compare >> >> ...so that a chronological order is established. >> >> >> The test table is created as follows: >> >> CREATE TABLE uuid_v1_ext (id uuid); >> CREATE UNIQUE INDEX idx_uuid_v1_ext ON uuid_v1_ext (id >> uuid_timestamp_ops); >> >> >> The values for "histogram_bounds" of the test table look like this (due >> to the default sort order for standard type UUID): >> >> 3789-97bf-11e9-b6bb-e03f49f7f733 >> 008b88f8-6deb-11e9-901a-e03f4947f477 >> 010a8b22-586a-11e9-8258-e03f49ce78f3 >> ... >> 6f682e68-978d-11e9-901a-e03f4947f477 >> 6ff412ee-926f-11e9-901a-e03f4947f477 >> 7079ffe2-642f-11e9-b0cc-e03f49e7fd3b >> 70ffaeca-4645-11e9-adf9-e03f494677fb >> ... >> fef26b41-9b9d-11e9-b0cc-e03f49e7fd3b >> ff779ce8-9e52-11e9-8258-e03f49ce78f3 >> 6bfc-4de4-11e9-b0d4-e03f49d6f6bf >> >> ...and I think that's where the planner gets the decision for a query >> such as: >> >> DELETE FROM uuid_v1_ext WHERE id <* >> '4bdf6f81-56ad-11e9-8258-e03f49ce78f3'; >> >> ...which then get's executed as sequential scan instead of an index scan. >> >> I was also thinking about changing the selectivity function used by the >> custom operator, but I didn't find any hints how to implement that >> without duplicating a lot of internal code. >> > > Not sure, I'm not very familiar with this code, so I'd have to play with > it and try things. But that's hard when I don't have any code. Would it > be possible to share a small self-contained test case? My test uses historically generated UUID's and is currently not automated (generates ~120M UUID's over a period of ~4 months). The tool I am using to generate the UUID's is a Java tool, so not very automation friendly (using Maven, which downloads a lot at first build time). The resulting data is ~4 GiB in size, but here is a guide I use for my manual testing: https://gist.github.com/ancoron/b08ac4b1ceafda2a38ff12030c011385 Please note that my testing involves 4 SSD's (to avoid too much mixed I/O but not for functionality): 1.) system + WAL 2.) generated UUID files (for reading) 3.) table data (tablespace "fast") 4.) index data (tablespace "faster") > > I wonder what does uuid_timestamp_cmp do? I suppose it first compares by > a timestamp extracted from the UUID, right? exactly: https://github.com/ancoron/pg-uuid-ext/blob/master/uuid_ext.c#L234 > > It'd be interesting to see > > (a) statistics for the column from pg_stats, both for the table and > index (which should have been built using the custom
