Re: Indexes on expressions with multiple columns and operators
=?UTF-8?Q?Fr=C3=A9d=C3=A9ric_Yhuel?= writes: > Hello, in the following, I don't understand why: > 1) the expression index isn't used in the first EXPLAIN The planner doesn't look for multi-clause matches of that sort. You could apply a little ju-jitsu perhaps: regression=# EXPLAIN (ANALYZE, SUMMARY OFF, BUFFERS OFF) SELECT * FROM foo WHERE (ackid IS NULL AND crit = 'WARNING') is true; QUERY PLAN -- Index Scan using foo_expr_idx on foo (cost=0.29..8.39 rows=5 width=17) (actual time=0.013..0.016 rows=5.00 loops=1) Index Cond: (((ackid IS NULL) AND (crit = 'WARNING'::text)) = true) Index Searches: 1 (3 rows) but my own tendency would be to use a partial index rather than a boolean-valued index: regression=# CREATE INDEX foo_partial_idx ON foo (id) WHERE ackid IS NULL AND crit = 'WARNING'; CREATE INDEX regression=# EXPLAIN (ANALYZE, SUMMARY OFF, BUFFERS OFF) SELECT * FROM foo WHERE ackid IS NULL AND crit = 'WARNING'; QUERY PLAN - Index Scan using foo_partial_idx on foo (cost=0.13..107.18 rows=990 width=17) (actual time=0.010..0.014 rows=5.00 loops=1) Index Searches: 1 (2 rows) The advantage of a partial index is you might be able to have the index entries themselves carry some other column(s), allowing more queries to be made into index-only scans. I put "id" here, which might or might not be of any use in this specific toy example. > 2) the number of estimated rows is completely off in the second EXPLAIN, > whereas the planner could easily use the statistics of foo_f_idx. Hmm, not sure about that. Again, boolean-valued indexes aren't something we've worked on too hard, but I don't see why that would affect this case. regards, tom lane
Re: Indexes on expressions with multiple columns and operators
Thank you Laurenz and Tom! I'm going to quote Tom's email here: On 9/17/25 16:41, Tom Lane wrote: =?UTF-8?Q?Fr=C3=A9d=C3=A9ric_Yhuel?= writes: Hello, in the following, I don't understand why: 1) the expression index isn't used in the first EXPLAIN The planner doesn't look for multi-clause matches of that sort. You could apply a little ju-jitsu perhaps: regression=# EXPLAIN (ANALYZE, SUMMARY OFF, BUFFERS OFF) SELECT * FROM foo WHERE (ackid IS NULL AND crit = 'WARNING') is true; QUERY PLAN -- Index Scan using foo_expr_idx on foo (cost=0.29..8.39 rows=5 width=17) (actual time=0.013..0.016 rows=5.00 loops=1) Index Cond: (((ackid IS NULL) AND (crit = 'WARNING'::text)) = true) Index Searches: 1 (3 rows) Thanks, it works well indeed. but my own tendency would be to use a partial index rather than a boolean-valued index: regression=# CREATE INDEX foo_partial_idx ON foo (id) WHERE ackid IS NULL AND crit = 'WARNING'; CREATE INDEX regression=# EXPLAIN (ANALYZE, SUMMARY OFF, BUFFERS OFF) SELECT * FROM foo WHERE ackid IS NULL AND crit = 'WARNING'; QUERY PLAN - Index Scan using foo_partial_idx on foo (cost=0.13..107.18 rows=990 width=17) (actual time=0.010..0.014 rows=5.00 loops=1) Index Searches: 1 (2 rows) The advantage of a partial index is you might be able to have the index entries themselves carry some other column(s), allowing more queries to be made into index-only scans. I put "id" here, which might or might not be of any use in this specific toy example. Yes, Laurenz made a similar suggestion, but the problem is that I'm mostly interested in the estimated number of output rows... because in the real query, there's a very bad Hash Join above (the Nested Loop is *much* faster). 2) the number of estimated rows is completely off in the second EXPLAIN, whereas the planner could easily use the statistics of foo_f_idx. Hmm, not sure about that. Again, boolean-valued indexes aren't something we've worked on too hard, but I don't see why that would affect this case. OK, thanks anyway, I think the ju-jitsu mentioned above will do, even though the application code will have to be patched.
Re: Indexes on expressions with multiple columns and operators
On 9/17/25 16:57, Frédéric Yhuel wrote:
Yes, Laurenz made a similar suggestion, but the problem is that I'm
mostly interested in the estimated number of output rows... because in
the real query, there's a very bad Hash Join above (the Nested Loop is
*much* faster).
BTW, I've also tested another solution: partitioning on 'criticity',
with one partition for 'INFO' (99% of the rows), and another one for the
other values ('WARNING' and 'ALARM' in the real case). The statistics
are much better... however an expression index would be an easier fix.
Multivariate MCV statistics don't work well in the real case (100M lines
in the table).
Indexes on expressions with multiple columns and operators
Hello, in the following, I don't understand why:
1) the expression index isn't used in the first EXPLAIN
2) the number of estimated rows is completely off in the second EXPLAIN,
whereas the planner could easily use the statistics of foo_f_idx.
(SQL script attached, tested with master and v17)
DROP TABLE IF EXISTS foo;
CREATE UNLOGGED TABLE foo (id bigint, ackid int, crit text);
ALTER TABLE foo ALTER COLUMN crit SET statistics 400;
INSERT INTO foo SELECT i, NULL, CASE WHEN i%100=1 THEN 'WARNING' ELSE
'INFO' END FROM generate_series(1,10) AS T(i);
UPDATE foo SET ackid = random()*1 WHERE id%100=1 AND id > 500 ;
CREATE INDEX foo_expr_idx ON foo ((ackid IS NULL AND crit = 'WARNING'));
ANALYZE foo ;
EXPLAIN (ANALYZE, SUMMARY OFF, BUFFERS OFF) SELECT * FROM foo WHERE
ackid IS NULL AND crit = 'WARNING';
QUERY PLAN
---
Seq Scan on foo (cost=0.00..1797.00 rows=990 width=17) (actual
time=0.012..23.932 rows=5.00 loops=1)
Filter: ((ackid IS NULL) AND (crit = 'WARNING'::text))
Rows Removed by Filter: 5
(3 rows)
CREATE OR REPLACE
FUNCTION f(crit text, ackid int)
RETURNS bool
LANGUAGE plpgsql
IMMUTABLE AS $$
BEGIN RETURN crit = 'WARNING' AND ackid IS NULL; END
$$;
CREATE INDEX foo_f_idx ON foo (f(crit, ackid));
ANALYZE foo ;
EXPLAIN (ANALYZE, SUMMARY OFF, BUFFERS OFF) SELECT * FROM foo WHERE
f(crit, ackid);
QUERY PLAN
---
Index Scan using foo_f_idx on foo (cost=0.29..8.39 rows=3
width=17) (actual time=0.021..0.028 rows=5.00 loops=1)
Index Cond: (f(crit, ackid) = true)
Index Searches: 1
(3 rows)
SELECT tablename, most_common_vals, most_common_freqs FROM pg_stats
WHERE tablename like 'foo_%';
tablename | most_common_vals | most_common_freqs
--+--+---
foo_expr_idx | {f,t}| {0.5,5e-05}
foo_f_idx| {f,t}| {0.5,5e-05}
SELECT 10 * 5e-05 AS the_row_estimate_that_the_planner_should_use;
the_row_estimate_that_the_planner_should_use
-
5.0
Best regards,
Frédéric
expr_index.sql
Description: application/sql
Why isn't PG using an index-only scan?
Hello, I have this very simple query: INSERT INTO copyrightad (idoeu, idad, role, mechowned, perfowned, iscontrolled) SELECT o.idoeu, c.idad, LEFT(sipa_capacity1,3), sipa_mech_owned, sipa_perf_owned, sipa_controlled='Y' FROM imaestro.msipfl ip JOIN oeu o ON o.imworkid=ip.sipa_song_code JOIN ad c ON c.imcompid=ip.sipa_ip_code WHERE ip.sipa_comp_or_publ='C'; And here are the number of elements in each table: imaestro.msipfl: 1550019 oeu: 1587533 ad: 304986 The explain plan is saying this: QUERY PLAN -- Insert on copyrightad (cost=613790.59..4448045.97 rows=0 width=0) -> Merge Join (cost=613790.59..4448045.97 rows=84972138 width=328) Merge Cond: (((c.imcompid)::numeric) = ip.sipa_ip_code) -> Sort (cost=35712.97..36475.44 rows=304986 width=8) Sort Key: ((c.imcompid)::numeric) -> Index Only Scan using ix_ad_imcompid on ad c (cost=0.42..7931.21 rows=304986 width=8) -> Materialize (cost=578077.61..594521.17 rows=3288712 width=23) -> Sort (cost=578077.61..586299.39 rows=3288712 width=23) Sort Key: ip.sipa_ip_code -> Hash Join (cost=56043.04..154644.48 rows=3288712 width=23) Hash Cond: ((o.imworkid)::numeric = ip.sipa_song_code) -> Seq Scan on oeu o (cost=0.00..41901.33 rows=1587533 width=8) -> Hash (cost=48542.24..48542.24 rows=600064 width=25) -> Seq Scan on msipfl ip (cost=0.00..48542.24 rows=600064 width=25) Filter: ((sipa_comp_or_publ)::text = 'C'::text) Table ad has this index: "ix_ad_imcompid" btree (imcompid, idad) Table oeu has this one: "ix_oeu_imcompid" btree (imworkid, idoeu) idad and idoeu are both primary keys of, respectively, ad and oeu. The resultset should be 591615 rows because: select sipa_comp_or_publ, count(*) from imaestro.msipfl group by 1; sipa_comp_or_publ | count ---+ C | 591615 P | 958404 Is it normal that PG is doing a seq scan on table oeu but an index-only scan on ad? I had to stop the query after 5 hours, how can I make this faster? Of course I ran VACUUM ANALYZE. These are the memory settings I have, but I have plenty of unused RAM. Should I bump something up? shared_buffers = 16GB # min 128kB #huge_pages = try # on, off, or try #huge_page_size = 0 # zero for system default #temp_buffers = 8MB # min 800kB #max_prepared_transactions = 0 # zero disables the feature work_mem = 128MB # min 64kB $ free -h total used free shared buff/cache available Mem: 125Gi 18Gi 3.1Gi 15Gi 121Gi 106Gi Swap: 4.0Gi 2.0Gi 2.0Gi Thanks for your help, JC
Re: Poor row estimates from planner, stat `most_common_elems` sometimes missing for a text[] column
Matt Long writes: > Not to let perfect be the enemy of better, but we're facing a variant of > this issue that would not be addressed by the proposed patch. > ... > In this case, the effects of the proposed patch are not applied since the > most_common_elems array is not empty. I'm not a statistician, so maybe this > wouldn't be valid, but it seems like using the highest frequency of an > element that did not qualify for the mce list instead of the 0.5% default > frequency could be an elegant, but more invasive solution. Yeah, I think you are quite right: we can apply this idea not only when the MCE list is empty, but whenever we didn't have to truncate the MCE list. In that case we know there are no additional element values that exceed the cutoff frequency, and that's what the selectivity functions want to know. Nosing around in the code that uses STATISTIC_KIND_MCELEM entries, I spotted two additional issues that the attached v2 patch addresses: * ts_typanalyze/ts_selfuncs have code essentially identical to the array case, and should receive the same treatment. * The selectivity functions believe that the upper bound on the frequency of non-MCEs is minfreq / 2, not the stored minfreq. This seems like complete brain fade: there could easily be elements with frequency just less than minfreq, and probably are if the data distribution follows Zipf's law. I did not dig into the git history, but I wonder if the divide-by-two business predates the introduction of the lossy-counting algorithm, and if so whether it was less insane with the original collection algorithm. In any case, this patch removes the divisions by 2, and makes some nearby cosmetic improvements. Many thanks for the suggestion! regards, tom lane From b2c31c0d31c1edb931d4613b9706efe7ce04edbf Mon Sep 17 00:00:00 2001 From: Tom Lane Date: Mon, 8 Sep 2025 19:19:00 -0400 Subject: [PATCH v2] Track the maximum possible frequency of non-MCE array elements. The lossy-counting algorithm that ANALYZE uses to identify most-common array elements has a notion of cutoff frequency: elements with frequency greater than that are guaranteed to be collected, elements with smaller frequencies are not. In cases where we find fewer MCEs than the stats target would permit us to store, the cutoff frequency provides valuable additional information, to wit that there are no non-MCEs with frequency greater than that. What the selectivity estimation functions actually use the "minfreq" entry for is as a ceiling on the possible frequency of non-MCEs, so using the cutoff rather than the lowest stored MCE frequency provides a tighter bound and more accurate estimates. Therefore, instead of redundantly storing the minimum observed MCE frequency, store the cutoff frequency when there are fewer tracked values than we want. (When there are more, then of course we cannot assert that no non-stored elements are above the cutoff frequency, since we're throwing away some that are; so we still use the minimum stored frequency in that case.) Notably, this works even when none of the values are common enough to be called MCEs. In such cases we stored nothing in the STATISTIC_KIND_MCELEM pg_statistic slot, which resulted in the selectivity functions falling back to default estimates. So in that case we want to construct a STATISTIC_KIND_MCELEM entry that contains no "values" but does have "numbers", to wit the three extra numbers that the MCELEM entry type defines. A small obstacle is that update_attstats() has traditionally stored a null, not an empty array, when passed zero "values" for a slot. That gives rise to an MCELEM entry that get_attstatsslot() will spit up on. The least risky solution seems to be to adjust update_attstats() so that it will emit a non-null (but possibly empty) array when the passed stavalues array pointer isn't NULL, rather than conditioning that on numvalues > 0. In other existing cases I don't believe that that changes anything. For consistency, handle the stanumbers array the same way. In addition to adjusting the typanalyze routines that collect STATISTIC_KIND_MCELEM data, adjust the routines that use the data to assume that minfreq is the ceiling on the frequency of non-MCE values. Previously they assumed that the ceiling is minfreq / 2, which seems obviously wrong. (Perhaps it was correct with the original implementation of MCE collection, but surely it is not with the lossy-counting algorithm.) Thanks to Matt Long for the suggestion that we could apply this idea even when there are more than zero MCEs. Reported-by: Mark Frost Reported-by: Matt Long Author: Tom Lane Discussion: https://postgr.es/m/ph3ppf1c905d6e6f24a5c1a1a1d8345b593e1...@ph3ppf1c905d6e6.namprd15.prod.outlook.com --- contrib/intarray/_int_selfuncs.c | 8 +++ src/backend/commands/analyze.c | 7 +++--- src/backend/tsearch/ts_selfuncs.c| 10 - src/backend/tsearch/ts_typanalyze.c | 27
Re: Indexes on expressions with multiple columns and operators
On Wed, 2025-09-17 at 15:55 +0200, Frédéric Yhuel wrote: > Hello, in the following, I don't understand why: > > 1) the expression index isn't used in the first EXPLAIN > > 2) the number of estimated rows is completely off in the second EXPLAIN, > whereas the planner could easily use the statistics of foo_f_idx. > > (SQL script attached, tested with master and v17) > > DROP TABLE IF EXISTS foo; > > CREATE UNLOGGED TABLE foo (id bigint, ackid int, crit text); > > ALTER TABLE foo ALTER COLUMN crit SET statistics 400; > > INSERT INTO foo SELECT i, NULL, CASE WHEN i%100=1 THEN 'WARNING' ELSE > 'INFO' END FROM generate_series(1,10) AS T(i); > > UPDATE foo SET ackid = random()*1 WHERE id%100=1 AND id > 500 ; > > CREATE INDEX foo_expr_idx ON foo ((ackid IS NULL AND crit = 'WARNING')); > > ANALYZE foo ; > > EXPLAIN (ANALYZE, SUMMARY OFF, BUFFERS OFF) SELECT * FROM foo WHERE > ackid IS NULL AND crit = 'WARNING'; > QUERY PLAN > > --- > Seq Scan on foo (cost=0.00..1797.00 rows=990 width=17) (actual > time=0.012..23.932 rows=5.00 loops=1) > Filter: ((ackid IS NULL) AND (crit = 'WARNING'::text)) > Rows Removed by Filter: 5 > (3 rows) As far as I know, PostgreSQL considers "ackid IS NULL" and "crit = 'WARNING'" as two different RestrictInfos. See the comment: /* * Restriction clause info. * * We create one of these for each AND sub-clause of a restriction condition * (WHERE or JOIN/ON clause). Since the restriction clauses are logically * ANDed, we can use any one of them or any subset of them to filter out * tuples, without having to evaluate the rest. The RestrictInfo node itself * stores data used by the optimizer while choosing the best query plan. PostgreSQL doesn't consider the (exotic) case what someone creates an index on an expression that contains an AND. You could do this: EXPLAIN (ANALYZE, SUMMARY OFF, BUFFERS OFF) SELECT * FROM foo WHERE (ackid IS NULL AND crit = 'WARNING') IS TRUE; QUERY PLAN ══ Index Scan using foo_expr_idx on foo (cost=0.29..8.39 rows=5 width=17) (actual time=0.011..0.018 rows=5.00 loops=1) Index Cond: (((ackid IS NULL) AND (crit = 'WARNING'::text)) = true) Index Searches: 1 (3 rows) But really, you should create a better index, perhaps CREATE INDEX ON foo (crit) WHERE ackid IS NULL; Yours, Laurenz Albe
