date:20250917

Re: Indexes on expressions with multiple columns and operators

2025-09-17 Thread Tom Lane

=?UTF-8?Q?Fr=C3=A9d=C3=A9ric_Yhuel?=  writes:
> Hello, in the following, I don't understand why:
> 1) the expression index isn't used in the first EXPLAIN

The planner doesn't look for multi-clause matches of that sort.
You could apply a little ju-jitsu perhaps:

regression=# EXPLAIN (ANALYZE, SUMMARY OFF, BUFFERS OFF) SELECT * FROM foo 
WHERE (ackid IS NULL AND crit = 'WARNING') is true;
  QUERY PLAN

--
 Index Scan using foo_expr_idx on foo  (cost=0.29..8.39 rows=5 width=17) 
(actual time=0.013..0.016 rows=5.00 loops=1)
   Index Cond: (((ackid IS NULL) AND (crit = 'WARNING'::text)) = true)
   Index Searches: 1
(3 rows)

but my own tendency would be to use a partial index rather than a
boolean-valued index:

regression=# CREATE INDEX foo_partial_idx ON foo (id) WHERE ackid IS NULL AND 
crit = 'WARNING';
CREATE INDEX
regression=# EXPLAIN (ANALYZE, SUMMARY OFF, BUFFERS OFF) SELECT * FROM foo 
WHERE ackid IS NULL AND crit = 'WARNING';
 QUERY PLAN 

-
 Index Scan using foo_partial_idx on foo  (cost=0.13..107.18 rows=990 width=17) 
(actual time=0.010..0.014 rows=5.00 loops=1)
   Index Searches: 1
(2 rows)

The advantage of a partial index is you might be able to have the
index entries themselves carry some other column(s), allowing
more queries to be made into index-only scans.  I put "id" here,
which might or might not be of any use in this specific toy example.

> 2) the number of estimated rows is completely off in the second EXPLAIN, 
> whereas the planner could easily use the statistics of foo_f_idx.

Hmm, not sure about that.  Again, boolean-valued indexes aren't
something we've worked on too hard, but I don't see why that
would affect this case.

regards, tom lane

Re: Indexes on expressions with multiple columns and operators

2025-09-17 Thread Frédéric Yhuel


Thank you Laurenz and Tom! I'm going to quote Tom's email here:

On 9/17/25 16:41, Tom Lane wrote:

=?UTF-8?Q?Fr=C3=A9d=C3=A9ric_Yhuel?=  writes:

Hello, in the following, I don't understand why:
1) the expression index isn't used in the first EXPLAIN


The planner doesn't look for multi-clause matches of that sort.
You could apply a little ju-jitsu perhaps:

regression=# EXPLAIN (ANALYZE, SUMMARY OFF, BUFFERS OFF) SELECT * FROM foo 
WHERE (ackid IS NULL AND crit = 'WARNING') is true;
   QUERY PLAN
--
  Index Scan using foo_expr_idx on foo  (cost=0.29..8.39 rows=5 width=17) 
(actual time=0.013..0.016 rows=5.00 loops=1)
Index Cond: (((ackid IS NULL) AND (crit = 'WARNING'::text)) = true)
Index Searches: 1
(3 rows)



Thanks, it works well indeed.


but my own tendency would be to use a partial index rather than a
boolean-valued index:

regression=# CREATE INDEX foo_partial_idx ON foo (id) WHERE ackid IS NULL AND 
crit = 'WARNING';
CREATE INDEX
regression=# EXPLAIN (ANALYZE, SUMMARY OFF, BUFFERS OFF) SELECT * FROM foo 
WHERE ackid IS NULL AND crit = 'WARNING';
  QUERY PLAN
-
  Index Scan using foo_partial_idx on foo  (cost=0.13..107.18 rows=990 
width=17) (actual time=0.010..0.014 rows=5.00 loops=1)
Index Searches: 1
(2 rows)

The advantage of a partial index is you might be able to have the
index entries themselves carry some other column(s), allowing
more queries to be made into index-only scans.  I put "id" here,
which might or might not be of any use in this specific toy example.



Yes, Laurenz made a similar suggestion, but the problem is that I'm 
mostly interested in the estimated number of output rows... because in 
the real query, there's a very bad Hash Join above (the Nested Loop is 
*much* faster).




2) the number of estimated rows is completely off in the second EXPLAIN,
whereas the planner could easily use the statistics of foo_f_idx.


Hmm, not sure about that.  Again, boolean-valued indexes aren't
something we've worked on too hard, but I don't see why that
would affect this case.



OK, thanks anyway, I think the ju-jitsu mentioned above will do, even 
though the application code will have to be patched.

Re: Indexes on expressions with multiple columns and operators

2025-09-17 Thread Frédéric Yhuel





On 9/17/25 16:57, Frédéric Yhuel wrote:
Yes, Laurenz made a similar suggestion, but the problem is that I'm 
mostly interested in the estimated number of output rows... because in 
the real query, there's a very bad Hash Join above (the Nested Loop is 
*much* faster).


BTW, I've also tested another solution: partitioning on 'criticity', 
with one partition for 'INFO' (99% of the rows), and another one for the 
other values ('WARNING' and 'ALARM' in the real case). The statistics 
are much better... however an expression index would be an easier fix.


Multivariate MCV statistics don't work well in the real case (100M lines 
in the table).

Indexes on expressions with multiple columns and operators

2025-09-17 Thread Frédéric Yhuel


Hello, in the following, I don't understand why:

1) the expression index isn't used in the first EXPLAIN

2) the number of estimated rows is completely off in the second EXPLAIN, 
whereas the planner could easily use the statistics of foo_f_idx.


(SQL script attached, tested with master and v17)

DROP TABLE IF EXISTS foo;

CREATE UNLOGGED TABLE foo (id bigint, ackid int, crit text);

ALTER TABLE foo ALTER COLUMN crit SET statistics 400;

INSERT INTO foo SELECT i, NULL, CASE WHEN i%100=1 THEN 'WARNING' ELSE 
'INFO' END FROM generate_series(1,10) AS T(i);


UPDATE foo SET ackid = random()*1 WHERE id%100=1 AND id > 500 ;

CREATE INDEX foo_expr_idx ON foo ((ackid IS NULL AND crit = 'WARNING'));

ANALYZE foo ;

EXPLAIN (ANALYZE, SUMMARY OFF, BUFFERS OFF) SELECT * FROM foo WHERE 
ackid IS NULL AND crit = 'WARNING';
  QUERY PLAN 


---
 Seq Scan on foo  (cost=0.00..1797.00 rows=990 width=17) (actual 
time=0.012..23.932 rows=5.00 loops=1)

   Filter: ((ackid IS NULL) AND (crit = 'WARNING'::text))
   Rows Removed by Filter: 5
(3 rows)

CREATE OR REPLACE
FUNCTION f(crit text, ackid int)
RETURNS bool
LANGUAGE plpgsql
IMMUTABLE AS $$
  BEGIN RETURN crit = 'WARNING' AND ackid IS NULL; END
$$;

CREATE INDEX foo_f_idx ON foo (f(crit, ackid));

ANALYZE foo ;

EXPLAIN (ANALYZE, SUMMARY OFF, BUFFERS OFF) SELECT * FROM foo WHERE 
f(crit, ackid);
  QUERY PLAN 


---
 Index Scan using foo_f_idx on foo  (cost=0.29..8.39 rows=3 
width=17) (actual time=0.021..0.028 rows=5.00 loops=1)

   Index Cond: (f(crit, ackid) = true)
   Index Searches: 1
(3 rows)

SELECT tablename, most_common_vals, most_common_freqs FROM pg_stats 
WHERE tablename like 'foo_%';

  tablename   | most_common_vals | most_common_freqs
--+--+---
 foo_expr_idx | {f,t}| {0.5,5e-05}
 foo_f_idx| {f,t}| {0.5,5e-05}

SELECT 10 * 5e-05 AS the_row_estimate_that_the_planner_should_use;
 the_row_estimate_that_the_planner_should_use
-
 5.0


Best regards,
Frédéric

expr_index.sql
Description: application/sql

Why isn't PG using an index-only scan?

2025-09-17 Thread Jean-Christophe BOGGIO


Hello,

I have this very simple query:

INSERT INTO copyrightad (idoeu, idad, role, mechowned, perfowned, 
iscontrolled)
SELECT o.idoeu, c.idad, LEFT(sipa_capacity1,3), sipa_mech_owned, 
sipa_perf_owned, sipa_controlled='Y'

FROM imaestro.msipfl ip
JOIN oeu o ON o.imworkid=ip.sipa_song_code
JOIN ad c ON c.imcompid=ip.sipa_ip_code
WHERE ip.sipa_comp_or_publ='C';

And here are the number of elements in each table:

imaestro.msipfl: 1550019

oeu: 1587533

ad: 304986

The explain plan is saying this:

   QUERY PLAN
-- 


Insert on copyrightad  (cost=613790.59..4448045.97 rows=0 width=0)
  ->  Merge Join  (cost=613790.59..4448045.97 rows=84972138 width=328)
Merge Cond: (((c.imcompid)::numeric) = ip.sipa_ip_code)
->  Sort  (cost=35712.97..36475.44 rows=304986 width=8)
  Sort Key: ((c.imcompid)::numeric)
  ->  Index Only Scan using ix_ad_imcompid on ad c 
 (cost=0.42..7931.21 rows=304986 width=8)

->  Materialize  (cost=578077.61..594521.17 rows=3288712 width=23)
  ->  Sort  (cost=578077.61..586299.39 rows=3288712 width=23)
Sort Key: ip.sipa_ip_code
->  Hash Join  (cost=56043.04..154644.48 
rows=3288712 width=23)
  Hash Cond: ((o.imworkid)::numeric = 
ip.sipa_song_code)
  ->  Seq Scan on oeu o  (cost=0.00..41901.33 
rows=1587533 width=8)
  ->  Hash  (cost=48542.24..48542.24 
rows=600064 width=25)
->  Seq Scan on msipfl ip 
 (cost=0.00..48542.24 rows=600064 width=25)
  Filter: 
((sipa_comp_or_publ)::text = 'C'::text)


Table ad has this index:

"ix_ad_imcompid" btree (imcompid, idad)

Table oeu has this one:

"ix_oeu_imcompid" btree (imworkid, idoeu)

idad and idoeu are both primary keys of, respectively, ad and oeu.

The resultset should be 591615 rows because:

select sipa_comp_or_publ, count(*) from imaestro.msipfl group by 1;
sipa_comp_or_publ | count
---+
C | 591615
P | 958404

Is it normal that PG is doing a seq scan on table oeu but an index-only 
scan on ad? I had to stop the query after 5 hours, how can I make this 
faster? Of course I ran VACUUM ANALYZE.


These are the memory settings I have, but I have plenty of unused RAM. 
Should I bump something up?


shared_buffers = 16GB   # min 128kB
#huge_pages = try   # on, off, or try
#huge_page_size = 0 # zero for system default
#temp_buffers = 8MB # min 800kB
#max_prepared_transactions = 0  # zero disables the feature
work_mem = 128MB    # min 64kB

$ free -h
  total    used    free  shared  buff/cache 
  available
Mem:   125Gi    18Gi   3.1Gi    15Gi   121Gi 
  106Gi

Swap:  4.0Gi   2.0Gi   2.0Gi

Thanks for your help,

JC

Re: Poor row estimates from planner, stat `most_common_elems` sometimes missing for a text[] column

2025-09-17 Thread Tom Lane

Matt Long  writes:
> Not to let perfect be the enemy of better, but we're facing a variant of
> this issue that would not be addressed by the proposed patch.
> ...
> In this case, the effects of the proposed patch are not applied since the
> most_common_elems array is not empty. I'm not a statistician, so maybe this
> wouldn't be valid, but it seems like using the highest frequency of an
> element that did not qualify for the mce list instead of the 0.5% default
> frequency could be an elegant, but more invasive solution.

Yeah, I think you are quite right: we can apply this idea not only
when the MCE list is empty, but whenever we didn't have to truncate
the MCE list.  In that case we know there are no additional element
values that exceed the cutoff frequency, and that's what the
selectivity functions want to know.

Nosing around in the code that uses STATISTIC_KIND_MCELEM entries,
I spotted two additional issues that the attached v2 patch addresses:

* ts_typanalyze/ts_selfuncs have code essentially identical to the
array case, and should receive the same treatment.

* The selectivity functions believe that the upper bound on the
frequency of non-MCEs is minfreq / 2, not the stored minfreq.
This seems like complete brain fade: there could easily be
elements with frequency just less than minfreq, and probably are
if the data distribution follows Zipf's law.  I did not dig into
the git history, but I wonder if the divide-by-two business
predates the introduction of the lossy-counting algorithm, and
if so whether it was less insane with the original collection
algorithm.  In any case, this patch removes the divisions by 2,
and makes some nearby cosmetic improvements.

Many thanks for the suggestion!

regards, tom lane

From b2c31c0d31c1edb931d4613b9706efe7ce04edbf Mon Sep 17 00:00:00 2001
From: Tom Lane 
Date: Mon, 8 Sep 2025 19:19:00 -0400
Subject: [PATCH v2] Track the maximum possible frequency of non-MCE array
 elements.

The lossy-counting algorithm that ANALYZE uses to identify most-common
array elements has a notion of cutoff frequency: elements with
frequency greater than that are guaranteed to be collected, elements
with smaller frequencies are not.  In cases where we find fewer MCEs
than the stats target would permit us to store, the cutoff frequency
provides valuable additional information, to wit that there are no
non-MCEs with frequency greater than that.  What the selectivity
estimation functions actually use the "minfreq" entry for is as a
ceiling on the possible frequency of non-MCEs, so using the cutoff
rather than the lowest stored MCE frequency provides a tighter bound
and more accurate estimates.

Therefore, instead of redundantly storing the minimum observed MCE
frequency, store the cutoff frequency when there are fewer tracked
values than we want.  (When there are more, then of course we cannot
assert that no non-stored elements are above the cutoff frequency,
since we're throwing away some that are; so we still use the
minimum stored frequency in that case.)

Notably, this works even when none of the values are common enough
to be called MCEs.  In such cases we stored nothing in the
STATISTIC_KIND_MCELEM pg_statistic slot, which resulted in the
selectivity functions falling back to default estimates.  So in that
case we want to construct a STATISTIC_KIND_MCELEM entry that contains
no "values" but does have "numbers", to wit the three extra numbers
that the MCELEM entry type defines.  A small obstacle is that
update_attstats() has traditionally stored a null, not an empty array,
when passed zero "values" for a slot.  That gives rise to an MCELEM
entry that get_attstatsslot() will spit up on.  The least risky
solution seems to be to adjust update_attstats() so that it will emit
a non-null (but possibly empty) array when the passed stavalues array
pointer isn't NULL, rather than conditioning that on numvalues > 0.
In other existing cases I don't believe that that changes anything.
For consistency, handle the stanumbers array the same way.

In addition to adjusting the typanalyze routines that collect
STATISTIC_KIND_MCELEM data, adjust the routines that use the data
to assume that minfreq is the ceiling on the frequency of non-MCE
values.  Previously they assumed that the ceiling is minfreq / 2,
which seems obviously wrong.  (Perhaps it was correct with the
original implementation of MCE collection, but surely it is not
with the lossy-counting algorithm.)

Thanks to Matt Long for the suggestion that we could apply this
idea even when there are more than zero MCEs.

Reported-by: Mark Frost 
Reported-by: Matt Long 
Author: Tom Lane 
Discussion: https://postgr.es/m/ph3ppf1c905d6e6f24a5c1a1a1d8345b593e1...@ph3ppf1c905d6e6.namprd15.prod.outlook.com
---
 contrib/intarray/_int_selfuncs.c |  8 +++
 src/backend/commands/analyze.c   |  7 +++---
 src/backend/tsearch/ts_selfuncs.c| 10 -
 src/backend/tsearch/ts_typanalyze.c  | 27

Re: Indexes on expressions with multiple columns and operators

2025-09-17 Thread Laurenz Albe

On Wed, 2025-09-17 at 15:55 +0200, Frédéric Yhuel wrote:
> Hello, in the following, I don't understand why:
> 
> 1) the expression index isn't used in the first EXPLAIN
> 
> 2) the number of estimated rows is completely off in the second EXPLAIN, 
> whereas the planner could easily use the statistics of foo_f_idx.
> 
> (SQL script attached, tested with master and v17)
> 
> DROP TABLE IF EXISTS foo;
> 
> CREATE UNLOGGED TABLE foo (id bigint, ackid int, crit text);
> 
> ALTER TABLE foo ALTER COLUMN crit SET statistics 400;
> 
> INSERT INTO foo SELECT i, NULL, CASE WHEN i%100=1 THEN 'WARNING' ELSE 
> 'INFO' END FROM generate_series(1,10) AS T(i);
> 
> UPDATE foo SET ackid = random()*1 WHERE id%100=1 AND id > 500 ;
> 
> CREATE INDEX foo_expr_idx ON foo ((ackid IS NULL AND crit = 'WARNING'));
> 
> ANALYZE foo ;
> 
> EXPLAIN (ANALYZE, SUMMARY OFF, BUFFERS OFF) SELECT * FROM foo WHERE 
> ackid IS NULL AND crit = 'WARNING';
>    QUERY PLAN 
> 
> ---
>   Seq Scan on foo  (cost=0.00..1797.00 rows=990 width=17) (actual 
> time=0.012..23.932 rows=5.00 loops=1)
>     Filter: ((ackid IS NULL) AND (crit = 'WARNING'::text))
>     Rows Removed by Filter: 5
> (3 rows)

As far as I know, PostgreSQL considers "ackid IS NULL" and "crit = 'WARNING'"
as two different RestrictInfos.  See the comment:

/*
 * Restriction clause info.
 *
 * We create one of these for each AND sub-clause of a restriction condition
 * (WHERE or JOIN/ON clause).  Since the restriction clauses are logically
 * ANDed, we can use any one of them or any subset of them to filter out
 * tuples, without having to evaluate the rest.  The RestrictInfo node itself
 * stores data used by the optimizer while choosing the best query plan.

PostgreSQL doesn't consider the (exotic) case what someone creates an index
on an expression that contains an AND.

You could do this:

EXPLAIN (ANALYZE, SUMMARY OFF, BUFFERS OFF) SELECT * FROM foo WHERE
(ackid IS NULL AND crit = 'WARNING') IS TRUE;
  QUERY PLAN
  
══
 Index Scan using foo_expr_idx on foo  (cost=0.29..8.39 rows=5 width=17) 
(actual time=0.011..0.018 rows=5.00 loops=1)
   Index Cond: (((ackid IS NULL) AND (crit = 'WARNING'::text)) = true)
   Index Searches: 1
(3 rows)

But really, you should create a better index, perhaps

  CREATE INDEX ON foo (crit) WHERE ackid IS NULL;

Yours,
Laurenz Albe

Re: Indexes on expressions with multiple columns and operators

Re: Indexes on expressions with multiple columns and operators

Re: Indexes on expressions with multiple columns and operators

Indexes on expressions with multiple columns and operators

Why isn't PG using an index-only scan?

Re: Poor row estimates from planner, stat `most_common_elems` sometimes missing for a text[] column

Re: Indexes on expressions with multiple columns and operators

7 matches

Site Navigation

Mail list logo

Footer information