date:20250918

Re: Indexes on expressions with multiple columns and operators

2025-09-18 Thread Tom Lane

I wrote:
> Sigh ... so the answer is this used to work (since commit 39df0f150)
> and then I carelessly broke it in commit a391ff3c3.  If you try this
> test case in versions 9.5..11 you get a spot-on rowcount estimate.
> Serves me right for not having a test case I guess, but I'm astonished
> that nobody complained sooner.

The attached fixes things so it works like it did pre-a391ff3c3.

I spent some time trying to devise a test case, and was reminded
of why I didn't have one before: it's hard to make a case that
will be robust enough to not show diffs in the buildfarm.
I'll keep thinking about that though.

regards, tom lane

diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 5d51f97f219..d0f516b7645 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -874,6 +874,10 @@ clause_selectivity_ext(PlannerInfo *root,
   varRelid,
   jointype,
   sjinfo);
+
+		/* If no support, fall back on boolvarsel */
+		if (s1 < 0)
+			s1 = boolvarsel(root, clause, varRelid);
 	}
 	else if (IsA(clause, ScalarArrayOpExpr))
 	{
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index f8641204a67..da5d901ec3c 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -2143,9 +2143,8 @@ join_selectivity(PlannerInfo *root,
 /*
  * function_selectivity
  *
- * Returns the selectivity of a specified boolean function clause.
- * This code executes registered procedures stored in the
- * pg_proc relation, by calling the function manager.
+ * Attempt to estimate the selectivity of a specified boolean function clause
+ * by asking its support function.  If the function lacks support, return -1.
  *
  * See clause_selectivity() for the meaning of the additional parameters.
  */
@@ -2163,15 +2162,8 @@ function_selectivity(PlannerInfo *root,
 	SupportRequestSelectivity req;
 	SupportRequestSelectivity *sresult;
 
-	/*
-	 * If no support function is provided, use our historical default
-	 * estimate, 0.333.  This seems a pretty unprincipled choice, but
-	 * Postgres has been using that estimate for function calls since 1992.
-	 * The hoariness of this behavior suggests that we should not be in too
-	 * much hurry to use another value.
-	 */
 	if (!prosupport)
-		return (Selectivity) 0.333;
+		return (Selectivity) -1;	/* no support function */
 
 	req.type = T_SupportRequestSelectivity;
 	req.root = root;
@@ -2188,9 +2180,8 @@ function_selectivity(PlannerInfo *root,
 		DatumGetPointer(OidFunctionCall1(prosupport,
 		 PointerGetDatum(&req)));
 
-	/* If support function fails, use default */
 	if (sresult != &req)
-		return (Selectivity) 0.333;
+		return (Selectivity) -1;	/* function did not honor request */
 
 	if (req.selectivity < 0.0 || req.selectivity > 1.0)
 		elog(ERROR, "invalid function selectivity: %f", req.selectivity);
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 1c480cfaaf7..e5e066a5537 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -1528,6 +1528,17 @@ boolvarsel(PlannerInfo *root, Node *arg, int varRelid)
 		selec = var_eq_const(&vardata, BooleanEqualOperator, InvalidOid,
 			 BoolGetDatum(true), false, true, false);
 	}
+	else if (is_funcclause(arg))
+	{
+		/*
+		 * If we have no stats and it's a function call, estimate 0.333.
+		 * This seems a pretty unprincipled choice, but Postgres has been
+		 * using that estimate for function calls since 1992.  The hoariness
+		 * of this behavior suggests that we should not be in too much hurry
+		 * to use another value.
+		 */
+		selec = 0.333;
+	}
 	else
 	{
 		/* Otherwise, the default estimate is 0.5 */

Re: Why isn't PG using an index-only scan?

2025-09-18 Thread David Rowley

On Thu, 18 Sept 2025 at 23:55, Andrei Lepikhov  wrote:
> It looks like a makeshift solution. By implementing a callback, we could
> elevate 'interrupter' to a first-class feature, enabling us to monitor
> the state of the entire query tree (it is especially cool in EXPLAIN
> ANALYZE mode when we may be OK with a partial result).
> What's more interesting if Robert's work on nodes' extensibility
> successes, it enables modules to save planning decisions in the plan.
> Thereby, the in-executor callback provides a means to kick off the query
> planned with incorrect estimations (e.g., too many spilled tuples).
>
> Perhaps we should start working on introducing this type of callback/ hook?

I don't really have a fully formed opinion here, but I expect that if
there is something that needs a new hook function, it might be easier
to convince everyone that it's a good idea to accept that change if
you first write a patch to add the hook, then go off and write the
extension that uses it and then come back with that as evidence that
the hook is useful enough to be committed to core.

There's certainly places where you could add a hook that would just
add an unacceptable overhead that we couldn't stomach. I expect
ExecProcNode would be one of those places.

I do agree that trying to come up with something for this is worthy of
some time and effort. Helping people with performance issues when they
can't even get EXPLAIN ANALYZE to finish is quite tricky.

David

Re: Why isn't PG using an index-only scan?

2025-09-18 Thread David Rowley

On Thu, 18 Sept 2025 at 18:36, Jean-Christophe BOGGIO
 wrote:
> Insert on copyrightad  (cost=613790.59..4448045.97 rows=0 width=0)
>   ->  Merge Join  (cost=613790.59..4448045.97 rows=84972138 width=328)
> Merge Cond: (((c.imcompid)::numeric) = ip.sipa_ip_code)
> ->  Sort  (cost=35712.97..36475.44 rows=304986 width=8)
>   Sort Key: ((c.imcompid)::numeric)
>   ->  Index Only Scan using ix_ad_imcompid on ad c  
> (cost=0.42..7931.21 rows=304986 width=8)
> ->  Materialize  (cost=578077.61..594521.17 rows=3288712 width=23)
>   ->  Sort  (cost=578077.61..586299.39 rows=3288712 width=23)
> Sort Key: ip.sipa_ip_code
> ->  Hash Join  (cost=56043.04..154644.48 rows=3288712 
> width=23)
>   Hash Cond: ((o.imworkid)::numeric = 
> ip.sipa_song_code)
>   ->  Seq Scan on oeu o  (cost=0.00..41901.33 
> rows=1587533 width=8)
>   ->  Hash  (cost=48542.24..48542.24 rows=600064 
> width=25)
> ->  Seq Scan on msipfl ip  
> (cost=0.00..48542.24 rows=600064 width=25)
>   Filter: ((sipa_comp_or_publ)::text = 
> 'C'::text)

> Is it normal that PG is doing a seq scan on table oeu but an index-only scan 
> on ad? I had to stop the query after 5 hours, how can I make this faster? Of 
> course I ran VACUUM ANALYZE.

Yes. Since *all* records of "oeu" are required and they're not
required in any particular order, then Seq Scan should be the fastest
way to access those records.

5 hours seems very slow for the estimated number of records. Have you
tried running the SELECT using EXPLAIN ANALYZE without the INSERT
part? Even if the 84 million Merge Join row estimate is accurate, 5
hours seems excessively long.

If it still takes a long time, you might try SET enable_mergejoin = 0;
and run the EXPLAIN ANALYZE SELECT .. part.  That'll at least give us
more accurate row counts of what we're actually working with.

David

Re: Indexes on expressions with multiple columns and operators

2025-09-18 Thread Ranier Vilela

Em qui., 18 de set. de 2025 às 13:40, Tom Lane  escreveu:

> I wrote:
> > Sigh ... so the answer is this used to work (since commit 39df0f150)
> > and then I carelessly broke it in commit a391ff3c3.  If you try this
> > test case in versions 9.5..11 you get a spot-on rowcount estimate.
> > Serves me right for not having a test case I guess, but I'm astonished
> > that nobody complained sooner.
>
> The attached fixes things so it works like it did pre-a391ff3c3.
>
> I spent some time trying to devise a test case, and was reminded
> of why I didn't have one before: it's hard to make a case that
> will be robust enough to not show diffs in the buildfarm.
> I'll keep thinking about that though.
>
One question, make difference return -1.0 (float point notation)?
If not, static analyzers will have less work.

best regards,
Ranier Vilela

Re: Indexes on expressions with multiple columns and operators

2025-09-18 Thread Tom Lane

Jehan-Guillaume de Rorthais  writes:
> On a fresh instance from HEAD with its default configuration, it shows:

>   Index Scan using foo_s_idx on foo  (cost=0.29..8.39 rows=3 width=13)
> Index Cond: (s(crit, ackid) = true)

> It seems statistics shown in "pg_stats" view for function "s()" are good. The
> query itself even have the same costs than the query using the syntax tips you
> provide before.

> However, the estimated row number seems wrong in regard with the costs shown
> and statistics.

Yeah.  The problem is that clause_selectivity_ext fails to consider
use of statistics if the clause looks like "bool_valued_function(...)".
If it looks like "bool_valued_function(...) = true", that goes down
a different code path that does the right thing.

Additional factors:

* If you just write "WHERE bool_valued_function(...) = true", that
gets stripped down to "WHERE bool_valued_function(...)" in the name
of making equivalent expressions look equivalent.  (IS TRUE doesn't
get stripped, which is why you have to use that wording to avoid
that.)

* Index condition building puts back the "= true" in order to
construct something that satisfies the index AM API.  And then it
uses that form to get a selectivity estimate for costing purposes
--- so the right number goes into the indexscan cost estimate.

* But the rowcount estimate is made on the form without "= true".
That's the number shown in EXPLAIN and used when considering
joins.

regards, tom lane

Re: Indexes on expressions with multiple columns and operators

2025-09-18 Thread Tom Lane

=?UTF-8?Q?Fr=C3=A9d=C3=A9ric_Yhuel?=  writes:
> On 9/17/25 16:41, Tom Lane wrote:
>> =?UTF-8?Q?Fr=C3=A9d=C3=A9ric_Yhuel?=  writes:
>>> 2) the number of estimated rows is completely off in the second EXPLAIN,
>>> whereas the planner could easily use the statistics of foo_f_idx.

>> Hmm, not sure about that.  Again, boolean-valued indexes aren't
>> something we've worked on too hard, but I don't see why that
>> would affect this case.

> OK, thanks anyway, I think the ju-jitsu mentioned above will do, even 
> though the application code will have to be patched.

Sigh ... so the answer is this used to work (since commit 39df0f150)
and then I carelessly broke it in commit a391ff3c3.  If you try this
test case in versions 9.5..11 you get a spot-on rowcount estimate.
Serves me right for not having a test case I guess, but I'm astonished
that nobody complained sooner.

regards, tom lane

Re: Why isn't PG using an index-only scan?

2025-09-18 Thread Jean-Christophe BOGGIO


Thanks David,

Le 18/09/2025 à 09:20, David Rowley a écrit :

Yes. Since *all* records of "oeu" are required and they're not
required in any particular order, then Seq Scan should be the fastest
way to access those records.


Ok but then why is it doing it on the AD table? Is it because of the 
number of rows?



5 hours seems very slow for the estimated number of records. Have you
tried running the SELECT using EXPLAIN ANALYZE without the INSERT
part? Even if the 84 million Merge Join row estimate is accurate, 5
hours seems excessively long.


I tried this, it took 1 second. And then I discovered that I had an old 
trigger on the copyrightad table.


The insert took 2 seconds.

Very sorry for wasting your time and thanks again for putting me on the 
right track.


Have a great day!

Re: Why isn't PG using an index-only scan?

2025-09-18 Thread Andrei Lepikhov


On 18/9/2025 13:35, David Rowley wrote:

On Thu, 18 Sept 2025 at 19:55, Andrei Lepikhov  wrote:

Imagine if we had a hook within the ExecProcNode. In that scenario, we
could create a trivial extension that would stop the query after, let's
say, 10 minutes of execution and display the current state. This would
give us more reliable data on estimation and the state of the plan tree.


I recall something along those lines existing once in the extension
world. Or maybe just from a previous employer. I don't recall many of
the details. Maybe something like a function you pass an SQL string
and if you cancelled the query it reported the EXPLAIN ANALYZE done so
far. I assume it must have done something like LOG it as it couldn't
have shown the EXPLAIN as query results on cancel.
It looks like a makeshift solution. By implementing a callback, we could 
elevate 'interrupter' to a first-class feature, enabling us to monitor 
the state of the entire query tree (it is especially cool in EXPLAIN 
ANALYZE mode when we may be OK with a partial result).
What's more interesting if Robert's work on nodes' extensibility 
successes, it enables modules to save planning decisions in the plan. 
Thereby, the in-executor callback provides a means to kick off the query 
planned with incorrect estimations (e.g., too many spilled tuples).


Perhaps we should start working on introducing this type of callback/ hook?

--
regards, Andrei Lepikhov

Re: Why isn't PG using an index-only scan?

2025-09-18 Thread Andrei Lepikhov


On 18/9/2025 09:20, David Rowley wrote:

On Thu, 18 Sept 2025 at 18:36, Jean-Christophe BOGGIO
If it still takes a long time, you might try SET enable_mergejoin = 0;
and run the EXPLAIN ANALYZE SELECT .. part.  That'll at least give us
more accurate row counts of what we're actually working with.This appears to be a good example of a missing feature: the in-execution 

callback or hook.
Imagine if we had a hook within the ExecProcNode. In that scenario, we 
could create a trivial extension that would stop the query after, let's 
say, 10 minutes of execution and display the current state. This would 
give us more reliable data on estimation and the state of the plan tree.


What are your thoughts?

--
regards, Andrei Lepikhov

Re: Why isn't PG using an index-only scan?

2025-09-18 Thread David Rowley

On Thu, 18 Sept 2025 at 19:45, Jean-Christophe BOGGIO
 wrote:
> Ok but then why is it doing it on the AD table? Is it because of the
> number of rows?

It's hard to tell as I don't have a clear view on which columns are
from which tables.  Perhaps "ad" can Index Only Scan because all of
the columns required are in an index but with "oeu" there's a column
being used that's not indexed?

David

Re: Why isn't PG using an index-only scan?

2025-09-18 Thread David Rowley

On Thu, 18 Sept 2025 at 19:55, Andrei Lepikhov  wrote:
> Imagine if we had a hook within the ExecProcNode. In that scenario, we
> could create a trivial extension that would stop the query after, let's
> say, 10 minutes of execution and display the current state. This would
> give us more reliable data on estimation and the state of the plan tree.

I recall something along those lines existing once in the extension
world. Or maybe just from a previous employer. I don't recall many of
the details. Maybe something like a function you pass an SQL string
and if you cancelled the query it reported the EXPLAIN ANALYZE done so
far. I assume it must have done something like LOG it as it couldn't
have shown the EXPLAIN as query results on cancel.

David

Re: Indexes on expressions with multiple columns and operators

2025-09-18 Thread Jehan-Guillaume de Rorthais

Hi there,

I think this discussion has a nice solution, thank you!

However, while poking around this issue yesterday, we also found something
surprising between estimated rows and costs when using a function. Bellow the
scenario to apply on top of Frederic's one to quickly expose the weirdness:

  CREATE OR REPLACE FUNCTION s(crit text, ackid int) 
  RETURNS bool LANGUAGE plpgsql IMMUTABLE AS $$
  BEGIN RETURN crit = 'WARNING' AND ackid IS NULL;
  END $$;

  CREATE INDEX foo_s_idx ON foo (s(crit, ackid));

  ANALYZE foo ;

  EXPLAIN SELECT * FROM foo WHERE s(crit, ackid);

  EXPLAIN SELECT * FROM foo
  WHERE (ackid IS NULL AND crit = 'WARNING') is true;

  SELECT most_common_vals, most_common_freqs
  FROM pg_stats WHERE tablename = 'foo_s_idx';

On a fresh instance from HEAD with its default configuration, it shows:

  Index Scan using foo_s_idx on foo  (cost=0.29..8.39 rows=3 width=13)
Index Cond: (s(crit, ackid) = true)

  Index Scan using foo_expr_idx on foo  (cost=0.29..8.39 rows=5 width=13)
Index Cond: (((ackid IS NULL) AND (crit = 'WARNING'::text)) = true)

   most_common_vals | most_common_freqs 
  --+---
   {f,t}| {0.5,5e-05}

It seems statistics shown in "pg_stats" view for function "s()" are good. The
query itself even have the same costs than the query using the syntax tips you
provide before.

However, the estimated row number seems wrong in regard with the costs shown
and statistics. It looks like a default hardcoded 33% has been applied. As
Frederic said earlier, this bad row estimation drives the upper join to the
wrong method.

Is this a known planer behavior?

(again, the ju-jitsu syntax provided by Laurenz and you is definitely on
point here, I just hijack the thread to discuss this weirdness.)

Regards,

Re: Indexes on expressions with multiple columns and operators

Re: Why isn't PG using an index-only scan?

Re: Why isn't PG using an index-only scan?

Re: Indexes on expressions with multiple columns and operators

Re: Indexes on expressions with multiple columns and operators

Re: Indexes on expressions with multiple columns and operators

Re: Why isn't PG using an index-only scan?

Re: Why isn't PG using an index-only scan?

Re: Why isn't PG using an index-only scan?

Re: Why isn't PG using an index-only scan?

Re: Why isn't PG using an index-only scan?

Re: Indexes on expressions with multiple columns and operators

12 matches

Site Navigation

Mail list logo

Footer information