RE: Postgres Optimizer ignores information about foreign key relationship, severly misestimating number of returned rows in join

2020-10-29 Thread Ehrenreich, Sigrid
Hi Tom,

Thanks a lot for your help!

If it is in the HEAD, does it mean, it will be included in v14?

I'll have to see, if we dare building our own v13 version with the patch.
(I would love to, because I am simply thrilled to pieces, having a patch made 
by you for us 😉)

Regards,
Sigrid

-Original Message-
From: Tom Lane  
Sent: Wednesday, October 28, 2020 5:55 PM
To: Ehrenreich, Sigrid 
Cc: David Rowley ; [email protected]
Subject: Re: Postgres Optimizer ignores information about foreign key 
relationship, severly misestimating number of returned rows in join

"Ehrenreich, Sigrid"  writes:
> A patch would be very much appreciated.
> We are currently running on Version 12, but could upgrade to 13, if necessary.
> Could you send me a notification if you managed to program a patch for that?

I've pushed a patch for this to HEAD, but current thinking is that we
will not be back-patching it.  Still, if you're desperate you could
consider running a custom build of v13 with that patch --- a quick
check suggests that it would back-patch easily.  v12 would be a
slightly harder lift (I see one hunk doesn't apply) but probably
not by much.

https://git.postgresql.org/gitweb/?p=postgresql.git;a=patch;h=ad1c36b0709e47cdb3cc4abd6c939fe64279b63f

regards, tom lane


Re: Postgres Optimizer ignores information about foreign key relationship, severly misestimating number of returned rows in join

2020-10-29 Thread Magnus Hagander
On Thu, Oct 29, 2020 at 9:43 AM Ehrenreich, Sigrid
 wrote:
>
> Hi Tom,
>
> Thanks a lot for your help!
>
> If it is in the HEAD, does it mean, it will be included in v14?


Yes, that's precisely what it means. Unless someone finds something
bad with it and it has to be removed of course, but in principle it
means that it will be in 14.

//Magnus


>
>
> I'll have to see, if we dare building our own v13 version with the patch.
> (I would love to, because I am simply thrilled to pieces, having a patch made 
> by you for us )
>
> Regards,
> Sigrid
>
> -Original Message-
> From: Tom Lane 
> Sent: Wednesday, October 28, 2020 5:55 PM
> To: Ehrenreich, Sigrid 
> Cc: David Rowley ; 
> [email protected]
> Subject: Re: Postgres Optimizer ignores information about foreign key 
> relationship, severly misestimating number of returned rows in join
>
> "Ehrenreich, Sigrid"  writes:
> > A patch would be very much appreciated.
> > We are currently running on Version 12, but could upgrade to 13, if 
> > necessary.
> > Could you send me a notification if you managed to program a patch for that?
>
> I've pushed a patch for this to HEAD, but current thinking is that we
> will not be back-patching it.  Still, if you're desperate you could
> consider running a custom build of v13 with that patch --- a quick
> check suggests that it would back-patch easily.  v12 would be a
> slightly harder lift (I see one hunk doesn't apply) but probably
> not by much.
>
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=patch;h=ad1c36b0709e47cdb3cc4abd6c939fe64279b63f
>
> regards, tom lane




RE: Postgres Optimizer ignores information about foreign key relationship, severly misestimating number of returned rows in join

2020-10-29 Thread Ehrenreich, Sigrid
Thanks!

Regards,
Sigrid

-Original Message-
From: Magnus Hagander  
Sent: Thursday, October 29, 2020 10:57 AM
To: Ehrenreich, Sigrid 
Cc: Tom Lane ; David Rowley ; 
[email protected]
Subject: Re: Postgres Optimizer ignores information about foreign key 
relationship, severly misestimating number of returned rows in join

On Thu, Oct 29, 2020 at 9:43 AM Ehrenreich, Sigrid
 wrote:
>
> Hi Tom,
>
> Thanks a lot for your help!
>
> If it is in the HEAD, does it mean, it will be included in v14?


Yes, that's precisely what it means. Unless someone finds something
bad with it and it has to be removed of course, but in principle it
means that it will be in 14.

//Magnus


>
>
> I'll have to see, if we dare building our own v13 version with the patch.
> (I would love to, because I am simply thrilled to pieces, having a patch made 
> by you for us )
>
> Regards,
> Sigrid
>
> -Original Message-
> From: Tom Lane 
> Sent: Wednesday, October 28, 2020 5:55 PM
> To: Ehrenreich, Sigrid 
> Cc: David Rowley ; 
> [email protected]
> Subject: Re: Postgres Optimizer ignores information about foreign key 
> relationship, severly misestimating number of returned rows in join
>
> "Ehrenreich, Sigrid"  writes:
> > A patch would be very much appreciated.
> > We are currently running on Version 12, but could upgrade to 13, if 
> > necessary.
> > Could you send me a notification if you managed to program a patch for that?
>
> I've pushed a patch for this to HEAD, but current thinking is that we
> will not be back-patching it.  Still, if you're desperate you could
> consider running a custom build of v13 with that patch --- a quick
> check suggests that it would back-patch easily.  v12 would be a
> slightly harder lift (I see one hunk doesn't apply) but probably
> not by much.
>
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=patch;h=ad1c36b0709e47cdb3cc4abd6c939fe64279b63f
>
> regards, tom lane


Re: query plan using partial index expects a much larger number of rows than is possible

2020-10-29 Thread Michael Lewis
On Wed, Oct 28, 2020 at 5:30 PM Tom Lane  wrote:

> "Olivier Poquet"  writes:
> > Looking at it in more detail, I found that the planner is assuming that
> I'll get millions of rows back even when I do a simple query that does an
> index scan on my partial index:
>
> We don't look at partial-index predicates when trying to estimate the
> selectivity of a WHERE clause.  It's not clear to me whether that'd be
> a useful thing to do, or whether it could be shoehorned into the system
> easily.  (One big problem is that while the index size could provide
> an upper bound, it's not apparent how to combine that knowledge with
> selectivities of unrelated conditions.  Also, it's riskier to extrapolate
> a current rowcount estimate from stale relpages/reltuples data for an
> index than it is for a table, because the index is less likely to scale
> up linearly.)
>
> regards, tom lane
>


Aren't there custom stats created for functional indexes? Would it be
feasible to create those for partial indexes as well, maybe only
optionally? I assume there may be giant gaps with that notion.


Re: Understanding bad estimate (related to FKs?)

2020-10-29 Thread Philip Semanchuk



> On Oct 28, 2020, at 9:13 PM, Justin Pryzby  wrote:
> 
> On Mon, Oct 26, 2020 at 11:20:01AM -0600, Michael Lewis wrote:
>> On Mon, Oct 26, 2020 at 11:14 AM Philip Semanchuk 
>>  wrote:
>> 
> The item I'm focused on is node 23. The estimate is for 7 rows, actual
>>> is 896 (multiplied by 1062 loops). I'm confused about two things in this
>>> node.
> 
> The first is Postgres' estimate. The condition for this index scan
>>> contains three expressions --
> 
> (five_uniform = zulu_five.five_uniform) AND
> (whiskey_mike = juliet_india.whiskey_mike) AND
> (bravo = 'mike'::text)
>>> 
>> 
>> Are the columns correlated?
> 
> I guess it shouldn't matter, since the FKs should remove all but one of the
> conditions.

Yes, I had the same expectation. I thought Postgres would calculate the 
selectivity as 1.0 * 1.0 * whatever estimate it has for the frequency of 
‘mike’, but since the frequency estimate is very accurate but the planner’s 
estimate is not, there’s something else going on. 

> Maybe you saw this other thread, which I tentatively think also affects your
> case (equijoin with nonjoin condition)
> https://www.postgresql.org/message-id/AM6PR02MB5287A0ADD936C1FA80973E72AB190%40AM6PR02MB5287.eurprd02.prod.outlook.com

Yes, thank you, I read that thread with interest. I tried your clever trick 
using BETWEEN, but it didn’t change the plan. Does that suggest there’s some 
other cause for the planner’s poor estimate?

Cheers
Philip








Re: Understanding bad estimate (related to FKs?)

2020-10-29 Thread Tomas Vondra

On Thu, Oct 29, 2020 at 11:25:48AM -0400, Philip Semanchuk wrote:




On Oct 28, 2020, at 9:13 PM, Justin Pryzby 
wrote:

On Mon, Oct 26, 2020 at 11:20:01AM -0600, Michael Lewis wrote:

On Mon, Oct 26, 2020 at 11:14 AM Philip Semanchuk
 wrote:


The item I'm focused on is node 23. The estimate is for 7 rows,
actual

is 896 (multiplied by 1062 loops). I'm confused about two things in
this node.


The first is Postgres' estimate. The condition for this index
scan

contains three expressions --


(five_uniform = zulu_five.five_uniform) AND (whiskey_mike =
juliet_india.whiskey_mike) AND (bravo = 'mike'::text)




Are the columns correlated?


I guess it shouldn't matter, since the FKs should remove all but one
of the conditions.


Yes, I had the same expectation. I thought Postgres would calculate the
selectivity as 1.0 * 1.0 * whatever estimate it has for the frequency
of ‘mike’, but since the frequency estimate is very accurate but the
planner’s estimate is not, there’s something else going on.



Well, this is quite a bit more complicated, I'm afraid :-( The clauses
include parameters passed from the nodes above the index scan. So even
if we had extended stats on the table, we couldn't use them as that
requires (Var op Const) conditions. So this likely ends up with a
product of estimates for each clause, and even then we can't use any
particular value so we probably end up with something like 1/ndistinct
or something like that. So if the values actually passed to the index
scan are more common and/or if the columns are somehow correlated, it's
not surprising we end up with an overestimate.


Maybe you saw this other thread, which I tentatively think also
affects your case (equijoin with nonjoin condition)
https://www.postgresql.org/message-id/AM6PR02MB5287A0ADD936C1FA80973E72AB190%40AM6PR02MB5287.eurprd02.prod.outlook.com


Yes, thank you, I read that thread with interest. I tried your clever
trick using BETWEEN, but it didn’t change the plan. Does that suggest
there’s some other cause for the planner’s poor estimate?



I don't think that's related - to hit that bug, there would have to be
implied conditions pushed-down to the scan level. And there's nothing
like that in this case.

FWIW I don't think this has anything to do with join cardinality
estimation - at least not for the node 23.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services