Understanding bad estimate (related to FKs?)

2020-10-26 Thread Philip Semanchuk
I'm trying to understand a bad estimate by the planner, and what I can do about 
it. The anonymized plan is here: https://explain.depesz.com/s/0MDz

The item I'm focused on is node 23. The estimate is for 7 rows, actual is 896 
(multiplied by 1062 loops). I'm confused about two things in this node.

The first is Postgres' estimate. The condition for this index scan contains 
three expressions --

(five_uniform = zulu_five.five_uniform) AND
(whiskey_mike = juliet_india.whiskey_mike) AND
(bravo = 'mike'::text)

The columns in the first two expressions (five_uniform and whiskey_mike) are 
NOT NULL, and have foreign key constraints to their respective tables 
(zulu_five.five_uniform and juliet_india.whiskey_mike). The planner can know in 
advance that 100% of the rows in the table will satisfy those criteria.

For the third expression (bravo = 'mike'), Postgres has excellent statistics. 
The estimated frequency of 'mike' is 2.228%, actual frequency is 2.242%, so 
Postgres' estimate is only off by a tiny amount (0.014%).

From what I understand, the planner has all the information it needs to make a 
very accurate estimate here, but it's off by quite a lot. What information am I 
failing to give to the planner?

My second point of confusion is related. There are 564,071 rows in the source 
table (xray_india, aliased as papa) that satisfy the condition bravo = 'mike'. 
EXPLAIN reports the actual number of rows returned as 896*1062 ~= 951,552. I 
understand that the number reported by EXPLAIN is often a bit bigger, but this 
discrepancy is much larger than I'm expecting. What am I missing here?

Thanks,
Philip



Re: Understanding bad estimate (related to FKs?)

2020-10-26 Thread Justin Pryzby
On Mon, Oct 26, 2020 at 12:50:38PM -0400, Philip Semanchuk wrote:
> I'm trying to understand a bad estimate by the planner, and what I can do 
> about it. The anonymized plan is here: https://explain.depesz.com/s/0MDz

What postgres version ?
Since 9.6(?) FKs affect estimates.

> The item I'm focused on is node 23. The estimate is for 7 rows, actual is 896 
> (multiplied by 1062 loops). I'm confused about two things in this node.
> 
> The first is Postgres' estimate. The condition for this index scan contains 
> three expressions --
> 
> (five_uniform = zulu_five.five_uniform) AND
> (whiskey_mike = juliet_india.whiskey_mike) AND
> (bravo = 'mike'::text)
> 
> The columns in the first two expressions (five_uniform and whiskey_mike) are 
> NOT NULL, and have foreign key constraints to their respective tables 
> (zulu_five.five_uniform and juliet_india.whiskey_mike). The planner can know 
> in advance that 100% of the rows in the table will satisfy those criteria.
> 
> For the third expression (bravo = 'mike'), Postgres has excellent statistics. 
> The estimated frequency of 'mike' is 2.228%, actual frequency is 2.242%, so 
> Postgres' estimate is only off by a tiny amount (0.014%).
> 
> From what I understand, the planner has all the information it needs to make 
> a very accurate estimate here, but it's off by quite a lot. What information 
> am I failing to give to the planner?
> 
> My second point of confusion is related. There are 564,071 rows in the source 
> table (xray_india, aliased as papa) that satisfy the condition bravo = 
> 'mike'. EXPLAIN reports the actual number of rows returned as 896*1062 ~= 
> 951,552. I understand that the number reported by EXPLAIN is often a bit 
> bigger, but this discrepancy is much larger than I'm expecting. What am I 
> missing here?




Re: Understanding bad estimate (related to FKs?)

2020-10-26 Thread Philip Semanchuk



> On Oct 26, 2020, at 1:04 PM, Justin Pryzby  wrote:
> 
> On Mon, Oct 26, 2020 at 12:50:38PM -0400, Philip Semanchuk wrote:
>> I'm trying to understand a bad estimate by the planner, and what I can do 
>> about it. The anonymized plan is here: https://explain.depesz.com/s/0MDz
> 
> What postgres version ?
> Since 9.6(?) FKs affect estimates.

We’re using 11.6 (under AWS Aurora).


> 
>> The item I'm focused on is node 23. The estimate is for 7 rows, actual is 
>> 896 (multiplied by 1062 loops). I'm confused about two things in this node.
>> 
>> The first is Postgres' estimate. The condition for this index scan contains 
>> three expressions --
>> 
>> (five_uniform = zulu_five.five_uniform) AND
>> (whiskey_mike = juliet_india.whiskey_mike) AND
>> (bravo = 'mike'::text)
>> 
>> The columns in the first two expressions (five_uniform and whiskey_mike) are 
>> NOT NULL, and have foreign key constraints to their respective tables 
>> (zulu_five.five_uniform and juliet_india.whiskey_mike). The planner can know 
>> in advance that 100% of the rows in the table will satisfy those criteria.
>> 
>> For the third expression (bravo = 'mike'), Postgres has excellent 
>> statistics. The estimated frequency of 'mike' is 2.228%, actual frequency is 
>> 2.242%, so Postgres' estimate is only off by a tiny amount (0.014%).
>> 
>> From what I understand, the planner has all the information it needs to make 
>> a very accurate estimate here, but it's off by quite a lot. What information 
>> am I failing to give to the planner?
>> 
>> My second point of confusion is related. There are 564,071 rows in the 
>> source table (xray_india, aliased as papa) that satisfy the condition bravo 
>> = 'mike'. EXPLAIN reports the actual number of rows returned as 896*1062 ~= 
>> 951,552. I understand that the number reported by EXPLAIN is often a bit 
>> bigger, but this discrepancy is much larger than I'm expecting. What am I 
>> missing here?





Re: Understanding bad estimate (related to FKs?)

2020-10-26 Thread Michael Lewis
On Mon, Oct 26, 2020 at 11:14 AM Philip Semanchuk <
[email protected]> wrote:

> >> The item I'm focused on is node 23. The estimate is for 7 rows, actual
> is 896 (multiplied by 1062 loops). I'm confused about two things in this
> node.
> >>
> >> The first is Postgres' estimate. The condition for this index scan
> contains three expressions --
> >>
> >> (five_uniform = zulu_five.five_uniform) AND
> >> (whiskey_mike = juliet_india.whiskey_mike) AND
> >> (bravo = 'mike'::text)
>

Are the columns correlated? Have you tried to create extended statistics
and see if the estimate changes? I believe that extended stats will not
directly help with joins though, only group bys and perhaps choosing an
index scan vs table scan when comparing the correlated columns to static
values rather than joining up tables. Wouldn't be much effort to try it
though.


Postgres Optimizer ignores information about foreign key relationship, severly misestimating number of returned rows in join

2020-10-26 Thread Ehrenreich, Sigrid
Hi Performance Guys,

I hope you can help me. I am joining two tables, that have a foreign key 
relationship. So I expect the optimizer to estimate the number of the resulting 
rows to be the same as the number of the returned rows of one of the tables. 
But the estimate is way too low.

I have built a test case, where the problem is easily to be seen.

Testcase:
-- create a large table with one column with only 3 possible values, the other 
rows are only there to increase the selectivity
create table fact (low_card integer, anydata1 integer, anydata2 integer);
insert into fact (low_card, anydata1, anydata2) select 
floor(random()*3+1),floor(random()*1000+1),floor(random()*100+1) from 
generate_series(1,1);

-- create a smaller table with only unique values to be referenced by foreign 
key
create table dim as (select distinct low_card, anydata1, anydata2 from fact);
create unique index on dim (low_card, anydata1, anydata2);
alter table fact add constraint fk foreign key (low_card, anydata1, anydata2) 
references dim (low_card, anydata1, anydata2);

analyze fact;
analyze dim;

And here comes the query:
explain analyze
select count(*) from fact inner join dim on (fact.low_card=dim.low_card and 
fact.anydata1=dim.anydata1 and fact.anydata2=dim.anydata2)
where fact.low_card=1;

Aggregate  (cost=424.11..424.12 rows=1 width=8) (actual time=7.899..7.903 
rows=1 loops=1)
  ->  Hash Join  (cost=226.27..423.82 rows=115 width=0) (actual 
time=3.150..7.511 rows=3344 loops=1)   <=== With the FK, the estimation 
should be 3344, but it is 115 rows
Hash Cond: ((fact.anydata1 = dim.anydata1) AND (fact.anydata2 = 
dim.anydata2))
->  Seq Scan on fact  (cost=0.00..180.00 rows=3344 width=12) (actual 
time=0.025..2.289 rows=3344 loops=1)
  Filter: (low_card = 1)
  Rows Removed by Filter: 6656
->  Hash  (cost=176.89..176.89 rows=3292 width=12) (actual 
time=3.105..3.107 rows=3292 loops=1)
  Buckets: 4096  Batches: 1  Memory Usage: 174kB
  ->  Seq Scan on dim  (cost=0.00..176.89 rows=3292 width=12) 
(actual time=0.014..2.103 rows=3292 loops=1)
Filter: (low_card = 1)
Rows Removed by Filter: 6539
Planning Time: 0.619 ms
Execution Time: 7.973 ms


My problem is, that I am joining a lot more tables in reality and since the row 
estimates are so low, the optimizer goes for nested loops, leading to 
inacceptable execution times.

Question: How can I get the optimizer to use the information about the foreign 
key relationship and get accurate estimates?

Sigrid Ehrenreich



Re: Understanding bad estimate (related to FKs?)

2020-10-26 Thread Philip Semanchuk



> On Oct 26, 2020, at 1:20 PM, Michael Lewis  wrote:
> 
> On Mon, Oct 26, 2020 at 11:14 AM Philip Semanchuk 
>  wrote:
> >> The item I'm focused on is node 23. The estimate is for 7 rows, actual is 
> >> 896 (multiplied by 1062 loops). I'm confused about two things in this node.
> >> 
> >> The first is Postgres' estimate. The condition for this index scan 
> >> contains three expressions --
> >> 
> >> (five_uniform = zulu_five.five_uniform) AND
> >> (whiskey_mike = juliet_india.whiskey_mike) AND
> >> (bravo = 'mike'::text)
> 
> Are the columns correlated? Have you tried to create extended statistics and 
> see if the estimate changes? I believe that extended stats will not directly 
> help with joins though, only group bys and perhaps choosing an index scan vs 
> table scan when comparing the correlated columns to static values rather than 
> joining up tables. Wouldn't be much effort to try it though.


There’s not a lot of correlation between whiskey_mike and bravo --
stxkind stxndistinctstxdependencies
['d', 'f']  {"7, 12": 42}   {"12 => 7": 0.000274}

Those stats didn’t help the planner. 

I should have mentioned that five_uniform has ~63k unique values, whereas 
whiskey_mike has only 3, and bravo only 19.

Cheers
Philip



Re: Postgres Optimizer ignores information about foreign key relationship, severely misestimating number of returned rows in join

2020-10-26 Thread Justin Pryzby
On Mon, Oct 26, 2020 at 03:58:05PM +, Ehrenreich, Sigrid wrote:
> Hi Performance Guys,
> 
> I hope you can help me. I am joining two tables, that have a foreign key 
> relationship. So I expect the optimizer to estimate the number of the 
> resulting rows to be the same as the number of the returned rows of one of 
> the tables. But the estimate is way too low.
> 
> I have built a test case, where the problem is easily to be seen.

I reproduced the problem on v14dev.

Note the different estimates between these:

postgres=# explain analyze SELECT * FROM fact INNER JOIN dim USING 
(low_card,anydata1,anydata2) WHERE fact.low_card=2;
 Hash Join  (cost=161.58..358.85 rows=112 width=12) (actual time=8.707..15.717 
rows=3289 loops=1)

postgres=# explain analyze SELECT * FROM fact INNER JOIN dim USING 
(low_card,anydata1,anydata2) WHERE fact.low_card BETWEEN 2 AND 2;
 Hash Join  (cost=324.71..555.61 rows=3289 width=12) (actual 
time=15.966..23.394 rows=3289 loops=1)

I think because low_card has an equality comparison in addition to the equijoin,
it's being disqualified from the planner's mechanism to consider FKs in join
selectivity.
https://doxygen.postgresql.org/costsize_8c_source.html#l05024

I don't know enough about this to help more than that.




Re: Postgres Optimizer ignores information about foreign key relationship, severly misestimating number of returned rows in join

2020-10-26 Thread David Rowley
On Tue, 27 Oct 2020 at 06:54, Ehrenreich, Sigrid  wrote:
>   ->  Hash Join  (cost=226.27..423.82 rows=115 width=0) (actual 
> time=3.150..7.511 rows=3344 loops=1)   <=== With the FK, the 
> estimation should be 3344, but it is 115 rows

I'd have expected this to find the foreign key and have the join
selectivity of 1.0, but I see it does not due to the fact that one of
the EquivalenceClass has a constant due to the fact.low_card = 1 qual.

In build_join_rel() we call build_joinrel_restrictlist() to get the
join quals that need to be evaluated at the join level, but we only
get the fact.anydata1=dim.anydata1 and fact.anydata2=dim.anydata2
quals there.  The low_card qual gets pushed down to the scan level on
each side of the join, so no need for it to get evaluated at the join
level. Later in build_join_rel() we do set_joinrel_size_estimates().
The restrictlist with just the two quals is what we pass to
get_foreign_key_join_selectivity().  Only two of the foreign key
columns are matched there, therefore we don't class that as a match
and just leave it up to the normal selectivity functions.

I feel like we could probably do better there and perhaps somehow
count ECs with ec_has_const as matched, but there seems to be some
assumptions later in get_foreign_key_join_selectivity() where we
determine the selectivity based on the base rel's tuple count.  We'd
need to account for how many rows remainder after filtering the ECs
with ec_has_const == true, else we'd be doing the wrong thing.  That
needs more thought than I have time for right now.

Your case would work if the foreign key had been on just anydata1 and
anydata2, but there's not much chance of that working without a unique
index on those two columns.

Extended statistics won't help you here either since they're currently
not used for join estimations.

David




Re: Postgres Optimizer ignores information about foreign key relationship, severly misestimating number of returned rows in join

2020-10-26 Thread Tom Lane
David Rowley  writes:
> On Tue, 27 Oct 2020 at 06:54, Ehrenreich, Sigrid  
> wrote:
>> ->  Hash Join  (cost=226.27..423.82 rows=115 width=0) (actual 
>> time=3.150..7.511 rows=3344 loops=1)   <=== With the FK, the 
>> estimation should be 3344, but it is 115 rows

> I'd have expected this to find the foreign key and have the join
> selectivity of 1.0, but I see it does not due to the fact that one of
> the EquivalenceClass has a constant due to the fact.low_card = 1 qual.

Right.

> I feel like we could probably do better there and perhaps somehow
> count ECs with ec_has_const as matched, but there seems to be some
> assumptions later in get_foreign_key_join_selectivity() where we
> determine the selectivity based on the base rel's tuple count.  We'd
> need to account for how many rows remainder after filtering the ECs
> with ec_has_const == true, else we'd be doing the wrong thing.  That
> needs more thought than I have time for right now.

Yeah, I'm fooling with a patch for that now.  The basic problem is
that the selectivity of the x = constant clauses has already been
factored into the sizes of both join input relations, so we're
double-counting it if we just apply the existing FK-based
selectivity estimate.  I think though that we can recover the
selectivity associated with that qual on the FK side (or should
it be the PK side?) and cancel it out of the FK selectivity.

regards, tom lane