Understanding bad estimate (related to FKs?)
I'm trying to understand a bad estimate by the planner, and what I can do about it. The anonymized plan is here: https://explain.depesz.com/s/0MDz The item I'm focused on is node 23. The estimate is for 7 rows, actual is 896 (multiplied by 1062 loops). I'm confused about two things in this node. The first is Postgres' estimate. The condition for this index scan contains three expressions -- (five_uniform = zulu_five.five_uniform) AND (whiskey_mike = juliet_india.whiskey_mike) AND (bravo = 'mike'::text) The columns in the first two expressions (five_uniform and whiskey_mike) are NOT NULL, and have foreign key constraints to their respective tables (zulu_five.five_uniform and juliet_india.whiskey_mike). The planner can know in advance that 100% of the rows in the table will satisfy those criteria. For the third expression (bravo = 'mike'), Postgres has excellent statistics. The estimated frequency of 'mike' is 2.228%, actual frequency is 2.242%, so Postgres' estimate is only off by a tiny amount (0.014%). From what I understand, the planner has all the information it needs to make a very accurate estimate here, but it's off by quite a lot. What information am I failing to give to the planner? My second point of confusion is related. There are 564,071 rows in the source table (xray_india, aliased as papa) that satisfy the condition bravo = 'mike'. EXPLAIN reports the actual number of rows returned as 896*1062 ~= 951,552. I understand that the number reported by EXPLAIN is often a bit bigger, but this discrepancy is much larger than I'm expecting. What am I missing here? Thanks, Philip
Re: Understanding bad estimate (related to FKs?)
On Mon, Oct 26, 2020 at 12:50:38PM -0400, Philip Semanchuk wrote: > I'm trying to understand a bad estimate by the planner, and what I can do > about it. The anonymized plan is here: https://explain.depesz.com/s/0MDz What postgres version ? Since 9.6(?) FKs affect estimates. > The item I'm focused on is node 23. The estimate is for 7 rows, actual is 896 > (multiplied by 1062 loops). I'm confused about two things in this node. > > The first is Postgres' estimate. The condition for this index scan contains > three expressions -- > > (five_uniform = zulu_five.five_uniform) AND > (whiskey_mike = juliet_india.whiskey_mike) AND > (bravo = 'mike'::text) > > The columns in the first two expressions (five_uniform and whiskey_mike) are > NOT NULL, and have foreign key constraints to their respective tables > (zulu_five.five_uniform and juliet_india.whiskey_mike). The planner can know > in advance that 100% of the rows in the table will satisfy those criteria. > > For the third expression (bravo = 'mike'), Postgres has excellent statistics. > The estimated frequency of 'mike' is 2.228%, actual frequency is 2.242%, so > Postgres' estimate is only off by a tiny amount (0.014%). > > From what I understand, the planner has all the information it needs to make > a very accurate estimate here, but it's off by quite a lot. What information > am I failing to give to the planner? > > My second point of confusion is related. There are 564,071 rows in the source > table (xray_india, aliased as papa) that satisfy the condition bravo = > 'mike'. EXPLAIN reports the actual number of rows returned as 896*1062 ~= > 951,552. I understand that the number reported by EXPLAIN is often a bit > bigger, but this discrepancy is much larger than I'm expecting. What am I > missing here?
Re: Understanding bad estimate (related to FKs?)
> On Oct 26, 2020, at 1:04 PM, Justin Pryzby wrote: > > On Mon, Oct 26, 2020 at 12:50:38PM -0400, Philip Semanchuk wrote: >> I'm trying to understand a bad estimate by the planner, and what I can do >> about it. The anonymized plan is here: https://explain.depesz.com/s/0MDz > > What postgres version ? > Since 9.6(?) FKs affect estimates. We’re using 11.6 (under AWS Aurora). > >> The item I'm focused on is node 23. The estimate is for 7 rows, actual is >> 896 (multiplied by 1062 loops). I'm confused about two things in this node. >> >> The first is Postgres' estimate. The condition for this index scan contains >> three expressions -- >> >> (five_uniform = zulu_five.five_uniform) AND >> (whiskey_mike = juliet_india.whiskey_mike) AND >> (bravo = 'mike'::text) >> >> The columns in the first two expressions (five_uniform and whiskey_mike) are >> NOT NULL, and have foreign key constraints to their respective tables >> (zulu_five.five_uniform and juliet_india.whiskey_mike). The planner can know >> in advance that 100% of the rows in the table will satisfy those criteria. >> >> For the third expression (bravo = 'mike'), Postgres has excellent >> statistics. The estimated frequency of 'mike' is 2.228%, actual frequency is >> 2.242%, so Postgres' estimate is only off by a tiny amount (0.014%). >> >> From what I understand, the planner has all the information it needs to make >> a very accurate estimate here, but it's off by quite a lot. What information >> am I failing to give to the planner? >> >> My second point of confusion is related. There are 564,071 rows in the >> source table (xray_india, aliased as papa) that satisfy the condition bravo >> = 'mike'. EXPLAIN reports the actual number of rows returned as 896*1062 ~= >> 951,552. I understand that the number reported by EXPLAIN is often a bit >> bigger, but this discrepancy is much larger than I'm expecting. What am I >> missing here?
Re: Understanding bad estimate (related to FKs?)
On Mon, Oct 26, 2020 at 11:14 AM Philip Semanchuk < [email protected]> wrote: > >> The item I'm focused on is node 23. The estimate is for 7 rows, actual > is 896 (multiplied by 1062 loops). I'm confused about two things in this > node. > >> > >> The first is Postgres' estimate. The condition for this index scan > contains three expressions -- > >> > >> (five_uniform = zulu_five.five_uniform) AND > >> (whiskey_mike = juliet_india.whiskey_mike) AND > >> (bravo = 'mike'::text) > Are the columns correlated? Have you tried to create extended statistics and see if the estimate changes? I believe that extended stats will not directly help with joins though, only group bys and perhaps choosing an index scan vs table scan when comparing the correlated columns to static values rather than joining up tables. Wouldn't be much effort to try it though.
Postgres Optimizer ignores information about foreign key relationship, severly misestimating number of returned rows in join
Hi Performance Guys, I hope you can help me. I am joining two tables, that have a foreign key relationship. So I expect the optimizer to estimate the number of the resulting rows to be the same as the number of the returned rows of one of the tables. But the estimate is way too low. I have built a test case, where the problem is easily to be seen. Testcase: -- create a large table with one column with only 3 possible values, the other rows are only there to increase the selectivity create table fact (low_card integer, anydata1 integer, anydata2 integer); insert into fact (low_card, anydata1, anydata2) select floor(random()*3+1),floor(random()*1000+1),floor(random()*100+1) from generate_series(1,1); -- create a smaller table with only unique values to be referenced by foreign key create table dim as (select distinct low_card, anydata1, anydata2 from fact); create unique index on dim (low_card, anydata1, anydata2); alter table fact add constraint fk foreign key (low_card, anydata1, anydata2) references dim (low_card, anydata1, anydata2); analyze fact; analyze dim; And here comes the query: explain analyze select count(*) from fact inner join dim on (fact.low_card=dim.low_card and fact.anydata1=dim.anydata1 and fact.anydata2=dim.anydata2) where fact.low_card=1; Aggregate (cost=424.11..424.12 rows=1 width=8) (actual time=7.899..7.903 rows=1 loops=1) -> Hash Join (cost=226.27..423.82 rows=115 width=0) (actual time=3.150..7.511 rows=3344 loops=1) <=== With the FK, the estimation should be 3344, but it is 115 rows Hash Cond: ((fact.anydata1 = dim.anydata1) AND (fact.anydata2 = dim.anydata2)) -> Seq Scan on fact (cost=0.00..180.00 rows=3344 width=12) (actual time=0.025..2.289 rows=3344 loops=1) Filter: (low_card = 1) Rows Removed by Filter: 6656 -> Hash (cost=176.89..176.89 rows=3292 width=12) (actual time=3.105..3.107 rows=3292 loops=1) Buckets: 4096 Batches: 1 Memory Usage: 174kB -> Seq Scan on dim (cost=0.00..176.89 rows=3292 width=12) (actual time=0.014..2.103 rows=3292 loops=1) Filter: (low_card = 1) Rows Removed by Filter: 6539 Planning Time: 0.619 ms Execution Time: 7.973 ms My problem is, that I am joining a lot more tables in reality and since the row estimates are so low, the optimizer goes for nested loops, leading to inacceptable execution times. Question: How can I get the optimizer to use the information about the foreign key relationship and get accurate estimates? Sigrid Ehrenreich
Re: Understanding bad estimate (related to FKs?)
> On Oct 26, 2020, at 1:20 PM, Michael Lewis wrote:
>
> On Mon, Oct 26, 2020 at 11:14 AM Philip Semanchuk
> wrote:
> >> The item I'm focused on is node 23. The estimate is for 7 rows, actual is
> >> 896 (multiplied by 1062 loops). I'm confused about two things in this node.
> >>
> >> The first is Postgres' estimate. The condition for this index scan
> >> contains three expressions --
> >>
> >> (five_uniform = zulu_five.five_uniform) AND
> >> (whiskey_mike = juliet_india.whiskey_mike) AND
> >> (bravo = 'mike'::text)
>
> Are the columns correlated? Have you tried to create extended statistics and
> see if the estimate changes? I believe that extended stats will not directly
> help with joins though, only group bys and perhaps choosing an index scan vs
> table scan when comparing the correlated columns to static values rather than
> joining up tables. Wouldn't be much effort to try it though.
There’s not a lot of correlation between whiskey_mike and bravo --
stxkind stxndistinctstxdependencies
['d', 'f'] {"7, 12": 42} {"12 => 7": 0.000274}
Those stats didn’t help the planner.
I should have mentioned that five_uniform has ~63k unique values, whereas
whiskey_mike has only 3, and bravo only 19.
Cheers
Philip
Re: Postgres Optimizer ignores information about foreign key relationship, severely misestimating number of returned rows in join
On Mon, Oct 26, 2020 at 03:58:05PM +, Ehrenreich, Sigrid wrote: > Hi Performance Guys, > > I hope you can help me. I am joining two tables, that have a foreign key > relationship. So I expect the optimizer to estimate the number of the > resulting rows to be the same as the number of the returned rows of one of > the tables. But the estimate is way too low. > > I have built a test case, where the problem is easily to be seen. I reproduced the problem on v14dev. Note the different estimates between these: postgres=# explain analyze SELECT * FROM fact INNER JOIN dim USING (low_card,anydata1,anydata2) WHERE fact.low_card=2; Hash Join (cost=161.58..358.85 rows=112 width=12) (actual time=8.707..15.717 rows=3289 loops=1) postgres=# explain analyze SELECT * FROM fact INNER JOIN dim USING (low_card,anydata1,anydata2) WHERE fact.low_card BETWEEN 2 AND 2; Hash Join (cost=324.71..555.61 rows=3289 width=12) (actual time=15.966..23.394 rows=3289 loops=1) I think because low_card has an equality comparison in addition to the equijoin, it's being disqualified from the planner's mechanism to consider FKs in join selectivity. https://doxygen.postgresql.org/costsize_8c_source.html#l05024 I don't know enough about this to help more than that.
Re: Postgres Optimizer ignores information about foreign key relationship, severly misestimating number of returned rows in join
On Tue, 27 Oct 2020 at 06:54, Ehrenreich, Sigrid wrote: > -> Hash Join (cost=226.27..423.82 rows=115 width=0) (actual > time=3.150..7.511 rows=3344 loops=1) <=== With the FK, the > estimation should be 3344, but it is 115 rows I'd have expected this to find the foreign key and have the join selectivity of 1.0, but I see it does not due to the fact that one of the EquivalenceClass has a constant due to the fact.low_card = 1 qual. In build_join_rel() we call build_joinrel_restrictlist() to get the join quals that need to be evaluated at the join level, but we only get the fact.anydata1=dim.anydata1 and fact.anydata2=dim.anydata2 quals there. The low_card qual gets pushed down to the scan level on each side of the join, so no need for it to get evaluated at the join level. Later in build_join_rel() we do set_joinrel_size_estimates(). The restrictlist with just the two quals is what we pass to get_foreign_key_join_selectivity(). Only two of the foreign key columns are matched there, therefore we don't class that as a match and just leave it up to the normal selectivity functions. I feel like we could probably do better there and perhaps somehow count ECs with ec_has_const as matched, but there seems to be some assumptions later in get_foreign_key_join_selectivity() where we determine the selectivity based on the base rel's tuple count. We'd need to account for how many rows remainder after filtering the ECs with ec_has_const == true, else we'd be doing the wrong thing. That needs more thought than I have time for right now. Your case would work if the foreign key had been on just anydata1 and anydata2, but there's not much chance of that working without a unique index on those two columns. Extended statistics won't help you here either since they're currently not used for join estimations. David
Re: Postgres Optimizer ignores information about foreign key relationship, severly misestimating number of returned rows in join
David Rowley writes: > On Tue, 27 Oct 2020 at 06:54, Ehrenreich, Sigrid > wrote: >> -> Hash Join (cost=226.27..423.82 rows=115 width=0) (actual >> time=3.150..7.511 rows=3344 loops=1) <=== With the FK, the >> estimation should be 3344, but it is 115 rows > I'd have expected this to find the foreign key and have the join > selectivity of 1.0, but I see it does not due to the fact that one of > the EquivalenceClass has a constant due to the fact.low_card = 1 qual. Right. > I feel like we could probably do better there and perhaps somehow > count ECs with ec_has_const as matched, but there seems to be some > assumptions later in get_foreign_key_join_selectivity() where we > determine the selectivity based on the base rel's tuple count. We'd > need to account for how many rows remainder after filtering the ECs > with ec_has_const == true, else we'd be doing the wrong thing. That > needs more thought than I have time for right now. Yeah, I'm fooling with a patch for that now. The basic problem is that the selectivity of the x = constant clauses has already been factored into the sizes of both join input relations, so we're double-counting it if we just apply the existing FK-based selectivity estimate. I think though that we can recover the selectivity associated with that qual on the FK side (or should it be the PK side?) and cancel it out of the FK selectivity. regards, tom lane
