Partition pruning with joins

2020-11-03 Thread Ehrenreich, Sigrid
Hi,

I would like to join a partitioned table and have the joined columns in the 
where clause to be used for partition pruning.
From some readings in the internet, I conclude that this was not possible in 
v12. I hoped for the “improvements in partition pruning” in v13, but it seems 
to me, that it is still not possible, or is it and I am missing something here?

My testcase:
create table fact (part_key integer) partition by range (part_key);
create table fact_100 partition of fact for values from (1) to (101);
create table fact_200 partition of fact for values from (101) to (201);

insert into fact (part_key) select floor(random()*100+1) from 
generate_series(1,1);
insert into fact (part_key) select floor(random()*100+101) from 
generate_series(1,1);

create table dim as (select distinct part_key from fact);
create unique index on dim (part_key);

analyze fact;
analyze dim;

-- Statement
explain SELECT
count(*)
FROM
dim INNER JOIN fact ON (dim.part_key=fact.part_key)
WHERE dim.part_key >= 110 and dim.part_key <= 160;

Plan shows me, that all partitions are scanned:
Aggregate  (cost=461.00..461.01 rows=1 width=8)
  ->  Hash Join  (cost=4.64..448.25 rows=5100 width=0)
Hash Cond: (fact.part_key = dim.part_key)
->  Append  (cost=0.00..390.00 rows=2 width=4)
  ->  Seq Scan on fact_100 fact_1  (cost=0.00..145.00 rows=1 
width=4)  ⇐ unnecessarily scanned
  ->  Seq Scan on fact_200 fact_2  (cost=0.00..145.00 rows=1 
width=4)
->  Hash  (cost=4.00..4.00 rows=51 width=4)
  ->  Seq Scan on dim  (cost=0.00..4.00 rows=51 width=4)
Filter: ((part_key >= 110) AND (part_key <= 160))


I know, that I could get rid of this problem, by rewriting the query to include 
the partitioned table in the where clause like this:
WHERE fact.part_key >= 210 and fact.part_key <= 260
Partition pruning happens very nicely then.

Unfortunately this is not an option for us, because the code in our case is 
generated by some third party software (sigh).

Do you have any suggestions, what else I could do? (Or maybe you could add it 
as a new feature for v14 😉)?

Regards,
Sigrid


Re: Understanding bad estimate (related to FKs?)

2020-11-03 Thread Philip Semanchuk



> On Nov 2, 2020, at 6:09 PM, Michael Lewis  wrote:
> 
> The query I asked about in the original post of this thread has 13 relations 
> in it. IIUC, that's 13! or > 6 billion possible plans. How did the planner 
> pick one plan out of 6 billion? I'm curious, both for practical purposes (I 
> want my query to run well) and also because it's fascinating.
> 
> Have you increased geqo_threshold and join_collapse_limit from the defaults?


Yes, thanks you, I should have said that. We avoid the GEQO, so 
geqo_threshold=25, and join_collapse_limit=from_collapse_limit=24. We tend to 
have long running queries, so we’re happy to pay a few seconds of  extra 
planner cost to increase the likelihood of getting a better plan.

Cheers
Philip



Re: Understanding bad estimate (related to FKs?)

2020-11-03 Thread Philip Semanchuk



> On Nov 2, 2020, at 10:17 PM, Tomas Vondra  
> wrote:
> 
> On Mon, Nov 02, 2020 at 03:08:12PM -0500, Tom Lane wrote:
>> Philip Semanchuk  writes:
>>> The query I asked about in the original post of this thread has 13 
>>> relations in it. IIUC, that's 13! or > 6 billion possible plans. How did 
>>> the planner pick one plan out of 6 billion? I'm curious, both for practical 
>>> purposes (I want my query to run well) and also because it's fascinating.
>> 
>> The twenty-thousand-foot overview is
>> 
>> https://www.postgresql.org/docs/devel/planner-optimizer.html
>> 
>> and then ten-thousand-foot level is the planner README file,
>> 
>> https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob_plain;f=src/backend/optimizer/README;hb=HEAD
>> 
>> and then you pretty much gotta start reading code.  You could also dig
>> into various planner expository talks that people have given at PG
>> conferences.  I don't have links at hand, but there have been several.
>> 
> 
> Yeah. The jump from high-level overviews to reading source code is a bit
> brutal, though ...
> 
> 
> FWIW a short list of relevant talks I'm aware of & would recommend:
> 
> * Explaining the Postgres Query Optimizer [Bruce Momjian]
>  https://www.postgresql.org/files/developer/tour.pdf
> 
> * Intro to Postgres Planner Hacking [Melanie Plageman]
>  https://www.pgcon.org/2019/schedule/events/1379.en.html
> 
> * Learning to Hack on Postgres Planner [Melanie Plageman]
>  
> https://www.pgcon.org/2019/schedule/attachments/540_debugging_planner_pgcon2019_v4.pdf
> 
> * What’s in a Plan? [Robert Haas]
>  
> https://www.postgresql.eu/events/pgconfeu2019/schedule/session/2741-whats-in-a-plan/
>  * A Tour of PostgreSQL Internals [Tom Lane]
>  https://www.postgresql.org/files/developer/tour.pdf
> 
> * Inside thePostgreSQL Query Optimizer [Neil Conway]
>  http://www.neilconway.org/talks/optimizer/optimizer.pdf
> 
> Some are a bit dated, but the overall principles don't change much.


Thank you so much to Tomas V, Tom L, Guillaume, Justin, and Michael for all the 
suggestions and direction. I really appreciate your time & wisdom (not to 
mention your contributions to Postgres!)

Cheers
Philip












Re: Partition pruning with joins

2020-11-03 Thread Laurenz Albe
On Tue, 2020-11-03 at 13:20 +, Ehrenreich, Sigrid wrote:
> I would like to join a partitioned table and have the joined columns in the 
> where clause to be used for partition pruning.
> From some readings in the internet, I conclude that this was not possible in 
> v12. I hoped for the
>  “improvements in partition pruning” in v13, but it seems to me, that it is 
> still not possible, or is it and I am missing something here?
> 
> My testcase:
> create table fact (part_key integer) partition by range (part_key);
> create table fact_100 partition of fact for values from (1) to (101);
> create table fact_200 partition of fact for values from (101) to (201);
> 
> insert into fact (part_key) select floor(random()*100+1) from 
> generate_series(1,1);
> insert into fact (part_key) select floor(random()*100+101) from 
> generate_series(1,1);
> 
> create table dim as (select distinct part_key from fact);
> create unique index on dim (part_key);
> 
> analyze fact;
> analyze dim;
> 
> -- Statement
> explain SELECT
> count(*)
> FROM
> dim INNER JOIN fact ON (dim.part_key=fact.part_key)
> WHERE dim.part_key >= 110 and dim.part_key <= 160;
> 
> Plan shows me, that all partitions are scanned:
> Aggregate  (cost=461.00..461.01 rows=1 width=8)
>   ->  Hash Join  (cost=4.64..448.25 rows=5100 width=0)
> Hash Cond: (fact.part_key = dim.part_key)
> ->  Append  (cost=0.00..390.00 rows=2 width=4)
>   ->  Seq Scan on fact_100 fact_1  (cost=0.00..145.00 rows=1 
> width=4)  ⇐ unnecessarily scanned
>   ->  Seq Scan on fact_200 fact_2  (cost=0.00..145.00 rows=1 
> width=4)
> ->  Hash  (cost=4.00..4.00 rows=51 width=4)
>   ->  Seq Scan on dim  (cost=0.00..4.00 rows=51 width=4)
> Filter: ((part_key >= 110) AND (part_key <= 160))

One thing you could try is to partition "dim" just like "fact" and
set "enable_partitionwise_join = on".

I didn't test it, but that might do the trick.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com