Re: Strange runtime partition pruning behaviour with 11.4
I too am a bit perplexed by why runtime partition pruning does not seem to work with this example. Anybody got any ideas of this? Regards, Michael Vitale Thomas Kellerer wrote on 8/2/2019 9:58 AM: I stumbled across this question on SO: https://stackoverflow.com/questions/56517852 Disregarding the part about Postgres 9.3, the example for Postgres 11 looks a bit confusing. There is a script to setup test data in that question: start of script create table foo ( foo_id integer not null, foo_name varchar(10), constraint foo_pkey primary key (foo_id) ); insert into foo (foo_id, foo_name) values (1, 'eeny'), (2, 'meeny'), (3, 'miny'), (4, 'moe'), (5, 'tiger'), (6, 'toe'); create table foo_bar_baz ( foo_id integer not null, bar_id integer not null, bazinteger not null, constraint foo_bar_baz_pkey primary key (foo_id, bar_id, baz), constraint foo_bar_baz_fkey1 foreign key (foo_id) references foo (foo_id) ) partition by range (foo_id) ; create table if not exists foo_bar_baz_0 partition of foo_bar_baz for values from (0) to (1); create table if not exists foo_bar_baz_1 partition of foo_bar_baz for values from (1) to (2); create table if not exists foo_bar_baz_2 partition of foo_bar_baz for values from (2) to (3); create table if not exists foo_bar_baz_3 partition of foo_bar_baz for values from (3) to (4); create table if not exists foo_bar_baz_4 partition of foo_bar_baz for values from (4) to (5); create table if not exists foo_bar_baz_5 partition of foo_bar_baz for values from (5) to (6); with foos_and_bars as ( select ((random() * 4) + 1)::int as foo_id, bar_id::int from generate_series(0, 1499) as t(bar_id) ), bazzes as ( select baz::int from generate_series(1, 1500) as t(baz) ) insert into foo_bar_baz (foo_id, bar_id, baz) select foo_id, bar_id, baz from bazzes as bz join foos_and_bars as fab on mod(bz.baz, fab.foo_id) = 0; end of script I see the some strange behaviour similar to to what is reported in the comments to that question: When I run the test query immediately after populating the tables with the sample data: explain analyze select count(*) from foo_bar_baz as fbb join foo on fbb.foo_id = foo.foo_id where foo.foo_name = 'eeny' I do see an "Index Only Scan (never executed)" in the plan for the irrelevant partitions: https://explain.depesz.com/s/AqlE However once I run "analyze foo_bar_baz" (or "vacuum analyze"), Postgres chooses to do a "Parallel Seq Scan" for each partition: https://explain.depesz.com/s/WwxE Why does updating the statistics mess up (runtime) partition pruning? I played around with random_page_cost and that didn't change anything. I tried to create extended statistics on "foo(id, name)" so that the planner would no, that there is only one name per id. No change. I saw the above behaviour when running this on Windows 10 (my Laptop) or CentOS 7 (a test environment on a VM) On the CentOS server default_statistics_target is set to 100, on my laptop it is set to 1000 In both cases the Postgres version was 11.4 Any ideas? Thomas
Re: Strange runtime partition pruning behaviour with 11.4
Hi, Am 03.08.19 um 15:16 schrieb MichaelDBA: I too am a bit perplexed by why runtime partition pruning does not seem to work with this example. Anybody got any ideas of this? please don't top-posting. it's posible to rewrite the query to: test=# explain analyse select count(*) from foo_bar_baz as fbb where foo_id = (select foo_id from foo where foo_name = 'eeny'); QUERY PLAN --- Finalize Aggregate (cost=15880.63..15880.64 rows=1 width=8) (actual time=48.447..48.448 rows=1 loops=1) InitPlan 1 (returns $0) -> Seq Scan on foo (cost=0.00..24.50 rows=6 width=4) (actual time=0.243..0.246 rows=1 loops=1) Filter: ((foo_name)::text = 'eeny'::text) Rows Removed by Filter: 5 -> Gather (cost=15855.92..15856.13 rows=2 width=8) (actual time=48.376..51.468 rows=3 loops=1) Workers Planned: 2 Params Evaluated: $0 Workers Launched: 2 -> Partial Aggregate (cost=14855.92..14855.93 rows=1 width=8) (actual time=42.600..42.600 rows=1 loops=3) -> Parallel Append (cost=0.00..13883.01 rows=389162 width=0) (actual time=0.139..34.914 rows=83500 loops=3) -> Parallel Bitmap Heap Scan on foo_bar_baz_0 fbb (cost=4.23..14.73 rows=6 width=0) (never executed) Recheck Cond: (foo_id = $0) -> Bitmap Index Scan on foo_bar_baz_0_pkey (cost=0.00..4.23 rows=10 width=0) (never executed) Index Cond: (foo_id = $0) -> Parallel Seq Scan on foo_bar_baz_2 fbb_2 (cost=0.00..3865.72 rows=178218 width=0) (never executed) Filter: (foo_id = $0) -> Parallel Seq Scan on foo_bar_baz_1 fbb_1 (cost=0.00..3195.62 rows=147250 width=0) (actual time=0.129..24.735 rows=83500 loops=3) Filter: (foo_id = $0) -> Parallel Seq Scan on foo_bar_baz_3 fbb_3 (cost=0.00..2334.49 rows=107559 width=0) (never executed) Filter: (foo_id = $0) -> Parallel Seq Scan on foo_bar_baz_4 fbb_4 (cost=0.00..1860.95 rows=85756 width=0) (never executed) Filter: (foo_id = $0) -> Parallel Seq Scan on foo_bar_baz_5 fbb_5 (cost=0.00..665.69 rows=30615 width=0) (never executed) Filter: (foo_id = $0) Planning Time: 12.648 ms Execution Time: 52.621 ms (27 rows) test=*# I know, that's not a solution, but a workaround. :-( (pg 12beta2 and also with PostgreSQL 11.4 (2ndQPG 11.4r1.6.7)) Regards, Andreas -- 2ndQuadrant - The PostgreSQL Support Company. www.2ndQuadrant.com
Re: Strange runtime partition pruning behaviour with 11.4
it's posible to rewrite the query to: test=# explain analyse select count(*) from foo_bar_baz as fbb where foo_id = (select foo_id from foo where foo_name = 'eeny'); I know, that's not a solution, but a workaround. :-( Yes, I discovered that as well. But I'm more confused (or concerned) by the fact that the (original) query works correctly *without* statistics. Thomas
Re: Strange runtime partition pruning behaviour with 11.4
Am 03.08.19 um 16:06 schrieb Thomas Kellerer: it's posible to rewrite the query to: test=# explain analyse select count(*) from foo_bar_baz as fbb where foo_id = (select foo_id from foo where foo_name = 'eeny'); I know, that's not a solution, but a workaround. :-( Yes, I discovered that as well. But I'm more confused (or concerned) by the fact that the (original) query works correctly *without* statistics. Thomas can't reproduce that :-( (PG 11.4 Community) (all in a file and executed the explain immediately) Regards, Andreas -- 2ndQuadrant - The PostgreSQL Support Company. www.2ndQuadrant.com
Re: Strange runtime partition pruning behaviour with 11.4
Andreas Kretschmer writes: > Am 03.08.19 um 16:06 schrieb Thomas Kellerer: >> But I'm more confused (or concerned) by the fact that the (original) >> query works correctly *without* statistics. > can't reproduce that :-( (PG 11.4 Community) Yeah, I get the same plan with or without ANALYZE, too. In this example, having the ANALYZE stats barely moves the rowcount estimates for foo_bar_baz at all, so it's not surprising that the plan doesn't change. (I do wonder how Thomas got a different outcome...) Given the shape of the preferred plan: Finalize Aggregate (cost=15779.59..15779.60 rows=1 width=8) (actual time=160.329..160.330 rows=1 loops=1) -> Gather (cost=15779.38..15779.59 rows=2 width=8) (actual time=160.011..161.712 rows=3 loops=1) Workers Planned: 2 Workers Launched: 2 -> Partial Aggregate (cost=14779.38..14779.39 rows=1 width=8) (actual time=154.675..154.675 rows=1 loops=3) -> Hash Join (cost=1.09..14612.90 rows=66590 width=0) (actual time=86.814..144.793 rows=100500 loops=3) Hash Cond: (fbb_1.foo_id = foo.foo_id) -> Parallel Append (cost=0.00..12822.21 rows=399537 width=4) (actual time=0.019..95.644 rows=318950 loops=3) -> Parallel Seq Scan on foo_bar_baz_1 fbb_1 (cost=0.00..3403.53 rows=177353 width=4) (actual time=0.012..18.881 rows=100500 loops=3) -> Parallel Seq Scan on foo_bar_baz_2 fbb_2 (cost=0.00..3115.53 rows=162353 width=4) (actual time=0.018..51.716 rows=276000 loops=1) -> Parallel Seq Scan on foo_bar_baz_3 fbb_3 (cost=0.00..2031.82 rows=105882 width=4) (actual time=0.011..16.854 rows=9 loops=2) -> Parallel Seq Scan on foo_bar_baz_4 fbb_4 (cost=0.00..1584.00 rows=82500 width=4) (actual time=0.011..26.950 rows=140250 loops=1) -> Parallel Seq Scan on foo_bar_baz_5 fbb_5 (cost=0.00..667.65 rows=34765 width=4) (actual time=0.014..11.896 rows=59100 loops=1) -> Parallel Seq Scan on foo_bar_baz_0 fbb (cost=0.00..22.00 rows=1200 width=4) (actual time=0.001..0.001 rows=0 loops=1) -> Hash (cost=1.07..1.07 rows=1 width=4) (actual time=0.038..0.038 rows=1 loops=3) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Seq Scan on foo (cost=0.00..1.07 rows=1 width=4) (actual time=0.021..0.023 rows=1 loops=3) Filter: ((foo_name)::text = 'eeny'::text) Rows Removed by Filter: 5 it's obvious that no pruning can happen, run-time or otherwise, because the partitioned table is being scanned on the outside of the join --- so the target value of foo_id isn't available. We can force the planner to its second best choice with set enable_hashjoin to 0; and then we get Aggregate (cost=31954.09..31954.10 rows=1 width=8) (actual time=420.158..420.158 rows=1 loops=1) -> Nested Loop (cost=0.00..31554.55 rows=159815 width=0) (actual time=0.058..389.974 rows=301500 loops=1) Join Filter: (fbb.foo_id = foo.foo_id) Rows Removed by Join Filter: 655350 -> Seq Scan on foo (cost=0.00..1.07 rows=1 width=4) (actual time=0.025..0.028 rows=1 loops=1) Filter: ((foo_name)::text = 'eeny'::text) Rows Removed by Filter: 5 -> Append (cost=0.00..19567.35 rows=958890 width=4) (actual time=0.026..280.510 rows=956850 loops=1) -> Seq Scan on foo_bar_baz_0 fbb (cost=0.00..30.40 rows=2040 width=4) (actual time=0.003..0.003 rows=0 loops=1) -> Seq Scan on foo_bar_baz_1 fbb_1 (cost=0.00..4645.00 rows=301500 width=4) (actual time=0.022..57.836 rows=301500 loops=1) -> Seq Scan on foo_bar_baz_2 fbb_2 (cost=0.00..4252.00 rows=276000 width=4) (actual time=0.019..51.834 rows=276000 loops=1) -> Seq Scan on foo_bar_baz_3 fbb_3 (cost=0.00..2773.00 rows=18 width=4) (actual time=0.016..31.951 rows=18 loops=1) -> Seq Scan on foo_bar_baz_4 fbb_4 (cost=0.00..2161.50 rows=140250 width=4) (actual time=0.015..24.392 rows=140250 loops=1) -> Seq Scan on foo_bar_baz_5 fbb_5 (cost=0.00..911.00 rows=59100 width=4) (actual time=0.012..10.252 rows=59100 loops=1) This is a good deal slower, and the planner correctly estimates that it's a good deal slower, so that's why it didn't get picked. But ... why didn't any run-time pruning happen? Because the shape of the plan is still wrong: the join condition is being applied at the nestloop node. If we'd pushed down the foo_id condition to the foo_bar_baz scan then there'd be hope of pruning. I think the reason that that isn't happening is that the planner has not been taught that run-time pruning is a thing, so it's not giving any cost preference to doing things in a way that would enable that.
Re: Strange runtime partition pruning behaviour with 11.4
I too got the same plan (non runtime partition pruning plan) with or without the statistics. So it looks like the workaround until this is fixed is to re-arrange the query to do a subselect to force the runtime partition pruning as Andreas suggested, which I tested and indeed does work for me too! Regards, Michael Vitale Thomas Kellerer wrote on 8/3/2019 10:06 AM: it's posible to rewrite the query to: test=# explain analyse select count(*) from foo_bar_baz as fbb where foo_id = (select foo_id from foo where foo_name = 'eeny'); I know, that's not a solution, but a workaround. :-( Yes, I discovered that as well. But I'm more confused (or concerned) by the fact that the (original) query works correctly *without* statistics. Thomas
