Partition pruning is not happening (even in PG18)
Hi super-experts,
I am trying to solve a mystery for a customer where they had some very
large heap tables (500GB+ each) on their PG 13 database which was bringing
them loads of performance problems.
Apart from the regular server tuning efforts, I suggested them to upgrade
it to at least Postgresql 15 and partition these large tables. They've
accepted it and they have applied my suggestions.
However, partition pruning is not happening. Instead, PG is doing
sequential scans against all partitions. I thought it was PG 15 behaviour,
so I tested the same on PG 17 and compiled PG18rc1, getting the same
results. I read on the release notes of PG18 that partition pruning had
many fixes but it seems this one is not in place.
The only way I found to make pruning work is to force index_scan using
pg_hint_plan, but I wanted to influence the planner to decide it by itself
rather than relying on hints. What's the reason for this misbehaving and
what could I do to overcome it?
*Relevant parameters/info:*
DB cluster hosted on Azure Flex Server
random_page_cost=1.1;
plan_cache_mode=force_custom_plan; *# the so called magic-bullet for
pruning*
default_statistics_target=1000;* # better stats to the planner*
geqo_effort=7; *# I wanted the planner to "think" better and do pruning*
enable_partition_pruning=on;
*Test case*
*-- Tables def*
postgres=> \d+ accounts
Partitioned table
"public.accounts"
Column | Type | Collation | Nullable |Default
| Storage | Compression | Stats target | Description
--+-+---+--+---+--+-+--+-
aid | integer | | not null |
nextval('accounts_aid_seq'::regclass) | plain| |
|
bid | integer | | |
| plain| | |
abalance | integer | | |
| plain| | |
filler | text| | |
| extended | | |
transaction_date | date| | not null |
| plain| | |
Partition key: RANGE (transaction_date)
Indexes:
"accounts_pkey" PRIMARY KEY, btree (transaction_date, aid)
Partitions: accounts_p1 FOR VALUES FROM ('2025-01-01') TO ('2025-02-01'),
accounts_p10 FOR VALUES FROM ('2025-10-01') TO ('2025-11-01'),
accounts_p11 FOR VALUES FROM ('2025-11-01') TO ('2025-12-01'),
accounts_p12 FOR VALUES FROM ('2025-12-01') TO ('2026-01-01'),
accounts_p2 FOR VALUES FROM ('2025-02-01') TO ('2025-03-01'),
accounts_p3 FOR VALUES FROM ('2025-03-01') TO ('2025-04-01'),
accounts_p4 FOR VALUES FROM ('2025-04-01') TO ('2025-05-01'),
accounts_p5 FOR VALUES FROM ('2025-05-01') TO ('2025-06-01'),
accounts_p6 FOR VALUES FROM ('2025-06-01') TO ('2025-07-01'),
accounts_p7 FOR VALUES FROM ('2025-07-01') TO ('2025-08-01'),
accounts_p8 FOR VALUES FROM ('2025-08-01') TO ('2025-09-01'),
accounts_p9 FOR VALUES FROM ('2025-09-01') TO ('2025-10-01')
postgres=> \d+ t2
Table "public.t2"
Column | Type | Collation | Nullable | Default | Storage | Compression |
Stats target | Description
+--+---+--+-+-+-+--+-
dt_col | date | | | | plain | |
|
Access method: heap
*-- Executing test*
*-- Regular execution*
postgres=> explain analyze
select aid, abalance
from accounts
where transaction_date in (select dt_col from t2);
QUERY PLAN
Gather (cost=1001.23..2073502.05 rows=2767123 width=8) (actual
time=3.574..211619.991 rows=1664742 loops=1)
Workers Planned: 2
Workers Launched: 1
-> Hash Semi Join (cost=1.23..1795789.75 rows=1152968 width=8) (actual
time=77575.241..203629.828 rows=832371 loops=2)
Hash Cond: (accounts.transaction_date = t2.dt_col)
-> Parallel Append (cost=0.00..1672493.00 rows=42083334
width=12) (actual time=0.645..197115.678 rows=5050 loops=2)
-> Parallel Seq Scan on accounts_p10 accounts_10
(cost=0.00..124553.77 rows=3585078 width=12) (actual time=0.560..30193.541
rows=8604186 loops=1)
-> Parallel Seq Scan on accounts_p5 accounts_5
(cost=0.00..124550.68 rows=3584968 width=12) (actual
time=56.415..54334.025 rows=8603923 loops=1)
-> Parallel Seq Scan on accounts_p8 accounts_8
(cost=0.00..124535.20 rows=3584520 width=12) (actual time=1.738..31555.264
rows=8602847 loops=1)
Re: Partition pruning is not happening (even in PG18)
Hi,
Partition pruning is happening: pruned nodes are marked as “never executed”.
It is just that pruning is performed not by the planner but by the executor in
this case.
—
Michał
> On 25 Sep 2025, at 21:49, Lauro Ojeda wrote:
>
>
> Hi super-experts,
> I am trying to solve a mystery for a customer where they had some very large
> heap tables (500GB+ each) on their PG 13 database which was bringing them
> loads of performance problems.
> Apart from the regular server tuning efforts, I suggested them to upgrade it
> to at least Postgresql 15 and partition these large tables. They've accepted
> it and they have applied my suggestions.
> However, partition pruning is not happening. Instead, PG is doing sequential
> scans against all partitions. I thought it was PG 15 behaviour, so I tested
> the same on PG 17 and compiled PG18rc1, getting the same results. I read on
> the release notes of PG18 that partition pruning had many fixes but it seems
> this one is not in place.
> The only way I found to make pruning work is to force index_scan using
> pg_hint_plan, but I wanted to influence the planner to decide it by itself
> rather than relying on hints. What's the reason for this misbehaving and what
> could I do to overcome it?
>
> Relevant parameters/info:
> DB cluster hosted on Azure Flex Server
> random_page_cost=1.1;
> plan_cache_mode=force_custom_plan; # the so called magic-bullet for pruning
> default_statistics_target=1000; # better stats to the planner
> geqo_effort=7; # I wanted the planner to "think" better and do pruning
> enable_partition_pruning=on;
>
> Test case
> -- Tables def
> postgres=> \d+ accounts
>Partitioned table
> "public.accounts"
> Column | Type | Collation | Nullable |Default
> | Storage | Compression | Stats target | Description
> --+-+---+--+---+--+-+--+-
> aid | integer | | not null |
> nextval('accounts_aid_seq'::regclass) | plain| |
> |
> bid | integer | | |
> | plain| | |
> abalance | integer | | |
> | plain| | |
> filler | text| | |
> | extended | | |
> transaction_date | date| | not null |
> | plain| | |
> Partition key: RANGE (transaction_date)
> Indexes:
> "accounts_pkey" PRIMARY KEY, btree (transaction_date, aid)
> Partitions: accounts_p1 FOR VALUES FROM ('2025-01-01') TO ('2025-02-01'),
> accounts_p10 FOR VALUES FROM ('2025-10-01') TO ('2025-11-01'),
> accounts_p11 FOR VALUES FROM ('2025-11-01') TO ('2025-12-01'),
> accounts_p12 FOR VALUES FROM ('2025-12-01') TO ('2026-01-01'),
> accounts_p2 FOR VALUES FROM ('2025-02-01') TO ('2025-03-01'),
> accounts_p3 FOR VALUES FROM ('2025-03-01') TO ('2025-04-01'),
> accounts_p4 FOR VALUES FROM ('2025-04-01') TO ('2025-05-01'),
> accounts_p5 FOR VALUES FROM ('2025-05-01') TO ('2025-06-01'),
> accounts_p6 FOR VALUES FROM ('2025-06-01') TO ('2025-07-01'),
> accounts_p7 FOR VALUES FROM ('2025-07-01') TO ('2025-08-01'),
> accounts_p8 FOR VALUES FROM ('2025-08-01') TO ('2025-09-01'),
> accounts_p9 FOR VALUES FROM ('2025-09-01') TO ('2025-10-01')
>
> postgres=> \d+ t2
> Table "public.t2"
> Column | Type | Collation | Nullable | Default | Storage | Compression |
> Stats target | Description
> +--+---+--+-+-+-+--+-
> dt_col | date | | | | plain | |
> |
> Access method: heap
>
> -- Executing test
> -- Regular execution
> postgres=> explain analyze
> select aid, abalance
> from accounts
> where transaction_date in (select dt_col from t2);
>
> QUERY PLAN
>
> Gather (cost=1001.23..2073502.05 rows=2767123 width=8) (actual
> time=3.574..211619.991 rows=1664742 loops=1)
>Workers Planned: 2
>Workers Launched: 1
>-> Hash Semi Join (cost=1.23..1795789.75 rows=1152968 width=8) (actual
> time=77575.241..203629.828 rows=832371 loops=2)
> Hash Cond: (accounts.transaction_date = t2.dt_col)
> -> Parallel
Re: Partition pruning is not happening (even in PG18)
Never mind my message. I misread it and missed the plan with no hints.
Michał
> On 25 Sep 2025, at 22:10, Michał Kłeczek wrote:
>
>
> Hi,
>
> Partition pruning is happening: pruned nodes are marked as “never executed”.
> It is just that pruning is performed not by the planner but by the executor
> in this case.
>
> —
>
> Michał
>
>>> On 25 Sep 2025, at 21:49, Lauro Ojeda wrote:
>>>
>>
>> Hi super-experts,
>> I am trying to solve a mystery for a customer where they had some very large
>> heap tables (500GB+ each) on their PG 13 database which was bringing them
>> loads of performance problems.
>> Apart from the regular server tuning efforts, I suggested them to upgrade it
>> to at least Postgresql 15 and partition these large tables. They've accepted
>> it and they have applied my suggestions.
>> However, partition pruning is not happening. Instead, PG is doing
>> sequential scans against all partitions. I thought it was PG 15 behaviour,
>> so I tested the same on PG 17 and compiled PG18rc1, getting the same
>> results. I read on the release notes of PG18 that partition pruning had many
>> fixes but it seems this one is not in place.
>> The only way I found to make pruning work is to force index_scan using
>> pg_hint_plan, but I wanted to influence the planner to decide it by itself
>> rather than relying on hints. What's the reason for this misbehaving and
>> what could I do to overcome it?
>>
>> Relevant parameters/info:
>> DB cluster hosted on Azure Flex Server
>> random_page_cost=1.1;
>> plan_cache_mode=force_custom_plan; # the so called magic-bullet for pruning
>> default_statistics_target=1000; # better stats to the planner
>> geqo_effort=7; # I wanted the planner to "think" better and do pruning
>> enable_partition_pruning=on;
>>
>> Test case
>> -- Tables def
>> postgres=> \d+ accounts
>>Partitioned table
>> "public.accounts"
>> Column | Type | Collation | Nullable |Default
>> | Storage | Compression | Stats target | Description
>> --+-+---+--+---+--+-+--+-
>> aid | integer | | not null |
>> nextval('accounts_aid_seq'::regclass) | plain| |
>> |
>> bid | integer | | |
>> | plain| | |
>> abalance | integer | | |
>> | plain| | |
>> filler | text| | |
>> | extended | | |
>> transaction_date | date| | not null |
>> | plain| | |
>> Partition key: RANGE (transaction_date)
>> Indexes:
>> "accounts_pkey" PRIMARY KEY, btree (transaction_date, aid)
>> Partitions: accounts_p1 FOR VALUES FROM ('2025-01-01') TO ('2025-02-01'),
>> accounts_p10 FOR VALUES FROM ('2025-10-01') TO ('2025-11-01'),
>> accounts_p11 FOR VALUES FROM ('2025-11-01') TO ('2025-12-01'),
>> accounts_p12 FOR VALUES FROM ('2025-12-01') TO ('2026-01-01'),
>> accounts_p2 FOR VALUES FROM ('2025-02-01') TO ('2025-03-01'),
>> accounts_p3 FOR VALUES FROM ('2025-03-01') TO ('2025-04-01'),
>> accounts_p4 FOR VALUES FROM ('2025-04-01') TO ('2025-05-01'),
>> accounts_p5 FOR VALUES FROM ('2025-05-01') TO ('2025-06-01'),
>> accounts_p6 FOR VALUES FROM ('2025-06-01') TO ('2025-07-01'),
>> accounts_p7 FOR VALUES FROM ('2025-07-01') TO ('2025-08-01'),
>> accounts_p8 FOR VALUES FROM ('2025-08-01') TO ('2025-09-01'),
>> accounts_p9 FOR VALUES FROM ('2025-09-01') TO ('2025-10-01')
>>
>> postgres=> \d+ t2
>> Table "public.t2"
>> Column | Type | Collation | Nullable | Default | Storage | Compression |
>> Stats target | Description
>> +--+---+--+-+-+-+--+-
>> dt_col | date | | | | plain | |
>> |
>> Access method: heap
>>
>> -- Executing test
>> -- Regular execution
>> postgres=> explain analyze
>> select aid, abalance
>> from accounts
>> where transaction_date in (select dt_col from t2);
>>
>> QUERY PLAN
>>
>> Gather (cost=1001.23..2073502.05 rows=2767123 width=8) (actual
>> time=3.574..211619.991 rows=1664742 loops=1)
>>Workers Plann
Re: Partition pruning is not happening (even in PG18)
On Fri, 26 Sept 2025 at 07:49, Lauro Ojeda wrote: > The only way I found to make pruning work is to force index_scan using > pg_hint_plan, but I wanted to influence the planner to decide it by itself > rather than relying on hints. What's the reason for this misbehaving and what > could I do to overcome it? > Partition key: RANGE (transaction_date) > postgres=> explain analyze > select aid, abalance > from accounts > where transaction_date in (select dt_col from t2); The only partition pruning that exists in PostgreSQL that can prune for that query is for parameterised Nested Loop joins. For Hash Join, it's been talked about, but this requires running the partition pruning code for every values that goes into the Hash Table and only scanning the unioned set of those partitions during the hash probe phase. The trouble with that is that it's very hard to know in advance if it'll be worth the extra effort. Putting a tuple into a hash table is quite cheap. Running the pruning code for a range partitioned table is likely to be a few times more costly than the hash insert (depending on how many partitions there are), so if the end result is that nothing was pruned, then that's quite a bit of extra effort for no gain. What we maybe could do better is reduce the cost of the Append scan when there's a run-time pruning object attached. This is a little tricky as we currently only build that object when creating the final plan. To include that in the costs we'd need to move that to the Path generation phase so that we didn't accidentally reject Paths which could be cheaper than we first think. > Also, how could I contribute to get this partition pruning to work? The pgsql-hackers mailing list is where all the discussions about that happen. There is plenty of past discussions on these topics. One such (fairly) recent discussion is in [1]. There are plenty more, including some ideas from Robert Haas about how we might cost run-time partition pruning. That was likely around 2017-2018 range, so you might need to dig deep to find that. David [1] https://www.postgresql.org/message-id/flat/CAApHDvoC7n_oceb%3D8z%2BMY8sTgH4xa%2ByAwBxZ4Dxv8pwkT9bOcA%40mail.gmail.com#45314d3d01ef8ad1eebe72111989062c
Re: Indexes on expressions with multiple columns and operators
On 9/23/25 12:20, Frédéric Yhuel wrote: On 9/22/25 23:15, Andrei Lepikhov wrote: I'm not sure I fully understand your case, but SQL Server demonstrates an interesting approach: they have a WHERE clause attached to statistics. So, having implemented this, you may separate the whole range of values inside the table into 'partitions' by such a WHERE condition. Yes, from what I understood of the documentation [1], this is exactly what I would like! I've tested it and I can confirm that it works very well. So, on SQL Server, you can do this: CREATE STATISTICS FooStats ON foo (ackid, crit) WHERE crit = 'WARNING'; It would be great to have a similar feature in PostgreSQL.
Re: Indexes on expressions with multiple columns and operators
On 9/23/25 15:31, Frédéric Yhuel wrote: To get back to the topic of partitioned statistics, do you know if SQL Server is smart enough to handle this case [1] that we discussed last year? (with filtered statistics) [1] https://www.postgresql.org/message-id/flat/b860c71a-7cab-4d88- ad87-8c1f2eea9ae8%40dalibo.com Sorry, it doesn't make any sense. First of all, it's not possible to do something like this with SQL Sever: CREATE STATISTICS OrdersStats ON orders (id, product_id) WHERE product_id IN (SELECT id FROM products WHERE name = 'babar'); this is because you need to use simple scalar expressions in the filter clause. An even if it were possible... it would be completely useless in this case. Sorry for the noise.
