date:20250925

Partition pruning is not happening (even in PG18)

2025-09-25 Thread Lauro Ojeda

Hi super-experts,
I am trying to solve a mystery for a customer where they had some very
large heap tables (500GB+ each) on their PG 13 database which was bringing
them loads of performance problems.
Apart from the regular server tuning efforts, I suggested them to upgrade
it to at least Postgresql 15 and partition these large tables. They've
accepted it and they have applied my suggestions.
However, partition  pruning is not happening. Instead, PG is doing
sequential scans against all partitions. I thought it was PG 15 behaviour,
so I tested the same on PG 17 and compiled PG18rc1, getting the same
results. I read on the release notes of PG18 that partition pruning had
many fixes but it seems this one is not in place.
The only way I found to make pruning work is to force index_scan using
pg_hint_plan, but I wanted to influence the planner to decide it by itself
rather than relying on hints. What's the reason for this misbehaving and
what could I do to overcome it?

*Relevant parameters/info:*
DB cluster hosted on Azure Flex Server
random_page_cost=1.1;
plan_cache_mode=force_custom_plan; *# the so called magic-bullet for
pruning*
default_statistics_target=1000;* # better stats to the planner*
geqo_effort=7; *# I wanted the planner to "think" better and do pruning*
enable_partition_pruning=on;

*Test case*
*-- Tables def*
postgres=> \d+ accounts
   Partitioned table
"public.accounts"
  Column  |  Type   | Collation | Nullable |Default
   | Storage  | Compression | Stats target | Description
--+-+---+--+---+--+-+--+-
 aid  | integer |   | not null |
nextval('accounts_aid_seq'::regclass) | plain| |
   |
 bid  | integer |   |  |
| plain| |  |
 abalance | integer |   |  |
| plain| |  |
 filler   | text|   |  |
| extended | |  |
 transaction_date | date|   | not null |
| plain| |  |
Partition key: RANGE (transaction_date)
Indexes:
"accounts_pkey" PRIMARY KEY, btree (transaction_date, aid)
Partitions: accounts_p1 FOR VALUES FROM ('2025-01-01') TO ('2025-02-01'),
accounts_p10 FOR VALUES FROM ('2025-10-01') TO ('2025-11-01'),
accounts_p11 FOR VALUES FROM ('2025-11-01') TO ('2025-12-01'),
accounts_p12 FOR VALUES FROM ('2025-12-01') TO ('2026-01-01'),
accounts_p2 FOR VALUES FROM ('2025-02-01') TO ('2025-03-01'),
accounts_p3 FOR VALUES FROM ('2025-03-01') TO ('2025-04-01'),
accounts_p4 FOR VALUES FROM ('2025-04-01') TO ('2025-05-01'),
accounts_p5 FOR VALUES FROM ('2025-05-01') TO ('2025-06-01'),
accounts_p6 FOR VALUES FROM ('2025-06-01') TO ('2025-07-01'),
accounts_p7 FOR VALUES FROM ('2025-07-01') TO ('2025-08-01'),
accounts_p8 FOR VALUES FROM ('2025-08-01') TO ('2025-09-01'),
accounts_p9 FOR VALUES FROM ('2025-09-01') TO ('2025-10-01')

postgres=> \d+ t2
  Table "public.t2"
 Column | Type | Collation | Nullable | Default | Storage | Compression |
Stats target | Description
+--+---+--+-+-+-+--+-
 dt_col | date |   |  | | plain   | |
   |
Access method: heap

*-- Executing test*
*-- Regular execution*
postgres=> explain analyze
select aid, abalance
from accounts
where transaction_date in (select dt_col from t2);

 QUERY PLAN

 Gather  (cost=1001.23..2073502.05 rows=2767123 width=8) (actual
time=3.574..211619.991 rows=1664742 loops=1)
   Workers Planned: 2
   Workers Launched: 1
   ->  Hash Semi Join  (cost=1.23..1795789.75 rows=1152968 width=8) (actual
time=77575.241..203629.828 rows=832371 loops=2)
 Hash Cond: (accounts.transaction_date = t2.dt_col)
 ->  Parallel Append  (cost=0.00..1672493.00 rows=42083334
width=12) (actual time=0.645..197115.678 rows=5050 loops=2)
   ->  Parallel Seq Scan on accounts_p10 accounts_10
 (cost=0.00..124553.77 rows=3585078 width=12) (actual time=0.560..30193.541
rows=8604186 loops=1)
   ->  Parallel Seq Scan on accounts_p5 accounts_5
 (cost=0.00..124550.68 rows=3584968 width=12) (actual
time=56.415..54334.025 rows=8603923 loops=1)
   ->  Parallel Seq Scan on accounts_p8 accounts_8
 (cost=0.00..124535.20 rows=3584520 width=12) (actual time=1.738..31555.264
rows=8602847 loops=1)

Re: Partition pruning is not happening (even in PG18)

2025-09-25 Thread Michał Kłeczek

Hi,

Partition pruning is happening: pruned nodes are marked as “never executed”.
It is just that pruning is performed not by the planner but by the executor in 
this case.

—

Michał 

> On 25 Sep 2025, at 21:49, Lauro Ojeda  wrote:
> 
> 
> Hi super-experts,
> I am trying to solve a mystery for a customer where they had some very large 
> heap tables (500GB+ each) on their PG 13 database which was bringing them 
> loads of performance problems. 
> Apart from the regular server tuning efforts, I suggested them to upgrade it 
> to at least Postgresql 15 and partition these large tables. They've accepted 
> it and they have applied my suggestions.
> However, partition  pruning is not happening. Instead, PG is doing sequential 
> scans against all partitions. I thought it was PG 15 behaviour, so I tested 
> the same on PG 17 and compiled PG18rc1, getting the same results. I read on 
> the release notes of PG18 that partition pruning had many fixes but it seems 
> this one is not in place.
> The only way I found to make pruning work is to force index_scan using 
> pg_hint_plan, but I wanted to influence the planner to decide it by itself 
> rather than relying on hints. What's the reason for this misbehaving and what 
> could I do to overcome it?
> 
> Relevant parameters/info:
> DB cluster hosted on Azure Flex Server
> random_page_cost=1.1;
> plan_cache_mode=force_custom_plan; # the so called magic-bullet for pruning
> default_statistics_target=1000; # better stats to the planner
> geqo_effort=7; # I wanted the planner to "think" better and do pruning
> enable_partition_pruning=on;
> 
> Test case
> -- Tables def
> postgres=> \d+ accounts
>Partitioned table 
> "public.accounts"
>   Column  |  Type   | Collation | Nullable |Default   
>  | Storage  | Compression | Stats target | Description
> --+-+---+--+---+--+-+--+-
>  aid  | integer |   | not null | 
> nextval('accounts_aid_seq'::regclass) | plain| |  
> |
>  bid  | integer |   |  |  
>  | plain| |  |
>  abalance | integer |   |  |  
>  | plain| |  |
>  filler   | text|   |  |  
>  | extended | |  |
>  transaction_date | date|   | not null |  
>  | plain| |  |
> Partition key: RANGE (transaction_date)
> Indexes:
> "accounts_pkey" PRIMARY KEY, btree (transaction_date, aid)
> Partitions: accounts_p1 FOR VALUES FROM ('2025-01-01') TO ('2025-02-01'),
> accounts_p10 FOR VALUES FROM ('2025-10-01') TO ('2025-11-01'),
> accounts_p11 FOR VALUES FROM ('2025-11-01') TO ('2025-12-01'),
> accounts_p12 FOR VALUES FROM ('2025-12-01') TO ('2026-01-01'),
> accounts_p2 FOR VALUES FROM ('2025-02-01') TO ('2025-03-01'),
> accounts_p3 FOR VALUES FROM ('2025-03-01') TO ('2025-04-01'),
> accounts_p4 FOR VALUES FROM ('2025-04-01') TO ('2025-05-01'),
> accounts_p5 FOR VALUES FROM ('2025-05-01') TO ('2025-06-01'),
> accounts_p6 FOR VALUES FROM ('2025-06-01') TO ('2025-07-01'),
> accounts_p7 FOR VALUES FROM ('2025-07-01') TO ('2025-08-01'),
> accounts_p8 FOR VALUES FROM ('2025-08-01') TO ('2025-09-01'),
> accounts_p9 FOR VALUES FROM ('2025-09-01') TO ('2025-10-01')
> 
> postgres=> \d+ t2
>   Table "public.t2"
>  Column | Type | Collation | Nullable | Default | Storage | Compression | 
> Stats target | Description
> +--+---+--+-+-+-+--+-
>  dt_col | date |   |  | | plain   | | 
>  |
> Access method: heap
> 
> -- Executing test
> -- Regular execution
> postgres=> explain analyze
> select aid, abalance
> from accounts
> where transaction_date in (select dt_col from t2);
>  
> QUERY PLAN
> 
>  Gather  (cost=1001.23..2073502.05 rows=2767123 width=8) (actual 
> time=3.574..211619.991 rows=1664742 loops=1)
>Workers Planned: 2
>Workers Launched: 1
>->  Hash Semi Join  (cost=1.23..1795789.75 rows=1152968 width=8) (actual 
> time=77575.241..203629.828 rows=832371 loops=2)
>  Hash Cond: (accounts.transaction_date = t2.dt_col)
>  ->  Parallel

Re: Partition pruning is not happening (even in PG18)

2025-09-25 Thread Michał Kłeczek

Never mind my message. I misread it and missed the plan with no hints.


Michał 

> On 25 Sep 2025, at 22:10, Michał Kłeczek  wrote:
> 
> 
> Hi,
> 
> Partition pruning is happening: pruned nodes are marked as “never executed”.
> It is just that pruning is performed not by the planner but by the executor 
> in this case.
> 
> —
> 
> Michał 
> 
>>> On 25 Sep 2025, at 21:49, Lauro Ojeda  wrote:
>>> 
>> 
>> Hi super-experts,
>> I am trying to solve a mystery for a customer where they had some very large 
>> heap tables (500GB+ each) on their PG 13 database which was bringing them 
>> loads of performance problems. 
>> Apart from the regular server tuning efforts, I suggested them to upgrade it 
>> to at least Postgresql 15 and partition these large tables. They've accepted 
>> it and they have applied my suggestions.
>> However, partition  pruning is not happening. Instead, PG is doing 
>> sequential scans against all partitions. I thought it was PG 15 behaviour, 
>> so I tested the same on PG 17 and compiled PG18rc1, getting the same 
>> results. I read on the release notes of PG18 that partition pruning had many 
>> fixes but it seems this one is not in place.
>> The only way I found to make pruning work is to force index_scan using 
>> pg_hint_plan, but I wanted to influence the planner to decide it by itself 
>> rather than relying on hints. What's the reason for this misbehaving and 
>> what could I do to overcome it?
>> 
>> Relevant parameters/info:
>> DB cluster hosted on Azure Flex Server
>> random_page_cost=1.1;
>> plan_cache_mode=force_custom_plan; # the so called magic-bullet for pruning
>> default_statistics_target=1000; # better stats to the planner
>> geqo_effort=7; # I wanted the planner to "think" better and do pruning
>> enable_partition_pruning=on;
>> 
>> Test case
>> -- Tables def
>> postgres=> \d+ accounts
>>Partitioned table 
>> "public.accounts"
>>   Column  |  Type   | Collation | Nullable |Default  
>>   | Storage  | Compression | Stats target | Description
>> --+-+---+--+---+--+-+--+-
>>  aid  | integer |   | not null | 
>> nextval('accounts_aid_seq'::regclass) | plain| | 
>>  |
>>  bid  | integer |   |  | 
>>   | plain| |  |
>>  abalance | integer |   |  | 
>>   | plain| |  |
>>  filler   | text|   |  | 
>>   | extended | |  |
>>  transaction_date | date|   | not null | 
>>   | plain| |  |
>> Partition key: RANGE (transaction_date)
>> Indexes:
>> "accounts_pkey" PRIMARY KEY, btree (transaction_date, aid)
>> Partitions: accounts_p1 FOR VALUES FROM ('2025-01-01') TO ('2025-02-01'),
>> accounts_p10 FOR VALUES FROM ('2025-10-01') TO ('2025-11-01'),
>> accounts_p11 FOR VALUES FROM ('2025-11-01') TO ('2025-12-01'),
>> accounts_p12 FOR VALUES FROM ('2025-12-01') TO ('2026-01-01'),
>> accounts_p2 FOR VALUES FROM ('2025-02-01') TO ('2025-03-01'),
>> accounts_p3 FOR VALUES FROM ('2025-03-01') TO ('2025-04-01'),
>> accounts_p4 FOR VALUES FROM ('2025-04-01') TO ('2025-05-01'),
>> accounts_p5 FOR VALUES FROM ('2025-05-01') TO ('2025-06-01'),
>> accounts_p6 FOR VALUES FROM ('2025-06-01') TO ('2025-07-01'),
>> accounts_p7 FOR VALUES FROM ('2025-07-01') TO ('2025-08-01'),
>> accounts_p8 FOR VALUES FROM ('2025-08-01') TO ('2025-09-01'),
>> accounts_p9 FOR VALUES FROM ('2025-09-01') TO ('2025-10-01')
>> 
>> postgres=> \d+ t2
>>   Table "public.t2"
>>  Column | Type | Collation | Nullable | Default | Storage | Compression | 
>> Stats target | Description
>> +--+---+--+-+-+-+--+-
>>  dt_col | date |   |  | | plain   | |
>>   |
>> Access method: heap
>> 
>> -- Executing test
>> -- Regular execution
>> postgres=> explain analyze
>> select aid, abalance
>> from accounts
>> where transaction_date in (select dt_col from t2);
>>  
>> QUERY PLAN
>> 
>>  Gather  (cost=1001.23..2073502.05 rows=2767123 width=8) (actual 
>> time=3.574..211619.991 rows=1664742 loops=1)
>>Workers Plann

Re: Partition pruning is not happening (even in PG18)

2025-09-25 Thread David Rowley

On Fri, 26 Sept 2025 at 07:49, Lauro Ojeda  wrote:
> The only way I found to make pruning work is to force index_scan using 
> pg_hint_plan, but I wanted to influence the planner to decide it by itself 
> rather than relying on hints. What's the reason for this misbehaving and what 
> could I do to overcome it?

> Partition key: RANGE (transaction_date)

> postgres=> explain analyze
> select aid, abalance
> from accounts
> where transaction_date in (select dt_col from t2);

The only partition pruning that exists in PostgreSQL that can prune
for that query is for parameterised Nested Loop joins.  For Hash Join,
it's been talked about, but this requires running the partition
pruning code for every values that goes into the Hash Table and only
scanning the unioned set of those partitions during the hash probe
phase. The trouble with that is that it's very hard to know in advance
if it'll be worth the extra effort. Putting a tuple into a hash table
is quite cheap. Running the pruning code for a range partitioned table
is likely to be a few times more costly than the hash insert
(depending on how many partitions there are), so if the end result is
that nothing was pruned, then that's quite a bit of extra effort for
no gain.

What we maybe could do better is reduce the cost of the Append scan
when there's a run-time pruning object attached. This is a little
tricky as we currently only build that object when creating the final
plan.  To include that in the costs we'd need to move that to the Path
generation phase so that we didn't accidentally reject Paths which
could be cheaper than we first think.

> Also, how could I contribute to get this partition pruning to work?

The pgsql-hackers mailing list is where all the discussions about that
happen. There is plenty of past discussions on these topics. One such
(fairly) recent discussion is in [1]. There are plenty more, including
some ideas from Robert Haas about how we might cost run-time partition
pruning. That was likely around 2017-2018 range, so you might need to
dig deep to find that.

David

[1] 
https://www.postgresql.org/message-id/flat/CAApHDvoC7n_oceb%3D8z%2BMY8sTgH4xa%2ByAwBxZ4Dxv8pwkT9bOcA%40mail.gmail.com#45314d3d01ef8ad1eebe72111989062c

Re: Indexes on expressions with multiple columns and operators

2025-09-25 Thread Frédéric Yhuel





On 9/23/25 12:20, Frédéric Yhuel wrote:

On 9/22/25 23:15, Andrei Lepikhov wrote:
I'm not sure I fully understand your case, but SQL Server demonstrates 
an interesting approach: they have a WHERE clause attached to 
statistics. So, having implemented this, you may separate the whole 
range of values inside the table into 'partitions' by such a WHERE 
condition.


Yes, from what I understood of the documentation [1], this is exactly 
what I would like!


I've tested it and I can confirm that it works very well.

So, on SQL Server, you can do this:
CREATE STATISTICS FooStats ON foo (ackid, crit) WHERE crit = 'WARNING';

It would be great to have a similar feature in PostgreSQL.

Re: Indexes on expressions with multiple columns and operators

2025-09-25 Thread Frédéric Yhuel





On 9/23/25 15:31, Frédéric Yhuel wrote:
To get back to the topic of partitioned statistics, do you know if SQL 
Server is smart enough to handle this case [1] that we discussed last 
year? (with filtered statistics)


[1] https://www.postgresql.org/message-id/flat/b860c71a-7cab-4d88- 
ad87-8c1f2eea9ae8%40dalibo.com




Sorry, it doesn't make any sense.

First of all, it's not possible to do something like this with SQL Sever:

CREATE STATISTICS OrdersStats
 ON orders (id, product_id)
WHERE product_id IN (SELECT id FROM products WHERE name = 'babar');

this is because you need to use simple scalar expressions in the filter 
clause.


An even if it were possible... it would be completely useless in this 
case. Sorry for the noise.

Partition pruning is not happening (even in PG18)

Re: Partition pruning is not happening (even in PG18)

Re: Partition pruning is not happening (even in PG18)

Re: Partition pruning is not happening (even in PG18)

Re: Indexes on expressions with multiple columns and operators

Re: Indexes on expressions with multiple columns and operators

6 matches

Site Navigation

Mail list logo

Footer information