Re: inefficient/wrong plan cache mode selection for queries with partitioned tables (postgresql 17)

2025-05-12 Thread Tom Lane
Maxim Boguk  writes:
> Reading the code - probably the lowest hanging fruit is to make
> 'The current multiplier of 1000 * cpu_operator_cost' configurable in the
> future versions.

I'm wondering whether we should try to make the planner not expend
the effort in the first place, but leave partition pruning to the
executor, at least in cases where it can determine that that will be
possible.

regards, tom lane




Re: inefficient/wrong plan cache mode selection for queries with partitioned tables (postgresql 17)

2025-05-12 Thread Maxim Boguk
On Mon, May 12, 2025 at 6:33 PM Tom Lane  wrote:

> Maxim Boguk  writes:
> > And the problem is that the cost of a custom plan ignores the cost of
> > planning itself (which is like 2x orders of magnitude worse than the cost
> > of real time partition pruning of a generic plan).
>
> False.  The estimate is evidently pretty wrong, but it's not that
> there is no consideration at all.  See around line 1370 in
> src/backend/utils/cache/plancache.c.
>
> regards, tom lane
>

Thank you.
Reading the code - probably the lowest hanging fruit is to make
'The current multiplier of 1000 * cpu_operator_cost' configurable in the
future versions.

PS: it's always nice to see when my ad-hoc idea (about N*nrelations as cost
planner estimate) is already implemented.


-- 
Maxim Boguk
Senior Postgresql DBA

Phone UA: +380 99 143 
Phone AU: +61  45 218 5678


Re: inefficient/wrong plan cache mode selection for queries with partitioned tables (postgresql 17)

2025-05-12 Thread Andrei Lepikhov

On 12/5/2025 16:04, Maxim Boguk wrote:
On Mon, May 12, 2025 at 4:48 PM Andrei Lepikhov It is not hard to write such a tiny extension. As I see, the only extra

stored "C" procedure is needed to set up force-plan-type flag employing
FetchPreparedStatement(). The rest of the code - querying
pg_stat_statements and switching between plan types may be written in
plpgsql.

If I'm not mistaken, it will work with all PG versions that are
currently in support. What do you think?


Such extension would be very useful (and in general - the solution based 
on the actual execution data - seems more stable/predictable than the 
plan cost based selection which is currently used by postgresql).
Okay, as far as I can see now, it costs a couple of weeks to develop. It 
would be more profitable in terms of speed and usage in older versions 
than any core patch.
What's more, if, as you predict, it will work, it may provide a 
rationale for opening the entire plan cache for extensions and allow a 
wide audience to impact the extended protocol (and query plans in stored 
procedures) in many curious ways.
As I may envision, a dummy routine providing a link to the 
saved_plan_list will spend a few lines of code. A subscription to cached 
statements may cost more time and effort but seems even more profitable.


--
regards, Andrei Lepikhov




Re: inefficient/wrong plan cache mode selection for queries with partitioned tables (postgresql 17)

2025-05-12 Thread Maxim Boguk
On Mon, May 12, 2025 at 3:08 PM Andrei Lepikhov  wrote:

> On 5/12/25 13:49, Maxim Boguk wrote:
> > I suspect this situation should be quite common with queries over
> > partitioned tables (where planning time is usually quite a high).
> >
> > Any suggestions what could be done there outside of using
> > force_generic_plan for a particular db user (which will kill performance
> > in other queries for sure)?
> Thanks for this puzzle!
> I suppose, in case generic planning is much faster than custom one,
> there are two candidates exist:
> 1. Touching the index during planning causes too much overhead - see
> get_actual_variable_range
> 2. You have a massive default_statistics_target for a table involved.
>
> So, to clarify the problem, may you provide EXPLAIN (without analyze)
> with BUFFERS ON ?
> Also, could you provide extra information on the statistics involved?
> For each column (I think created_at is the most important one), show the
> size of MCV and histogram arrays.
>
> --
> regards, Andrei Lepikhov
>


clickcast=# explain (buffers) execute qqq('2025-04-11
09:22:00.193'::timestamp without time zone, '2025-05-12
09:22:00.203'::timestamp without time zone);

   QUERY PLAN


-
 Limit  (cost=1.14..1.29 rows=1 width=385)
   ->  Append  (cost=1.14..9.10 rows=50 width=385)
 ->  Index Scan Backward using
job_stats_new_2025_05_job_board_id_job_reference_created_at_idx on
job_stats_new_2025_05 job_stats_master_2  (cost=0.56..3.28 rows=18
width=371)
   Index Cond: ((job_board_id = 27068) AND
((job_reference)::text = '*'::text) AND (created_at >= '2025-04-11
09:22:00.193'::timestamp without time zone) AND (created_at <= '2025-05-12
09:22:00.203'::timestamp without time zone))
 ->  Index Scan Backward using
job_stats_new_2025_04_job_board_id_job_reference_created_at_idx on
job_stats_new_2025_04 job_stats_master_1  (cost=0.57..5.32 rows=32
width=394)
   Index Cond: ((job_board_id = 27068) AND
((job_reference)::text = '***'::text) AND (created_at >= '2025-04-11
09:22:00.193'::timestamp without time zone) AND (created_at <= '2025-05-12
09:22:00.203'::timestamp without time zone))
 Planning:
   Buffers: shared hit=16
16 buffers - most times, sometimes 12k   Buffers: shared hit=12511 (like 5%
cases) - I have no idea why.

show default_statistics_target ;
 default_statistics_target
---
 100
No custom statistic targets on this table or partitions.

select
tablename,attname,inherited,null_frac,n_distinct,array_length(most_common_vals,1)
mcv, array_length(histogram_bounds,1) hist from pg_stats where tablename IN
('job_stats_master', 'job_stats_new_2025_04', 'job_stats_new_2025_05') and
attname in ('created_at', 'job_board_id', 'job_reference') order by
tablename, attname;
   tablename   |attname| inherited | null_frac  |
 n_distinct  | mcv | hist
---+---+---++--+-+--
 job_stats_master  | created_at| t |  0 |
1.066586e+06 |  15 |  101
 job_stats_master  | job_board_id  | t | 0.52743334 |
1716 | 100 |  101
 job_stats_master  | job_reference | t |  0 |
-0.1 |  39 |  101
 job_stats_new_2025_04 | created_at| f |  0 |
832508 |  39 |  101
 job_stats_new_2025_04 | job_board_id  | f | 0.47096667 |
1096 | 100 |  101
 job_stats_new_2025_04 | job_reference | f |  0 |
-0.1 |  93 |  101
 job_stats_new_2025_05 | created_at| f |  0 |
709166 |  42 |  101
 job_stats_new_2025_05 | job_board_id  | f | 0.4703 |
1142 | 100 |  101
 job_stats_new_2025_05 | job_reference | f |  0 |
-0.1 | 100 |  101


PS: problem not with difference between custom and generic planning time
but with prepared statements
generic plan plans only once, but custom plan plan every call (and plan
time cost 95% on total query runtime).


-- 
Maxim Boguk
Senior Postgresql DBA

Phone UA: +380 99 143 
Phone AU: +61  45 218 5678


Re: inefficient/wrong plan cache mode selection for queries with partitioned tables (postgresql 17)

2025-05-12 Thread Maxim Boguk
On Mon, May 12, 2025 at 4:48 PM Andrei Lepikhov  wrote:

> On 5/12/25 15:08, Maxim Boguk wrote:
> > PS: problem not with difference between custom and generic planning time
> > but with prepared statements
> > generic plan plans only once, but custom plan plan every call (and plan
> > time cost 95% on total query runtime).
> Ah, now I got it.
> I'm aware of this problem from at least two sources of regular complaints.
> What can you do here? Let's imagine a palliative solution:
> Having pg_stat_statements data and the list of prepared statements (see
> pg_prepared_statement) and queryId enabled, there is a way to force a
> custom or generic plan in specific cases only: look up into min/max
> query execution time. If no big difference exists and planning time is
> sufficient, setting force_generic_plan for this plan makes sense. In
> another case, if the planning time is too short or the generic plan is
> unstable - switch to force_custom_plan.
>
> It is not hard to write such a tiny extension. As I see, the only extra
> stored "C" procedure is needed to set up force-plan-type flag employing
> FetchPreparedStatement(). The rest of the code - querying
> pg_stat_statements and switching between plan types may be written in
> plpgsql.
>
> If I'm not mistaken, it will work with all PG versions that are
> currently in support. What do you think?


Such extension would be very useful (and in general - the solution based on
the actual execution data - seems more stable/predictable than the plan
cost based selection which is currently used by postgresql).


-- 
Maxim Boguk
Senior Postgresql DBA

Phone UA: +380 99 143 
Phone AU: +61  45 218 5678


Re: inefficient/wrong plan cache mode selection for queries with partitioned tables (postgresql 17)

2025-05-12 Thread Andrei Lepikhov

On 5/12/25 15:08, Maxim Boguk wrote:
PS: problem not with difference between custom and generic planning time 
but with prepared statements
generic plan plans only once, but custom plan plan every call (and plan 
time cost 95% on total query runtime).

Ah, now I got it.
I'm aware of this problem from at least two sources of regular complaints.
What can you do here? Let's imagine a palliative solution:
Having pg_stat_statements data and the list of prepared statements (see 
pg_prepared_statement) and queryId enabled, there is a way to force a 
custom or generic plan in specific cases only: look up into min/max 
query execution time. If no big difference exists and planning time is 
sufficient, setting force_generic_plan for this plan makes sense. In 
another case, if the planning time is too short or the generic plan is 
unstable - switch to force_custom_plan.


It is not hard to write such a tiny extension. As I see, the only extra 
stored "C" procedure is needed to set up force-plan-type flag employing 
FetchPreparedStatement(). The rest of the code - querying 
pg_stat_statements and switching between plan types may be written in 
plpgsql.


If I'm not mistaken, it will work with all PG versions that are 
currently in support. What do you think?


--
regards, Andrei Lepikhov




Re: inefficient/wrong plan cache mode selection for queries with partitioned tables (postgresql 17)

2025-05-12 Thread Maxim Boguk
On Mon, May 12, 2025 at 4:48 PM Andrei Lepikhov  wrote:

> On 5/12/25 15:08, Maxim Boguk wrote:
> > PS: problem not with difference between custom and generic planning time
> > but with prepared statements
> > generic plan plans only once, but custom plan plan every call (and plan
> > time cost 95% on total query runtime).
> Ah, now I got it.
> I'm aware of this problem from at least two sources of regular complaints.
> What can you do here? Let's imagine a palliative solution:
> Having pg_stat_statements data and the list of prepared statements (see
> pg_prepared_statement) and queryId enabled, there is a way to force a
> custom or generic plan in specific cases only: look up into min/max
> query execution time. If no big difference exists and planning time is
> sufficient, setting force_generic_plan for this plan makes sense. In
> another case, if the planning time is too short or the generic plan is
> unstable - switch to force_custom_plan.
>
> It is not hard to write such a tiny extension. As I see, the only extra
> stored "C" procedure is needed to set up force-plan-type flag employing
> FetchPreparedStatement(). The rest of the code - querying
> pg_stat_statements and switching between plan types may be written in
> plpgsql.
>
> If I'm not mistaken, it will work with all PG versions that are
> currently in support. What do you think?


But a more general question - this exact issue will affect every prepared
query logic which selects only a subset of partitions.
In this case - current logic will always select custom plan over generic
plan (even in case the both plans are actually the same).
E.g. If a fast/cheap query over a partitioned table has conditions that
allow use of only a few partitions - custom plan always wins whatever
database settings is (outside of force_custom_plan hammer).
Seems there could be something done about the cost calculation of generic
plan.



-- 
Maxim Boguk
Senior Postgresql DBA

Phone UA: +380 99 143 
Phone AU: +61  45 218 5678


Re: inefficient/wrong plan cache mode selection for queries with partitioned tables (postgresql 17)

2025-05-12 Thread David Rowley
On Mon, 12 May 2025, 05:08 Andrei Lepikhov,  wrote:

> Thanks for this puzzle!
> I suppose, in case generic planning is much faster than custom one,
> there are two candidates exist:
> 1. Touching the index during planning causes too much overhead - see
> get_actual_variable_range
> 2. You have a massive default_statistics_target for a table involved.
>

This is just an artifact of the fact that runtime pruning is not factored
into the costs. Note the cost of the generic plan. The plan_cache_mode GUC
is about the only way to overrule the choice to use the custom plan.

David

>


Re: inefficient/wrong plan cache mode selection for queries with partitioned tables (postgresql 17)

2025-05-12 Thread Andrei Lepikhov

On 5/12/25 20:07, Tom Lane wrote:

Maxim Boguk  writes:

Reading the code - probably the lowest hanging fruit is to make
'The current multiplier of 1000 * cpu_operator_cost' configurable in the
future versions.


I'm wondering whether we should try to make the planner not expend
the effort in the first place, but leave partition pruning to the
executor, at least in cases where it can determine that that will be
possible.
Significant planning time is a sorting out lots of scan paths, applying 
partition statistics etc. planner-stage partitioning reduces these 
efforts drastically.


--
regards, Andrei Lepikhov




inefficient/wrong plan cache mode selection for queries with partitioned tables (postgresql 17)

2025-05-12 Thread Maxim Boguk
Hi,
I found a case where plan cache all time switching to custom plans forces
query replan each call (and thus slows down the whole query for 10x or
more).
What makes the situation intriguing - that both custom and generic plans
are the same.

job_stats_master - partitioned table with 24 partitions (per month last 2
year).

Problem query:
prepare qqq(timestamp, timestamp) AS
SELECT *
FROM "job_stats_master"
WHERE
"job_stats_master"."created_at" BETWEEN $1 AND $2 AND
"job_stats_master"."job_reference" = '**' AND
"job_stats_master"."job_board_id" = 27068
ORDER BY "created_at" DESC LIMIT 1;

plan (after 6th execution):
explain analyze execute qqq('2025-04-11 09:22:00.193'::timestamp without
time zone, '2025-05-12 09:22:00.203'::timestamp without time zone);

   QUERY PLAN


-
 Limit  (cost=1.14..1.29 rows=1 width=384) (actual time=0.026..0.026 rows=1
loops=1)
   ->  Append  (cost=1.14..9.10 rows=50 width=384) (actual
time=0.025..0.026 rows=1 loops=1)
 ->  Index Scan Backward using
job_stats_new_2025_05_job_board_id_job_reference_created_at_idx on
job_stats_new_2025_05 job_stats_master_2  (cost=0.56..3.28 rows=18
width=368) (actual time=0.025..0.025 rows=1 loops=1)
   Index Cond: ((job_board_id = 27068) AND
((job_reference)::text = '**'::text) AND (created_at >= '2025-04-11
09:22:00.193'::timestamp without time zone) AND (created_at <= '2025-05-12
09:22:00.203'::timestamp without time zone))
 ->  Index Scan Backward using
job_stats_new_2025_04_job_board_id_job_reference_created_at_idx on
job_stats_new_2025_04 job_stats_master_1  (cost=0.57..5.32 rows=32
width=394) (never executed)
   Index Cond: ((job_board_id = 27068) AND
((job_reference)::text = '**'::text) AND (created_at >= '2025-04-11
09:22:00.193'::timestamp without time zone) AND (created_at <= '2025-05-12
09:22:00.203'::timestamp without time zone))
 Planning Time: 0.611 ms
 Execution Time: 0.057 ms
(8 rows)

plan with set plan_cache_mode to force_generic_plan ;

explain analyze execute qqq('2025-04-11 09:22:00.193'::timestamp without
time zone, '2025-05-12 09:22:00.203'::timestamp without time zone);

   QUERY PLAN

-
 Limit  (cost=19.06..19.32 rows=1 width=407) (actual time=0.030..0.030
rows=1 loops=1)
   ->  Append  (cost=19.06..26.74 rows=29 width=407) (actual
time=0.029..0.030 rows=1 loops=1)
 Subplans Removed: 27
 ->  Index Scan Backward using
job_stats_new_2025_05_job_board_id_job_reference_created_at_idx on
job_stats_new_2025_05 job_stats_master_2  (cost=0.56..0.82 rows=1
width=368) (actual time=0.029..0.029 rows=1 loops=1)
   Index Cond: ((job_board_id = 27068) AND
((job_reference)::text = '***'::text) AND (created_at >= $1) AND
(created_at <= $2))
 ->  Index Scan Backward using
job_stats_new_2025_04_job_board_id_job_reference_created_at_idx on
job_stats_new_2025_04 job_stats_master_1  (cost=0.57..0.83 rows=1
width=394) (never executed)
   Index Cond: ((job_board_id = 27068) AND
((job_reference)::text = '***'::text) AND (created_at >= $1) AND
(created_at <= $2))
 Planning Time: 0.033 ms
 Execution Time: 0.086 ms

Plan "de facto" the same, performance almost the same but with custom plans
there is 20x more time spent on planning.
With over 1M RPS - it's become quite an issue even for the best available
servers.

No playing with cost parameters provides any changes in selection custom
plan over generic.
As I understand there is an issue with costing model - generic plan thinks
it will visit all 24 partitions but custom plan does prune partitions
during planning thus custom plan always wins in this case "by cost" and in
the same time huge loss in performance (but actual plans are the same in
both cases).

I suspect this situation should be quite common with queries over
partitioned tables (where planning time is usually quite a high).

Any suggestions what could be done there outside of using
force_generic_plan for a particular db user (which will kill performance in
other queries for sure)?




-- 
Maxim Boguk
Senior Postgresql DBA

Phone UA: +380 99 143 
Phone AU: +61  45 218 5678


Re: inefficient/wrong plan cache mode selection for queries with partitioned tables (postgresql 17)

2025-05-12 Thread Andrei Lepikhov

On 5/12/25 13:49, Maxim Boguk wrote:
I suspect this situation should be quite common with queries over 
partitioned tables (where planning time is usually quite a high).


Any suggestions what could be done there outside of using 
force_generic_plan for a particular db user (which will kill performance 
in other queries for sure)?

Thanks for this puzzle!
I suppose, in case generic planning is much faster than custom one, 
there are two candidates exist:
1. Touching the index during planning causes too much overhead - see 
get_actual_variable_range

2. You have a massive default_statistics_target for a table involved.

So, to clarify the problem, may you provide EXPLAIN (without analyze) 
with BUFFERS ON ?
Also, could you provide extra information on the statistics involved? 
For each column (I think created_at is the most important one), show the 
size of MCV and histogram arrays.


--
regards, Andrei Lepikhov




Re: inefficient/wrong plan cache mode selection for queries with partitioned tables (postgresql 17)

2025-05-12 Thread Maxim Boguk
On Mon, May 12, 2025 at 6:01 PM David Rowley  wrote:

> On Mon, 12 May 2025, 05:08 Andrei Lepikhov,  wrote:
>
>> Thanks for this puzzle!
>> I suppose, in case generic planning is much faster than custom one,
>> there are two candidates exist:
>> 1. Touching the index during planning causes too much overhead - see
>> get_actual_variable_range
>> 2. You have a massive default_statistics_target for a table involved.
>>
>
> This is just an artifact of the fact that runtime pruning is not factored
> into the costs. Note the cost of the generic plan. The plan_cache_mode GUC
> is about the only way to overrule the choice to use the custom plan.
>

Situation quite the opposite - I need to force a generic plan because it
has the same execution time as a custom plan but performs 20-50x faster
(because in custom plan case - 95-98% time spent in planning not in
execution).

And the problem is that the cost of a custom plan ignores the cost of
planning itself (which is like 2x orders of magnitude worse than the cost
of real time partition pruning of a generic plan). I started thinking of
something like cost_planner GUC to help with similar issues (where planning
cost calculated as cost_planned*(some heuristic function with amount
involved in query tables).

In my case the high cost of planning itself should force the database to
use generic plan.

-- 
Maxim Boguk
Senior Postgresql DBA

Phone UA: +380 99 143 
Phone AU: +61  45 218 5678


Re: inefficient/wrong plan cache mode selection for queries with partitioned tables (postgresql 17)

2025-05-12 Thread Tom Lane
Maxim Boguk  writes:
> And the problem is that the cost of a custom plan ignores the cost of
> planning itself (which is like 2x orders of magnitude worse than the cost
> of real time partition pruning of a generic plan).

False.  The estimate is evidently pretty wrong, but it's not that
there is no consideration at all.  See around line 1370 in
src/backend/utils/cache/plancache.c.

regards, tom lane