Plan selection based on worst case scenario

2024-05-29 Thread Darwin O'Connor
I have been having an ongoing problem for years with PostgreSQL selecting
very poor plans when running queries. It does things like doing a table
scan of gigabyte size tables to generate a hash table rather than use a
suitable index.

When I disable enough features that it generates a sensible plan I notice
that the low range of the total cost is many orders of magnitude lower
while the high range is higher then the slow plan it originally chose. This
suggests PostgreSQL is choosing a plan based on the high end cost rather
than the median cost.

Is there a PostgreSQL setting that can control how it judges plans?

Here is a recent example of a query that finds the last time at a stop
filtered for a certain route it has to look up another table to find.
PostgreSQL initially chose the plan that cost "37357.45..37357.45" rather
than the one that cost "1.15..61088.32".

transsee=# explain analyze select t.time+coalesce(t.timespent,0)*interval
'1 second' from trackstopscurr t join tracktrip r on r.a=0 and
r.route='501' and r.id=t.trackid and t.stopid='4514' order by t.time desc
limit 1;

 QUERY PLAN
-
 Limit  (cost=37357.45..37357.45 rows=1 width=16) (actual
time=2667.674..2694.047 rows=1 loops=1)
   ->  Sort  (cost=37357.45..37357.45 rows=1 width=16) (actual
time=2667.673..2694.045 rows=1 loops=1)
 Sort Key: t."time" DESC
 Sort Method: top-N heapsort  Memory: 25kB
 ->  Gather  (cost=1182.60..37357.44 rows=1 width=16) (actual
time=387.266..2692.733 rows=4027 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Nested Loop  (cost=182.60..36357.34 rows=1 width=16)
(actual time=381.913..2659.412 rows=1342 loops=3)
 ->  Parallel Bitmap Heap Scan on trackstopscurr t
 (cost=182.03..19048.63 rows=2014 width=14) (actual time=380.467..1231.788
rows=8097 loops=3)
   Recheck Cond: ((stopid)::text = '4514'::text)
   Heap Blocks: exact=8103
   ->  Bitmap Index Scan on trackstopscurr_2
 (cost=0.00..180.82 rows=4833 width=0) (actual time=382.653..382.653
rows=24379 loops=1)
 Index Cond: ((stopid)::text = '4514'::text)
 ->  Index Scan using tracktrip_0 on tracktrip r
 (cost=0.57..8.59 rows=1 width=4) (actual time=0.175..0.175 rows=0
loops=24290)
   Index Cond: (id = t.trackid)
   Filter: ((a = 0) AND ((route)::text =
'501'::text))
   Rows Removed by Filter: 1
 Planning Time: 0.228 ms
 Execution Time: 2694.077 ms
(19 rows)

transsee=# set enable_sort TO false;
  SET
transsee=# explain analyze select t.time+coalesce(t.timespent,0)*interval
'1 second' from trackstopscurr t join tracktrip r on r.a=0 and
r.route='501' and r.id=t.trackid and t.stopid='4514' order by t.time desc
limit 1;

 QUERY PLAN

 Limit  (cost=1.15..61088.32 rows=1 width=16) (actual time=0.076..0.076
rows=1 loops=1)
   ->  Nested Loop  (cost=1.15..61088.32 rows=1 width=16) (actual
time=0.076..0.076 rows=1 loops=1)
 ->  Index Scan Backward using trackstopscurr_2 on trackstopscurr t
 (cost=0.57..19552.59 rows=4833 width=14) (actual time=0.021..0.032 rows=5
loops=1)
   Index Cond: ((stopid)::text = '4514'::text)
 ->  Index Scan using tracktrip_0 on tracktrip r  (cost=0.57..8.59
rows=1 width=4) (actual time=0.008..0.008 rows=0 loops=5)
   Index Cond: (id = t.trackid)
   Filter: ((a = 0) AND ((route)::text = '501'::text))
   Rows Removed by Filter: 1
 Planning Time: 0.229 ms
 Execution Time: 0.091 ms
(10 rows)

Indexes:
"trackstopscurr_0f" UNIQUE, btree (trackid, "time"), tablespace "new"
"trackstopscurr_1" btree (trackid, stopid), tablespace "new"
"trackstopscurr_2" btree (stopid, "time")


Re: Plan selection based on worst case scenario

2024-05-29 Thread David Rowley
On Thu, 30 May 2024 at 13:03, Darwin O'Connor  wrote:
> Is there a PostgreSQL setting that can control how it judges plans?

There's nothing like that, unfortunately.

> Here is a recent example of a query that finds the last time at a stop 
> filtered for a certain route it has to look up another table to find. 
> PostgreSQL initially chose the plan that cost "37357.45..37357.45" rather 
> than the one that cost "1.15..61088.32".
>
> transsee=# explain analyze select t.time+coalesce(t.timespent,0)*interval '1 
> second' from trackstopscurr t join tracktrip r on r.a=0 and r.route='501' and 
> r.id=t.trackid and t.stopid='4514' order by t.time desc limit 1;

>  Limit  (cost=37357.45..37357.45 rows=1 width=16) (actual 
> time=2667.674..2694.047 rows=1 loops=1)

>->  Nested Loop  (cost=182.60..36357.34 rows=1 width=16) 
> (actual time=381.913..2659.412 rows=1342 loops=3)
>  ->  Parallel Bitmap Heap Scan on trackstopscurr t  
> (cost=182.03..19048.63 rows=2014 width=14) (actual time=380.467..1231.788 
> rows=8097 loops=3)
>Recheck Cond: ((stopid)::text = '4514'::text)
>Heap Blocks: exact=8103
>->  Bitmap Index Scan on trackstopscurr_2  
> (cost=0.00..180.82 rows=4833 width=0) (actual time=382.653..382.653 
> rows=24379 loops=1)
>  Index Cond: ((stopid)::text = '4514'::text)
>  ->  Index Scan using tracktrip_0 on tracktrip r  
> (cost=0.57..8.59 rows=1 width=4) (actual time=0.175..0.175 rows=0 loops=24290)
>Index Cond: (id = t.trackid)
>Filter: ((a = 0) AND ((route)::text = '501'::text))
>Rows Removed by Filter: 1
>  Planning Time: 0.228 ms
>  Execution Time: 2694.077 ms

The problem here is primarily down to the poor estimates for the scan
on tracktrip.  You can see that the Nested Loop estimates 1 row, so
therefore the LIMIT costing code thinks LIMIT 1 will require reading
all rows, all 1 of them.  In which case that's expected to cost
36357.34, which is cheaper than the other plan which costs 61088.32 to
get one row.

If you can fix the row estimate to even estimate 2 rows rather than 1,
then it'll choose the other plan.  An estimate of 2 rows would mean
the total cost of the best path after the limit would be 61088.32 / 2
= 30544.16, which is cheaper than the 36357.34 of the bad plan.

You could try ANALYZE on tracktrip, or perhaps increasing the
statistics targets on the columns being queried here.

If there's a correlation between the "a" and "route" columns then you
might want to try CREATE STATISTICS:

CREATE STATISTICS ON a,route FROM tracktrip;
ANALYZE tracktrip;

David




Re: Plan selection based on worst case scenario

2024-05-29 Thread Chema
Hey Darwin,

you don't mention your version or config, but it's always good to go
through https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server

I used to notice huge improvements in plans when increasing statistics in
relevant columns, as already suggested by David, and also by
lowering random_page_cost, especially in older Pg versions.