SQL performance issue (postgresql chooses a bad plan when a better one is available)
AWS RDS v12
The following SQL takes ~25 seconds to run. I'm relatively new to postgres
but the execution plan (https://explain.depesz.com/s/N4oR) looks like it's
materializing the entire EXISTS subquery for each row returned by the rest
of the query before probing for plate_384_id existence. postgres is
choosing sequential scans on sample_plate_384 and test_result when
suitable, efficient indexes exist. a re-written query produces a much
better plan (https://explain.depesz.com/s/zXJ6). Executing the EXISTS
portion of the query with an explicit PLATE_384_ID yields the execution
plan we want as well (https://explain.depesz.com/s/3QAK). unnesting the
EXISTS and adding a DISTINCT on the result also yields a better plan.
I've tried tried the following:
disable parallel
set join_collapse_limit=1 and played with order of EXISTS/NOT EXISTS
changed work_mem and enable_material to see if that had any effect
VACUUM FULL'd TEST_RESULT and SAMPLE_PLATE_384
created a stats object on (sample_id, sample_plate_384_id) for both
TEST_RESULT and SAMPLE_PLATE_384 to see if that would help (they increment
fairly consistently with each other)
I'm out of ideas on how to convince postgres to choose a better plan. any
and all help/suggestions/explanations would be greatly appreciated. the
rewritten SQL performs sufficiently well but i'd like to understand why
postgres is doing this and what to do about it so i can't tackle the next
SQL performance issue with a little more knowledge.
SELECT count(*) AS "count" FROM "plate_384_scan"
WHERE NOT EXISTS (SELECT 1 FROM "plate_384_scan" AS "plate_384_scan_0"
WHERE "plate_384_scan_0"."ts" > "plate_384_scan"."ts" AND
"plate_384_scan_0"."plate_384_id" = "plate_384_scan"."plate_384_id")
AND EXISTS (SELECT 1 FROM "sample_plate_384" INNER JOIN "test_result"
USING ("sample_plate_384_id", "sample_id") WHERE "test_result" IS NULL AND
"plate_384_scan_id" = "plate_384_scan"."plate_384_scan_id")
AND NOT EXISTS (SELECT 1 FROM "plate_384_abandoned" WHERE "plate_384_id"
= "plate_384_scan"."plate_384_id");
[limsdb_dev] # SELECT relname, relpages, reltuples, relallvisible, relkind,
relnatts, relhassubclass, reloptions, pg_table_size(oid) FROM pg_class
WHERE relname in ('sample_plate_384','test_result',
'plate_384_scan','plate_384_abandoned') order by 1;
relname | relpages | reltuples | relallvisible | relkind |
relnatts | relhassubclass | reloptions | pg_table_size
-+--+---+---+-+--+++---
plate_384_abandoned |1 |16 | 0 | r |
4 | f | (null) | 16384
plate_384_scan | 13 | 1875 | 0 | r |
5 | f | (null) |131072
sample_plate_384| 3827 |600701 | 0 | r |
9 | f | (null) | 31350784
test_result | 4900 |599388 | 0 | r |
8 | f | (null) | 40140800
(4 rows)
Time: 44.405 ms
[limsdb_dev] # \d plate_384_abandoned
Table "lab_data.plate_384_abandoned"
Column| Type | Collation | Nullable |
Default
--+--+---+--+---
plate_384_id | integer | | not null |
reason | text | | not null |
tech_id | integer | | |
ts | timestamp with time zone | | not null |
CURRENT_TIMESTAMP
Indexes:
"plate_384_abandoned_pkey" PRIMARY KEY, btree (plate_384_id)
Foreign-key constraints:
"plate_384_abandoned_plate_384_id_fkey" FOREIGN KEY (plate_384_id)
REFERENCES plate_384(plate_384_id)
"plate_384_abandoned_tech_id_fkey" FOREIGN KEY (tech_id) REFERENCES
tech(tech_id)
[limsdb_dev] # \d plate_384_scan
Table
"lab_data.plate_384_scan"
Column | Type | Collation | Nullable |
Default
---+--+---+--+---
plate_384_scan_id | integer | | not null |
nextval('plate_384_scan_plate_384_scan_id_seq'::regclass)
plate_384_id | integer | | not null |
equipment_id | integer | | not null |
tech_id | integer | | not null |
ts| timestamp with time zone | | not null |
CURRENT_TIMESTAMP
Indexes:
"pk_plate_384_scan" PRIMARY KEY, btree (plate_384_scan_id)
"plate_384_scan_idx001" btree (ts, plate_384_scan_id)
"plate_384_scan_idx002" btree (plate_384_id, ts)
Foreign-key constraints:
"fk_plate_384_scan_equipment_id" FOREIGN KEY (equipment_id) REFERENCES
equipment(equipment_id)
"fk
Re: SQL performance issue (postgresql chooses a bad plan when a better one is available)
On Mon, 2021-03-22 at 08:10 -0500, Chris Stephens wrote: > The following SQL takes ~25 seconds to run. I'm relatively new to postgres > but the execution plan (https://explain.depesz.com/s/N4oR) looks like it's > materializing the entire EXISTS subquery for each row returned by the rest > of the query before probing for plate_384_id existence. postgres is > choosing sequential scans on sample_plate_384 and test_result when suitable, > efficient indexes exist. a re-written query produces a much better plan > (https://explain.depesz.com/s/zXJ6). Executing the EXISTS portion of the > query with an explicit PLATE_384_ID yields the execution plan we want as > well (https://explain.depesz.com/s/3QAK). unnesting the EXISTS and adding > a DISTINCT on the result also yields a better plan. Great! Then use one of the rewritten queries. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
Re: SQL performance issue (postgresql chooses a bad plan when a better one is available)
we are but i was hoping to get a better understanding of where the optimizer is going wrong and what i can do about it. chris On Mon, Mar 22, 2021 at 9:54 AM Laurenz Albe wrote: > On Mon, 2021-03-22 at 08:10 -0500, Chris Stephens wrote: > > The following SQL takes ~25 seconds to run. I'm relatively new to > postgres > > but the execution plan (https://explain.depesz.com/s/N4oR) looks like > it's > > materializing the entire EXISTS subquery for each row returned by the > rest > > of the query before probing for plate_384_id existence. postgres is > > choosing sequential scans on sample_plate_384 and test_result when > suitable, > > efficient indexes exist. a re-written query produces a much better plan > > (https://explain.depesz.com/s/zXJ6). Executing the EXISTS portion of > the > > query with an explicit PLATE_384_ID yields the execution plan we want as > > well (https://explain.depesz.com/s/3QAK). unnesting the EXISTS and > adding > > a DISTINCT on the result also yields a better plan. > > Great! Then use one of the rewritten queries. > > Yours, > Laurenz Albe > -- > Cybertec | https://www.cybertec-postgresql.com > >
Re: SQL performance issue (postgresql chooses a bad plan when a better one is available)
you can play around various `enable_*` flags to see if disabling any of these will *maybe* yield the plan you were expecting, and then check the costs in EXPLAIN to see if the optimiser also thinks this plan is cheaper. On Mon, Mar 22, 2021 at 6:29 PM Chris Stephens wrote: > > we are but i was hoping to get a better understanding of where the optimizer > is going wrong and what i can do about it. > > chris > > > On Mon, Mar 22, 2021 at 9:54 AM Laurenz Albe wrote: >> >> On Mon, 2021-03-22 at 08:10 -0500, Chris Stephens wrote: >> > The following SQL takes ~25 seconds to run. I'm relatively new to postgres >> > but the execution plan (https://explain.depesz.com/s/N4oR) looks like it's >> > materializing the entire EXISTS subquery for each row returned by the rest >> > of the query before probing for plate_384_id existence. postgres is >> > choosing sequential scans on sample_plate_384 and test_result when >> > suitable, >> > efficient indexes exist. a re-written query produces a much better plan >> > (https://explain.depesz.com/s/zXJ6). Executing the EXISTS portion of the >> > query with an explicit PLATE_384_ID yields the execution plan we want as >> > well (https://explain.depesz.com/s/3QAK). unnesting the EXISTS and adding >> > a DISTINCT on the result also yields a better plan. >> >> Great! Then use one of the rewritten queries. >> >> Yours, >> Laurenz Albe >> -- >> Cybertec | https://www.cybertec-postgresql.com >>
Odd (slow) plan choice with min/max
Hi all,
I have a query where Postgresql (11.9 at the moment) is making an odd plan
choice, choosing to use index scans which require filtering out millions of
rows, rather than "just" doing an aggregate over the rows the where clause
targets which is much faster.
AFAICT it isn't a statistics problem, at least increasing the stats target
and analyzing the table doesn't seem to fix the problem.
The query looks like:
==
explain analyze select min(risk_id),max(risk_id) from risk where
time>='2020-01-20 15:00:07+00' and time < '2020-01-21 15:00:08+00';
QUERY PLAN
---
Result (cost=217.80..217.81 rows=1 width=16) (actual
time=99722.685..99722.687 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=0.57..108.90 rows=1 width=8) (actual
time=38454.537..38454.538 rows=1 loops=1)
-> Index Scan using risk_risk_id_key on risk
(cost=0.57..9280362.29 rows=85668 width=8) (actual
time=38454.535..38454.536 rows=1 loops=1)
Index Cond: (risk_id IS NOT NULL)
Filter: (("time" >= '2020-01-20 15:00:07+00'::timestamp
with time zone) AND ("time" < '2020-01-21 15:00:08+00'::timestamp with time
zone))
Rows Removed by Filter: 161048697
InitPlan 2 (returns $1)
-> Limit (cost=0.57..108.90 rows=1 width=8) (actual
time=61268.140..61268.140 rows=1 loops=1)
-> Index Scan Backward using risk_risk_id_key on risk risk_1
(cost=0.57..9280362.29 rows=85668 width=8) (actual
time=61268.138..61268.139 rows=1 loops=1)
Index Cond: (risk_id IS NOT NULL)
Filter: (("time" >= '2020-01-20 15:00:07+00'::timestamp
with time zone) AND ("time" < '2020-01-21 15:00:08+00'::timestamp with time
zone))
Rows Removed by Filter: 41746396
Planning Time: 0.173 ms
Execution Time: 99722.716 ms
(15 rows)
==
If I add a count(*) so it has to consider all rows in the range for that
part of the query and doesn't consider using the other index for a min/max
"shortcut" then the query is fast.
==
explain analyze select min(risk_id),max(risk_id), count(*) from risk where
time>='2020-01-20 15:00:07+00' and time < '2020-01-21 15:00:08+00';
QUERY PLAN
Aggregate (cost=4376.67..4376.68 rows=1 width=24) (actual
time=30.011..30.012 rows=1 loops=1)
-> Index Scan using risk_time_idx on risk (cost=0.57..3734.17
rows=85667 width=8) (actual time=0.018..22.441 rows=90973 loops=1)
Index Cond: (("time" >= '2020-01-20 15:00:07+00'::timestamp with
time zone) AND ("time" < '2020-01-21 15:00:08+00'::timestamp with time
zone))
Planning Time: 0.091 ms
Execution Time: 30.045 ms
(5 rows)
==
My count() hack works around my immediate problem but I'm trying to get my
head round why Postgres chooses the plan it does without it, in case there
is some general problem with my configuration that may negatively effect
other areas, or there's something else I am missing.
Any ideas?
Paul McGarry
Re: Odd (slow) plan choice with min/max
On Tue, Mar 23, 2021 at 03:00:38PM +1100, Paul McGarry wrote: > I have a query where Postgresql (11.9 at the moment) is making an odd plan > choice, choosing to use index scans which require filtering out millions of > rows, rather than "just" doing an aggregate over the rows the where clause > targets which is much faster. > AFAICT it isn't a statistics problem, at least increasing the stats target > and analyzing the table doesn't seem to fix the problem. > explain analyze select min(risk_id),max(risk_id) from risk where > time>='2020-01-20 15:00:07+00' and time < '2020-01-21 15:00:08+00'; I'm guessing the time and ID columns are highly correlated... So the planner thinks it can get the smallest ID by scanning the ID index, but then ends up rejecting the first 161e6 rows for which the time is too low, and fails the >= condition. And thinks it can get the greatest ID by backward scanning the ID idx, but ends up rejecting/filtering the first 41e6 rows, for which the time is too high, failing the < condition. This is easy to reproduce: postgres=# DROP TABLE t; CREATE TABLE t AS SELECT a i,a j FROM generate_series(1,99)a; CREATE INDEX ON t(j); ANALYZE t; postgres=# explain analyze SELECT min(j), max(j) FROM t WHERE i BETWEEN AND 9; One solution seems to be to create an index on (i,j), but I don't know if there's a better way. -- Justin
Re: Odd (slow) plan choice with min/max
On Tue, 23 Mar 2021 at 16:13, Justin Pryzby wrote: > On Tue, Mar 23, 2021 at 03:00:38PM +1100, Paul McGarry wrote: > > I have a query where Postgresql (11.9 at the moment) is making an odd > plan > > choice, choosing to use index scans which require filtering out millions > of > > rows, rather than "just" doing an aggregate over the rows the where > clause > > targets which is much faster. > > AFAICT it isn't a statistics problem, at least increasing the stats > target > > and analyzing the table doesn't seem to fix the problem. > > > explain analyze select min(risk_id),max(risk_id) from risk where > > time>='2020-01-20 15:00:07+00' and time < '2020-01-21 15:00:08+00'; > > I'm guessing the time and ID columns are highly correlated... > > So the planner thinks it can get the smallest ID by scanning the ID index, > but > then ends up rejecting the first 161e6 rows for which the time is too low, > and > fails the >= condition. > > And thinks it can get the greatest ID by backward scanning the ID idx, but > ends > up rejecting/filtering the first 41e6 rows, for which the time is too high, > failing the < condition. > Yes, the columns are highly correlated, but that alone doesn't seem like it should be sufficient criteria to choose this plan. Ie the selection criteria (1 day of data about a year ago) has a year+ worth of data after it and probably a decade of data before it, so anything walking a correlated index from top or bottom is going to have to walk past a lot of data before it gets to data that fits the criteria. > One solution seems to be to create an index on (i,j), but I don't know if > there's a better way. > > Adding the count() stops the planner considering the option so that will work for now. My colleague has pointed out that we had the same issue in November and I came up with the count() workaround then too, but somehow seem to have forgotten it in the meantime and reinvented it today. I wonder if I posted to pgsql-performance then too. Maybe time for me to read the PG12 release notes Paul
