AWS RDS v12
The following SQL takes ~25 seconds to run. I'm relatively new to postgres
but the execution plan (https://explain.depesz.com/s/N4oR) looks like it's
materializing the entire EXISTS subquery for each row returned by the rest
of the query before probing for plate_384_id existence. postgres is
choosing sequential scans on sample_plate_384 and test_result when
suitable, efficient indexes exist. a re-written query produces a much
better plan (https://explain.depesz.com/s/zXJ6). Executing the EXISTS
portion of the query with an explicit PLATE_384_ID yields the execution
plan we want as well (https://explain.depesz.com/s/3QAK). unnesting the
EXISTS and adding a DISTINCT on the result also yields a better plan.
I've tried tried the following:
disable parallel
set join_collapse_limit=1 and played with order of EXISTS/NOT EXISTS
changed work_mem and enable_material to see if that had any effect
VACUUM FULL'd TEST_RESULT and SAMPLE_PLATE_384
created a stats object on (sample_id, sample_plate_384_id) for both
TEST_RESULT and SAMPLE_PLATE_384 to see if that would help (they increment
fairly consistently with each other)
I'm out of ideas on how to convince postgres to choose a better plan. any
and all help/suggestions/explanations would be greatly appreciated. the
rewritten SQL performs sufficiently well but i'd like to understand why
postgres is doing this and what to do about it so i can't tackle the next
SQL performance issue with a little more knowledge.
SELECT count(*) AS "count" FROM "plate_384_scan"
WHERE NOT EXISTS (SELECT 1 FROM "plate_384_scan" AS "plate_384_scan_0"
WHERE "plate_384_scan_0"."ts" > "plate_384_scan"."ts" AND
"plate_384_scan_0"."plate_384_id" = "plate_384_scan"."plate_384_id")
AND EXISTS (SELECT 1 FROM "sample_plate_384" INNER JOIN "test_result"
USING ("sample_plate_384_id", "sample_id") WHERE "test_result" IS NULL AND
"plate_384_scan_id" = "plate_384_scan"."plate_384_scan_id")
AND NOT EXISTS (SELECT 1 FROM "plate_384_abandoned" WHERE "plate_384_id"
= "plate_384_scan"."plate_384_id");
[limsdb_dev] # SELECT relname, relpages, reltuples, relallvisible, relkind,
relnatts, relhassubclass, reloptions, pg_table_size(oid) FROM pg_class
WHERE relname in ('sample_plate_384','test_result',
'plate_384_scan','plate_384_abandoned') order by 1;
relname | relpages | reltuples | relallvisible | relkind |
relnatts | relhassubclass | reloptions | pg_table_size
-+--+---+---+-+--+++---
plate_384_abandoned |1 |16 | 0 | r |
4 | f | (null) | 16384
plate_384_scan | 13 | 1875 | 0 | r |
5 | f | (null) |131072
sample_plate_384| 3827 |600701 | 0 | r |
9 | f | (null) | 31350784
test_result | 4900 |599388 | 0 | r |
8 | f | (null) | 40140800
(4 rows)
Time: 44.405 ms
[limsdb_dev] # \d plate_384_abandoned
Table "lab_data.plate_384_abandoned"
Column| Type | Collation | Nullable |
Default
--+--+---+--+---
plate_384_id | integer | | not null |
reason | text | | not null |
tech_id | integer | | |
ts | timestamp with time zone | | not null |
CURRENT_TIMESTAMP
Indexes:
"plate_384_abandoned_pkey" PRIMARY KEY, btree (plate_384_id)
Foreign-key constraints:
"plate_384_abandoned_plate_384_id_fkey" FOREIGN KEY (plate_384_id)
REFERENCES plate_384(plate_384_id)
"plate_384_abandoned_tech_id_fkey" FOREIGN KEY (tech_id) REFERENCES
tech(tech_id)
[limsdb_dev] # \d plate_384_scan
Table
"lab_data.plate_384_scan"
Column | Type | Collation | Nullable |
Default
---+--+---+--+---
plate_384_scan_id | integer | | not null |
nextval('plate_384_scan_plate_384_scan_id_seq'::regclass)
plate_384_id | integer | | not null |
equipment_id | integer | | not null |
tech_id | integer | | not null |
ts| timestamp with time zone | | not null |
CURRENT_TIMESTAMP
Indexes:
"pk_plate_384_scan" PRIMARY KEY, btree (plate_384_scan_id)
"plate_384_scan_idx001" btree (ts, plate_384_scan_id)
"plate_384_scan_idx002" btree (plate_384_id, ts)
Foreign-key constraints:
"fk_plate_384_scan_equipment_id" FOREIGN KEY (equipment_id) REFERENCES
equipment(equipment_id)
"fk