On Wed, 23 Jul 2025 at 03:18, Peter J. Holzer <hjp-pg...@hjp.at> wrote: > > PostgreSQL version 17.5 on Ubuntu 24.04 (PGDG build).
> -> Merge Left Join (cost=4613.25..7180.30 rows=8357 > width=136) (actual time=222.037..292242.701 rows=40460 loops=1) > Merge Cond: (ns.nspname = s.schemaname) > Join Filter: ((s.tablename = tbl.relname) AND (s.attname > = att.attname)) > Rows Removed by Join Filter: 1215045276 > -> Sort (cost=4476.84..4477.18 rows=139 width=200) > (actual time=156.864..178628.030 rows=1215036357 loops=1) > Sort Key: s.schemaname > Sort Method: external sort Disk: 6480kB > -> Subquery Scan on s (cost=2189.59..4471.89 > rows=139 width=200) (actual time=81.840..131.044 rows=30921 loops=1) > The plans are clearly different, with the slow one claiming to sort 1.2 > billion rows (but using only 6.5MB of disk space) despite the node below > it only returning 30921 rows. 1215036357 is almost but not quite > 30921*40460, but that nested loop is a sibling of the sort, so it > shouldn't be included, right? This is coming from the "mark and restore" due to the Merge Join that executes that sort on its inner side "rewinding" the inner side back to check the matches for the next outer row. This is happening because the Merge Join on the slow query only contains 1 of the 3 columns you're joining on. Notice the Sort's subnode only returned 30921 rows. > And why are the plans different at all? Computing a few extra values per > row shouldn't change the cost of the query delivering the rows, IMHO. > But then the costs are very similar, so maybe it's just some random > variation. That's a good question. What happens if you SET enable_incremental_sort=0; and run the slow query again? I've not managed to recreate this locally, so far. I tried on 17.5 with hash joins disabled to get the merge join plan but my merge join includes all 3 clauses rather than your 1 clause. I'm unsure why your planner doesn't include all 3. To eliminate 1 moving part, is it still slow if you make the outer query just SELECT * FROM the subquery rather than have the additional calculations in the select list? (the planner won't pullup the subquery due to the GROUP BY, it'll only be eliminated during setrefs.c, which is well after the planner will choose the merge join clauses) David