Outer cost higher than the inner cost

2024-07-26 Thread Stanisław Skonieczny
Hello all postgres developers,

Recently pg started to make a query plan, which I can not understand.
The pan is here: https://explain.depesz.com/s/Lvw0#source

The interesting part is at the top:
```
Aggregate (cost=1132443.98..1132443.99 rows=1 width=24) (actual rows=1
loops=1)
-> Merge Join (cost=1127516.99..1131699.33 rows=372323 width=24)
(actual rows=642956 loops=1)
Merge Cond: (parent.volume_id = volume.id)
-> Merge Join (cost=1127516.66..7430940.30 rows=372323 width=40)
(actual rows=642956 loops=1)

...

-> Index Only Scan using volume_pkey on volume (cost=0.06..18.72
rows=1060 width=8) (actual rows=1011 loops=1)
 Heap Fetches: 23
```

What bothers me is that the inner plan cost (7430940) is higher than the
outer plan cost (1131699).
And I wonder how that is possible. There is no limit in the query that
would prevent PG from reading all rows coming out from inner Merge Join.
cursor_tuple_fraction is 1.

The query is similar to: (there were more joins, but they were rejected by
the planner)
```
SELECT CAST(count(*) AS BIGINT) AS COUNT
  FROM
(SELECT file.id
 FROM sf.file_current AS FILE
 JOIN sf.dir_current AS parent ON parent.id = file.parent_id
 AND parent.volume_id = file.volume_id
 JOIN sf_volumes.volume AS volume ON file.volume_id = volume.id
 WHERE (parent.volume_id = 1011
AND parent.ancestor_ids && ARRAY[151188430]::BIGINT[]
OR file.volume_id = 453)
   AND file.type = 32768
   AND file.volume_id IN (1011, 453)
   AND parent.volume_id IN (1011, 453)) AS fsentry_query
```

I am using:
PostgreSQL 13.12 (Ubuntu 13.12-1.pgdg20.04+1) on x86_64-pc-linux-gnu,
compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, 64-bit

All non-standard configuration params are in the attachment.

I am looking for some hints for understanding this situation.

Thanks,
Stanisław Skonieczny


params
Description: Binary data


Re: Outer cost higher than the inner cost

2024-07-26 Thread Tom Lane
=?UTF-8?Q?Stanis=C5=82aw_Skonieczny?=  writes:
> What bothers me is that the inner plan cost (7430940) is higher than the
> outer plan cost (1131699).

I think it is estimating (based on knowledge of the ranges of join keys
in the two relations) that that input subplan won't need to be run to
completion.  See initial_cost_mergejoin in costsize.c.

regards, tom lane