Re: Index not used in certain nested views but not in others

2025-05-03 Thread Peter J. Holzer
On 2025-04-30 17:16:47 +0200, Markus Demleitner wrote:
> There is another view made up of about 20 tables, looking somewhat
> like this:
[...]
> The first view, ivoa.obs_radio, is just a few hundred records,
> dfbsspec.raw_spectra is about 23 Megarows,
[...]
>->  Parallel Seq Scan on raw_spectra  
> (cost=0.00..2626995.66 rows=5803266 width=756) (actual time=0.137..6841.379 
> rows=4642657 loops=5)
[...]
> 
> My problem is: I can't seem to figure out why Postgres chooses to ignore
> the pubdid index on raw_spectra.pub_did and instead does the
> time-consuming seqscan.

It estimates that it has to read 5803266 of those 2300 rows. That's
25 %! I'm not surprised that it thinks just reading the whole table is
faster than doing almost 6 million index lookups (The actual count is
4642657, so that estimate wasn't totally off).


> Trying to investigate more closely, I wanted to simplify the
> situation and created a view like ivoa.obscore but only having the
> evil table in it:
[...]
>->  Nested Loop  (cost=0.56..4871.60 rows=561 width=0) (actual 
> time=2.478..2.479 rows=0 loops=1)
>  ->  Seq Scan on main  (cost=0.00..52.61 rows=561 width=48) (actual 
> time=0.011..0.317 rows=561 loops=1)
>  ->  Index Scan using raw_spectra_pub_did on raw_spectra  
> (cost=0.56..8.58 rows=1 width=66) (actual time=0.003..0.003 rows=0 loops=561)
>Index Cond: (pub_did = main.obs_publisher_did)

Here you select only 561 rows. That's just a tiny fraction of the whole
table, so the optimizer estimates that doing a few hundred index lookups
is faster than reading the whole table.

> So, when the SELECT statement on dfbsspec.ssa stands along in the view
> definition, Postgres does the right thing; when the exact same query
> stands in a UNION ALL with other tables, Postgres doesn't use the
> index.  Hu?

It is obviously not the exact same query if one of them need to read
1 times as many rows.

hjp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Re: Index not used in certain nested views but not in others

2025-05-03 Thread Tom Lane
"Peter J. Holzer"  writes:
> On 2025-04-30 17:16:47 +0200, Markus Demleitner wrote:
>> dfbsspec.raw_spectra is about 23 Megarows,
>> ->  Parallel Seq Scan on raw_spectra  (cost=0.00..2626995.66 rows=5803266 
>> width=756) (actual time=0.137..6841.379 rows=4642657 loops=5)

> It estimates that it has to read 5803266 of those 2300 rows.

No, you're misreading that (I admit it's confusing).  The rows report
is the average per parallel worker, and the loops count indicates we
had 5 workers.  So actually this parallel seqscan emitted 4642657*5
= 23213285 rows, or the whole table, which is what should be expected
given it has no filter condition and no LIMIT.

(I am wondering why the estimate is only 5803266 rows, because I don't
think that number is scaled for the number of workers...)

>> ->  Nested Loop  (cost=0.56..4871.60 rows=561 width=0) (actual 
>> time=2.478..2.479 rows=0 loops=1)
>>   ->  Seq Scan on main  (cost=0.00..52.61 rows=561 width=48) (actual 
>> time=0.011..0.317 rows=561 loops=1)
>>   ->  Index Scan using raw_spectra_pub_did on raw_spectra  
>> (cost=0.56..8.58 rows=1 width=66) (actual time=0.003..0.003 rows=0 loops=561)
>> Index Cond: (pub_did = main.obs_publisher_did)

> Here you select only 561 rows. That's just a tiny fraction of the whole
> table, so the optimizer estimates that doing a few hundred index lookups
> is faster than reading the whole table.

The point here is that the chosen plan shape allows pushing the join
qual "raw_spectra.pub_did = main.obs_publisher_did" down to be an
index condition, which is exactly what we have to do if we want to
avoid reading all of raw_spectra.  What Markus is complaining about
is that that fails to happen if there's a UNION ALL in the way.
Postgres is capable of doing that in other cases, so it's a fair
question.

regards, tom lane