Re: Poor performance with row wise comparisons

2025-02-09 Thread Tom Lane
Jon Emord  writes:
>->  Index Only Scan using entity_data_model_id_primary_key_uniq on entity  
> (cost=0.70..873753.60 rows=15581254 width=31) (actual time=0.093..2712.836 
> rows=100 loops=1)
>  Index Cond: ((ROW(data_model_id, primary_key) >= ROW(123, 
> 'ABC'::text)) AND (ROW(data_model_id, primary_key) <= ROW(123, 'DEF'::text)))
>  Heap Fetches: 4
>  Buffers: shared hit=97259

>   2.
> data_model_id = 123 is the 15 most common value of data_model_id with 10.8 
> million records

Hm.  I think your answer is in this comment in nbtree's
key-preprocessing logic:

 * Row comparison keys are currently also treated without any smarts:
 * we just transfer them into the preprocessed array without any
 * editorialization.  We can treat them the same as an ordinary inequality
 * comparison on the row's first index column, for the purposes of the logic
 * about required keys.

That is, for the purposes of deciding when the index scan can stop,
the "<= ROW" condition acts like "data_model_id <= 123".  So it will
run through all of the data_model_id = 123 entries before stopping.

regards, tom lane




Re: Poor performance with row wise comparisons

2025-02-09 Thread Jon Emord
With limit 101, the plan is ~equivalent to the no limit case

explain (analyze, buffers)
select data_model_id, primary_key
from entity
WHERE (data_model_id, primary_key) BETWEEN (123, ‘ABC’) AND (123, ‘DEF’)
limit 101;

  QUERY PLAN
---
 Limit  (cost=0.70..6.37 rows=101 width=31) (actual time=0.094..2712.844 
rows=100 loops=1)
   Buffers: shared hit=97259
   ->  Index Only Scan using entity_data_model_id_primary_key_uniq on entity  
(cost=0.70..873753.60 rows=15581254 width=31) (actual time=0.093..2712.836 
rows=100 loops=1)
 Index Cond: ((ROW(data_model_id, primary_key) >= ROW(123, 
'ABC'::text)) AND (ROW(data_model_id, primary_key) <= ROW(123, 'DEF'::text)))
 Heap Fetches: 4
 Buffers: shared hit=97259
 Planning:
   Buffers: shared hit=104
 Planning Time: 0.204 ms
 Execution Time: 2712.873 ms


Some other information about the table:


  1.  Row estimate for entity is 1.2 billion rows
  2.
data_model_id = 123 is the 15 most common value of data_model_id with 10.8 
million records
  3.  primary_key is a relatively unique column


select attname, null_frac, avg_width, n_distinct
from pg_stats
where tablename = ‘entity’ and attname in ('data_model_id', 'primary_key');
 attname  | null_frac | avg_width | n_distinct
--+---+---+-
 data_model_id | 0 | 8 |1143
 primary_key  | 0 |23 | -0.27303192
(2 rows)


From: Greg Sabino Mullane 
Sent: Friday, February 7, 2025 9:43 AM
To: Jon Emord 
Cc: [email protected] 

Subject: Re: Poor performance with row wise comparisons

You don't often get email from [email protected]. Learn why this is 
important
On Fri, Feb 7, 2025 at 2:05 AM Jon Emord 
mailto:[email protected]>> wrote:
 but with limit 101, the extra shared hits return

Can you show the explain analyze for the limit 101 case?

Cheers,
Greg

--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support

This email is from an external source. Exercise additional caution with links 
and attachments.


The content of this email is confidential, may contain proprietary information, 
and is solely intended for the recipient specified. If you received this 
message by mistake, please reply to this message and follow with its deletion, 
so that we can ensure such a mistake does not occur in the future.