Trying to understand why a query is filtering when there is a composite index

2024-08-18 Thread Stephen Samuel (Sam)
Hi folks.

I have a table with 4.5m rows per partition (16 partitions) (I know, very
small, probably didn't need to be partitioned).

The table has two columns, a bigint and b text.
There is a unique index on (a,b)
The query is:

SELECT b
FROM table
WHERE a = 
  AND b IN ()


The visibility map is almost exclusively true.
This table gets few updates.

The planner says index only scan, but is filtering on b.

Index Only Scan using pkey on table  (cost=0.46..29.09 rows=1
width=19) (actual time=0.033..0.053 rows=10 loops=1)
  Index Cond: (a = 662028765)
"  Filter: (b = ANY
('{634579987:662028765,561730945:662028765,50183:662028765,472806302:662028765,401361055:662028765,363587258:662028765,346093772:662028765,314369897:662028765,289498328:662028765,217993946:662028765}'::text[]))"
  Rows Removed by Filter: 1
  Heap Fetches: 11
Planning Time: 0.095 ms
Execution Time: 0.070 ms

My question is, why isn't it using the index for column b? Is this
expected? And why is it doing heap lookups for every row,.

Performance is still good, but I am curious.

Thanks in advance!


Re: Trying to understand why a query is filtering when there is a composite index

2024-08-18 Thread Peter Geoghegan
On Sun, Aug 18, 2024 at 9:56 PM Stephen Samuel (Sam)  wrote:
> My question is, why isn't it using the index for column b? Is this expected? 
> And why is it doing heap lookups for every row,.

This has been fixed for Postgres 17:

https://pganalyze.com/blog/5mins-postgres-17-faster-btree-index-scans

-- 
Peter Geoghegan




Re: Trying to understand why a query is filtering when there is a composite index

2024-08-18 Thread Stephen Samuel (Sam)
Oh as simple as upgrading!
Ok great, appreciate the quick reply. Will have to wait for AWS to support
17 :)


On Sun, 18 Aug 2024 at 20:59, Peter Geoghegan  wrote:

> On Sun, Aug 18, 2024 at 9:56 PM Stephen Samuel (Sam) 
> wrote:
> > My question is, why isn't it using the index for column b? Is this
> expected? And why is it doing heap lookups for every row,.
>
> This has been fixed for Postgres 17:
>
> https://pganalyze.com/blog/5mins-postgres-17-faster-btree-index-scans
>
> --
> Peter Geoghegan
>


Re: Trying to understand why a query is filtering when there is a composite index

2024-08-18 Thread Peter Geoghegan
On Sun, Aug 18, 2024 at 10:01 PM Stephen Samuel (Sam)  wrote:
> Oh as simple as upgrading!
> Ok great, appreciate the quick reply. Will have to wait for AWS to support 17 
> :)

It is possible to use index quals for both a and b on earlier
versions, with certain restrictions. You might try setting
random_page_cost to a much lower value, to see if that allows the
planner to use such a plan with your real query.

In my experience it's very unlikely that the planner will do that,
though, even when coaxed. At least when there are this many IN()
constants. So you probably really will need to upgrade to 17.

-- 
Peter Geoghegan




Re: Trying to understand why a query is filtering when there is a composite index

2024-08-18 Thread Stephen Samuel (Sam)
Performance is pretty good anyway, and I'm only running 5 r7.large readers
on this service, I was just looking at the query planner and it surprised
me.


On Sun, 18 Aug 2024 at 21:08, Peter Geoghegan  wrote:

> On Sun, Aug 18, 2024 at 10:01 PM Stephen Samuel (Sam) 
> wrote:
> > Oh as simple as upgrading!
> > Ok great, appreciate the quick reply. Will have to wait for AWS to
> support 17 :)
>
> It is possible to use index quals for both a and b on earlier
> versions, with certain restrictions. You might try setting
> random_page_cost to a much lower value, to see if that allows the
> planner to use such a plan with your real query.
>
> In my experience it's very unlikely that the planner will do that,
> though, even when coaxed. At least when there are this many IN()
> constants. So you probably really will need to upgrade to 17.
>
> --
> Peter Geoghegan
>


Re: Trying to understand why a query is filtering when there is a composite index

2024-08-18 Thread Tom Lane
"Stephen Samuel (Sam)"  writes:
> There is a unique index on (a,b)
> The query is:

> SELECT b
> FROM table
> WHERE a = 
>   AND b IN ()

> The planner says index only scan, but is filtering on b.

> Index Only Scan using pkey on table  (cost=0.46..29.09 rows=1
> width=19) (actual time=0.033..0.053 rows=10 loops=1)
>   Index Cond: (a = 662028765)
> "  Filter: (b = ANY
> ('{634579987:662028765,561730945:662028765,50183:662028765,472806302:662028765,401361055:662028765,363587258:662028765,346093772:662028765,314369897:662028765,289498328:662028765,217993946:662028765}'::text[]))"
>   Rows Removed by Filter: 1
>   Heap Fetches: 11
> Planning Time: 0.095 ms
> Execution Time: 0.070 ms

I think it's a good bet that this query would be *slower* if
it were done the other way.  The filter condition is eliminating
only one of the 11 rows matching "a = 662028765".  If we did what
you think you want, we'd initiate ten separate index descents
to find the other ten rows.

Whether the planner is costing this out accurately enough to
realize that, or whether it's just accidentally falling into
the right plan, I'm not sure; you've not provided nearly
enough details for anyone to guess what the other cost estimate
was.

> And why is it doing heap lookups for every row,.

Yeah, that part is a weakness I've wanted to fix for a long
time: it could do the filter condition by fetching b from the
index, but it doesn't notice that and has to go to the heap
to get b.  (If the other plan does win, it'd likely be because
of that problem and not because the index scanning strategy
per se is better.)

regards, tom lane




Re: Trying to understand why a query is filtering when there is a composite index

2024-08-18 Thread Peter Geoghegan
On Sun, Aug 18, 2024 at 10:50 PM Tom Lane  wrote:
> I think it's a good bet that this query would be *slower* if
> it were done the other way.  The filter condition is eliminating
> only one of the 11 rows matching "a = 662028765".  If we did what
> you think you want, we'd initiate ten separate index descents
> to find the other ten rows.

True - on versions prior to Postgres 17.

On 17 the number of index descents will be minimal.  If there are less
than a few hundred index tuples with the value a = , then
there'll only be one descent.

> Yeah, that part is a weakness I've wanted to fix for a long
> time: it could do the filter condition by fetching b from the
> index, but it doesn't notice that and has to go to the heap
> to get b.

It was fixed? At least on 17.

-- 
Peter Geoghegan




Re: Trying to understand why a query is filtering when there is a composite index

2024-08-18 Thread Tom Lane
Peter Geoghegan  writes:
> On Sun, Aug 18, 2024 at 10:50 PM Tom Lane  wrote:
>> Yeah, that part is a weakness I've wanted to fix for a long
>> time: it could do the filter condition by fetching b from the
>> index, but it doesn't notice that and has to go to the heap
>> to get b.

> It was fixed? At least on 17.

Oh, sorry, I was thinking of a related problem that doesn't apply
here: matching indexes on expressions to fragments of a filter
condition.  However, the fact that the OP's EXPLAIN shows heap
fetches from a supposedly all-visible table suggests that his
IN isn't getting optimized that way.  I wonder why --- it seems
to work for me, even in fairly old versions.  Taking a parallel
example from the regression database, even v12 can do

regression=# explain analyze select tenthous from tenk1 where thousand=99 and 
tenthous in (1,4,7,9,11,55,66,88,99,77,8876,9876);
   QUERY PLAN   
 
-
 Index Only Scan using tenk1_thous_tenthous on tenk1  (cost=0.29..4.61 rows=1 
width=4) (actual time=0.016..0.018 rows=1 loops=1)
   Index Cond: (thousand = 99)
   Filter: (tenthous = ANY ('{1,4,7,9,11,55,66,88,99,77,8876,9876}'::integer[]))
   Rows Removed by Filter: 9
   Heap Fetches: 0
 Planning Time: 0.298 ms
 Execution Time: 0.036 ms
(7 rows)

No heap fetches, so it must have done the filter from the index.
Why not in the original case?

regards, tom lane




Re: Trying to understand why a query is filtering when there is a composite index

2024-08-18 Thread Stephen Samuel (Sam)
select all_visible, count(*)
from pg_visibility('table')
group by all_visible

false,1614
true,30575

The table is partitioned if that matters (but same results if I run the
queries directly on the partition).

On Sun, 18 Aug 2024 at 23:06, Tom Lane  wrote:

> Peter Geoghegan  writes:
> > On Sun, Aug 18, 2024 at 10:50 PM Tom Lane  wrote:
> >> Yeah, that part is a weakness I've wanted to fix for a long
> >> time: it could do the filter condition by fetching b from the
> >> index, but it doesn't notice that and has to go to the heap
> >> to get b.
>
> > It was fixed? At least on 17.
>
> Oh, sorry, I was thinking of a related problem that doesn't apply
> here: matching indexes on expressions to fragments of a filter
> condition.  However, the fact that the OP's EXPLAIN shows heap
> fetches from a supposedly all-visible table suggests that his
> IN isn't getting optimized that way.  I wonder why --- it seems
> to work for me, even in fairly old versions.  Taking a parallel
> example from the regression database, even v12 can do
>
> regression=# explain analyze select tenthous from tenk1 where thousand=99
> and tenthous in (1,4,7,9,11,55,66,88,99,77,8876,9876);
>QUERY PLAN
>
>
> -
>  Index Only Scan using tenk1_thous_tenthous on tenk1  (cost=0.29..4.61
> rows=1 width=4) (actual time=0.016..0.018 rows=1 loops=1)
>Index Cond: (thousand = 99)
>Filter: (tenthous = ANY
> ('{1,4,7,9,11,55,66,88,99,77,8876,9876}'::integer[]))
>Rows Removed by Filter: 9
>Heap Fetches: 0
>  Planning Time: 0.298 ms
>  Execution Time: 0.036 ms
> (7 rows)
>
> No heap fetches, so it must have done the filter from the index.
> Why not in the original case?
>
> regards, tom lane
>