Postgres index usage

2024-08-07 Thread Dirschel, Steve
I've found multiple postings out there saying you can query pg_stat_all_indexes 
and look at idx_scan to know if an index has been used by queries.  I want to 
be 100% sure I can rely on that table/column to know if an index has never been 
used.

I queried that table for a specific index and idx_scan is 0.  I queried 
pg_statio_all_indexes and can see idx_blks_read and idx_blks_hit have numbers 
in there.  If the index is not being used then what it causing idx_blks_read 
and idx_blks_hit to increase over time?  I'm wondering if those increase due to 
DML on the table.  Could anyone please confirm I can rely on 
pg_stat_all_index.idx_scan to know if queries are using an index and the 
increases over time in idx_blks_read and idx_blks_hit in pg_statio_all_indexes 
would be from DML (or possibly vacuum or other things)?

Thanks in advance.
This e-mail is for the sole use of the intended recipient and contains 
information that may be privileged and/or confidential. If you are not an 
intended recipient, please notify the sender by return e-mail and delete this 
e-mail and any attachments. Certain required legal entity disclosures can be 
accessed on our website: 
https://www.thomsonreuters.com/en/resources/disclosures.html


RE: Postgres index usage

2024-08-07 Thread Dirschel, Steve
Didn't mention-  this is Aurora Postgres version 14.6 if that matters for my 
question.  Thanks

From: Dirschel, Steve 
Sent: Wednesday, August 7, 2024 12:06 PM
To: [email protected]
Subject: Postgres index usage

I've found multiple postings out there saying you can query pg_stat_all_indexes 
and look at idx_scan to know if an index has been used by queries.  I want to 
be 100% sure I can rely on that table/column to know if an index has never been 
used.

I queried that table for a specific index and idx_scan is 0.  I queried 
pg_statio_all_indexes and can see idx_blks_read and idx_blks_hit have numbers 
in there.  If the index is not being used then what it causing idx_blks_read 
and idx_blks_hit to increase over time?  I'm wondering if those increase due to 
DML on the table.  Could anyone please confirm I can rely on 
pg_stat_all_index.idx_scan to know if queries are using an index and the 
increases over time in idx_blks_read and idx_blks_hit in pg_statio_all_indexes 
would be from DML (or possibly vacuum or other things)?

Thanks in advance.
This e-mail is for the sole use of the intended recipient and contains 
information that may be privileged and/or confidential. If you are not an 
intended recipient, please notify the sender by return e-mail and delete this 
e-mail and any attachments. Certain required legal entity disclosures can be 
accessed on our website: 
https://www.thomsonreuters.com/en/resources/disclosures.html


Re: Postgres index usage

2024-08-07 Thread Tom Lane
"Dirschel, Steve"  writes:
> I queried that table for a specific index and idx_scan is 0.  I
> queried pg_statio_all_indexes and can see idx_blks_read and
> idx_blks_hit have numbers in there.  If the index is not being used
> then what it causing idx_blks_read and idx_blks_hit to increase over
> time?  I'm wondering if those increase due to DML on the table.

Yes, I think that's the case: index updates will cause the per-block
counters to advance, but only an index search will increment idx_scan.

I'd recommend testing this theory for yourself in an idle database,
though.  It's not impossible that Aurora works differently from
community PG.

Another thing to keep in mind is that in versions before PG 15,
the statistics subsystem is (by design) unreliable and might sometimes
miss events under load.  This effect isn't big enough to invalidate
a conclusion that an index with idx_scan = 0 isn't being used, but
it's something to keep in mind when running small tests that are
only expected to record a few events.

regards, tom lane




Re: Postgres index usage

2024-08-07 Thread Greg Sabino Mullane
On Wed, Aug 7, 2024 at 1:06 PM Dirschel, Steve <
[email protected]> wrote:

> I’ve found multiple postings out there saying you can query
> pg_stat_all_indexes and look at idx_scan to know if an index has been used
> by queries.  I want to be 100% sure I can rely on that table/column to know
> if an index has never been used.
>

Also make sure you check pg_stat_all_indexes on your replicas as well. Each
has their own independent idx_scan counters. So while your primary is not
using a particular index, one or more of your replicas might be.

Cheers,
Greg


Partition pruning with array-contains check and current_setting function

2024-08-07 Thread Marcelo Zabani
Hello all. I am trying to make postgres 16 prune partition for queries with
`WHERE tenant_id=ANY(current_setting('my.tenant_id')::integer[])`, but I
haven't been able to make it work, and naturally it impacts performance so
I thought this list would be appropriate.

Here's the SQL I tried (but feel free to skip to the end as I'm sure all
this stuff is obvious to you!):








*CREATE TABLE tbl (id SERIAL NOT NULL, tenant_id INT NOT NULL, some_col
INT, PRIMARY KEY (tenant_id, id)) PARTITION BY HASH (tenant_id);CREATE
TABLE tbl1 PARTITION OF tbl FOR VALUES WITH (MODULUS 2, REMAINDER 0);CREATE
TABLE tbl2 PARTITION OF tbl FOR VALUES WITH (MODULUS 2, REMAINDER 1);INSERT
INTO tbl (tenant_id, some_col) SELECT 1, * FROM
generate_series(1,1);INSERT INTO tbl (tenant_id, some_col) SELECT 3, *
FROM generate_series(1,1);*

Partition pruning works as expected for this query (still not an
array-contains check):
*EXPLAIN ANALYZE SELECT COUNT(*) FROM tbl WHERE tenant_id=1;*

When reading from a setting it also prunes partitions correctly:

*SET my.tenant_id=1;EXPLAIN ANALYZE SELECT COUNT(*) FROM tbl WHERE
tenant_id=current_setting('my.tenant_id')::integer;*

It still does partition pruning if we use a scalar subquery. I can see the
(never executed) scans in the plan.
*EXPLAIN ANALYZE SELECT COUNT(*) FROM tbl WHERE tenant_id=(SELECT
current_setting('my.tenant_id')::integer);*

But how about an array-contains check? Still prunes, which is nice.
*EXPLAIN ANALYZE SELECT COUNT(*) FROM tbl WHERE
tenant_id=ANY('{1}'::integer[]);*

However, it doesn't prune if the array is in a setting:

*SET my.tenant_id='{1}';EXPLAIN ANALYZE SELECT COUNT(*) FROM tbl WHERE
tenant_id=ANY(current_setting('my.tenant_id')::integer[]);*

I actually expected that when in a setting, none of the previous queries
would've done partition pruning because I thought `current_setting` is not
a stable function. But some of them did, which surprised me.

So I thought maybe if I put it in a scalar query it will give me an
InitPlan node, but it looks like method resolution for =ANY won't let me
try this:
*EXPLAIN ANALYZE SELECT COUNT(*) FROM tbl WHERE tenant_id=ANY((SELECT
current_setting('my.tenant_id')::integer[]));*
*ERROR:  operator does not exist: integer = integer[]*

I tried using UNNEST, but that adds a Hash Semi Join to the plan which also
doesn't do partition pruning.
*EXPLAIN ANALYZE SELECT COUNT(*) FROM tbl WHERE tenant_id=ANY((SELECT
UNNEST(current_setting('my.tenant_id')::integer[])));*

My question is if there's a way to do partition pruning based on
array-contains operator if the array is in a setting. The use-case is to
make Row Level Security policies do partition pruning "automatically" in a
setting where users can be in more than one tenant.
It feels like this would work if there were a non-overloaded operator that
takes in an array and a single element and tests for array-contains,
because then I could use that operator with a scalar subquery and get an
InitPlan node. But I'm new to all of this, so apologies if I'm getting it
all wrong!

Thanks in advance,
Marcelo.