Re: Postgres 15 SELECT query doesn't use index under RLS

2023-10-18 Thread Alexander Okulovich

Hi Oscar,

Thank you for the suggestion.

Unfortunately, I didn't mention that on prod we performed the upgrade 
from Postgres 12 to 15 using replication to another instance with 
pglogical, so I assume that the index was filled from scratch by 
Postgres 15.


We upgraded stage instance by changing Postgres version only, so 
potentially could run into the index issue there. I've tried to execute 
REINDEX CONCURRENTLY, but the performance issue hasn't gone. The problem 
is probably somewhere else. However, I do not exclude that we'll perform 
REINDEX on prod.


Kind regards,

Alexander

On 13.10.2023 11:44, Oscar van Baten wrote:


Hi Alexander,

I think this is caused by the de-duplication of B-tree index entries 
which was added to postgres in version 13

https://www.postgresql.org/docs/release/13.0/

"
More efficiently store duplicates in B-tree indexes (Anastasia 
Lubennikova, Peter Geoghegan)
This allows efficient B-tree indexing of low-cardinality columns by 
storing duplicate keys only once. Users upgrading with pg_upgrade will 
need to use REINDEX to make an existing index use this feature.

"

When we upgraded from 12->13 we had a similar issue. We had to rebuild 
the indexes and it was fixed..



regards,
Oscar


Op do 12 okt 2023 om 18:41 schreef Alexander Okulovich 
:


Hello everyone!


Recently, we upgraded the AWS RDS instance from Postgres 12.14 to
15.4 and noticed extremely high disk consumption on the following
query execution:

select (exists (select 1 as "one" from "public"."indexed_commit"
where "public"."indexed_commit"."repo_id" in
(964992,964994,964999, ...);

For some reason, the query planner starts using Seq Scan instead
of the index on the "repo_id" column when requesting under user
limited with RLS. On prod, it happens when there are more than 316
IDs in the IN part of the query, on stage - 3. If we execute the
request from Superuser, the planner always uses the "repo_id" index.

Luckily, we can easily reproduce this on our stage database (which
is smaller). If we add a multicolumn "repo_id, tenant_id" index,
the planner uses it (Index Only Scan) with any IN params count
under RLS.

Could you please clarify if this is a Postgres bug or not? Should
we include the "tenant_id" column in all our indexes to make them
work under RLS?


  Postgres version / Operating system+version


PostgreSQL 15.4 on aarch64-unknown-linux-gnu, compiled by gcc
(GCC) 7.3.1 20180712 (Red Hat 7.3.1-6), 64-bit


  Full Table and Index Schema

\d indexed_commit
    Table "public.indexed_commit"
    Column |    Type | Collation |
Nullable | Default

---+-+---+--+-
 id    | bigint  |   | not null |
 commit_hash   | character varying(40)   |   | not null |
 parent_hash   | text    | |  |
 created_ts    | timestamp without time zone |   | not null |
 repo_id   | bigint  |   | not null |
 lines_added   | bigint  | |  |
 lines_removed | bigint  | |  |
 tenant_id | uuid    |   | not null |
 author_id | uuid    |   | not null |
Indexes:
    "indexed-commit-repo-idx" btree (repo_id)
    "indexed_commit_commit_hash_repo_id_key" UNIQUE CONSTRAINT,
btree (commit_hash, repo_id) REPLICA IDENTITY
    "indexed_commit_repo_id_without_loc_idx" btree (repo_id) WHERE
lines_added IS NULL OR lines_removed IS NULL
Policies:
    POLICY "commit_isolation_policy"
  USING ((tenant_id =
(current_setting('app.current_tenant_id'::text))::uuid))


  Table Metadata

SELECT relname, relpages, reltuples, relallvisible, relkind,
relnatts, relhassubclass, reloptions, pg_table_size(oid) FROM
pg_class WHERE relname='indexed_commit';
    relname | relpages |  reltuples   | relallvisible |
relkind | relnatts | relhassubclass | reloptions | pg_table_size

+--+--+---+-+--++-+---
 indexed_commit | 18170522 | 7.451964e+08 |  18104744 |
r   |    9 | f  |

{autovacuum_vacuum_scale_factor=0,autovacuum_analyze_scale_factor=0,autovacuum_vacuum_threshold=20,autovacuum_analyze_threshold=10}
|  148903337984


  EXPLAIN (ANALYZE, BUFFERS), not just EXPLAIN

Production queries:

316 ids under RLS limited user


392 ids under RLS limited user 

Re: Postgres 15 SELECT query doesn't use index under RLS

2023-10-18 Thread Alexander Okulovich

Hi Tom,


If in fact you were getting decent performance from an indexscan plan
before, the only explanation I can think of is that the repo_ids you
are querying for are correlated with the tenant_id, so that the RLS
filter doesn't eliminate very many rows from the index result.  The
planner wouldn't realize that by default, but if you create extended
statistics on repo_id and tenant_id then it might do better.  Still,
you probably want the extra index.


Do you have any idea how to measure that correlation?


You haven't shown any evidence suggesting that.

My suggestion is based on following backward reasoning.

We used the product with the default settings. The requests are simple. 
We didn't change the hardware (actually, we use even more performant 
hardware because of that issue) and DDL. I've checked the request on old 
and new databases. Requests that rely on this index execute more than 10 
times longer. Planner indeed used Index Scan before, but now it doesn't.


So, from my perspective, the only reason we experience that is database 
logic change. I think we could probably try to reproduce the issue on 
different Postgres versions and find the specific version that causes this.



Adding tenant_id is going to bloat your indexes quite a bit,
so I wouldn't do that except in cases where you've demonstrated
it's important.


Any recommendations from the Postgres team on how to use the indexes 
under RLS would help a lot here, but I didn't find them.


Kind regards,

Alexander

On 13.10.2023 22:26, Tom Lane wrote:

Alexander Okulovich  writes:

Recently, we upgraded the AWS RDS instance from Postgres 12.14 to 15.4
and noticed extremely high disk consumption on the following query
execution:
select (exists (select 1 as "one" from "public"."indexed_commit" where
"public"."indexed_commit"."repo_id" in (964992,964994,964999, ...);
For some reason, the query planner starts using Seq Scan instead of the
index on the "repo_id" column when requesting under user limited with
RLS. On prod, it happens when there are more than 316 IDs in the IN part
of the query, on stage - 3. If we execute the request from Superuser,
the planner always uses the "repo_id" index.

The superuser bypasses the RLS policy.  When that's enforced, the
query can no longer use an index-only scan (because it needs to fetch
tenant_id too).  Moreover, it may be that only a small fraction of the
rows fetched via the index will satisfy the RLS condition.  So the
estimated cost of an indexscan query could be high enough to persuade
the planner that a seqscan is a better idea.


Luckily, we can easily reproduce this on our stage database (which is
smaller). If we add a multicolumn "repo_id, tenant_id" index, the
planner uses it (Index Only Scan) with any IN params count under RLS.

Yeah, that would be the obvious way to ameliorate both problems.

If in fact you were getting decent performance from an indexscan plan
before, the only explanation I can think of is that the repo_ids you
are querying for are correlated with the tenant_id, so that the RLS
filter doesn't eliminate very many rows from the index result.  The
planner wouldn't realize that by default, but if you create extended
statistics on repo_id and tenant_id then it might do better.  Still,
you probably want the extra index.


Could you please clarify if this is a Postgres bug or not?

You haven't shown any evidence suggesting that.


Should we
include the "tenant_id" column in all our indexes to make them work
under RLS?

Adding tenant_id is going to bloat your indexes quite a bit,
so I wouldn't do that except in cases where you've demonstrated
it's important.

regards, tom lane





Re: Postgres 15 SELECT query doesn't use index under RLS

2023-10-18 Thread Tom Lane
Alexander Okulovich  writes:
> We used the product with the default settings. The requests are simple. 
> We didn't change the hardware (actually, we use even more performant 
> hardware because of that issue) and DDL. I've checked the request on old 
> and new databases. Requests that rely on this index execute more than 10 
> times longer. Planner indeed used Index Scan before, but now it doesn't.

> So, from my perspective, the only reason we experience that is database 
> logic change.

[ shrug... ]  Maybe, but it's still not clear if it's a bug, or an
intentional change, or just a cost estimate that was on the hairy
edge before and your luck ran out.

If you could provide a self-contained test case that performs 10x worse
under v15 than v12, we'd surely take a look at it.  But with the
information you've given so far, little is possible beyond speculation.

regards, tom lane