Hello everyone!
Recently, we upgraded the AWS RDS instance from Postgres 12.14 to 15.4
and noticed extremely high disk consumption on the following query
execution:
select (exists (select 1 as "one" from "public"."indexed_commit" where
"public"."indexed_commit"."repo_id" in (964992,964994,964999, ...);
For some reason, the query planner starts using Seq Scan instead of the
index on the "repo_id" column when requesting under user limited with
RLS. On prod, it happens when there are more than 316 IDs in the IN part
of the query, on stage - 3. If we execute the request from Superuser,
the planner always uses the "repo_id" index.
Luckily, we can easily reproduce this on our stage database (which is
smaller). If we add a multicolumn "repo_id, tenant_id" index, the
planner uses it (Index Only Scan) with any IN params count under RLS.
Could you please clarify if this is a Postgres bug or not? Should we
include the "tenant_id" column in all our indexes to make them work
under RLS?
Postgres version / Operating system+version
PostgreSQL 15.4 on aarch64-unknown-linux-gnu, compiled by gcc (GCC)
7.3.1 20180712 (Red Hat 7.3.1-6), 64-bit
Full Table and Index Schema
\d indexed_commit
Table "public.indexed_commit"
Column | Type | Collation | Nullable |
Default
---+-+---+--+-
id | bigint | | not null |
commit_hash | character varying(40) | | not null |
parent_hash | text | | |
created_ts | timestamp without time zone | | not null |
repo_id | bigint | | not null |
lines_added | bigint | | |
lines_removed | bigint | | |
tenant_id | uuid | | not null |
author_id | uuid | | not null |
Indexes:
"indexed-commit-repo-idx" btree (repo_id)
"indexed_commit_commit_hash_repo_id_key" UNIQUE CONSTRAINT, btree
(commit_hash, repo_id) REPLICA IDENTITY
"indexed_commit_repo_id_without_loc_idx" btree (repo_id) WHERE
lines_added IS NULL OR lines_removed IS NULL
Policies:
POLICY "commit_isolation_policy"
USING ((tenant_id =
(current_setting('app.current_tenant_id'::text))::uuid))
Table Metadata
SELECT relname, relpages, reltuples, relallvisible, relkind, relnatts,
relhassubclass, reloptions, pg_table_size(oid) FROM pg_class WHERE
relname='indexed_commit';
relname | relpages | reltuples | relallvisible | relkind |
relnatts | relhassubclass | reloptions | pg_table_size
+--+--+---+-+--++-+---
indexed_commit | 18170522 | 7.451964e+08 | 18104744 | r |
9 | f |
{autovacuum_vacuum_scale_factor=0,autovacuum_analyze_scale_factor=0,autovacuum_vacuum_threshold=20,autovacuum_analyze_threshold=10}
| 148903337984
EXPLAIN (ANALYZE, BUFFERS), not just EXPLAIN
Production queries:
316 ids under RLS limited user
392 ids under RLS limited user
392 ids under Superuser
History
It became slow after the upgrade to 15.4. We never had any issues before.
Hardware
AWS DB class db.t4g.large + GP3 400GB disk
Maintenance Setup
Are you running autovacuum? Yes
If so, with what settings?
autovacuum_vacuum_scale_factor=0,autovacuum_analyze_scale_factor=0,autovacuum_vacuum_threshold=20,autovacuum_analyze_threshold=10
SELECT * FROM pg_stat_user_tables WHERE relname='indexed_commit';
relid | schemaname | relname | seq_scan | seq_tup_read |
idx_scan | idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del |
n_tup_hot_upd | n_live_tup | n_dead_tup | n_mod_since_analyze |
n_ins_since_vacuum | last_vacuum | last_autovacuum |
last_analyze | last_autoanalyze | vacuum_count |
autovacuum_count | analyze_count | autoanalyze_count
---+++--+--+---+---+---+---+---+---+++-++-+---+--+---+--+--+---+---
24662 | public | indexed_commit | 2485 | 49215378424 |
374533865 | 4050928807 | 764089750 | 2191615 | 18500311
| 0 | 745241398 | 383 | 46018
| 45343 | | 2023-10-11 23:51:29.17