GIN JSONB path index is not always used

2023-10-17 Thread Tomasz Szymański
Hello! We have an issue with database planner choosing really expensive 
sequence scan instead of an index scan in some cases.
I'm reaching out in order to maybe get some idea on what we're dealing with / 
what could be the actual issue here.
The table affected is the users table with a field called "private_metadata" 
which is JSONB type.
There are currently two GIN indexes on that table - one GIN and second which is 
GIN jsonb_path_ops.
The operator that is used for the problematic query is contains (@>) so 
jsonb_path_ops should be preferred according to docs 
https://www.postgresql.org/docs/11/gin-builtin-opclasses.html.
However the database sometimes decides NOT to use the index and perform the 
sequence scan.
I have observed it happens in intervals - ex. 10 hours everything is ok, then 
something snaps and there are sequence scans for few more hours, then back to 
index and so on.
More context:
- Database version: 11.18
- The table schema with indexes
CREATE TABLE account_user (
id integer NOT NULL,
private_metadata jsonb,
);
CREATE INDEX user_p_meta_idx ON account_user USING gin (private_metadata);
CREATE INDEX user_p_meta_jsonb_path_idx ON account_user USING gin 
(private_metadata jsonb_path_ops);
- The query that is perfomed
SELECT "account_user"."id", "account_user"."private_metadata"
FROM "account_user"
WHERE "account_user"."private_metadata" @ > '{"somekey": "somevalue"}'
LIMIT 21;
- Plan when it uses an index
{
  "Plan": {
"Node Type": "Limit",
"Parallel Aware": false,
"Startup Cost": 1091.29,
"Total Cost": 1165.26,
"Plan Rows": 21,
"Plan Width": 4,
"Plans": [
  {
"Node Type": "Bitmap Heap Scan",
"Parent Relationship": "Outer",
"Parallel Aware": false,
"Relation Name": "account_user",
"Alias": "account_user",
"Startup Cost": 1091.29,
"Total Cost": 17129.12,
"Plan Rows": 4553,
"Plan Width": 4,
"Recheck Cond": "( private_metadata @ > ? :: jsonb )",
"Plans": [
  {
"Node Type": "Bitmap Index Scan",
"Parent Relationship": "Outer",
"Parallel Aware": false,
"Index Name": "user_p_meta_jsonb_path_idx",
"Startup Cost": 0,
"Total Cost": 1090.15,
"Plan Rows": 4553,
"Plan Width": 0,
"Index Cond": "( private_metadata @ > ? :: jsonb )"
  }
]
  }
]
  }
}
- Plan when it doesn't use an index
{
  "Plan": {
"Node Type": "Limit",
"Parallel Aware": false,
"Startup Cost": 0,
"Total Cost": 1184.3,
"Plan Rows": 21,
"Plan Width": 4,
"Plans": [
  {
"Node Type": "Seq Scan",
"Parent Relationship": "Outer",
"Parallel Aware": false,
"Relation Name": "account_user",
"Alias": "account_user",
"Startup Cost": 0,
"Total Cost": 256768.27,
"Plan Rows": 4553,
"Plan Width": 4,
"Filter": "( private_metadata @ > ? :: jsonb )"
  }
]
  }
}
- There are ~4.5M rows on the table
- We currently have a script that is heavily inserting rows 24/7, about 130k 
rows / day (1,5 row/s)
- It seems maybe the index can't keep up(?) because of this heavy insertion
SELECT * FROM pgstatginindex('user_p_meta_jsonb_path_idx');
 version | pending_pages | pending_tuples
-+---+
   2 |98 |  28807
(1 row)
Might it be the case that is cloggs up and cannot use the index when reading?
- Last autovacuum for some reason happened 4 days ago
select * from pg_stat_user_tables where relname = 'account_user';
-[ RECORD 1
]---+-
relid   | 74937
schemaname  | public
relname | account_user
seq_scan| 66578
seq_tup_read| 99542628744
idx_scan| 9342647
idx_tup_fetch   | 105527685
n_tup_ins   | 518901
n_tup_upd   | 684607
n_tup_del   | 25
n_tup_hot_upd   | 82803
n_live_tup  | 4591412
n_dead_tup  | 370828
n_mod_since_analyze | 152968
last_vacuum |
last_autovacuum | 2023-10-13 07: 35: 29.11448+00
last_analyze|
last_autoanalyze| 2023-10-13 07: 44: 31.90437+00
vacuum_count| 0
autovacuum_count| 2
analyze_count   | 0
- The plans above comes from a production system, I failed to reproduce it 
locally, only two cases in which I managed to get a seq scan were:
a) when there are very few rows in the table
b) when I run the filter @> '{}' in which case I suppose postgres deduces "you 
want everything" so additional index lookup in not necessary (checked and this 
filter is impossible to induce by our code).

Any feedback is appreciated! What would be the possible reason the planner 
chooses seq scan in that case?



Re: GIN JSONB path index is not always used

2023-10-17 Thread Laurenz Albe
On Tue, 2023-10-17 at 15:48 +0200, Tomasz Szymański wrote:
> Hello! We have an issue with database planner choosing really expensive 
> sequence scan instead of an index scan in some cases.

To analyze that, we'd need the output from EXPLAIN (ANALYZE, BUFFERS) SELECT ...
Plain text format please, no JSON.

Yours,
Laurenz Albe




Re: GIN JSONB path index is not always used

2023-10-17 Thread Jeff Janes
On Tue, Oct 17, 2023 at 10:09 AM Tomasz Szymański  wrote:

> - Database version: 11.18

That is pretty old.  It is 3 bug-fix releases out of date even for its
major version, and the major version itself is just about to reach EOL and
is missing relevant improvements.

- Plan when it uses an index
> "Total Cost": 1165.26,
> - Plan when it doesn't use an index
> "Total Cost": 1184.3,
>

The JSON format for plans is pretty non-ideal for human inspection;
especially so once you include ANALYZE and BUFFERS, which you should do.
Please use the plain text format instead.  But I can see that the plans are
very similar in cost, so it wouldn't take much to shift between them.
Should we assume that not using the index is much slower (otherwise, why
would you be asking the question?)?



> - It seems maybe the index can't keep up(?) because of this heavy insertion
> SELECT * FROM pgstatginindex('user_p_meta_jsonb_path_idx');
>  version | pending_pages | pending_tuples
> -+---+
>2 |98 |  28807
> (1 row)
> Might it be the case that is cloggs up and cannot use the index when
> reading?
>

Definitely possible.  The planner does take those numbers into account when
planning.  The easiest thing would be to just turn off fastupdate for those
indexes.  That might make the INSERTs somewhat slower (it is hard to
predict how much and you haven't complained about the performance of the
INSERTs anyway) but should make the SELECTs more predictable and generally
faster.  I habitually turn fastupdate off and then turn it back on only if
I have an identifiable cause to do so.

If you don't want to turn fastupdate off, you could instead change the
table's autovac parameters to be more aggressive (particularly
n_ins_since_vacuum, except that that doesn't exist until v13), or have a
cron job call gin_clean_pending_list periodically.


- Last autovacuum for some reason happened 4 days ago
>
> n_live_tup  | 4591412
> n_dead_tup  | 370828
>

Based on those numbers and default parameters, there is no reason for it to
be running any sooner.  That reflects only 8% turnover while the default
factor is 20%.

Cheers,

Jeff