Hello all,
We’re seeing intermittently very poor performance of a query, when occasionally
a poor query plan is chosen. We’re using Postgres 16.9.
One suspicious factor when looking at the EXPLAIN ANALYZE output, is a very
wrong estimated number of rows to be returned from a text[] column queried with
‘&&’.
After playing around with a simple recreate (details below), it seems ANALYZE
of the table is affected by the number of rows in the table. Statistic
`most_common_elems` is [null] when there’s over 15,873 rows in the table when
analyzed. With fewer rows it’s analyzed correctly.
Is there any good explanation for this behaviour? Preferably we’d like some way
for proper `most_common_elems` statistics to be collected in our production
database, in the hope that influences a good query plan to always be selected.
In our production system there’s ~150,000 rows in a table including a `text[]`
column, where each row has an array containing a single 19ish char string,
unique within the table. The full query joins against a couple more tables, and
has a GIN index on the text[] column. If necessary, I can get into details of
the real system, but hope the simple recreate will be sufficient to understand
the problem:
CREATE TABLE IF NOT EXISTS public.test(
id SERIAL PRIMARY KEY,
tags text[]
)
INSERT INTO public.test (tags)
SELECT ARRAY[TO_CHAR(n,'fm00000000')] FROM ( SELECT
generate_series(1,15_873) AS n );
ANALYZE public.test;
SELECT * FROM pg_stat_user_tables WHERE relname = 'test';
EXPLAIN (ANALYZE,BUFFERS,VERBOSE)
SELECT * FROM test WHERE tags && ARRAY['00000002']
Results
-------
table with 15_000 rows has most_common_elems after ANALYZE
(most_common_elem_freqs : 6.666667e-05)
table with 15_872 rows has most_common_elems after ANALYZE
(most_common_elem_freqs : 6.300403e-05)
table with 15_873 rows has [null] most_common_elems after ANALYZE
table with 100_000 rows has [null] most_common_elems after ANALYZE
Query plans show an estimated 1 row is predicted when statistics has
`most_common_elems` available, or the hardcoded default 1/200 of the estimated
table size when most_common_elems is null.
Here 79 rows are estimated, when the table contained 15,873 rows and stats
weren’t available.
Query plan
-----------
Seq Scan on public.test (cost=0.00..463.41 rows=79 width=37) (actual
time=9.934..17.190 rows=1 loops=1)
Output: id, tags
Filter: (test.tags && '{00000002}'::text[])
Rows Removed by Filter: 15872
Buffers: shared hit=268
Planning:
Buffers: shared hit=75
Planning Time: 2.060 ms
Execution Time: 17.205 ms
Full version
------------
"PostgreSQL 16.9 (Debian 16.9-1.pgdg120+1) on aarch64-unknown-linux-gnu,
compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit"
Regards,
Mark Frost
IBM
Unless otherwise stated above:
IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: Building C, IBM Hursley Office, Hursley Park Road,
Winchester, Hampshire SO21 2JN