Slow "not in array" operation
I have a large table with millions of rows. Each row has an array field
"tags". I also have the proper GIN index on tags.
Counting the rows that have a tag is fast (~7s):
SELECT COUNT(*) FROM "subscriptions" WHERE (tags @> ARRAY['t1']::varchar[]);
However counting the rows that don't have a tag is extremely slow (~70s):
SELECT COUNT(*) FROM "subscriptions" WHERE NOT (tags @>
ARRAY['t1']::varchar[]);
I have also tried other variants, but with the same results (~70s):
SELECT COUNT(*) FROM "subscriptions" WHERE NOT ('t1' = ANY (tags));
How can I make the "not in array" operation fast?
Any help would be appreciated, thank you!
Marco Colli
PostgreSQL 11 on Ubuntu 18LTS
Re: Slow "not in array" operation
What's the plan for the slow one? What's the time to just count all rows? >
Re: Slow "not in array" operation
To be honest, I have simplified the question above. In order to show you
the plan, I must show you the actual query, which is this:
=== QUERY ===
SELECT COUNT(*) FROM "subscriptions" WHERE "subscriptions"."project_id" =
123 AND "subscriptions"."trashed_at" IS NULL AND NOT (tags @>
ARRAY['en']::varchar[]);
=== QUERY PLAN ===
QUERY
PLAN
--
Finalize Aggregate (cost=2152593.04..2152593.05 rows=1 width=8) (actual
time=70555.561..70555.561 rows=1 loops=1)
-> Gather (cost=2152592.31..2152593.02 rows=7 width=8) (actual
time=70540.641..70702.365 rows=8 loops=1)
Workers Planned: 7
Workers Launched: 7
-> Partial Aggregate (cost=2151592.31..2151592.32 rows=1
width=8) (actual time=70537.376..70537.377 rows=1 loops=8)
-> Parallel Seq Scan on subscriptions
(cost=0.00..2149490.49 rows=840731 width=0) (actual time=0.742..70479.359
rows=611828 loops=8)
Filter: ((trashed_at IS NULL) AND (NOT (tags @>
'{en}'::character varying[])) AND (project_id = 123))
Rows Removed by Filter: 4572769
Planning Time: 1.304 ms
Execution Time: 70702.463 ms
(10 rows)
=== INDEXES ===
Indexes:
"subscriptions_pkey" PRIMARY KEY, btree (id)
"index_subscriptions_on_project_id_and_created_at" btree (project_id,
created_at DESC)
"index_subscriptions_on_project_id_and_tags" gin (project_id, tags)
WHERE trashed_at IS NULL
"index_subscriptions_on_project_id_and_trashed_at" btree (project_id,
trashed_at DESC)
=== NOTES ===
Running the query without the last filter on tags takes only 500ms.
Unfortunately I cannot make strict assumptions on data or tags: for example
I also have to count subscriptions in a project that don't have tag A and
don't have tag B, etc. This means that I cannot simply calculate the total
and then make a subtraction.
On Tue, Nov 12, 2019 at 7:40 PM Michael Lewis wrote:
> What's the plan for the slow one? What's the time to just count all rows?
>
>>
Re: Slow "not in array" operation
It is very interesting to me that the optimizer chose a parallel sequential scan rather than an index scan on either of your indexes that start with project_id that also reference trashed_at. 1) Are you running on SSD type storage? Has random_page_cost been lowered to 1-1.5 or so (close to 1 assumes good cache hits)? 2) It seems you have increased parallel workers. Have you also changed the startup or other cost configs related to how inclined the system is to use sequential scans? 3) If you disable sequential scan, what does the plan look like for this query? (SET ENABLE_SEQSCAN TO OFF;) >
Re: Slow "not in array" operation
On Tue, Nov 12, 2019 at 12:20:10PM -0700, Michael Lewis wrote:
> It is very interesting to me that the optimizer chose a parallel sequential
> scan rather than an index scan on either of your indexes that start
> with project_id that also reference trashed_at.
Maybe because of low correlation on any of those columns?
https://wiki.postgresql.org/wiki/Slow_Query_Questions#Statistics:_n_distinct.2C_MCV.2C_histogram
SELECT (SELECT sum(x) FROM unnest(most_common_freqs) x) frac_MCV, tablename,
attname, inherited, null_frac, n_distinct, array_length(most_common_vals,1)
n_mcv, array_length(histogram_bounds,1) n_hist, correlation FROM pg_stats WHERE
tablename='subscriptions' AND attname IN ('project_id','tags') ORDER BY 1
DESC;
Maybe clustering the table on project_id (and ANALYZEing) would help, but that
might need to be done consistently.
Michael previously suggested partitioning which, if done on project_id,
would then no longer need to be specially CLUSTERed.
Is the plan for the fast query the same as in August ?
https://www.postgresql.org/message-id/CAFvCgN4UijKTYiOF61Tyd%2BgHvF_oqnMabatS9%2BDcX%2B_PK2SHRw%40mail.gmail.com
Justin
Re: Slow "not in array" operation
1) It is running on a DigitalOcean CPU-optimized droplet with dedicated
hyperthreads (16 cores) and SSD.
SHOW random_page_cost; => 2
2) What config names should I check exactly? I used some suggestions from
the online PGTune, when I first configured the db some months ago:
max_worker_processes = 16
max_parallel_workers_per_gather = 8
max_parallel_workers = 16
3) Here's the query plan that I get after disabling the seq scan:
QUERY PLAN
---
Finalize Aggregate (cost=2183938.89..2183938.90 rows=1 width=8) (actual
time=94972.253..94972.254 rows=1 loops=1)
-> Gather (cost=2183938.16..2183938.87 rows=7 width=8) (actual
time=94952.895..95132.626 rows=8 loops=1)
Workers Planned: 7
Workers Launched: 7
-> Partial Aggregate (cost=2182938.16..2182938.17 rows=1
width=8) (actual time=94950.958..94950.958 rows=1 loops=8)
-> Parallel Bitmap Heap Scan on subscriptions
(cost=50294.50..2180801.47 rows=854677 width=0) (actual
time=1831.342..94895.208 rows=611828 loops=8)
Recheck Cond: ((project_id = 123) AND (trashed_at IS
NULL))
Rows Removed by Index Recheck: 2217924
Filter: (NOT (tags @> '{en}'::character varying[]))
Rows Removed by Filter: 288545
Heap Blocks: exact=120301 lossy=134269
-> Bitmap Index Scan on
index_subscriptions_on_project_id_and_tags (cost=0.00..48798.81
rows=6518094 width=0) (actual time=1493.823..1493.823 rows=7203173 loops=1)
Index Cond: (project_id = 123)
Planning Time: 1.273 ms
Execution Time: 95132.766 ms
(15 rows)
On Tue, Nov 12, 2019 at 8:20 PM Michael Lewis wrote:
> It is very interesting to me that the optimizer chose a parallel
> sequential scan rather than an index scan on either of your indexes that
> start with project_id that also reference trashed_at.
>
> 1) Are you running on SSD type storage? Has random_page_cost been lowered
> to 1-1.5 or so (close to 1 assumes good cache hits)?
> 2) It seems you have increased parallel workers. Have you also changed the
> startup or other cost configs related to how inclined the system is to use
> sequential scans?
> 3) If you disable sequential scan, what does the plan look like for this
> query? (SET ENABLE_SEQSCAN TO OFF;)
>
>>
Re: Slow "not in array" operation
Odd index choice by the optimizer given what is available. The bitmap being
lossy means more work_mem is needed if I remember properly.
It is interesting that skipping the where condition on the array is only
half a second. Is the array being toasted or is it small and being stored
in the same file as primary table?
What is the result for this count query? Is it roughly 4 million?
On Tue, Nov 12, 2019, 1:06 PM Marco Colli wrote:
> 1) It is running on a DigitalOcean CPU-optimized droplet with dedicated
> hyperthreads (16 cores) and SSD.
> SHOW random_page_cost; => 2
>
> 2) What config names should I check exactly? I used some suggestions from
> the online PGTune, when I first configured the db some months ago:
> max_worker_processes = 16
> max_parallel_workers_per_gather = 8
> max_parallel_workers = 16
>
> 3) Here's the query plan that I get after disabling the seq scan:
>
>
> QUERY PLAN
>
>
>
> ---
>
> Finalize Aggregate (cost=2183938.89..2183938.90 rows=1 width=8) (actual
> time=94972.253..94972.254 rows=1 loops=1)
>
>-> Gather (cost=2183938.16..2183938.87 rows=7 width=8) (actual
> time=94952.895..95132.626 rows=8 loops=1)
>
> Workers Planned: 7
>
> Workers Launched: 7
>
> -> Partial Aggregate (cost=2182938.16..2182938.17 rows=1
> width=8) (actual time=94950.958..94950.958 rows=1 loops=8)
>
>-> Parallel Bitmap Heap Scan on subscriptions
> (cost=50294.50..2180801.47 rows=854677 width=0) (actual
> time=1831.342..94895.208 rows=611828 loops=8)
>
> Recheck Cond: ((project_id = 123) AND (trashed_at IS
> NULL))
>
> Rows Removed by Index Recheck: 2217924
>
> Filter: (NOT (tags @> '{en}'::character varying[]))
>
> Rows Removed by Filter: 288545
>
> Heap Blocks: exact=120301 lossy=134269
>
> -> Bitmap Index Scan on
> index_subscriptions_on_project_id_and_tags (cost=0.00..48798.81
> rows=6518094 width=0) (actual time=1493.823..1493.823 rows=7203173 loops=1)
>
>Index Cond: (project_id = 123)
>
> Planning Time: 1.273 ms
>
> Execution Time: 95132.766 ms
>
> (15 rows)
>
>
> On Tue, Nov 12, 2019 at 8:20 PM Michael Lewis wrote:
>
>> It is very interesting to me that the optimizer chose a parallel
>> sequential scan rather than an index scan on either of your indexes that
>> start with project_id that also reference trashed_at.
>>
>> 1) Are you running on SSD type storage? Has random_page_cost been lowered
>> to 1-1.5 or so (close to 1 assumes good cache hits)?
>> 2) It seems you have increased parallel workers. Have you also changed
>> the startup or other cost configs related to how inclined the system is to
>> use sequential scans?
>> 3) If you disable sequential scan, what does the plan look like for this
>> query? (SET ENABLE_SEQSCAN TO OFF;)
>>
>>>
Re: Slow "not in array" operation
Marco Colli writes: > 3) Here's the query plan that I get after disabling the seq scan: > Finalize Aggregate (cost=2183938.89..2183938.90 rows=1 width=8) (actual > time=94972.253..94972.254 rows=1 loops=1) So, this is slower than the seqscan, which means the planner made the right choice. You seem to be imagining that there's some way the index could be used with the NOT clause, but there isn't. Indexable clauses are of the form indexed_column indexable_operator constant and there's no provision for a NOT in that. If we had a "not contained in" array operator, the NOT could be folded to be of this syntactic form, but it's highly unlikely that any index operator class would consider such an operator to be a supported indexable operator. It doesn't lend itself to searching an index. So the planner is doing the best it can, which in this case is a full-table scan. A conceivable solution, if the tags array is a lot smaller than the table as a whole and the table is fairly static, is that you could make a btree index on the tags array and let the planner fall back to an index-only scan that is just using the index as a cheaper source of the array data. (This doesn't work for your existing GIST index because GIST can't reconstruct the original arrays on-demand.) I suspect though that this wouldn't win much, even if you disregard the maintenance costs for the extra index. The really fundamental problem here is that a large fraction of the table satisfies the NOT-in condition, and no index is going to beat a seqscan by all that much when that's true. Indexes are good at retrieving small portions of tables. regards, tom lane
Re: Slow "not in array" operation
I am not a PostgreSQL expert, however I think that the following algorithm should be possible and fast: 1. find the bitmap of all subscriptions in a project that are not trashed (it can use the index and takes only ~500ms) 2. find the bitmap of all subscriptions that match the above condition and HAVE the tag (~7s) 3. calculate [bitmap 1] - [bitmap 2] to find the subscriptions of the project that DON'T HAVE the tag On Tue, Nov 12, 2019 at 9:50 PM Tom Lane wrote: > Marco Colli writes: > > 3) Here's the query plan that I get after disabling the seq scan: > > > Finalize Aggregate (cost=2183938.89..2183938.90 rows=1 width=8) (actual > > time=94972.253..94972.254 rows=1 loops=1) > > So, this is slower than the seqscan, which means the planner made the > right choice. > > You seem to be imagining that there's some way the index could be used > with the NOT clause, but there isn't. Indexable clauses are of the form > indexed_column indexable_operator constant > and there's no provision for a NOT in that. If we had a "not contained > in" array operator, the NOT could be folded to be of this syntactic form, > but it's highly unlikely that any index operator class would consider such > an operator to be a supported indexable operator. It doesn't lend itself > to searching an index. > > So the planner is doing the best it can, which in this case is a > full-table scan. > > A conceivable solution, if the tags array is a lot smaller than > the table as a whole and the table is fairly static, is that you could > make a btree index on the tags array and let the planner fall back > to an index-only scan that is just using the index as a cheaper > source of the array data. (This doesn't work for your existing GIST > index because GIST can't reconstruct the original arrays on-demand.) > I suspect though that this wouldn't win much, even if you disregard > the maintenance costs for the extra index. The really fundamental > problem here is that a large fraction of the table satisfies the > NOT-in condition, and no index is going to beat a seqscan by all that > much when that's true. Indexes are good at retrieving small portions > of tables. > > regards, tom lane >
Re: Slow "not in array" operation
>
>
> 3) Here's the query plan that I get after disabling the seq scan:
>
>
> QUERY PLAN
>
>
>
> ---
>
> Finalize Aggregate (cost=2183938.89..2183938.90 rows=1 width=8) (actual
> time=94972.253..94972.254 rows=1 loops=1)
>
>-> Gather (cost=2183938.16..2183938.87 rows=7 width=8) (actual
> time=94952.895..95132.626 rows=8 loops=1)
>
> Workers Planned: 7
>
> Workers Launched: 7
>
> -> Partial Aggregate (cost=2182938.16..2182938.17 rows=1
> width=8) (actual time=94950.958..94950.958 rows=1 loops=8)
>
>-> Parallel Bitmap Heap Scan on subscriptions
> (cost=50294.50..2180801.47 rows=854677 width=0) (actual
> time=1831.342..94895.208 rows=611828 loops=8)
>
> Recheck Cond: ((project_id = 123) AND (trashed_at IS
> NULL))
>
> Rows Removed by Index Recheck: 2217924
>
> Filter: (NOT (tags @> '{en}'::character varying[]))
>
> Rows Removed by Filter: 288545
>
> Heap Blocks: exact=120301 lossy=134269
>
> -> Bitmap Index Scan on
> index_subscriptions_on_project_id_and_tags (cost=0.00..48798.81
> rows=6518094 width=0) (actual time=1493.823..1493.823 rows=7203173 loops=1)
>
>Index Cond: (project_id = 123)
>
> Planning Time: 1.273 ms
>
> Execution Time: 95132.766 ms
>
> (15 rows)
>
What was the plan for the one that took 500ms? I don't see how it is
possible that this one is 180 times slower than that one. Maybe a hot
cache versus cold cache? Also, it seems weird to me that "trashed_at IS
NULL" shows up in the recheck but not in the original Index Cond.
Increasing work_mem can also help, but since the Bitmap Index Scan itself
took half the time there is only so much it can do.
Cheers,
Jeff
