Bad query plan when you add many OR conditions

2020-01-09 Thread Marco Colli
Hello!

I have a query on a large table that is very fast (0s):
https://gist.github.com/collimarco/039412b4fe0dcf39955888f96eff29db#file-fast_query-txt

Basically the query matches the rows that have a tag1 OR tag2 OR tag3 OR
tag4 OR tag5...

However if you increase the number of OR at some point PostgreSQL makes the
bad decision to change its query plan! And the new plan makes the query
terribly slow:
https://gist.github.com/collimarco/039412b4fe0dcf39955888f96eff29db#file-slow_query-txt

Instead of this (which is fast):
  Bitmap Index Scan on index_subscriptions_on_project_id_and_tags
It starts using this (which is slow):
  Parallel Index Scan using index_subscriptions_on_project_id_and_created_at
The choice seems quite stupid since it doesn't have the tags on the new
index... and indeed the query takes about 1 minute instead of a few
milliseconds. Here's a list of the available indexes:
https://gist.github.com/collimarco/039412b4fe0dcf39955888f96eff29db#file-_indexes-txt

How can I encourage PostgreSQL to use the Bitmap Index Scan even when there
are many OR conditions? I have tried with VACUUM ANALYZE subscriptions but
it doesn't help.

Note: the query is generated dynamically by customers of a SaaS, so I don't
have full control on it


Thank you very much for any advice!
Marco Colli


Re: Bad query plan when you add many OR conditions

2020-01-09 Thread Justin Pryzby
On Fri, Jan 10, 2020 at 02:11:14AM +0100, Marco Colli wrote:
> I have a query on a large table that is very fast (0s):
> https://gist.github.com/collimarco/039412b4fe0dcf39955888f96eff29db#file-fast_query-txt

ORDER BY + LIMIT is a query which sometimes has issues, you can probably find
more by searching.  The planner thinks it'll hit the LIMIT pretty soon and only
run a fraction of the index scan - but then it turns out to be wrong.

You might have poor statistics on project_id and/or tags.  This *might* help:
ALTER TABLE subscriptions ALTER project_id SET STATISTICS 2000; ANALYZE 
subscriptions;

But I'm guessing there's correlation between the two, which the planner doesn't
know.  If you're running at least v10, I'm guessing it would help to CREATE
STATISTICS on those columns (and analyze).

See one similar problem here (not involving LIMIT).
https://www.postgresql.org/message-id/flat/CABFxtPedz4zL%2BaPWut4%2B%3Dum4av1aAXr6OVRfRB_6K7mJKMbEcw%40mail.gmail.com