Bad query plan when you add many OR conditions
Hello! I have a query on a large table that is very fast (0s): https://gist.github.com/collimarco/039412b4fe0dcf39955888f96eff29db#file-fast_query-txt Basically the query matches the rows that have a tag1 OR tag2 OR tag3 OR tag4 OR tag5... However if you increase the number of OR at some point PostgreSQL makes the bad decision to change its query plan! And the new plan makes the query terribly slow: https://gist.github.com/collimarco/039412b4fe0dcf39955888f96eff29db#file-slow_query-txt Instead of this (which is fast): Bitmap Index Scan on index_subscriptions_on_project_id_and_tags It starts using this (which is slow): Parallel Index Scan using index_subscriptions_on_project_id_and_created_at The choice seems quite stupid since it doesn't have the tags on the new index... and indeed the query takes about 1 minute instead of a few milliseconds. Here's a list of the available indexes: https://gist.github.com/collimarco/039412b4fe0dcf39955888f96eff29db#file-_indexes-txt How can I encourage PostgreSQL to use the Bitmap Index Scan even when there are many OR conditions? I have tried with VACUUM ANALYZE subscriptions but it doesn't help. Note: the query is generated dynamically by customers of a SaaS, so I don't have full control on it Thank you very much for any advice! Marco Colli
Re: Bad query plan when you add many OR conditions
On Fri, Jan 10, 2020 at 02:11:14AM +0100, Marco Colli wrote: > I have a query on a large table that is very fast (0s): > https://gist.github.com/collimarco/039412b4fe0dcf39955888f96eff29db#file-fast_query-txt ORDER BY + LIMIT is a query which sometimes has issues, you can probably find more by searching. The planner thinks it'll hit the LIMIT pretty soon and only run a fraction of the index scan - but then it turns out to be wrong. You might have poor statistics on project_id and/or tags. This *might* help: ALTER TABLE subscriptions ALTER project_id SET STATISTICS 2000; ANALYZE subscriptions; But I'm guessing there's correlation between the two, which the planner doesn't know. If you're running at least v10, I'm guessing it would help to CREATE STATISTICS on those columns (and analyze). See one similar problem here (not involving LIMIT). https://www.postgresql.org/message-id/flat/CABFxtPedz4zL%2BaPWut4%2B%3Dum4av1aAXr6OVRfRB_6K7mJKMbEcw%40mail.gmail.com
