In addition to Toke's suggestions (and those in the linked article), some more ideas: If single-term, bare queries are slow, it might be productive to check config/performance of your queryResultCache (I realize this doesn't directly address the concern of slow queries, but might nonetheless be helpful in practice). If multi-term queries that include these terms are slow, maybe check your mm config to make sure it's not more inclusive than necessary for your use case (scoring over union of docSets/clauses). If multi-term queries get faster by disabling pf, you could try disabling main-query pf, and invoke implicit phrase search (pseudo-pf) using ReRankQParser? If you're able to share your configs (built queries, indexing/fieldType config (positions, payloads?), etc.), that might enable more specific advice. I'm assuming the query-times posted are for queries that isolate the performance of main query only (i.e., no other components, like facets, etc.)? Michael
On Mon, Apr 8, 2019 at 3:28 AM Ash Ramesh <ash...@canva.com> wrote: > Hi Toke, > > Thanks for the prompt reply. I'm glad to hear that this is a common > problem. In regards to stop words, I've been thinking about trying that > out. In our business case, most of these terms are keywords related to > stock photography, therefore it's natural for 'photography' or 'background' > to appear commonly in a document's keyword list. it seems unlikely we can > use the common grams solution with our business case. > > Regards, > > Ash > > On Mon, Apr 8, 2019 at 5:01 PM Toke Eskildsen <t...@kb.dk> wrote: > > > On Mon, 2019-04-08 at 09:58 +1000, Ash Ramesh wrote: > > > We have a corpus of 50+ million documents in our collection. I've > > > noticed that some queries with specific keywords tend to be extremely > > > slow. > > > E.g. the q=`photography' or q='background'. After digging into the > > > raw documents, I could see that these two terms appear in greater > > > than 90% of all documents, which means solr has to score each of > > > those documents. > > > > That is known behaviour, which can be remedied somewhat. Stop words is > > a common approach, but your samples does not seem to fit well with > > that. Instead you can look at Common Grams, where your high-frequency > > words gets concatenated with surrounding words. This only works with > > phrases though. There's a nice article at > > > > > > > https://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-2 > > > > - Toke Eskildsen, Royal Danish Library > > > > > > > > -- > *P.S. We've launched a new blog to share the latest ideas and case studies > from our team. Check it out here: product.canva.com > <https://product.canva.com/>. *** > ** <https://www.canva.com/>Empowering the > world to design > Also, we're hiring. Apply here! > <https://about.canva.com/careers/> > <https://twitter.com/canva> > <https://facebook.com/canva> <https://au.linkedin.com/company/canva> > <https://twitter.com/canva> <https://facebook.com/canva> > <https://au.linkedin.com/company/canva> <https://instagram.com/canva> > > > > > > >