Another way to make queries faster is, if you can, identify a subset of documents that are in general relevant for the users (most recent ones, most browsed etc etc), index those documents into a separate collection and then query the small collection and back out to the full one if the small one didn't have enough documents (caveat: the small collection could affect the ranking because all terms stats will be different..)
Cheers, Diego On Mon, Apr 8, 2019, 15:59 Michael Gibney <mich...@michaelgibney.net> wrote: > In addition to Toke's suggestions (and those in the linked article), some > more ideas: > If single-term, bare queries are slow, it might be productive to check > config/performance of your queryResultCache (I realize this doesn't > directly address the concern of slow queries, but might nonetheless be > helpful in practice). > If multi-term queries that include these terms are slow, maybe check your > mm config to make sure it's not more inclusive than necessary for your use > case (scoring over union of docSets/clauses). If multi-term queries get > faster by disabling pf, you could try disabling main-query pf, and invoke > implicit phrase search (pseudo-pf) using ReRankQParser? > If you're able to share your configs (built queries, indexing/fieldType > config (positions, payloads?), etc.), that might enable more specific > advice. > I'm assuming the query-times posted are for queries that isolate the > performance of main query only (i.e., no other components, like facets, > etc.)? > Michael > > On Mon, Apr 8, 2019 at 3:28 AM Ash Ramesh <ash...@canva.com> wrote: > > > Hi Toke, > > > > Thanks for the prompt reply. I'm glad to hear that this is a common > > problem. In regards to stop words, I've been thinking about trying that > > out. In our business case, most of these terms are keywords related to > > stock photography, therefore it's natural for 'photography' or > 'background' > > to appear commonly in a document's keyword list. it seems unlikely we can > > use the common grams solution with our business case. > > > > Regards, > > > > Ash > > > > On Mon, Apr 8, 2019 at 5:01 PM Toke Eskildsen <t...@kb.dk> wrote: > > > > > On Mon, 2019-04-08 at 09:58 +1000, Ash Ramesh wrote: > > > > We have a corpus of 50+ million documents in our collection. I've > > > > noticed that some queries with specific keywords tend to be extremely > > > > slow. > > > > E.g. the q=`photography' or q='background'. After digging into the > > > > raw documents, I could see that these two terms appear in greater > > > > than 90% of all documents, which means solr has to score each of > > > > those documents. > > > > > > That is known behaviour, which can be remedied somewhat. Stop words is > > > a common approach, but your samples does not seem to fit well with > > > that. Instead you can look at Common Grams, where your high-frequency > > > words gets concatenated with surrounding words. This only works with > > > phrases though. There's a nice article at > > > > > > > > > > > > https://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-2 > > > > > > - Toke Eskildsen, Royal Danish Library > > > > > > > > > > > > > -- > > *P.S. We've launched a new blog to share the latest ideas and case > studies > > from our team. Check it out here: product.canva.com > > <https://product.canva.com/>. *** > > ** <https://www.canva.com/>Empowering the > > world to design > > Also, we're hiring. Apply here! > > <https://about.canva.com/careers/> > > <https://twitter.com/canva> > > <https://facebook.com/canva> <https://au.linkedin.com/company/canva> > > <https://twitter.com/canva> <https://facebook.com/canva> > > <https://au.linkedin.com/company/canva> <https://instagram.com/canva> > > > > > > > > > > > > > > >