Re: Performance problems with extremely common terms in collection (Solr 7.4)

Michael Gibney Mon, 08 Apr 2019 07:59:31 -0700

In addition to Toke's suggestions (and those in the linked article), some
more ideas:
If single-term, bare queries are slow, it might be productive to check
config/performance of your queryResultCache (I realize this doesn't
directly address the concern of slow queries, but might nonetheless be
helpful in practice).
If multi-term queries that include these terms are slow, maybe check your
mm config to make sure it's not more inclusive than necessary for your use
case (scoring over union of docSets/clauses). If multi-term queries get
faster by disabling pf, you could try disabling main-query pf, and invoke
implicit phrase search (pseudo-pf) using ReRankQParser?
If you're able to share your configs (built queries, indexing/fieldType
config (positions, payloads?), etc.), that might enable more specific
advice.
I'm assuming the query-times posted are for queries that isolate the
performance of main query only (i.e., no other components, like facets,
etc.)?
Michael


On Mon, Apr 8, 2019 at 3:28 AM Ash Ramesh <ash...@canva.com> wrote:

> Hi Toke,
>
> Thanks for the prompt reply. I'm glad to hear that this is a common
> problem. In regards to stop words, I've been thinking about trying that
> out. In our business case, most of these terms are keywords related to
> stock photography, therefore it's natural for 'photography' or 'background'
> to appear commonly in a document's keyword list. it seems unlikely we can
> use the common grams solution with our business case.
>
> Regards,
>
> Ash
>
> On Mon, Apr 8, 2019 at 5:01 PM Toke Eskildsen <t...@kb.dk> wrote:
>
> > On Mon, 2019-04-08 at 09:58 +1000, Ash Ramesh wrote:
> > > We have a corpus of 50+ million documents in our collection. I've
> > > noticed that some queries with specific keywords tend to be extremely
> > > slow.
> > > E.g. the q=`photography' or q='background'. After digging into the
> > > raw documents, I could see that these two terms appear in greater
> > > than 90% of all documents, which means solr has to score each of
> > > those documents.
> >
> > That is known behaviour, which can be remedied somewhat. Stop words is
> > a common approach, but your samples does not seem to fit well with
> > that. Instead you can look at Common Grams, where your high-frequency
> > words gets concatenated with surrounding words. This only works with
> > phrases though. There's a nice article at
> >
> >
> >
> https://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-2
> >
> > - Toke Eskildsen, Royal Danish Library
> >
> >
> >
>
> --
> *P.S. We've launched a new blog to share the latest ideas and case studies
> from our team. Check it out here: product.canva.com
> <https://product.canva.com/>. ***
> ** <https://www.canva.com/>Empowering the
> world to design
> Also, we're hiring. Apply here!
> <https://about.canva.com/careers/>
>  <https://twitter.com/canva>
> <https://facebook.com/canva> <https://au.linkedin.com/company/canva>
> <https://twitter.com/canva>  <https://facebook.com/canva>
> <https://au.linkedin.com/company/canva>  <https://instagram.com/canva>
>
>
>
>
>
>
>

Re: Performance problems with extremely common terms in collection (Solr 7.4)

Reply via email to