Re: Performance problems with extremely common terms in collection (Solr 7.4)

Diego Ceccarelli Tue, 09 Apr 2019 07:46:18 -0700

Another way to make queries faster is, if you can, identify a subset of
documents that are in general relevant for the users (most recent ones,
most browsed etc etc), index those documents into a separate collection and
then query the small collection and back out to the full one if the small
one didn't have enough documents (caveat: the small collection could affect
the ranking because all terms stats will be different..)


Cheers,
Diego

On Mon, Apr 8, 2019, 15:59 Michael Gibney <mich...@michaelgibney.net> wrote:

> In addition to Toke's suggestions (and those in the linked article), some
> more ideas:
> If single-term, bare queries are slow, it might be productive to check
> config/performance of your queryResultCache (I realize this doesn't
> directly address the concern of slow queries, but might nonetheless be
> helpful in practice).
> If multi-term queries that include these terms are slow, maybe check your
> mm config to make sure it's not more inclusive than necessary for your use
> case (scoring over union of docSets/clauses). If multi-term queries get
> faster by disabling pf, you could try disabling main-query pf, and invoke
> implicit phrase search (pseudo-pf) using ReRankQParser?
> If you're able to share your configs (built queries, indexing/fieldType
> config (positions, payloads?), etc.), that might enable more specific
> advice.
> I'm assuming the query-times posted are for queries that isolate the
> performance of main query only (i.e., no other components, like facets,
> etc.)?
> Michael
>
> On Mon, Apr 8, 2019 at 3:28 AM Ash Ramesh <ash...@canva.com> wrote:
>
> > Hi Toke,
> >
> > Thanks for the prompt reply. I'm glad to hear that this is a common
> > problem. In regards to stop words, I've been thinking about trying that
> > out. In our business case, most of these terms are keywords related to
> > stock photography, therefore it's natural for 'photography' or
> 'background'
> > to appear commonly in a document's keyword list. it seems unlikely we can
> > use the common grams solution with our business case.
> >
> > Regards,
> >
> > Ash
> >
> > On Mon, Apr 8, 2019 at 5:01 PM Toke Eskildsen <t...@kb.dk> wrote:
> >
> > > On Mon, 2019-04-08 at 09:58 +1000, Ash Ramesh wrote:
> > > > We have a corpus of 50+ million documents in our collection. I've
> > > > noticed that some queries with specific keywords tend to be extremely
> > > > slow.
> > > > E.g. the q=`photography' or q='background'. After digging into the
> > > > raw documents, I could see that these two terms appear in greater
> > > > than 90% of all documents, which means solr has to score each of
> > > > those documents.
> > >
> > > That is known behaviour, which can be remedied somewhat. Stop words is
> > > a common approach, but your samples does not seem to fit well with
> > > that. Instead you can look at Common Grams, where your high-frequency
> > > words gets concatenated with surrounding words. This only works with
> > > phrases though. There's a nice article at
> > >
> > >
> > >
> >
> https://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-2
> > >
> > > - Toke Eskildsen, Royal Danish Library
> > >
> > >
> > >
> >
> > --
> > *P.S. We've launched a new blog to share the latest ideas and case
> studies
> > from our team. Check it out here: product.canva.com
> > <https://product.canva.com/>. ***
> > ** <https://www.canva.com/>Empowering the
> > world to design
> > Also, we're hiring. Apply here!
> > <https://about.canva.com/careers/>
> >  <https://twitter.com/canva>
> > <https://facebook.com/canva> <https://au.linkedin.com/company/canva>
> > <https://twitter.com/canva>  <https://facebook.com/canva>
> > <https://au.linkedin.com/company/canva>  <https://instagram.com/canva>
> >
> >
> >
> >
> >
> >
> >
>

Re: Performance problems with extremely common terms in collection (Solr 7.4)

Reply via email to