Re: CommonTerms & slow queries

Michael Gibney Fri, 29 Mar 2019 11:26:56 -0700

You might take a look at CommonGramsFilter (
https://lucene.apache.org/solr/guide/6_6/filter-descriptions.html#FilterDescriptions-CommonGramsFilter),
especially if you're either not using pf, or if ps=0. An absolute setting
of mm=2 strikes me as unusual (though quite possibly appropriate for your
use case). mm=2 would force scoring of all docs for which >=2 terms match,
which for any query containing the words "a" and "the" for example, could
easily be the majority of the index.
Another thought, re: single-core: sharding would allow you to effectively
parallelize query processing to a certain extent, which I expect might
speed things up for your use case.


On Fri, Mar 29, 2019 at 1:13 PM Erie Data Systems <eriedata...@gmail.com>
wrote:

> Michael,
>
>
> select/?&rows=12&qf=title+description&q=once+upon+a+time+in+the+west&fl=*&hl=true&hl.field=desc&hl.fragsize=250&hl.maxAnalyzedChars=200000&ps=1&qs=1&df=title&mm=2&defType=edismax&debugQuery=off&indent=on&wt=json&debug=true
>     "rawquerystring":"once upon a time in the west",
>     "querystring":"once upon a time in the west",
>     "parsedquery":"+(DisjunctionMaxQuery((description:once | title:once))
> DisjunctionMaxQuery((description:upon | title:upon))
> DisjunctionMaxQuery((description:a | title:a))
> DisjunctionMaxQuery((description:time | title:time))
> DisjunctionMaxQuery((description:in | title:in))
> DisjunctionMaxQuery((description:the | title:the))
> DisjunctionMaxQuery((description:west | title:west)))~2",
>     "parsedquery_toString":"+(((description:once | title:once)
> (description:upon | title:upon) (description:a | title:a) (description:time
> | title:time) (description:in | title:in) (description:the | title:the)
> (description:west | title:west))~2)"
>
> Removing pf cuts time almost half but its still 5+sec
>
> Thank you for your help, more than happy to include more output..
> -Craig
>
>
> On Fri, Mar 29, 2019 at 12:24 PM Michael Gibney <mich...@michaelgibney.net
> >
> wrote:
>
> > Can you post the query that's actually built for some of these inputs
> > ("parsedquery" or "parsedquery_toString" output included for requests
> with
> > "debug=query" parameter)? What is performance like if you turn off pf
> > (i.e., no implicit phrase searching)?
> > Michael
> >
> > On Fri, Mar 29, 2019 at 11:53 AM Erie Data Systems <
> eriedata...@gmail.com>
> > wrote:
> >
> > > Using Solr 8.0.0, single instance, single core, 50m records (38gb
> index)
> > > on one SSD, 96gb ram, 16 cores CPU
> > >
> > > Most queries run very very fast <1 sec however we have noticed queries
> > > containing "common" words are quite slow sometimes 10+sec , currently
> > using
> > > edismax with 2 text_general fields,. qf, and pf, qs=0,ps=0
> > >
> > > I came across these which describe the issue.
> > >
> > >
> >
> https://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-2
> > >
> > >
> > >
> >
> https://lucene.apache.org/core/5_5_3/queries/org/apache/lucene/queries/CommonTermsQuery.html
> > >
> > > Test queries with issues :
> > > 1. things to do in seattle with eric
> > > 2. year of the cat
> > > 3. time of my life
> > > 4. when will i be loved
> > > 5. once upon a time in the west
> > >
> > > Stopwords are not an option as in the case of #2, if of and the are
> > removed
> > > it essentially destroys relevance.  Is there a common suggested
> solution
> > to
> > > what would seem to be a common issue besides adding stopwords.
> > >
> > > Thank you.
> > > Craig Stadler
> > >
> >
>

Re: CommonTerms & slow queries

Reply via email to