You might take a look at CommonGramsFilter ( https://lucene.apache.org/solr/guide/6_6/filter-descriptions.html#FilterDescriptions-CommonGramsFilter), especially if you're either not using pf, or if ps=0. An absolute setting of mm=2 strikes me as unusual (though quite possibly appropriate for your use case). mm=2 would force scoring of all docs for which >=2 terms match, which for any query containing the words "a" and "the" for example, could easily be the majority of the index. Another thought, re: single-core: sharding would allow you to effectively parallelize query processing to a certain extent, which I expect might speed things up for your use case.
On Fri, Mar 29, 2019 at 1:13 PM Erie Data Systems <eriedata...@gmail.com> wrote: > Michael, > > > select/?&rows=12&qf=title+description&q=once+upon+a+time+in+the+west&fl=*&hl=true&hl.field=desc&hl.fragsize=250&hl.maxAnalyzedChars=200000&ps=1&qs=1&df=title&mm=2&defType=edismax&debugQuery=off&indent=on&wt=json&debug=true > "rawquerystring":"once upon a time in the west", > "querystring":"once upon a time in the west", > "parsedquery":"+(DisjunctionMaxQuery((description:once | title:once)) > DisjunctionMaxQuery((description:upon | title:upon)) > DisjunctionMaxQuery((description:a | title:a)) > DisjunctionMaxQuery((description:time | title:time)) > DisjunctionMaxQuery((description:in | title:in)) > DisjunctionMaxQuery((description:the | title:the)) > DisjunctionMaxQuery((description:west | title:west)))~2", > "parsedquery_toString":"+(((description:once | title:once) > (description:upon | title:upon) (description:a | title:a) (description:time > | title:time) (description:in | title:in) (description:the | title:the) > (description:west | title:west))~2)" > > Removing pf cuts time almost half but its still 5+sec > > Thank you for your help, more than happy to include more output.. > -Craig > > > On Fri, Mar 29, 2019 at 12:24 PM Michael Gibney <mich...@michaelgibney.net > > > wrote: > > > Can you post the query that's actually built for some of these inputs > > ("parsedquery" or "parsedquery_toString" output included for requests > with > > "debug=query" parameter)? What is performance like if you turn off pf > > (i.e., no implicit phrase searching)? > > Michael > > > > On Fri, Mar 29, 2019 at 11:53 AM Erie Data Systems < > eriedata...@gmail.com> > > wrote: > > > > > Using Solr 8.0.0, single instance, single core, 50m records (38gb > index) > > > on one SSD, 96gb ram, 16 cores CPU > > > > > > Most queries run very very fast <1 sec however we have noticed queries > > > containing "common" words are quite slow sometimes 10+sec , currently > > using > > > edismax with 2 text_general fields,. qf, and pf, qs=0,ps=0 > > > > > > I came across these which describe the issue. > > > > > > > > > https://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-2 > > > > > > > > > > > > https://lucene.apache.org/core/5_5_3/queries/org/apache/lucene/queries/CommonTermsQuery.html > > > > > > Test queries with issues : > > > 1. things to do in seattle with eric > > > 2. year of the cat > > > 3. time of my life > > > 4. when will i be loved > > > 5. once upon a time in the west > > > > > > Stopwords are not an option as in the case of #2, if of and the are > > removed > > > it essentially destroys relevance. Is there a common suggested > solution > > to > > > what would seem to be a common issue besides adding stopwords. > > > > > > Thank you. > > > Craig Stadler > > > > > >