CommonTerms & slow queries
Using Solr 8.0.0, single instance, single core, 50m records (38gb index) on one SSD, 96gb ram, 16 cores CPU Most queries run very very fast <1 sec however we have noticed queries containing "common" words are quite slow sometimes 10+sec , currently using edismax with 2 text_general fields,. qf, and pf, qs=0,ps=0 I came across these which describe the issue. https://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-2 https://lucene.apache.org/core/5_5_3/queries/org/apache/lucene/queries/CommonTermsQuery.html Test queries with issues : 1. things to do in seattle with eric 2. year of the cat 3. time of my life 4. when will i be loved 5. once upon a time in the west Stopwords are not an option as in the case of #2, if of and the are removed it essentially destroys relevance. Is there a common suggested solution to what would seem to be a common issue besides adding stopwords. Thank you. Craig Stadler
Re: CommonTerms & slow queries
Michael, select/?&rows=12&qf=title+description&q=once+upon+a+time+in+the+west&fl=*&hl=true&hl.field=desc&hl.fragsize=250&hl.maxAnalyzedChars=20&ps=1&qs=1&df=title&mm=2&defType=edismax&debugQuery=off&indent=on&wt=json&debug=true "rawquerystring":"once upon a time in the west", "querystring":"once upon a time in the west", "parsedquery":"+(DisjunctionMaxQuery((description:once | title:once)) DisjunctionMaxQuery((description:upon | title:upon)) DisjunctionMaxQuery((description:a | title:a)) DisjunctionMaxQuery((description:time | title:time)) DisjunctionMaxQuery((description:in | title:in)) DisjunctionMaxQuery((description:the | title:the)) DisjunctionMaxQuery((description:west | title:west)))~2", "parsedquery_toString":"+(((description:once | title:once) (description:upon | title:upon) (description:a | title:a) (description:time | title:time) (description:in | title:in) (description:the | title:the) (description:west | title:west))~2)" Removing pf cuts time almost half but its still 5+sec Thank you for your help, more than happy to include more output.. -Craig On Fri, Mar 29, 2019 at 12:24 PM Michael Gibney wrote: > Can you post the query that's actually built for some of these inputs > ("parsedquery" or "parsedquery_toString" output included for requests with > "debug=query" parameter)? What is performance like if you turn off pf > (i.e., no implicit phrase searching)? > Michael > > On Fri, Mar 29, 2019 at 11:53 AM Erie Data Systems > wrote: > > > Using Solr 8.0.0, single instance, single core, 50m records (38gb index) > > on one SSD, 96gb ram, 16 cores CPU > > > > Most queries run very very fast <1 sec however we have noticed queries > > containing "common" words are quite slow sometimes 10+sec , currently > using > > edismax with 2 text_general fields,. qf, and pf, qs=0,ps=0 > > > > I came across these which describe the issue. > > > > > https://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-2 > > > > > > > https://lucene.apache.org/core/5_5_3/queries/org/apache/lucene/queries/CommonTermsQuery.html > > > > Test queries with issues : > > 1. things to do in seattle with eric > > 2. year of the cat > > 3. time of my life > > 4. when will i be loved > > 5. once upon a time in the west > > > > Stopwords are not an option as in the case of #2, if of and the are > removed > > it essentially destroys relevance. Is there a common suggested solution > to > > what would seem to be a common issue besides adding stopwords. > > > > Thank you. > > Craig Stadler > > >
Re: CommonTerms & slow queries
> > All great advice thanks Michael, have an excellent weekend! Testing the > common grams > -Craig
Interesting Grouping/Facet issue
Solr 8.0.0, I have a HASHTAG string field I am trying to facet on to get the most popular hashtags (top 100) across many sources. (SITE field is string) /select?facet.field=hashtag&facet=on&rows=0&q=%2Bhashtag:*%20%2BDT:[" . date('Y-m-d') . "T00:00:00Z+TO+" . date('Y-m-d') . "T23:59:59Z]&facet.limit=100&facet.mincount=1&facet.method=fc It works but not to what I feel should happen... For example if one site has 1000 rows on todays date and they all have a HASHTAG in common, that HASHTAG automatically rises to the top simply because one SITE has 1000 pages with the same HASHTAG. Is there a way to get a better more even distribution of top HASHTAGS for a given date, ie facet. ..by a grouping or distinct or filter of some sort? Im more interesting in knowing if a HASHTAG is used frequently among SITEs, not just one one. Hope this makes sense... any recommendations welcomed. Thank you in advance, -Craig
Issue with max documents on single instance
Solr 8.0.0 (single server, single instance, single core) Centos 6x86_64 Error : number of documents in the index cannot exceed 2147483519 Ive read about the max number of documents which means I need to go with SolrCloud.. My question is this, can I implement a "clustered" environment on single server so I can take advantage of the segmented data? I have a TON (96gb) of RAM and plenty of SSD disk space available... Thanks, -Craig
Odd error with Solr 8 log / ingestion
Hello everyone, I recently setup Solr 8 in SolrCloud mode, previously I was using standalone mode and was able to easily push 10,000 records in per HTTP call wit autocommit. Ingestion occurs when server A pushes (HTTPS) payload to server B (SolrCloud) on LAN network. However, once converted to SolrCloud (1 node, 3 shards, 1 replica) I am seeing the following error : ConcurrentUpdateHttp2SolrClient Error consuming and closing http response stream. Im wondering, what possibly causes could be, im not seeing much documentation online specific to Solr. Thanks in advance for any assistance, Craig
Moving to solrcloud from single instance
I am starting the planning stages of moving from a single instance of solr 8 to a solrcloud implementation. Currently I have a 148GB index on a single dedicated server w 96gb ram @ 16 cores /2.4ghz ea. + SSD disk. The search is fast but obviously the index size is greater than the physical memory, which to my understanding is not a good thing. I have a lot of experience with single instance but none with solrcloud. I have 3 machines (other than my main 1) with the exact same hardware 96gb * 3 essentially which should be plenty. My issue is that im not sure where to go to learn how to set this up, how many shards, how many replicas, etc and would rather hire somebody or something (detailed video or document) to guide me through the process, and make decisions along the way...For example I think a shard is a piece of the index... but I dont even know how to decide how many replicas or what they are . Thanks everyone. -Craig