protected phrases - possible?
Hi, The way our collection is setup, searches for "breast cancer" are returning results for ovarian cancer, or anything that contains either "breast" or "cancer". The reason is, we are searching across multiple fields. Even though I have set a "mm" value so that if less than 3 terms, ALL terms much match...SOLR considers it all matched even though "breast" was in the title and "cancer" is in the description. Is there a way to protect certain phrases so that they will not be tokenized? I tried using CommonGramsFilterFactory, but having "breast cancer" in the word list did not seem to do anything. I'm guessing it's because the field is tokenized first, so nothing would match that phrase. If I put "breast" and "cancer" as separate entries in the word list, I end up with too many unnecessary shingles, and "breast" and "cancer" are still two of the final terms. I have a feeling CommonGramsFilterFactory is not the right way to handle this. What are other options? Is it better to put all fields in one field, apply mm, and proximity boost? Thanks! Jing
Conditional Filter Queries
Hi, I want to filter my search results by different date fields based on content type. In other words: if contentType is A, filter out results that are older than 1 year; if contentType is B, filter out results that are older than 2 years; otherwise, date does not matter. Is that possible with fq parameters? Would it be something like fq=(contentType:"A" AND startDate:[NOW-1YEAR TO NOW]) OR (contentType:"B" AND startDate:[NOW-2YEAR TO NOW]) OR !contentType: ("A" or "B") Is there a better way to do this? Thanks, Jing
Inconsistent relevancy score between browser refreshes
I am seeing different relevancy scores for the same documents, between browser refreshes. Any ideas why? The query is the same, index is the same - why would score change? Example: First request returns: Stroke Anticoagulation and Prophylaxis 3.463463 Hemorrhagic Stroke 3.463463 Vertebrobasilar Stroke 3.460521 Second request: Vertebrobasilar Stroke 3.460521 Hemorrhagic Stroke 3.4484053 Stroke Anticoagulation and Prophylaxis 3.4484053 Third request: Stroke Anticoagulation and Prophylaxis 3.463463 Hemorrhagic Stroke 3.463463 Vertebrobasilar Stroke 3.402718 Jing
RE: Inconsistent relevancy score between browser refreshes
1) It is a SolrCloud setup on 4 servers, 4 shards, replication factor of 2. 2) There is no indexing going on. 3) No, I did not optimize. 4) Did not optimize between refreshes. Thanks, Jing -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, September 10, 2014 4:09 PM To: solr-user@lucene.apache.org Subject: Re: Inconsistent relevancy score between browser refreshes More info please. 1> Are there replicas involved? 2> Is there any indexing going on? 3> If more than one node, did you optimize? 4> Did you optimize between refreshes? Best, Erick On Wed, Sep 10, 2014 at 12:28 PM, Tao, Jing wrote: > I am seeing different relevancy scores for the same documents, between > browser refreshes. Any ideas why? The query is the same, index is the same > - why would score change? > > Example: > First request returns: > > Stroke Anticoagulation and Prophylaxis name="score">3.463463name="title">Hemorrhagic Stroke name="score">3.463463name="title">Vertebrobasilar Stroke name="score">3.460521 > > Second request: > > Vertebrobasilar Stroke name="score">3.460521name="title">Hemorrhagic Stroke name="score">3.4484053 Stroke > Anticoagulation and Prophylaxis name="score">3.4484053 > > Third request: > > Stroke Anticoagulation and Prophylaxis name="score">3.463463name="title">Hemorrhagic Stroke name="score">3.463463name="title">Vertebrobasilar Stroke name="score">3.402718 > > > Jing
RE: Inconsistent relevancy score between browser refreshes
Thanks Erick! After optimization, the scores don't change anymore. Now the only time order may change between browser refreshes is if the score is exactly the same between documents. But that's understandable. First Request: refcenter_323409 Vertebrobasilar Stroke 3.4728973 refcenter_1916662 Hemorrhagic Stroke 3.4613633 refcenter_1160021 Stroke Anticoagulation and Prophylaxis 3.4613633 Second Request: refcenter_323409 Vertebrobasilar Stroke 3.4728973 refcenter_1160021 Stroke Anticoagulation and Prophylaxis 3.4613633 refcenter_1916662 Hemorrhagic Stroke 3.4613633 Jing -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Thursday, September 11, 2014 5:35 PM To: solr-user@lucene.apache.org Subject: Re: Inconsistent relevancy score between browser refreshes If you look at your shards individually, I'll be that you'll find a slight difference in the number of deleted docs. You can add &distrib=false to the query and it will be entirely served on a single node. Indexes will be slightly different on various shards due to differing merges, and my theory is that you're seeing a very slight difference in score due to differing tf/idf statistics. Unfortunately, the stats include deleted docs which are purged on segment merge. Since the merges may be at different times on the various shards, you may have very slightly different scores. A simple test would be to optimize. That should purge all the deleted docs' data. Best, Erick On Wed, Sep 10, 2014 at 1:19 PM, Tao, Jing wrote: > 1) It is a SolrCloud setup on 4 servers, 4 shards, replication factor of 2. > 2) There is no indexing going on. > 3) No, I did not optimize. > 4) Did not optimize between refreshes. > > Thanks, > Jing > > -Original Message- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: Wednesday, September 10, 2014 4:09 PM > To: solr-user@lucene.apache.org > Subject: Re: Inconsistent relevancy score between browser refreshes > > More info please. > > 1> Are there replicas involved? > 2> Is there any indexing going on? > 3> If more than one node, did you optimize? > 4> Did you optimize between refreshes? > > Best, > Erick > > On Wed, Sep 10, 2014 at 12:28 PM, Tao, Jing wrote: >> I am seeing different relevancy scores for the same documents, between >> browser refreshes. Any ideas why? The query is the same, index is the same >> - why would score change? >> >> Example: >> First request returns: >> >> Stroke Anticoagulation and Prophylaxis > name="score">3.463463 > name="title">Hemorrhagic Stroke > name="score">3.463463 > name="title">Vertebrobasilar Stroke > name="score">3.460521 >> >> Second request: >> >> Vertebrobasilar Stroke > name="score">3.460521 > name="title">Hemorrhagic Stroke > name="score">3.4484053 Stroke >> Anticoagulation and Prophylaxis > name="score">3.4484053 >> >> Third request: >> >> Stroke Anticoagulation and Prophylaxis > name="score">3.463463 > name="title">Hemorrhagic Stroke > name="score">3.463463 > name="title">Vertebrobasilar Stroke > name="score">3.402718 >> >> >> Jing
spellchecker returns correctlySpelled=true if one term in phrase is correctly spelled
Hi, It seems that when I do a phrase search, SOLR's spellchecker would return correctlySpelled=true if at least one term in the phrase was correctly spelled. For example: If I search for "soriasis treatment", SOLR returns over 8000 search results for "treatment", correctlySpelled: true, and a spelling suggestion of "psoriasis" for "soriasis". If I search for "soriasis treatment", SOLR returns 0 results, correctlySpelled:false, and spelling suggestings for both "soriasis" and "treatmnt". Does this mean if I want to display a "Did You Mean" for "soriasis treatment", I need to 1) Check if there are any suggestions returned by spellchecker for any of the terms, and 2) Compare the number of hits for each collation with the numFound for original query? Another spellchecker question I have is how can I configure SOLR to suggest "heart attack" if someone searches for "heart attach"? Technically, there are no misspellings, but "heart attach" as a phrase does not make sense. Thanks, Jing