protected phrases - possible?

2015-03-30 Thread Tao, Jing
Hi,

The way our collection is setup, searches for "breast cancer" are returning 
results for ovarian cancer, or anything that contains either "breast" or 
"cancer".  The reason is, we are searching across multiple fields.  Even though 
I have set a "mm" value so that if less than 3 terms, ALL terms much 
match...SOLR considers it all matched even though "breast" was in the title and 
"cancer" is in the description.

Is there a way to protect certain phrases so that they will not be tokenized?  
I tried using CommonGramsFilterFactory, but having "breast cancer" in the word 
list did not seem to do anything.  I'm guessing it's because the field is 
tokenized first, so nothing would match that phrase.  If I put "breast" and 
"cancer" as separate entries in the word list, I end up with too many 
unnecessary shingles, and "breast" and "cancer" are still two of the final 
terms.

I have a feeling CommonGramsFilterFactory is not the right way to handle this.  
What are other options?  Is it better to put all fields in one field, apply mm, 
and proximity boost?

Thanks!
Jing


Conditional Filter Queries

2015-04-16 Thread Tao, Jing
Hi,

I want to filter my search results by different date fields based on content 
type.
In other words: if contentType is A, filter out results that are older than 1 
year; if contentType is B, filter out results that are older than 2 years; 
otherwise, date does not matter.

Is that possible with fq parameters?
Would it be something like  fq=(contentType:"A" AND startDate:[NOW-1YEAR TO 
NOW]) OR (contentType:"B" AND startDate:[NOW-2YEAR TO NOW]) OR !contentType: 
("A" or "B")

Is there a better way to do this?

Thanks,
Jing


Inconsistent relevancy score between browser refreshes

2014-09-10 Thread Tao, Jing
I am seeing different relevancy scores for the same documents, between browser 
refreshes.  Any ideas why?  The query is the same, index is the same - why 
would score change?

Example:
First request returns:

Stroke Anticoagulation and Prophylaxis
3.463463


Hemorrhagic Stroke
3.463463


Vertebrobasilar Stroke
3.460521


Second request:

Vertebrobasilar Stroke
3.460521


Hemorrhagic Stroke
3.4484053


Stroke Anticoagulation and Prophylaxis
3.4484053


Third request:

Stroke Anticoagulation and Prophylaxis
3.463463


Hemorrhagic Stroke
3.463463


Vertebrobasilar Stroke
3.402718



Jing


RE: Inconsistent relevancy score between browser refreshes

2014-09-10 Thread Tao, Jing
1) It is a SolrCloud setup on 4 servers, 4 shards, replication factor of 2.
2) There is no indexing going on.
3) No, I did not optimize.
4) Did not optimize between refreshes.

Thanks,
Jing

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Wednesday, September 10, 2014 4:09 PM
To: solr-user@lucene.apache.org
Subject: Re: Inconsistent relevancy score between browser refreshes

More info please.

1> Are there replicas involved?
2> Is there any indexing going on?
3> If more than one node, did you optimize?
4> Did you optimize between refreshes?

Best,
Erick

On Wed, Sep 10, 2014 at 12:28 PM, Tao, Jing  wrote:
> I am seeing different relevancy scores for the same documents, between 
> browser refreshes.  Any ideas why?  The query is the same, index is the same 
> - why would score change?
>
> Example:
> First request returns:
> 
> Stroke Anticoagulation and Prophylaxis  name="score">3.463463name="title">Hemorrhagic Stroke  name="score">3.463463name="title">Vertebrobasilar Stroke  name="score">3.460521 
>
> Second request:
> 
> Vertebrobasilar Stroke  name="score">3.460521name="title">Hemorrhagic Stroke  name="score">3.4484053   Stroke 
> Anticoagulation and Prophylaxis  name="score">3.4484053 
>
> Third request:
> 
> Stroke Anticoagulation and Prophylaxis  name="score">3.463463name="title">Hemorrhagic Stroke  name="score">3.463463name="title">Vertebrobasilar Stroke  name="score">3.402718 
>
>
> Jing


RE: Inconsistent relevancy score between browser refreshes

2014-09-15 Thread Tao, Jing
Thanks Erick!  After optimization, the scores don't change anymore.  Now the 
only time order may change between browser refreshes is if the score is exactly 
the same between documents.  But that's understandable.  

First Request:

refcenter_323409
Vertebrobasilar Stroke
3.4728973


refcenter_1916662
Hemorrhagic Stroke
3.4613633


refcenter_1160021
Stroke Anticoagulation and Prophylaxis
3.4613633


Second Request:

refcenter_323409
Vertebrobasilar Stroke
3.4728973


refcenter_1160021
Stroke Anticoagulation and Prophylaxis
3.4613633


refcenter_1916662
Hemorrhagic Stroke
3.4613633


Jing

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Thursday, September 11, 2014 5:35 PM
To: solr-user@lucene.apache.org
Subject: Re: Inconsistent relevancy score between browser refreshes

If you look at your shards individually, I'll be that you'll find a slight 
difference in the number of deleted docs. You can add &distrib=false to the 
query and it will be entirely served on a single node.

Indexes will be slightly different on various shards due to differing merges, 
and my theory is that you're seeing a very slight difference in score due to 
differing tf/idf statistics. Unfortunately, the stats include deleted docs 
which are purged on segment merge. Since the merges may be at different times 
on the various shards, you may have very slightly different scores.

A simple test would be to optimize. That should purge all the deleted docs' 
data.

Best,
Erick

On Wed, Sep 10, 2014 at 1:19 PM, Tao, Jing  wrote:
> 1) It is a SolrCloud setup on 4 servers, 4 shards, replication factor of 2.
> 2) There is no indexing going on.
> 3) No, I did not optimize.
> 4) Did not optimize between refreshes.
>
> Thanks,
> Jing
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Wednesday, September 10, 2014 4:09 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Inconsistent relevancy score between browser refreshes
>
> More info please.
>
> 1> Are there replicas involved?
> 2> Is there any indexing going on?
> 3> If more than one node, did you optimize?
> 4> Did you optimize between refreshes?
>
> Best,
> Erick
>
> On Wed, Sep 10, 2014 at 12:28 PM, Tao, Jing  wrote:
>> I am seeing different relevancy scores for the same documents, between 
>> browser refreshes.  Any ideas why?  The query is the same, index is the same 
>> - why would score change?
>>
>> Example:
>> First request returns:
>> 
>> Stroke Anticoagulation and Prophylaxis > name="score">3.463463   > name="title">Hemorrhagic Stroke > name="score">3.463463   > name="title">Vertebrobasilar Stroke > name="score">3.460521 
>>
>> Second request:
>> 
>> Vertebrobasilar Stroke > name="score">3.460521   > name="title">Hemorrhagic Stroke > name="score">3.4484053   Stroke 
>> Anticoagulation and Prophylaxis > name="score">3.4484053 
>>
>> Third request:
>> 
>> Stroke Anticoagulation and Prophylaxis > name="score">3.463463   > name="title">Hemorrhagic Stroke > name="score">3.463463   > name="title">Vertebrobasilar Stroke > name="score">3.402718 
>>
>>
>> Jing


spellchecker returns correctlySpelled=true if one term in phrase is correctly spelled

2014-12-02 Thread Tao, Jing
Hi,

It seems that when I do a phrase search, SOLR's spellchecker would return 
correctlySpelled=true if at least one term in the phrase was correctly spelled.
For example:
If I search for "soriasis treatment", SOLR returns over 8000 search results for 
"treatment", correctlySpelled: true, and a spelling suggestion of "psoriasis" 
for "soriasis".
If I search for "soriasis treatment", SOLR returns 0 results, 
correctlySpelled:false, and spelling suggestings for both "soriasis" and 
"treatmnt".

Does this mean if I want to display a "Did You Mean" for "soriasis treatment", 
I need to

1)  Check if there are any suggestions returned by spellchecker for any of 
the terms, and

2)  Compare the number of hits for each collation with the numFound for 
original query?

Another spellchecker question I have is how can I configure SOLR to suggest 
"heart attack" if someone searches for "heart attach"?  Technically, there are 
no misspellings, but "heart attach" as a phrase does not make sense.

Thanks,
Jing