Hi Jing,

You can boost phrases by pf (phrase fields) parameter. If you don't like this 
solution, you can modify search query at client side. E.g. surround certain 
phrases with quotes. This will force proximity search without interfering with 
tokenisation.

Ahmet


On Monday, March 30, 2015 8:49 PM, "Tao, Jing" <j...@webmd.net> wrote:
Hi,

The way our collection is setup, searches for "breast cancer" are returning 
results for ovarian cancer, or anything that contains either "breast" or 
"cancer".  The reason is, we are searching across multiple fields.  Even though 
I have set a "mm" value so that if less than 3 terms, ALL terms much 
match...SOLR considers it all matched even though "breast" was in the title and 
"cancer" is in the description.

Is there a way to protect certain phrases so that they will not be tokenized?  
I tried using CommonGramsFilterFactory, but having "breast cancer" in the word 
list did not seem to do anything.  I'm guessing it's because the field is 
tokenized first, so nothing would match that phrase.  If I put "breast" and 
"cancer" as separate entries in the word list, I end up with too many 
unnecessary shingles, and "breast" and "cancer" are still two of the final 
terms.

I have a feeling CommonGramsFilterFactory is not the right way to handle this.  
What are other options?  Is it better to put all fields in one field, apply mm, 
and proximity boost?

Thanks!
Jing 

Reply via email to