RE: Mixed field types and boolean searching

Ensdorf Ken Fri, 25 Sep 2009 07:04:54 -0700

> No- there are various analyzers. StandardAnalyzer is geared toward
> searching bodies of text for interesting words -  punctuation is
> ripped out. Other analyzers are more useful for "concrete" text. You
> may have to work at finding one that leaves punctuation in.
>


My problem is not with the StandardAnalyzer per se, but more as to how "dismax" 
style queries are handled by the query parser when the different fields have 
different sets of ignored tokens or stop words.

Say you want to use the contents of a text box in your app and query a field in 
Solr.  The user enters "A and B", so you map this to "f1:A and f1:B".  Now, if 
"B" is an ignored token in the "f1" field for whatever reason, the query boils 
down to "f1:A".  

Now imagine you want to allow the user's text to match multiple fields - as in 
any term can match any field, but all terms must match at least 1 field.  So 
now you map the user's query to "(f1:A OR f2:A) AND (f1:B OR f2:B)".  But if f2 
does not ignore "B", the query boils down to "(f1:A OR f2:A) AND (f2:B)".  Now 
documents that could come back when you were only matching against the f1 field 
don't come back.  

This seems counter-intuitive - to be consistent, I would think the query should 
essentially be treated as "(f1:A OR f2:A) AND (TRUE OR f2:B) " - and thus a 
term that is a stop word or ignored token for any of the fields would be 
ignored across the board.

So I guess what I'm asking is if there is a reason for the existing behavior, 
or is it just a fact-of-life of the query parser?  Thanks!

-Ken

RE: Mixed field types and boolean searching

Reply via email to