Re: Mixed field types and boolean searching

Lance Norskog Fri, 25 Sep 2009 19:13:47 -0700

The DisMax parser essentially creates a set of queries against
different fields. These queries are analyzed as per each field.


I think this what you are talking about- "The" in a movie title is
diffferent from "the" in the movie description. Would you expect "The
Sound Of Music" to fetch every movie in the database? So "the" is a
stopword in the description but is not in the title.

Also, the DisMax parser has no OR. It has +, - and "at least one of
and more is better". The query "A B" means "A or B but both is
better". "+a +b" means "a AND B". "+a b" means "must have 'a' but is
better with 'b'".

On Fri, Sep 25, 2009 at 7:04 AM, Ensdorf Ken <ensd...@zoominfo.com> wrote:
>> No- there are various analyzers. StandardAnalyzer is geared toward
>> searching bodies of text for interesting words -  punctuation is
>> ripped out. Other analyzers are more useful for "concrete" text. You
>> may have to work at finding one that leaves punctuation in.
>>
>
> My problem is not with the StandardAnalyzer per se, but more as to how 
> "dismax" style queries are handled by the query parser when the different 
> fields have different sets of ignored tokens or stop words.
>
> Say you want to use the contents of a text box in your app and query a field 
> in Solr.  The user enters "A and B", so you map this to "f1:A and f1:B".  
> Now, if "B" is an ignored token in the "f1" field for whatever reason, the 
> query boils down to "f1:A".
>
> Now imagine you want to allow the user's text to match multiple fields - as 
> in any term can match any field, but all terms must match at least 1 field.  
> So now you map the user's query to "(f1:A OR f2:A) AND (f1:B OR f2:B)".  But 
> if f2 does not ignore "B", the query boils down to "(f1:A OR f2:A) AND 
> (f2:B)".  Now documents that could come back when you were only matching 
> against the f1 field don't come back.
>
> This seems counter-intuitive - to be consistent, I would think the query 
> should essentially be treated as "(f1:A OR f2:A) AND (TRUE OR f2:B) " - and 
> thus a term that is a stop word or ignored token for any of the fields would 
> be ignored across the board.
>
> So I guess what I'm asking is if there is a reason for the existing behavior, 
> or is it just a fact-of-life of the query parser?  Thanks!
>
> -Ken
>



-- 
Lance Norskog
goks...@gmail.com

Re: Mixed field types and boolean searching

Reply via email to