The DisMax parser essentially creates a set of queries against different fields. These queries are analyzed as per each field.
I think this what you are talking about- "The" in a movie title is diffferent from "the" in the movie description. Would you expect "The Sound Of Music" to fetch every movie in the database? So "the" is a stopword in the description but is not in the title. Also, the DisMax parser has no OR. It has +, - and "at least one of and more is better". The query "A B" means "A or B but both is better". "+a +b" means "a AND B". "+a b" means "must have 'a' but is better with 'b'". On Fri, Sep 25, 2009 at 7:04 AM, Ensdorf Ken <ensd...@zoominfo.com> wrote: >> No- there are various analyzers. StandardAnalyzer is geared toward >> searching bodies of text for interesting words - punctuation is >> ripped out. Other analyzers are more useful for "concrete" text. You >> may have to work at finding one that leaves punctuation in. >> > > My problem is not with the StandardAnalyzer per se, but more as to how > "dismax" style queries are handled by the query parser when the different > fields have different sets of ignored tokens or stop words. > > Say you want to use the contents of a text box in your app and query a field > in Solr. The user enters "A and B", so you map this to "f1:A and f1:B". > Now, if "B" is an ignored token in the "f1" field for whatever reason, the > query boils down to "f1:A". > > Now imagine you want to allow the user's text to match multiple fields - as > in any term can match any field, but all terms must match at least 1 field. > So now you map the user's query to "(f1:A OR f2:A) AND (f1:B OR f2:B)". But > if f2 does not ignore "B", the query boils down to "(f1:A OR f2:A) AND > (f2:B)". Now documents that could come back when you were only matching > against the f1 field don't come back. > > This seems counter-intuitive - to be consistent, I would think the query > should essentially be treated as "(f1:A OR f2:A) AND (TRUE OR f2:B) " - and > thus a term that is a stop word or ignored token for any of the fields would > be ignored across the board. > > So I guess what I'm asking is if there is a reason for the existing behavior, > or is it just a fact-of-life of the query parser? Thanks! > > -Ken > -- Lance Norskog goks...@gmail.com