> No- there are various analyzers. StandardAnalyzer is geared toward > searching bodies of text for interesting words - punctuation is > ripped out. Other analyzers are more useful for "concrete" text. You > may have to work at finding one that leaves punctuation in. >
My problem is not with the StandardAnalyzer per se, but more as to how "dismax" style queries are handled by the query parser when the different fields have different sets of ignored tokens or stop words. Say you want to use the contents of a text box in your app and query a field in Solr. The user enters "A and B", so you map this to "f1:A and f1:B". Now, if "B" is an ignored token in the "f1" field for whatever reason, the query boils down to "f1:A". Now imagine you want to allow the user's text to match multiple fields - as in any term can match any field, but all terms must match at least 1 field. So now you map the user's query to "(f1:A OR f2:A) AND (f1:B OR f2:B)". But if f2 does not ignore "B", the query boils down to "(f1:A OR f2:A) AND (f2:B)". Now documents that could come back when you were only matching against the f1 field don't come back. This seems counter-intuitive - to be consistent, I would think the query should essentially be treated as "(f1:A OR f2:A) AND (TRUE OR f2:B) " - and thus a term that is a stop word or ignored token for any of the fields would be ignored across the board. So I guess what I'm asking is if there is a reason for the existing behavior, or is it just a fact-of-life of the query parser? Thanks! -Ken