: <analyzer type="query"> : <tokenizer class="solr.StandardTokenizerFactory"/> : <filter class="solr.StandardFilterFactory"/> : <filter class="solr.LowerCaseFilterFactory" /> : <filter class="solr.ShingleFilterFactory" outputUnigrams="false" maxShingleSize="2"/> : </analyzer> : </fieldType>
i'm pretty sure what you are seeing here is a variation on the "stopwords" confusion people tend to have about dismax (and edismax) just like hte lucene qparser, "whitespace" in the query string is significant, and is used to denote the individual clauses of the input, which are then *individually* passed to the analysers for each field in the qf -- if one of your qf fields produces no tokens for an individual clause (in this case: because it is configured not to output unigrams, and unigrams is all that it can produce based on only getting one clause at a time) then it gets droped out... http://www.lucidimagination.com/blog/2010/05/23/whats-a-dismax/ (note in particular the latter half starting with "Where people tend to get tripped up, is in thinking about how Solr’s per-field analysis configuration...") if you quoted some portion of hte input, then the entire quoted portion would be treated as a single clause and passed to your analyser. altenatly: if you used thta field in the "pf" (where the entire input is treated as one phrase) you would also start to see some shingles i believe -Hoss