:       <analyzer type="query">
:         <tokenizer class="solr.StandardTokenizerFactory"/>
:         <filter class="solr.StandardFilterFactory"/>
:               <filter class="solr.LowerCaseFilterFactory" />
:         <filter class="solr.ShingleFilterFactory" outputUnigrams="false" 
maxShingleSize="2"/>
:       </analyzer> 
:      </fieldType>

i'm pretty sure what you are seeing here is a variation on the "stopwords" 
confusion people tend to have about dismax (and edismax)

just like hte lucene qparser, "whitespace" in the query string is 
significant, and is used to denote the individual clauses of the input, 
which are then *individually* passed to the analysers for each field in 
the qf -- if one of your qf fields produces no tokens for an individual 
clause (in this case: because it is configured not to output unigrams, and 
unigrams is all that it can produce based on only getting one clause at a 
time) then it gets droped out...

http://www.lucidimagination.com/blog/2010/05/23/whats-a-dismax/

(note in particular the latter half starting with "Where people tend to 
get tripped up, is in thinking about how Solr’s per-field analysis 
configuration...") 

if you quoted some portion of hte input, then the entire quoted portion 
would be treated as a single clause and passed to your analyser.

altenatly: if you used thta field in the "pf" (where the entire input is 
treated as one phrase) you would also start to see some shingles i believe


-Hoss

Reply via email to