Many token filters will be used 100% identically for both "index" and "query" analysis, but WordDelimiterFilter is a rare exception. The issue is that at index time it has the ability to generate multiple tokens at the same position (the "catenate" options), any of which can be queried, but at query time it can be problematic to have these "extra" terms (except in some conditions), so the WDF settings suppress generation of the extra terms.

Another example is synonyms - generate extra terms at index time for greater precision of searches, but limit the query terms to exclude the "extra" terms.

That's the reason for the occaassional asymmetry between index-time and query-time analyzers.

-- Jack Krupansky

-----Original Message----- From: johnmu...@aol.com
Sent: Wednesday, November 07, 2012 7:13 PM
To: solr-user@lucene.apache.org
Subject: Questions about schema.xml


HI,


Can someone help me understand the meaning of <analyzer type="index"> and <analyzer type="query"> in schema.xml, how they are used and what do I get back when the values are not the same?


For example, given:


<fieldType name="text" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
  <analyzer type="index">
     <tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
     <filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
     <filter class="solr.PorterStemFilterFactory"/>
  </analyzer>
  <analyzer type="query">
     <tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
     <filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
     <filter class="solr.PorterStemFilterFactory"/>
  </analyzer>
</fieldType>


If I make the entire content of "index" the same as "query" (or the other way around) how will that impact my search? And why would I want to not make those two blocks the same?


Thanks!!!


-MJ

Reply via email to