Many token filters will be used 100% identically for both "index" and
"query" analysis, but WordDelimiterFilter is a rare exception. The issue is
that at index time it has the ability to generate multiple tokens at the
same position (the "catenate" options), any of which can be queried, but at
query time it can be problematic to have these "extra" terms (except in some
conditions), so the WDF settings suppress generation of the extra terms.
Another example is synonyms - generate extra terms at index time for greater
precision of searches, but limit the query terms to exclude the "extra"
terms.
That's the reason for the occaassional asymmetry between index-time and
query-time analyzers.
-- Jack Krupansky
-----Original Message-----
From: johnmu...@aol.com
Sent: Wednesday, November 07, 2012 7:13 PM
To: solr-user@lucene.apache.org
Subject: Questions about schema.xml
HI,
Can someone help me understand the meaning of <analyzer type="index"> and
<analyzer type="query"> in schema.xml, how they are used and what do I get
back when the values are not the same?
For example, given:
<fieldType name="text" class="solr.TextField" positionIncrementGap="100"
autoGeneratePhraseQueries="true">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1"
catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory"
protected="protwords.txt"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="0" catenateNumbers="0"
catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory"
protected="protwords.txt"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
</fieldType>
If I make the entire content of "index" the same as "query" (or the other
way around) how will that impact my search? And why would I want to not
make those two blocks the same?
Thanks!!!
-MJ