Re: Questions about schema.xml

Erick Erickson Thu, 08 Nov 2012 15:57:14 -0800

And, in fact, you do NOT need to have two. If they are both identical, just
specify one analysis chain with no qualifier, i.e.
<analyzer>



On Thu, Nov 8, 2012 at 9:44 AM, Jack Krupansky <[email protected]>wrote:

> Many token filters will be used 100% identically for both "index" and
> "query" analysis, but WordDelimiterFilter is a rare exception. The issue is
> that at index time it has the ability to generate multiple tokens at the
> same position (the "catenate" options), any of which can be queried, but at
> query time it can be problematic to have these "extra" terms (except in
> some conditions), so the WDF settings suppress generation of the extra
> terms.
>
> Another example is synonyms - generate extra terms at index time for
> greater precision of searches, but limit the query terms to exclude the
> "extra" terms.
>
> That's the reason for the occaassional asymmetry between index-time and
> query-time analyzers.
>
> -- Jack Krupansky
>
> -----Original Message----- From: [email protected]
> Sent: Wednesday, November 07, 2012 7:13 PM
> To: [email protected]
> Subject: Questions about schema.xml
>
>
>
> HI,
>
>
> Can someone help me understand the meaning of <analyzer type="index"> and
> <analyzer type="query"> in schema.xml, how they are used and what do I get
> back when the values are not the same?
>
>
> For example, given:
>
>
> <fieldType name="text" class="solr.TextField" positionIncrementGap="100"
> autoGeneratePhraseQueries="**true">
>   <analyzer type="index">
>      <tokenizer class="solr.**WhitespaceTokenizerFactory"/>
>      <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="**true" />
>      <filter class="solr.**WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>      <filter class="solr.**LowerCaseFilterFactory"/>
>      <filter class="solr.**KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>      <filter class="solr.**PorterStemFilterFactory"/>
>   </analyzer>
>   <analyzer type="query">
>      <tokenizer class="solr.**WhitespaceTokenizerFactory"/>
>      <filter class="solr.**SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>      <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="**true" />
>      <filter class="solr.**WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>      <filter class="solr.**LowerCaseFilterFactory"/>
>      <filter class="solr.**KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>      <filter class="solr.**PorterStemFilterFactory"/>
>   </analyzer>
> </fieldType>
>
>
> If I make the entire content of "index" the same as "query" (or the other
> way around) how will that impact my search?  And why would I want to not
> make those two blocks the same?
>
>
> Thanks!!!
>
>
> -MJ
>

Re: Questions about schema.xml

Reply via email to