Re: Questions about schema.xml

Jack Krupansky Thu, 08 Nov 2012 06:45:34 -0800

Many token filters will be used 100% identically for both "index" and"query" analysis, but WordDelimiterFilter is a rare exception. The issue isthat at index time it has the ability to generate multiple tokens at thesame position (the "catenate" options), any of which can be queried, but atquery time it can be problematic to have these "extra" terms (except in someconditions), so the WDF settings suppress generation of the extra terms.

Another example is synonyms - generate extra terms at index time for greaterprecision of searches, but limit the query terms to exclude the "extra"terms.

That's the reason for the occaassional asymmetry between index-time andquery-time analyzers.


-- Jack Krupansky

-----Original Message-----From: johnmu...@aol.com

Sent: Wednesday, November 07, 2012 7:13 PM
To: solr-user@lucene.apache.org
Subject: Questions about schema.xml


HI,

Can someone help me understand the meaning of <analyzer type="index"> and<analyzer type="query"> in schema.xml, how they are used and what do I getback when the values are not the same?



For example, given:

<fieldType name="text" class="solr.TextField" positionIncrementGap="100"autoGeneratePhraseQueries="true">

  <analyzer type="index">
     <tokenizer class="solr.WhitespaceTokenizerFactory"/>

<filter class="solr.StopFilterFactory" ignoreCase="true"words="stopwords.txt" enablePositionIncrements="true" /><filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"generateNumberParts="1" catenateWords="1" catenateNumbers="1"catenateAll="0" splitOnCaseChange="1"/>

     <filter class="solr.LowerCaseFilterFactory"/>

<filter class="solr.KeywordMarkerFilterFactory"protected="protwords.txt"/>

     <filter class="solr.PorterStemFilterFactory"/>
  </analyzer>
  <analyzer type="query">
     <tokenizer class="solr.WhitespaceTokenizerFactory"/>

<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"ignoreCase="true" expand="true"/><filter class="solr.StopFilterFactory" ignoreCase="true"words="stopwords.txt" enablePositionIncrements="true" /><filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"generateNumberParts="1" catenateWords="0" catenateNumbers="0"catenateAll="0" splitOnCaseChange="1"/>

     <filter class="solr.LowerCaseFilterFactory"/>

<filter class="solr.KeywordMarkerFilterFactory"protected="protwords.txt"/>

     <filter class="solr.PorterStemFilterFactory"/>
  </analyzer>
</fieldType>

If I make the entire content of "index" the same as "query" (or the otherway around) how will that impact my search? And why would I want to notmake those two blocks the same?



Thanks!!!

-MJ

Re: Questions about schema.xml

Reply via email to