Re: Questions about schema.xml

johnmunir Thu, 08 Nov 2012 04:41:04 -0800

Thanks Prithu.

But why would I use different settings for the index and query?  I would think 
that if the setting is not the same for both, then search results for end users 
would be confusing, no?  To illustrate my point (this maybe drastic) if I don't 
"solr.LowerCaseFilterFactory" in one case, then many searches (mix-case for 
example) won't give me any hits.  A more realistic example is, if I don't match 
the rules for "solr.WordDelimiterFilterFactory", again, I could miss hits.  If 
my understanding is correct, and there is value in using different rules for 
"query" and "index", I like to see a concrete example, a use-case I can apply.

-- MJ

-----Original Message-----
From: Prithu Banerjee <prid...@gmail.com>
To: solr-user <solr-user@lucene.apache.org>
Sent: Thu, Nov 8, 2012 12:34 am
Subject: Re: Questions about schema.xml

Those two values are used to specify the analyzer type you want. That can
be of two kinds, one for the indexer- the analyzer you specify analyzes the
input documents accordingly to build the index. The other one is for query,
it analyzes your query. Typically the specified analyzer for index and
query are same so that you can search over exactly the token you created
while indexing. But you are free to provide any customized analyzer
according to your need.

-- 
best regards,
Prithu

On Thu, Nov 8, 2012 at 8:43 AM, <johnmu...@aol.com> wrote:

>
> HI,
>
>
> Can someone help me understand the meaning of <analyzer type="index"> and
> <analyzer type="query"> in schema.xml, how they are used and what do I get
> back when the values are not the same?
>
>
> For example, given:
>
>
> <fieldType name="text" class="solr.TextField" positionIncrementGap="100"
> autoGeneratePhraseQueries="true">
>    <analyzer type="index">
>       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>       <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true" />
>       <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>       <filter class="solr.LowerCaseFilterFactory"/>
>       <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>       <filter class="solr.PorterStemFilterFactory"/>
>    </analyzer>
>    <analyzer type="query">
>       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>       <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>       <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true" />
>       <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>       <filter class="solr.LowerCaseFilterFactory"/>
>       <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>       <filter class="solr.PorterStemFilterFactory"/>
>    </analyzer>
> </fieldType>
>
>
> If I make the entire content of "index" the same as "query" (or the other
> way around) how will that impact my search?  And why would I want to not
> make those two blocks the same?
>
>
> Thanks!!!
>
>
> -MJ
>

Re: Questions about schema.xml

Reply via email to