Thank you everyone for your explanation.  So for WordDelimiterFilter, let me 
see if I got it right.


Given that out-of-the box setting for catenateWords is "0" for query but is "1" 
for index, then I don't see how this will give me any hits.  That is, if my 
document has "wi-fi", at index time it will be stored as "wifi".  Well, than at 
query time if I type "wi-fi" (without quotes) I will be searching for "wi fi" 
and thus won't get a hit.  no?


What about when I *do* quote my search, i.e.: I search for "wi-fi" with quotes, 
now what am I sending to the searcher, "wi-fi", "wi fi" or "wifi"?  Again, this 
is using the default out-of-the box setting per the above.


The same applies for catenateNumbers.


Btw, I'm looking at this link for the above values: 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters


--MJ





-----Original Message-----
From: Erick Erickson <erickerick...@gmail.com>
To: solr-user <solr-user@lucene.apache.org>
Sent: Thu, Nov 8, 2012 6:57 pm
Subject: Re: Questions about schema.xml


And, in fact, you do NOT need to have two. If they are both identical, just
specify one analysis chain with no qualifier, i.e.
<analyzer>


On Thu, Nov 8, 2012 at 9:44 AM, Jack Krupansky <j...@basetechnology.com>wrote:

> Many token filters will be used 100% identically for both "index" and
> "query" analysis, but WordDelimiterFilter is a rare exception. The issue is
> that at index time it has the ability to generate multiple tokens at the
> same position (the "catenate" options), any of which can be queried, but at
> query time it can be problematic to have these "extra" terms (except in
> some conditions), so the WDF settings suppress generation of the extra
> terms.
>
> Another example is synonyms - generate extra terms at index time for
> greater precision of searches, but limit the query terms to exclude the
> "extra" terms.
>
> That's the reason for the occaassional asymmetry between index-time and
> query-time analyzers.
>
> -- Jack Krupansky
>
> -----Original Message----- From: johnmu...@aol.com
> Sent: Wednesday, November 07, 2012 7:13 PM
> To: solr-user@lucene.apache.org
> Subject: Questions about schema.xml
>
>
>
> HI,
>
>
> Can someone help me understand the meaning of <analyzer type="index"> and
> <analyzer type="query"> in schema.xml, how they are used and what do I get
> back when the values are not the same?
>
>
> For example, given:
>
>
> <fieldType name="text" class="solr.TextField" positionIncrementGap="100"
> autoGeneratePhraseQueries="**true">
>   <analyzer type="index">
>      <tokenizer class="solr.**WhitespaceTokenizerFactory"/>
>      <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="**true" />
>      <filter class="solr.**WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>      <filter class="solr.**LowerCaseFilterFactory"/>
>      <filter class="solr.**KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>      <filter class="solr.**PorterStemFilterFactory"/>
>   </analyzer>
>   <analyzer type="query">
>      <tokenizer class="solr.**WhitespaceTokenizerFactory"/>
>      <filter class="solr.**SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>      <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="**true" />
>      <filter class="solr.**WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>      <filter class="solr.**LowerCaseFilterFactory"/>
>      <filter class="solr.**KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>      <filter class="solr.**PorterStemFilterFactory"/>
>   </analyzer>
> </fieldType>
>
>
> If I make the entire content of "index" the same as "query" (or the other
> way around) how will that impact my search?  And why would I want to not
> make those two blocks the same?
>
>
> Thanks!!!
>
>
> -MJ
>

 

Reply via email to