Thank you everyone for your explanation. So for WordDelimiterFilter, let me see if I got it right.
Given that out-of-the box setting for catenateWords is "0" for query but is "1" for index, then I don't see how this will give me any hits. That is, if my document has "wi-fi", at index time it will be stored as "wifi". Well, than at query time if I type "wi-fi" (without quotes) I will be searching for "wi fi" and thus won't get a hit. no? What about when I *do* quote my search, i.e.: I search for "wi-fi" with quotes, now what am I sending to the searcher, "wi-fi", "wi fi" or "wifi"? Again, this is using the default out-of-the box setting per the above. The same applies for catenateNumbers. Btw, I'm looking at this link for the above values: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters --MJ -----Original Message----- From: Erick Erickson <erickerick...@gmail.com> To: solr-user <solr-user@lucene.apache.org> Sent: Thu, Nov 8, 2012 6:57 pm Subject: Re: Questions about schema.xml And, in fact, you do NOT need to have two. If they are both identical, just specify one analysis chain with no qualifier, i.e. <analyzer> On Thu, Nov 8, 2012 at 9:44 AM, Jack Krupansky <j...@basetechnology.com>wrote: > Many token filters will be used 100% identically for both "index" and > "query" analysis, but WordDelimiterFilter is a rare exception. The issue is > that at index time it has the ability to generate multiple tokens at the > same position (the "catenate" options), any of which can be queried, but at > query time it can be problematic to have these "extra" terms (except in > some conditions), so the WDF settings suppress generation of the extra > terms. > > Another example is synonyms - generate extra terms at index time for > greater precision of searches, but limit the query terms to exclude the > "extra" terms. > > That's the reason for the occaassional asymmetry between index-time and > query-time analyzers. > > -- Jack Krupansky > > -----Original Message----- From: johnmu...@aol.com > Sent: Wednesday, November 07, 2012 7:13 PM > To: solr-user@lucene.apache.org > Subject: Questions about schema.xml > > > > HI, > > > Can someone help me understand the meaning of <analyzer type="index"> and > <analyzer type="query"> in schema.xml, how they are used and what do I get > back when the values are not the same? > > > For example, given: > > > <fieldType name="text" class="solr.TextField" positionIncrementGap="100" > autoGeneratePhraseQueries="**true"> > <analyzer type="index"> > <tokenizer class="solr.**WhitespaceTokenizerFactory"/> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords.txt" enablePositionIncrements="**true" /> > <filter class="solr.**WordDelimiterFilterFactory" > generateWordParts="1" generateNumberParts="1" catenateWords="1" > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> > <filter class="solr.**LowerCaseFilterFactory"/> > <filter class="solr.**KeywordMarkerFilterFactory" > protected="protwords.txt"/> > <filter class="solr.**PorterStemFilterFactory"/> > </analyzer> > <analyzer type="query"> > <tokenizer class="solr.**WhitespaceTokenizerFactory"/> > <filter class="solr.**SynonymFilterFactory" synonyms="synonyms.txt" > ignoreCase="true" expand="true"/> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords.txt" enablePositionIncrements="**true" /> > <filter class="solr.**WordDelimiterFilterFactory" > generateWordParts="1" generateNumberParts="1" catenateWords="0" > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> > <filter class="solr.**LowerCaseFilterFactory"/> > <filter class="solr.**KeywordMarkerFilterFactory" > protected="protwords.txt"/> > <filter class="solr.**PorterStemFilterFactory"/> > </analyzer> > </fieldType> > > > If I make the entire content of "index" the same as "query" (or the other > way around) how will that impact my search? And why would I want to not > make those two blocks the same? > > > Thanks!!! > > > -MJ >