You should get familiar with the admin/analysis page, it's invaluable for understanding _exactly_ what your analysis chain does with various inputs..
Best Erick On Thu, Nov 8, 2012 at 9:49 PM, Jack Krupansky <j...@basetechnology.com>wrote: > The default setting should index BOTH "wi fi" and "wifi". Query for > "wi-fi", either with or without quotes will query for "wi fi". > Incidentally, that is known as "autoGeneratePhraseQueries". > > > -- Jack Krupansky > > -----Original Message----- From: johnmu...@aol.com > Sent: Thursday, November 08, 2012 6:20 PM > To: solr-user@lucene.apache.org > > Subject: Re: Questions about schema.xml > > > Thank you everyone for your explanation. So for WordDelimiterFilter, let > me see if I got it right. > > > Given that out-of-the box setting for catenateWords is "0" for query but > is "1" for index, then I don't see how this will give me any hits. That > is, if my document has "wi-fi", at index time it will be stored as "wifi". > Well, than at query time if I type "wi-fi" (without quotes) I will be > searching for "wi fi" and thus won't get a hit. no? > > > What about when I *do* quote my search, i.e.: I search for "wi-fi" with > quotes, now what am I sending to the searcher, "wi-fi", "wi fi" or "wifi"? > Again, this is using the default out-of-the box setting per the above. > > > The same applies for catenateNumbers. > > > Btw, I'm looking at this link for the above values: > http://wiki.apache.org/solr/**AnalyzersTokenizersTokenFilter**s<http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters> > > > --MJ > > > > > > -----Original Message----- > From: Erick Erickson <erickerick...@gmail.com> > To: solr-user <solr-user@lucene.apache.org> > Sent: Thu, Nov 8, 2012 6:57 pm > Subject: Re: Questions about schema.xml > > > And, in fact, you do NOT need to have two. If they are both identical, just > specify one analysis chain with no qualifier, i.e. > <analyzer> > > > On Thu, Nov 8, 2012 at 9:44 AM, Jack Krupansky <j...@basetechnology.com>** > wrote: > > Many token filters will be used 100% identically for both "index" and >> "query" analysis, but WordDelimiterFilter is a rare exception. The issue >> is >> that at index time it has the ability to generate multiple tokens at the >> same position (the "catenate" options), any of which can be queried, but >> at >> query time it can be problematic to have these "extra" terms (except in >> some conditions), so the WDF settings suppress generation of the extra >> terms. >> >> Another example is synonyms - generate extra terms at index time for >> greater precision of searches, but limit the query terms to exclude the >> "extra" terms. >> >> That's the reason for the occaassional asymmetry between index-time and >> query-time analyzers. >> >> -- Jack Krupansky >> >> -----Original Message----- From: johnmu...@aol.com >> Sent: Wednesday, November 07, 2012 7:13 PM >> To: solr-user@lucene.apache.org >> Subject: Questions about schema.xml >> >> >> >> HI, >> >> >> Can someone help me understand the meaning of <analyzer type="index"> and >> <analyzer type="query"> in schema.xml, how they are used and what do I get >> back when the values are not the same? >> >> >> For example, given: >> >> >> <fieldType name="text" class="solr.TextField" positionIncrementGap="100" >> autoGeneratePhraseQueries="****true"> >> <analyzer type="index"> >> <tokenizer class="solr.****WhitespaceTokenizerFactory"/> >> <filter class="solr.StopFilterFactory" ignoreCase="true" >> words="stopwords.txt" enablePositionIncrements="****true" /> >> <filter class="solr.****WordDelimiterFilterFactory" >> generateWordParts="1" generateNumberParts="1" catenateWords="1" >> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> >> <filter class="solr.****LowerCaseFilterFactory"/> >> <filter class="solr.****KeywordMarkerFilterFactory" >> protected="protwords.txt"/> >> <filter class="solr.****PorterStemFilterFactory"/> >> </analyzer> >> <analyzer type="query"> >> <tokenizer class="solr.****WhitespaceTokenizerFactory"/> >> <filter class="solr.****SynonymFilterFactory" >> synonyms="synonyms.txt" >> ignoreCase="true" expand="true"/> >> <filter class="solr.StopFilterFactory" ignoreCase="true" >> words="stopwords.txt" enablePositionIncrements="****true" /> >> <filter class="solr.****WordDelimiterFilterFactory" >> generateWordParts="1" generateNumberParts="1" catenateWords="0" >> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> >> <filter class="solr.****LowerCaseFilterFactory"/> >> <filter class="solr.****KeywordMarkerFilterFactory" >> protected="protwords.txt"/> >> <filter class="solr.****PorterStemFilterFactory"/> >> </analyzer> >> </fieldType> >> >> >> If I make the entire content of "index" the same as "query" (or the other >> way around) how will that impact my search? And why would I want to not >> make those two blocks the same? >> >> >> Thanks!!! >> >> >> -MJ >> >> > >