Re: Questions about schema.xml

Jack Krupansky Thu, 08 Nov 2012 18:50:42 -0800

The default setting should index BOTH "wi fi" and "wifi". Query for "wi-fi",either with or without quotes will query for "wi fi". Incidentally, that isknown as "autoGeneratePhraseQueries".


-- Jack Krupansky

-----Original Message-----From: johnmu...@aol.com

Sent: Thursday, November 08, 2012 6:20 PM
To: solr-user@lucene.apache.org
Subject: Re: Questions about schema.xml

Thank you everyone for your explanation. So for WordDelimiterFilter, let mesee if I got it right.

Given that out-of-the box setting for catenateWords is "0" for query but is"1" for index, then I don't see how this will give me any hits. That is, ifmy document has "wi-fi", at index time it will be stored as "wifi". Well,than at query time if I type "wi-fi" (without quotes) I will be searchingfor "wi fi" and thus won't get a hit. no?

What about when I *do* quote my search, i.e.: I search for "wi-fi" withquotes, now what am I sending to the searcher, "wi-fi", "wi fi" or "wifi"?Again, this is using the default out-of-the box setting per the above.



The same applies for catenateNumbers.

Btw, I'm looking at this link for the above values:http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters



--MJ





-----Original Message-----
From: Erick Erickson <erickerick...@gmail.com>
To: solr-user <solr-user@lucene.apache.org>
Sent: Thu, Nov 8, 2012 6:57 pm
Subject: Re: Questions about schema.xml


And, in fact, you do NOT need to have two. If they are both identical, just
specify one analysis chain with no qualifier, i.e.
<analyzer>

On Thu, Nov 8, 2012 at 9:44 AM, Jack Krupansky<j...@basetechnology.com>wrote:

Many token filters will be used 100% identically for both "index" and

"query" analysis, but WordDelimiterFilter is a rare exception. The issueis

that at index time it has the ability to generate multiple tokens at the

same position (the "catenate" options), any of which can be queried, butat

query time it can be problematic to have these "extra" terms (except in
some conditions), so the WDF settings suppress generation of the extra
terms.

Another example is synonyms - generate extra terms at index time for
greater precision of searches, but limit the query terms to exclude the
"extra" terms.

That's the reason for the occaassional asymmetry between index-time and
query-time analyzers.

-- Jack Krupansky

-----Original Message----- From: johnmu...@aol.com
Sent: Wednesday, November 07, 2012 7:13 PM
To: solr-user@lucene.apache.org
Subject: Questions about schema.xml



HI,


Can someone help me understand the meaning of <analyzer type="index"> and
<analyzer type="query"> in schema.xml, how they are used and what do I get
back when the values are not the same?


For example, given:


<fieldType name="text" class="solr.TextField" positionIncrementGap="100"
autoGeneratePhraseQueries="**true">
  <analyzer type="index">
     <tokenizer class="solr.**WhitespaceTokenizerFactory"/>
     <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="**true" />
     <filter class="solr.**WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
     <filter class="solr.**LowerCaseFilterFactory"/>
     <filter class="solr.**KeywordMarkerFilterFactory"
protected="protwords.txt"/>
     <filter class="solr.**PorterStemFilterFactory"/>
  </analyzer>
  <analyzer type="query">
     <tokenizer class="solr.**WhitespaceTokenizerFactory"/>
     <filter class="solr.**SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
     <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="**true" />
     <filter class="solr.**WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
     <filter class="solr.**LowerCaseFilterFactory"/>
     <filter class="solr.**KeywordMarkerFilterFactory"
protected="protwords.txt"/>
     <filter class="solr.**PorterStemFilterFactory"/>
  </analyzer>
</fieldType>


If I make the entire content of "index" the same as "query" (or the other
way around) how will that impact my search?  And why would I want to not
make those two blocks the same?


Thanks!!!


-MJ

Re: Questions about schema.xml

Reply via email to