I am trying to figure out how the synonym filter processes multi word
inputs.  I have checked the analyzer in the GUI with some confusing results.
The indexed field has ³The North Face² as a value. The synonym file has

morthface, morth face, noethface, noeth face, norhtface, norht face,
nortface, nort face, northfac, north fac, northfac3e, north fac3e,
northface, north face, northfae, north fae, northfaqce, north faqce,
northfave, north fave, northhace, north hace, nothface, noth face,
thenorhface, the norh face, thenorth, the north, thenorthandface, the north
and face, thenortheface, the northe face, thenorthfac, the north fac,
thenorthface, thenorthfacee, the north facee, thenothface, the noth face,
thenotrhface, the notrh face, thenrothface, the nroth face, tnf => The North
Face

I have the field type using the WhiteSpaceTokenizer before the synonyms are
running.  My confusion on this is when the term ³morth fac² is run somehow
the system knows to map it to the correct term even though the term is not
present in the file.

How is this happening?  Is the synonym process tokenzing as well?

The datatype schema is as follows:
       <fieldType name="text" class="solr.TextField"
positionIncrementGap="100">
           <analyzer>
               <tokenizer class="solr.WhitespaceTokenizerFactory"/>
               <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
               <filter class="solr.LowerCaseFilterFactory"/>
               <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
               <filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="true"/>

               <filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>
               <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
           </analyzer>
       </fieldType>


-Jeff

Reply via email to