I am trying to figure out how the synonym filter processes multi word inputs. I have checked the analyzer in the GUI with some confusing results. The indexed field has ³The North Face² as a value. The synonym file has
morthface, morth face, noethface, noeth face, norhtface, norht face, nortface, nort face, northfac, north fac, northfac3e, north fac3e, northface, north face, northfae, north fae, northfaqce, north faqce, northfave, north fave, northhace, north hace, nothface, noth face, thenorhface, the norh face, thenorth, the north, thenorthandface, the north and face, thenortheface, the northe face, thenorthfac, the north fac, thenorthface, thenorthfacee, the north facee, thenothface, the noth face, thenotrhface, the notrh face, thenrothface, the nroth face, tnf => The North Face I have the field type using the WhiteSpaceTokenizer before the synonyms are running. My confusion on this is when the term ³morth fac² is run somehow the system knows to map it to the correct term even though the term is not present in the file. How is this happening? Is the synonym process tokenzing as well? The datatype schema is as follows: <fieldType name="text" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </fieldType> -Jeff