Just use the query analysis link with appropriate values. It will show how
each filter factories and analyzers breaks the terms during various analysis
levels. Specially check EnglishPorterFilterFactory analysis




Jeff Newburn wrote:
> 
> I am trying to figure out how the synonym filter processes multi word
> inputs.  I have checked the analyzer in the GUI with some confusing
> results.
> The indexed field has ³The North Face² as a value. The synonym file has
> 
> morthface, morth face, noethface, noeth face, norhtface, norht face,
> nortface, nort face, northfac, north fac, northfac3e, north fac3e,
> northface, north face, northfae, north fae, northfaqce, north faqce,
> northfave, north fave, northhace, north hace, nothface, noth face,
> thenorhface, the norh face, thenorth, the north, thenorthandface, the
> north
> and face, thenortheface, the northe face, thenorthfac, the north fac,
> thenorthface, thenorthfacee, the north facee, thenothface, the noth face,
> thenotrhface, the notrh face, thenrothface, the nroth face, tnf => The
> North
> Face
> 
> I have the field type using the WhiteSpaceTokenizer before the synonyms
> are
> running.  My confusion on this is when the term ³morth fac² is run somehow
> the system knows to map it to the correct term even though the term is not
> present in the file.
> 
> How is this happening?  Is the synonym process tokenzing as well?
> 
> The datatype schema is as follows:
>        <fieldType name="text" class="solr.TextField"
> positionIncrementGap="100">
>            <analyzer>
>                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>                <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>                <filter class="solr.LowerCaseFilterFactory"/>
>                <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt"/>
>                <filter class="solr.SynonymFilterFactory"
> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
> 
>                <filter class="solr.EnglishPorterFilterFactory"
> protected="protwords.txt"/>
>                <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>            </analyzer>
>        </fieldType>
> 
> 
> -Jeff
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Multi-word-Synonym-tp20586702p20602482.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to