Re: Punctuation marks in documents prevent recognition of synonyms at indexing?

G.S.J. Lobbestael Sat, 26 Sep 2009 09:54:54 -0700

> > The wiki uses the example:
> > 
> >     <fieldtype name="syn"
> > class="solr.TextField">
> >       <analyzer>
> >           <tokenizer
> > class="solr.WhitespaceTokenizerFactory"/>
> >           <filter
> > class="solr.SynonymFilterFactory synonyms="syn.txt"
> > ignoreCase="true" expand="false"/>
> >       </analyzer>
> >     </fieldtype>
> > 
> > With "dog, canine" in syn.txt and a document with "I have a
> > dog, Bob.", "dog" is not seen as a synonym. With a document
> > "I have a dog Bob" it is.
> 
> Why not use StandardTokenizerFactory which removes punctuations?


You lose the WordDelimiterFilterFactory functionality:

Syn.txt has: ADC, HIV-dementie
Search on "ADC" doesn't find document with "HIV-dementie".

regards
geert

Re: Punctuation marks in documents prevent recognition of synonyms at indexing?

Reply via email to