Synonyms only need to be done once. Generally, expand synonyms at index time
only.
Also, consider the StandardTokeniizer. It is a bit smarter and that can be
useful.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Oct 5, 2020, at 9:08 PM, Jayadevan
>
> ICUNormalizer2CharFilterFactory name=“nfkc_cf” (the default)
> WhitespaceTokenizerFactory
> SynonymGraphFilterFactory
> FlattenGraphFilterFactory
> KStemFilterFactory
> RemoveDuplicatesFilterFactory
>
> One doubt related to this. Ideally, the same sequence should be followed
for indexing and qu
> ICUNormalizer2CharFilterFactory name=“nfkc_cf” (the default)
> WhitespaceTokenizerFactory
> SynonymGraphFilterFactory
> FlattenGraphFilterFactory
> KStemFilterFactory
> RemoveDuplicatesFilterFactory
>
> Thanks a lot. Very useful insights.
Regards,
Jayadevan
Several problems.
1. Do not remove stopwords. That is a 1970s-era hack for saving disk space.
Want to search for “vitamin a”? Better not remove stopwords.
2. Synonyms are before the stemmer, especially the Porter stemmer, where the
output isn’t English words.
3. Use KStem instead of Porter. Port