Re: Order of applying tokens/filter

2020-10-06 Thread Walter Underwood
Synonyms only need to be done once. Generally, expand synonyms at index time only. Also, consider the StandardTokeniizer. It is a bit smarter and that can be useful. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Oct 5, 2020, at 9:08 PM, Jayadevan

Re: Order of applying tokens/filter

2020-10-05 Thread Jayadevan Maymala
> > ICUNormalizer2CharFilterFactory name=“nfkc_cf” (the default) > WhitespaceTokenizerFactory > SynonymGraphFilterFactory > FlattenGraphFilterFactory > KStemFilterFactory > RemoveDuplicatesFilterFactory > > One doubt related to this. Ideally, the same sequence should be followed for indexing and qu

Re: Order of applying tokens/filter

2020-10-05 Thread Jayadevan Maymala
> ICUNormalizer2CharFilterFactory name=“nfkc_cf” (the default) > WhitespaceTokenizerFactory > SynonymGraphFilterFactory > FlattenGraphFilterFactory > KStemFilterFactory > RemoveDuplicatesFilterFactory > > Thanks a lot. Very useful insights. Regards, Jayadevan

Re: Order of applying tokens/filter

2020-10-04 Thread Walter Underwood
Several problems. 1. Do not remove stopwords. That is a 1970s-era hack for saving disk space. Want to search for “vitamin a”? Better not remove stopwords. 2. Synonyms are before the stemmer, especially the Porter stemmer, where the output isn’t English words. 3. Use KStem instead of Porter. Port