On Fri, May 24, 2013 at 4:04 PM, Jack Krupansky <j...@basetechnology.com> wrote: > The primary purpose of this filter is in conjunction with the > KeywordRepeatFilterFactory and a stemmer, to remove the tokens that did not > produce a stem from the original token, so the keyword duplicate is no > longer needed. The goal is to index both the stemmed and unstemmed terms at > the same position. > > Whether your app is using the filter for that purpose remains to be seen. > > Removing duplicates from the raw input token stream would impact the term > frequency. > > -- Jack Krupansky >
Thank you Jack. I thought that the filter only removed tokens with both identical position and identical text: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.RemoveDuplicatesTokenFilterFactory Are stemmed terms considered the same text as the original word, such that they will show as a dupe fo the RemoveDuplicatesTokenFilterFactory? That seems odd. -- Dotan Cohen http://gibberish.co.il http://what-is-what.com