Re: Why would one not use RemoveDuplicatesTokenFilterFactory?

Dotan Cohen Sun, 26 May 2013 09:56:16 -0700

On Fri, May 24, 2013 at 4:04 PM, Jack Krupansky <j...@basetechnology.com> wrote:
> The primary purpose of this filter is in conjunction with the
> KeywordRepeatFilterFactory and a stemmer, to remove the tokens that did not
> produce a stem from the original token, so the keyword duplicate is no
> longer needed. The goal is to index both the stemmed and unstemmed terms at
> the same position.
>
> Whether your app is using the filter for that purpose remains to be seen.
>
> Removing duplicates from the raw input token stream would impact the term
> frequency.
>
> -- Jack Krupansky
>


Thank you Jack. I thought that the filter only removed tokens with
both identical position and identical text:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.RemoveDuplicatesTokenFilterFactory

Are stemmed terms considered the same text as the original word, such
that they will show as a dupe fo the
RemoveDuplicatesTokenFilterFactory? That seems odd.

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com

Re: Why would one not use RemoveDuplicatesTokenFilterFactory?

Reply via email to