Re: Why would one not use RemoveDuplicatesTokenFilterFactory?

Jack Krupansky Sun, 26 May 2013 10:16:34 -0700

The only comment I was trying to make here is the relationship between theRemoveDuplicatesTokenFilterFactory and the KeywordRepeatFilterFactory.

No, stemmed terms are not considered the same text as the original word. Bydefinition, they are a new value for the term text.


-- Jack Krupansky

-----Original Message-----From: Dotan Cohen

Sent: Sunday, May 26, 2013 12:55 PM
To: solr-user@lucene.apache.org
Subject: Re: Why would one not use RemoveDuplicatesTokenFilterFactory?

On Fri, May 24, 2013 at 4:04 PM, Jack Krupansky <j...@basetechnology.com>wrote:

The primary purpose of this filter is in conjunction with the
KeywordRepeatFilterFactory and a stemmer, to remove the tokens that didnot
produce a stem from the original token, so the keyword duplicate is no
longer needed. The goal is to index both the stemmed and unstemmed termsat
the same position.

Whether your app is using the filter for that purpose remains to be seen.

Removing duplicates from the raw input token stream would impact the term
frequency.

-- Jack Krupansky


Thank you Jack. I thought that the filter only removed tokens with
both identical position and identical text:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.RemoveDuplicatesTokenFilterFactory

Are stemmed terms considered the same text as the original word, such
that they will show as a dupe fo the
RemoveDuplicatesTokenFilterFactory? That seems odd.

--
Dotan Cohen

http://gibberish.co.il

http://what-is-what.com

Re: Why would one not use RemoveDuplicatesTokenFilterFactory?

Reply via email to