The primary purpose of this filter is in conjunction with the
KeywordRepeatFilterFactory and a stemmer, to remove the tokens that did not
produce a stem from the original token, so the keyword duplicate is no
longer needed. The goal is to index both the stemmed and unstemmed terms at
the same position.
Whether your app is using the filter for that purpose remains to be seen.
Removing duplicates from the raw input token stream would impact the term
frequency.
-- Jack Krupansky
-----Original Message-----
From: Dotan Cohen
Sent: Friday, May 24, 2013 3:03 AM
To: solr-user@lucene.apache.org
Subject: Why would one not use RemoveDuplicatesTokenFilterFactory?
I am looking through the schema of a Solr installation that I
inherited last year. The original dev, who is unavailable for comment,
has two types of text fields: one with
RemoveDuplicatesTokenFilterFactory and one without. These fields are
intended for full-text search.
Why would someone _not_ use RemoveDuplicatesTokenFilterFactory on a
field intended for full-text search? What are the drawbacks to using
it? This application is very, very write heavy (hundreds of writes per
minute) if that matters. It was running on websolr.com at the time,
I've now moved it to Amazon Web Services.
Thanks.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com