Re: Why would one not use RemoveDuplicatesTokenFilterFactory?

Jack Krupansky Fri, 24 May 2013 06:05:28 -0700

The primary purpose of this filter is in conjunction with theKeywordRepeatFilterFactory and a stemmer, to remove the tokens that did notproduce a stem from the original token, so the keyword duplicate is nolonger needed. The goal is to index both the stemmed and unstemmed terms atthe same position.


Whether your app is using the filter for that purpose remains to be seen.

Removing duplicates from the raw input token stream would impact the termfrequency.


-- Jack Krupansky

-----Original Message-----From: Dotan Cohen

Sent: Friday, May 24, 2013 3:03 AM
To: solr-user@lucene.apache.org
Subject: Why would one not use RemoveDuplicatesTokenFilterFactory?

I am looking through the schema of a Solr installation that I
inherited last year. The original dev, who is unavailable for comment,
has two types of text fields: one with
RemoveDuplicatesTokenFilterFactory and one without. These fields are
intended for full-text search.

Why would someone _not_ use RemoveDuplicatesTokenFilterFactory on a
field intended for full-text search? What are the drawbacks to using
it? This application is very, very write heavy (hundreds of writes per
minute) if that matters. It was running on websolr.com at the time,
I've now moved it to Amazon Web Services.

Thanks.

--
Dotan Cohen

http://gibberish.co.il

http://what-is-what.com

Re: Why would one not use RemoveDuplicatesTokenFilterFactory?

Reply via email to