I think don't this handle near duplicates which would require some of the methods mentioned recently on the Mahout list.
On Wed, Sep 23, 2009 at 2:59 AM, Shalin Shekhar Mangar <shalinman...@gmail.com> wrote: > On Wed, Sep 23, 2009 at 3:14 PM, Ninad Raut <hbase.user.ni...@gmail.com>wrote: > >> Hi, >> When we have news content crawled we face a problme of same content being >> repeated in many documents. We want to add a near duplicate document >> filter >> to detect such documents. Is there a way to do that in SOLR? >> > > Look at http://wiki.apache.org/solr/Deduplication > > -- > Regards, > Shalin Shekhar Mangar. >