I think don't this handle near duplicates which would require some of
the methods mentioned recently on the Mahout list.

On Wed, Sep 23, 2009 at 2:59 AM, Shalin Shekhar Mangar
<shalinman...@gmail.com> wrote:
> On Wed, Sep 23, 2009 at 3:14 PM, Ninad Raut <hbase.user.ni...@gmail.com>wrote:
>
>> Hi,
>> When we have news content crawled we face a problme of same content being
>> repeated in many documents.  We want to add a near duplicate document
>> filter
>> to detect such documents. Is there a way to do that in SOLR?
>>
>
> Look at http://wiki.apache.org/solr/Deduplication
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Reply via email to