-Original message-
> From:Furkan KAMACI
> Sent: Sunday 22nd September 2013 21:15
> To: solr-user@lucene.apache.org
> Subject: Re: Near Duplicate Document Detection at Solr
>
> I've also know that there is another mechanism at Solr:
> http://wiki.apache.org/
I've also know that there is another mechanism at Solr:
http://wiki.apache.org/solr/Deduplication I think that I should add a
custom signature because that is the most usable one for me:
http://wiki.apache.org/solr/TextProfileSignature On the other hand are
there any limitation for deduplication at
I want to detect near duplicate documents (for web documents). I know that
there is an algorithm called Winnowing and there is another technique used
by Google. However I also know that Solr has a component called
MoreLikeThis. Google's page explains that *mirroring and plagiarism* is
easy to detec