Solr 1.4 (trunk) has a similar functionality.

http://wiki.apache.org/solr/Deduplication

On Fri, Apr 24, 2009 at 9:53 PM, Otis Gospodnetic <
otis_gospodne...@yahoo.com> wrote:

>
> Hi,
>
> Solr doesn't include such functionality.  But Nutch has:
>
> [o...@localhost src]$ ff \*Signature\*java
> ./test/org/apache/nutch/crawl/TestSignatureFactory.java
> ./java/org/apache/nutch/crawl/SignatureFactory.java
> ./java/org/apache/nutch/crawl/MD5Signature.java
> ./java/org/apache/nutch/crawl/Signature.java
> ./java/org/apache/nutch/crawl/TextProfileSignature.java
> ./java/org/apache/nutch/crawl/SignatureComparator.java
>
>  Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> ----- Original Message ----
> > From: aidahaj <aida...@gmail.com>
> > To: solr-user@lucene.apache.org
> > Sent: Friday, April 24, 2009 12:13:47 PM
> > Subject: Solr index
> >
> >
> > Hi ,
> > I'm using Nutch to crawl a list of web sites.
> > Solr is my index(Nutch-1.0 integration with solr).
> > I'm working on detecting web site defacement(if there's any changes in
> the
> > text of a web page).
> > I want to know if solr may give me the possibility to detect the changes
> in
> > the Documents in the indexe before commiting or a log file or something
> like
> > that(the text that has been changed between two points of time ).
> > I'm looking for your help. Thanks a lot.
> > --
> > View this message in context:
> > http://www.nabble.com/Solr-index-tp23219842p23219842.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>
>


-- 
Regards,
Shalin Shekhar Mangar.

Reply via email to