Solr 1.4 (trunk) has a similar functionality. http://wiki.apache.org/solr/Deduplication
On Fri, Apr 24, 2009 at 9:53 PM, Otis Gospodnetic < otis_gospodne...@yahoo.com> wrote: > > Hi, > > Solr doesn't include such functionality. But Nutch has: > > [o...@localhost src]$ ff \*Signature\*java > ./test/org/apache/nutch/crawl/TestSignatureFactory.java > ./java/org/apache/nutch/crawl/SignatureFactory.java > ./java/org/apache/nutch/crawl/MD5Signature.java > ./java/org/apache/nutch/crawl/Signature.java > ./java/org/apache/nutch/crawl/TextProfileSignature.java > ./java/org/apache/nutch/crawl/SignatureComparator.java > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > ----- Original Message ---- > > From: aidahaj <aida...@gmail.com> > > To: solr-user@lucene.apache.org > > Sent: Friday, April 24, 2009 12:13:47 PM > > Subject: Solr index > > > > > > Hi , > > I'm using Nutch to crawl a list of web sites. > > Solr is my index(Nutch-1.0 integration with solr). > > I'm working on detecting web site defacement(if there's any changes in > the > > text of a web page). > > I want to know if solr may give me the possibility to detect the changes > in > > the Documents in the indexe before commiting or a log file or something > like > > that(the text that has been changed between two points of time ). > > I'm looking for your help. Thanks a lot. > > -- > > View this message in context: > > http://www.nabble.com/Solr-index-tp23219842p23219842.html > > Sent from the Solr - User mailing list archive at Nabble.com. > > -- Regards, Shalin Shekhar Mangar.