RE: Re: Solr Deduplication and Field Collpasing

2010-09-28 Thread Markus Jelsma
 Correction, Java heap size should be RAM buffer size if i'm not too mistaken.   -Original message- From: Markus Jelsma Sent: Wed 29-09-2010 01:17 To: solr-user@lucene.apache.org; Subject: RE: Re: Solr Deduplication and Field Collpasing If you can set the digest field for your

RE: Re: Solr Deduplication and Field Collpasing

2010-09-28 Thread Markus Jelsma
00:57 To: solr-user@lucene.apache.org; Subject: Re: Solr Deduplication and Field Collpasing I have the digest field already in the schema because the index is shared between nutch docs and others.  I do not know if the second approach is the quickest in my case. I can set the digest value to something

Re: Solr Deduplication and Field Collpasing

2010-09-28 Thread Nemani, Raj
date the digest field with the value from the corresponding I'd field using solr? Thanks Raj - Original Message - From: Markus Jelsma To: solr-user@lucene.apache.org Sent: Tue Sep 28 18:19:17 2010 Subject: RE: Solr Deduplication and Field Collpasing You could create a custom update p

RE: Solr Deduplication and Field Collpasing

2010-09-28 Thread Markus Jelsma
mani, Raj Sent: Tue 28-09-2010 23:28 To: solr-user@lucene.apache.org; Subject: Solr Deduplication and Field Collpasing All, I have setup Nutch to submit the crawl results to Solr index.  I have some duplicates in the documents generated by the Nutch crawl.  There is filed 'digest' tha

Solr Deduplication and Field Collpasing

2010-09-28 Thread Nemani, Raj
All, I have setup Nutch to submit the crawl results to Solr index. I have some duplicates in the documents generated by the Nutch crawl. There is filed 'digest' that Nutch generates that is same for those documents that are duplicates. While setting up the the dedupe processor in the Solr co