RE: Re: Solr Deduplication and Field Collpasing

2010-09-28 Thread Markus Jelsma
 Correction, Java heap size should be RAM buffer size if i'm not too mistaken.   -Original message- From: Markus Jelsma Sent: Wed 29-09-2010 01:17 To: solr-user@lucene.apache.org; Subject: RE: Re: Solr Deduplication and Field Collpasing If you can set the digest field for your

RE: Re: Solr Deduplication and Field Collpasing

2010-09-28 Thread Markus Jelsma
00:57 To: solr-user@lucene.apache.org; Subject: Re: Solr Deduplication and Field Collpasing I have the digest field already in the schema because the index is shared between nutch docs and others.  I do not know if the second approach is the quickest in my case. I can set the digest value to something

Re: Solr Deduplication and Field Collpasing

2010-09-28 Thread Nemani, Raj
date the digest field with the value from the corresponding I'd field using solr? Thanks Raj - Original Message - From: Markus Jelsma To: solr-user@lucene.apache.org Sent: Tue Sep 28 18:19:17 2010 Subject: RE: Solr Deduplication and Field Collpasing You could create a custom update p

RE: Solr Deduplication and Field Collpasing

2010-09-28 Thread Markus Jelsma
You could create a custom update processor that adds a digest field for newly added documents that do not have the digest field themselves. This way, the documents that are not added by Nutch get a proper non-empty digest field so the deduplication processor won't create the same empty hash and