Re: index duplicate records from data source into 1 document

2015-03-20 Thread Shawn Heisey
On 3/20/2015 4:03 AM, Toke Eskildsen wrote: > On Thu, 2015-03-19 at 15:44 +0100, Shawn Heisey wrote: >> You could in theory write a custom UpdateRequestProcessor that looks for >> the previous document and merges it in whatever way you desire, so the >> combined information is what will be indexed,

Re: index duplicate records from data source into 1 document

2015-03-20 Thread Toke Eskildsen
On Thu, 2015-03-19 at 15:44 +0100, Shawn Heisey wrote: > You could in theory write a custom UpdateRequestProcessor that looks for > the previous document and merges it in whatever way you desire, so the > combined information is what will be indexed, and configure Solr to use > that update processo

Re: index duplicate records from data source into 1 document

2015-03-19 Thread Derek Poh
Oh that is how Solr works... On 3/19/2015 10:44 PM, Shawn Heisey wrote: On 3/19/2015 2:09 AM, Derek Poh wrote: Am I right to saywe need todo the combine of duplicate records into 1 before feeding it to Solr to index? I am coming from Endecawhich support the combine of duplicate records into 1

Re: index duplicate records from data source into 1 document

2015-03-19 Thread Erick Erickson
bq: Am I right to saywe need todo the combine of duplicate records into 1 before feeding it to Solr to index? That's what I'd do. As Shawn says, if you simply fire them both at Solr the more recent one will replace the older one. Best, Erick On Thu, Mar 19, 2015 at 7:44 AM, Shawn Heisey wrote:

Re: index duplicate records from data source into 1 document

2015-03-19 Thread Shawn Heisey
On 3/19/2015 2:09 AM, Derek Poh wrote: > Am I right to saywe need todo the combine of duplicate records into 1 > before feeding it to Solr to index? > > I am coming from Endecawhich support the combine of duplicate records > into 1 recordduring indexing. Was wondering if Solr support this. If you

Re: index duplicate records from data source into 1 document

2015-03-19 Thread Derek Poh
Hi Erick Am I right to saywe need todo the combine of duplicate records into 1 before feeding it to Solr to index? I am coming from Endecawhich support the combine of duplicate records into 1 recordduring indexing. Was wondering if Solr support this. -Derek On 3/18/2015 11:21 PM, Erick Eri

Re: index duplicate records from data source into 1 document

2015-03-18 Thread Erick Erickson
I'd use SolrJ, pull the docs by productId order and combine records with the same product ID into a single doc. Here's a starter set for indexing form a DB with SolrJ. It has Tika processing in it as well, but you can pull that out pretty easily. https://lucidworks.com/blog/indexing-with-solrj/

index duplicate records from data source into 1 document

2015-03-18 Thread Derek Poh
Hi If I have duplicaterecords in my source data (DB or delimited files). For simplicity sake they are of the following nature Product IdBusiness Type --- 12345 Exporter 12345 Agent 12366 Manufacturer 12377 Exporter 12377 Distributor