Hi Erick

Am I right to saywe need todo the combine of duplicate records into 1 before feeding it to Solr to index?

I am coming from Endecawhich support the combine of duplicate records into 1 recordduring indexing. Was wondering if Solr support this.

-Derek

On 3/18/2015 11:21 PM, Erick Erickson wrote:
I'd use SolrJ, pull the docs by productId order and combine records
with the same product ID into a single doc.

Here's a starter set for indexing form a DB with SolrJ. It has Tika
processing in it as well, but you can pull that out pretty easily.

https://lucidworks.com/blog/indexing-with-solrj/

Best,
Erick

On Wed, Mar 18, 2015 at 2:52 AM, Derek Poh <d...@globalsources.com> wrote:
Hi

If I have duplicaterecords in my source data (DB or delimited files). For
simplicity sake they are of the following nature

Product Id    Business Type
-----------------------------------
12345         Exporter
12345     Agent
12366     Manufacturer
12377         Exporter
12377 Distributor

There are other fields with multiple values as well.

How do I index theduplicate records into 1 document. Eg. Product Id 12345
will be 1 document,12366 as 1 document and 12377 as 1 document.

-Derek


Reply via email to