Re: index duplicate records from data source into 1 document

Erick Erickson Wed, 18 Mar 2015 08:25:05 -0700

I'd use SolrJ, pull the docs by productId order and combine records
with the same product ID into a single doc.


Here's a starter set for indexing form a DB with SolrJ. It has Tika
processing in it as well, but you can pull that out pretty easily.

https://lucidworks.com/blog/indexing-with-solrj/

Best,
Erick

On Wed, Mar 18, 2015 at 2:52 AM, Derek Poh <d...@globalsources.com> wrote:
> Hi
>
> If I have duplicaterecords in my source data (DB or delimited files). For
> simplicity sake they are of the following nature
>
> Product Id    Business Type
> -----------------------------------
> 12345         Exporter
> 12345     Agent
> 12366     Manufacturer
> 12377         Exporter
> 12377 Distributor
>
> There are other fields with multiple values as well.
>
> How do I index theduplicate records into 1 document. Eg. Product Id 12345
> will be 1 document,12366 as 1 document and 12377 as 1 document.
>
> -Derek

Re: index duplicate records from data source into 1 document

Reply via email to