I'd use SolrJ, pull the docs by productId order and combine records with the same product ID into a single doc.
Here's a starter set for indexing form a DB with SolrJ. It has Tika processing in it as well, but you can pull that out pretty easily. https://lucidworks.com/blog/indexing-with-solrj/ Best, Erick On Wed, Mar 18, 2015 at 2:52 AM, Derek Poh <d...@globalsources.com> wrote: > Hi > > If I have duplicaterecords in my source data (DB or delimited files). For > simplicity sake they are of the following nature > > Product Id Business Type > ----------------------------------- > 12345 Exporter > 12345 Agent > 12366 Manufacturer > 12377 Exporter > 12377 Distributor > > There are other fields with multiple values as well. > > How do I index theduplicate records into 1 document. Eg. Product Id 12345 > will be 1 document,12366 as 1 document and 12377 as 1 document. > > -Derek