Hi Erick
Am I right to saywe need todo the combine of duplicate records into 1
before feeding it to Solr to index?
I am coming from Endecawhich support the combine of duplicate records
into 1 recordduring indexing. Was wondering if Solr support this.
-Derek
On 3/18/2015 11:21 PM, Erick Erickson wrote:
I'd use SolrJ, pull the docs by productId order and combine records
with the same product ID into a single doc.
Here's a starter set for indexing form a DB with SolrJ. It has Tika
processing in it as well, but you can pull that out pretty easily.
https://lucidworks.com/blog/indexing-with-solrj/
Best,
Erick
On Wed, Mar 18, 2015 at 2:52 AM, Derek Poh <d...@globalsources.com> wrote:
Hi
If I have duplicaterecords in my source data (DB or delimited files). For
simplicity sake they are of the following nature
Product Id Business Type
-----------------------------------
12345 Exporter
12345 Agent
12366 Manufacturer
12377 Exporter
12377 Distributor
There are other fields with multiple values as well.
How do I index theduplicate records into 1 document. Eg. Product Id 12345
will be 1 document,12366 as 1 document and 12377 as 1 document.
-Derek