Hi there,

We are using AWS EMR as our big data processing cluster. We have like 3TB
of text files where each line denotes a json record which I want to be
indexed into Solr.

I have tried this by batching them and pushing to Solr index using
SolrJClient. But I feel thats really slow.

My doubt is 2 fold:

1. Is there a ready-to-use tool which can be used to create a Solr index
offline and store in say S3 or somewhere.
2. That offline solr index file if possible in (1), how can i push it to a
live Solr cluster?


I found this tool:
https://docs.cloudera.com/documentation/enterprise/latest/topics/search_mapreduceindexertool.html

but its really cumbersome to use and looks like at the time of creating
offline index you need to put in shard/schema information.

Some suggestions would be greatly appreciated.

-Vivek

Reply via email to