Hi there, We are using AWS EMR as our big data processing cluster. We have like 3TB of text files where each line denotes a json record which I want to be indexed into Solr.
I have tried this by batching them and pushing to Solr index using SolrJClient. But I feel thats really slow. My doubt is 2 fold: 1. Is there a ready-to-use tool which can be used to create a Solr index offline and store in say S3 or somewhere. 2. That offline solr index file if possible in (1), how can i push it to a live Solr cluster? I found this tool: https://docs.cloudera.com/documentation/enterprise/latest/topics/search_mapreduceindexertool.html but its really cumbersome to use and looks like at the time of creating offline index you need to put in shard/schema information. Some suggestions would be greatly appreciated. -Vivek