Indexing rates scale pretty linearly with the number of shards, so one
way to increase throughput is to simply create a collection with
more shards. For the initial bulk-indexing operations, you can
go with a 1-replica-per-shard scenario then ADDREPLICA if you need
to build things out.
However… t
Hi there,
We are using AWS EMR as our big data processing cluster. We have like 3TB
of text files where each line denotes a json record which I want to be
indexed into Solr.
I have tried this by batching them and pushing to Solr index using
SolrJClient. But I feel thats really slow.
My doubt is