Note that 70,000 docs/second pretty much guarantees that there are multiple shards. Lots of shards.
But since you're using SolrJ, the very first thing I'd try would be to comment out the SolrClient.add(doclist) call so you're doing everything _except_ send the docs to Solr. That'll tell you whether there's any bottleneck on getting the docs from the system of record. The fact that you're pegging the CPUs argues that you are feeding Solr as fast as Solr can go so this is just a sanity check. But it's simple/fast. As far as what on Solr could be the bottleneck, no real way to know without profiling. But 300+ fields per doc probably just means you're doing a lot of processing, I'm not particularly hopeful you'll be able to speed things up without either more shards or simplifying your schema. Best, Erick On Mon, Mar 13, 2017 at 6:58 AM, Mahmoud Almokadem <prog.mahm...@gmail.com> wrote: > Hi great community, > > I have a SolrCloud with the following configuration: > > - 2 nodes (r3.2xlarge 61GB RAM) > - 4 shards. > - The producer can produce 13,000+ docs per second > - The schema contains about 300+ fields and the document size is about > 3KB. > - Using SolrJ and SolrCloudClient, each batch to solr contains 500 docs. > > When I start my bulk indexer program the CPU utilization is 100% on each > server but the rate of the indexer is about 1500 docs per second. > > I know that some solr benchmarks reached 70,000+ doc. per second. > > The question: What is the best way to determine the bottleneck on solr > indexing rate? > > Thanks, > Mahmoud