Thanks Erick, I've commented out the line SolrClient.add(doclist) and get 5500+ docs per second from single producer.
Regarding more shards, you mean use 2 nodes with 8 shards per node so we got 16 shards on the same 2 nodes or spread shards over more nodes? I'm using solr 6.4.1 with zookeeper on the same nodes. Here's what I got from sematext profiler 51% Thread.java:745java.lang.Thread#run 42% QueuedThreadPool.java:589 org.eclipse.jetty.util.thread.QueuedThreadPool$2#run Collapsed 29 calls (Expand) 43% UpdateRequestHandler.java:97 org.apache.solr.handler.UpdateRequestHandler$1#load 30% JsonLoader.java:78org.apache.solr.handler.loader.JsonLoader#load 30% JsonLoader.java:115 org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader#load 13% JavabinLoader.java:54org.apache.solr.handler.loader.JavabinLoader#load 9% ThreadPoolExecutor.java:617 java.util.concurrent.ThreadPoolExecutor$Worker#run 9% ThreadPoolExecutor.java:1142 java.util.concurrent.ThreadPoolExecutor#runWorker 33% ConcurrentMergeScheduler.java:626 org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread#run 33% ConcurrentMergeScheduler.java:588 org.apache.lucene.index.ConcurrentMergeScheduler#doMerge 33% SolrIndexWriter.java:233org.apache.solr.update.SolrIndexWriter#merge 33% IndexWriter.java:3920org.apache.lucene.index.IndexWriter#merge 33% IndexWriter.java:4343org.apache.lucene.index.IndexWriter#mergeMiddle 20% SegmentMerger.java:101org.apache.lucene.index.SegmentMerger#merge 11% SegmentMerger.java:89org.apache.lucene.index.SegmentMerger#merge 2% SegmentMerger.java:144org.apache.lucene.index.SegmentMerger#merge On Mon, Mar 13, 2017 at 5:12 PM, Erick Erickson <erickerick...@gmail.com> wrote: > Note that 70,000 docs/second pretty much guarantees that there are > multiple shards. Lots of shards. > > But since you're using SolrJ, the very first thing I'd try would be > to comment out the SolrClient.add(doclist) call so you're doing > everything _except_ send the docs to Solr. That'll tell you whether > there's any bottleneck on getting the docs from the system of record. > The fact that you're pegging the CPUs argues that you are feeding Solr > as fast as Solr can go so this is just a sanity check. But it's > simple/fast. > > As far as what on Solr could be the bottleneck, no real way to know > without profiling. But 300+ fields per doc probably just means you're > doing a lot of processing, I'm not particularly hopeful you'll be able > to speed things up without either more shards or simplifying your > schema. > > Best, > Erick > > On Mon, Mar 13, 2017 at 6:58 AM, Mahmoud Almokadem > <prog.mahm...@gmail.com> wrote: > > Hi great community, > > > > I have a SolrCloud with the following configuration: > > > > - 2 nodes (r3.2xlarge 61GB RAM) > > - 4 shards. > > - The producer can produce 13,000+ docs per second > > - The schema contains about 300+ fields and the document size is about > > 3KB. > > - Using SolrJ and SolrCloudClient, each batch to solr contains 500 > docs. > > > > When I start my bulk indexer program the CPU utilization is 100% on each > > server but the rate of the indexer is about 1500 docs per second. > > > > I know that some solr benchmarks reached 70,000+ doc. per second. > > > > The question: What is the best way to determine the bottleneck on solr > > indexing rate? > > > > Thanks, > > Mahmoud >