Thanks Shalin, I'm posting data to solr with SolrInputDocument using SolrJ.
According to the profiler, the com.codahale.metrics.Meter.mark is take much processing than others as mentioned on this issue https://issues.apache.org/jira/browse/SOLR-10130. And I think the profiler of sematext is different than VisualVM. Thanks for help, Mahmoud On Tue, Mar 14, 2017 at 11:08 AM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > According to the profiler output, a significant amount of cpu is being > spent in JSON parsing but your previous email said that you use SolrJ. > SolrJ uses the javabin binary format to send documents to Solr and it > never ever uses JSON so there is definitely some other indexing > process that you have not accounted for. > > On Tue, Mar 14, 2017 at 12:31 AM, Mahmoud Almokadem > <prog.mahm...@gmail.com> wrote: > > Thanks Erick, > > > > I've commented out the line SolrClient.add(doclist) and get 5500+ docs > per > > second from single producer. > > > > Regarding more shards, you mean use 2 nodes with 8 shards per node so we > > got 16 shards on the same 2 nodes or spread shards over more nodes? > > > > I'm using solr 6.4.1 with zookeeper on the same nodes. > > > > Here's what I got from sematext profiler > > > > 51% > > Thread.java:745java.lang.Thread#run > > > > 42% > > QueuedThreadPool.java:589 > > org.eclipse.jetty.util.thread.QueuedThreadPool$2#run > > Collapsed 29 calls (Expand) > > > > 43% > > UpdateRequestHandler.java:97 > > org.apache.solr.handler.UpdateRequestHandler$1#load > > > > 30% > > JsonLoader.java:78org.apache.solr.handler.loader.JsonLoader#load > > > > 30% > > JsonLoader.java:115 > > org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader#load > > > > 13% > > JavabinLoader.java:54org.apache.solr.handler.loader.JavabinLoader#load > > > > 9% > > ThreadPoolExecutor.java:617 > > java.util.concurrent.ThreadPoolExecutor$Worker#run > > > > 9% > > ThreadPoolExecutor.java:1142 > > java.util.concurrent.ThreadPoolExecutor#runWorker > > > > 33% > > ConcurrentMergeScheduler.java:626 > > org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread#run > > > > 33% > > ConcurrentMergeScheduler.java:588 > > org.apache.lucene.index.ConcurrentMergeScheduler#doMerge > > > > 33% > > SolrIndexWriter.java:233org.apache.solr.update.SolrIndexWriter#merge > > > > 33% > > IndexWriter.java:3920org.apache.lucene.index.IndexWriter#merge > > > > 33% > > IndexWriter.java:4343org.apache.lucene.index.IndexWriter#mergeMiddle > > > > 20% > > SegmentMerger.java:101org.apache.lucene.index.SegmentMerger#merge > > > > 11% > > SegmentMerger.java:89org.apache.lucene.index.SegmentMerger#merge > > > > 2% > > SegmentMerger.java:144org.apache.lucene.index.SegmentMerger#merge > > > > > > On Mon, Mar 13, 2017 at 5:12 PM, Erick Erickson <erickerick...@gmail.com > > > > wrote: > > > >> Note that 70,000 docs/second pretty much guarantees that there are > >> multiple shards. Lots of shards. > >> > >> But since you're using SolrJ, the very first thing I'd try would be > >> to comment out the SolrClient.add(doclist) call so you're doing > >> everything _except_ send the docs to Solr. That'll tell you whether > >> there's any bottleneck on getting the docs from the system of record. > >> The fact that you're pegging the CPUs argues that you are feeding Solr > >> as fast as Solr can go so this is just a sanity check. But it's > >> simple/fast. > >> > >> As far as what on Solr could be the bottleneck, no real way to know > >> without profiling. But 300+ fields per doc probably just means you're > >> doing a lot of processing, I'm not particularly hopeful you'll be able > >> to speed things up without either more shards or simplifying your > >> schema. > >> > >> Best, > >> Erick > >> > >> On Mon, Mar 13, 2017 at 6:58 AM, Mahmoud Almokadem > >> <prog.mahm...@gmail.com> wrote: > >> > Hi great community, > >> > > >> > I have a SolrCloud with the following configuration: > >> > > >> > - 2 nodes (r3.2xlarge 61GB RAM) > >> > - 4 shards. > >> > - The producer can produce 13,000+ docs per second > >> > - The schema contains about 300+ fields and the document size is > about > >> > 3KB. > >> > - Using SolrJ and SolrCloudClient, each batch to solr contains 500 > >> docs. > >> > > >> > When I start my bulk indexer program the CPU utilization is 100% on > each > >> > server but the rate of the indexer is about 1500 docs per second. > >> > > >> > I know that some solr benchmarks reached 70,000+ doc. per second. > >> > > >> > The question: What is the best way to determine the bottleneck on solr > >> > indexing rate? > >> > > >> > Thanks, > >> > Mahmoud > >> > > > > -- > Regards, > Shalin Shekhar Mangar. >