Here is the profiler screenshot from VisualVM after upgrading https://drive.google.com/open?id=0BwLcshoSCVcddldVRTExaDR2dzg
the jetty is taking the most time on CPU. Does this mean, the jetty is the bottleneck on indexing? Thanks, Mahmoud On Tue, Mar 14, 2017 at 11:41 AM, Mahmoud Almokadem <prog.mahm...@gmail.com> wrote: > Thanks Shalin, > > I'm posting data to solr with SolrInputDocument using SolrJ. > > According to the profiler, the com.codahale.metrics.Meter.mark is take > much processing than others as mentioned on this issue > https://issues.apache.org/jira/browse/SOLR-10130. > > And I think the profiler of sematext is different than VisualVM. > > Thanks for help, > Mahmoud > > > > On Tue, Mar 14, 2017 at 11:08 AM, Shalin Shekhar Mangar < > shalinman...@gmail.com> wrote: > >> According to the profiler output, a significant amount of cpu is being >> spent in JSON parsing but your previous email said that you use SolrJ. >> SolrJ uses the javabin binary format to send documents to Solr and it >> never ever uses JSON so there is definitely some other indexing >> process that you have not accounted for. >> >> On Tue, Mar 14, 2017 at 12:31 AM, Mahmoud Almokadem >> <prog.mahm...@gmail.com> wrote: >> > Thanks Erick, >> > >> > I've commented out the line SolrClient.add(doclist) and get 5500+ docs >> per >> > second from single producer. >> > >> > Regarding more shards, you mean use 2 nodes with 8 shards per node so we >> > got 16 shards on the same 2 nodes or spread shards over more nodes? >> > >> > I'm using solr 6.4.1 with zookeeper on the same nodes. >> > >> > Here's what I got from sematext profiler >> > >> > 51% >> > Thread.java:745java.lang.Thread#run >> > >> > 42% >> > QueuedThreadPool.java:589 >> > org.eclipse.jetty.util.thread.QueuedThreadPool$2#run >> > Collapsed 29 calls (Expand) >> > >> > 43% >> > UpdateRequestHandler.java:97 >> > org.apache.solr.handler.UpdateRequestHandler$1#load >> > >> > 30% >> > JsonLoader.java:78org.apache.solr.handler.loader.JsonLoader#load >> > >> > 30% >> > JsonLoader.java:115 >> > org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader#load >> > >> > 13% >> > JavabinLoader.java:54org.apache.solr.handler.loader.JavabinLoader#load >> > >> > 9% >> > ThreadPoolExecutor.java:617 >> > java.util.concurrent.ThreadPoolExecutor$Worker#run >> > >> > 9% >> > ThreadPoolExecutor.java:1142 >> > java.util.concurrent.ThreadPoolExecutor#runWorker >> > >> > 33% >> > ConcurrentMergeScheduler.java:626 >> > org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread#run >> > >> > 33% >> > ConcurrentMergeScheduler.java:588 >> > org.apache.lucene.index.ConcurrentMergeScheduler#doMerge >> > >> > 33% >> > SolrIndexWriter.java:233org.apache.solr.update.SolrIndexWriter#merge >> > >> > 33% >> > IndexWriter.java:3920org.apache.lucene.index.IndexWriter#merge >> > >> > 33% >> > IndexWriter.java:4343org.apache.lucene.index.IndexWriter#mergeMiddle >> > >> > 20% >> > SegmentMerger.java:101org.apache.lucene.index.SegmentMerger#merge >> > >> > 11% >> > SegmentMerger.java:89org.apache.lucene.index.SegmentMerger#merge >> > >> > 2% >> > SegmentMerger.java:144org.apache.lucene.index.SegmentMerger#merge >> > >> > >> > On Mon, Mar 13, 2017 at 5:12 PM, Erick Erickson < >> erickerick...@gmail.com> >> > wrote: >> > >> >> Note that 70,000 docs/second pretty much guarantees that there are >> >> multiple shards. Lots of shards. >> >> >> >> But since you're using SolrJ, the very first thing I'd try would be >> >> to comment out the SolrClient.add(doclist) call so you're doing >> >> everything _except_ send the docs to Solr. That'll tell you whether >> >> there's any bottleneck on getting the docs from the system of record. >> >> The fact that you're pegging the CPUs argues that you are feeding Solr >> >> as fast as Solr can go so this is just a sanity check. But it's >> >> simple/fast. >> >> >> >> As far as what on Solr could be the bottleneck, no real way to know >> >> without profiling. But 300+ fields per doc probably just means you're >> >> doing a lot of processing, I'm not particularly hopeful you'll be able >> >> to speed things up without either more shards or simplifying your >> >> schema. >> >> >> >> Best, >> >> Erick >> >> >> >> On Mon, Mar 13, 2017 at 6:58 AM, Mahmoud Almokadem >> >> <prog.mahm...@gmail.com> wrote: >> >> > Hi great community, >> >> > >> >> > I have a SolrCloud with the following configuration: >> >> > >> >> > - 2 nodes (r3.2xlarge 61GB RAM) >> >> > - 4 shards. >> >> > - The producer can produce 13,000+ docs per second >> >> > - The schema contains about 300+ fields and the document size is >> about >> >> > 3KB. >> >> > - Using SolrJ and SolrCloudClient, each batch to solr contains 500 >> >> docs. >> >> > >> >> > When I start my bulk indexer program the CPU utilization is 100% on >> each >> >> > server but the rate of the indexer is about 1500 docs per second. >> >> > >> >> > I know that some solr benchmarks reached 70,000+ doc. per second. >> >> > >> >> > The question: What is the best way to determine the bottleneck on >> solr >> >> > indexing rate? >> >> > >> >> > Thanks, >> >> > Mahmoud >> >> >> >> >> >> -- >> Regards, >> Shalin Shekhar Mangar. >> > >