Re: Indexing CPU performance

Mahmoud Almokadem Tue, 14 Mar 2017 02:42:08 -0700

Thanks Shalin,

I'm posting data to solr with SolrInputDocument using SolrJ.


According to the profiler, the com.codahale.metrics.Meter.mark is take much
processing than others as mentioned on this issue
https://issues.apache.org/jira/browse/SOLR-10130.

And I think the profiler of sematext is different than VisualVM.

Thanks for help,
Mahmoud



On Tue, Mar 14, 2017 at 11:08 AM, Shalin Shekhar Mangar <
[email protected]> wrote:

> According to the profiler output, a significant amount of cpu is being
> spent in JSON parsing but your previous email said that you use SolrJ.
> SolrJ uses the javabin binary format to send documents to Solr and it
> never ever uses JSON so there is definitely some other indexing
> process that you have not accounted for.
>
> On Tue, Mar 14, 2017 at 12:31 AM, Mahmoud Almokadem
> <[email protected]> wrote:
> > Thanks Erick,
> >
> > I've commented out the line SolrClient.add(doclist) and get 5500+ docs
> per
> > second from single producer.
> >
> > Regarding more shards, you mean use 2 nodes with 8 shards per node so we
> > got 16 shards on the same 2 nodes or spread shards over more nodes?
> >
> > I'm using solr 6.4.1 with zookeeper on the same nodes.
> >
> > Here's what I got from sematext profiler
> >
> > 51%
> > Thread.java:745java.lang.Thread#run
> >
> > 42%
> > QueuedThreadPool.java:589
> > org.eclipse.jetty.util.thread.QueuedThreadPool$2#run
> > Collapsed 29 calls (Expand)
> >
> > 43%
> > UpdateRequestHandler.java:97
> > org.apache.solr.handler.UpdateRequestHandler$1#load
> >
> > 30%
> > JsonLoader.java:78org.apache.solr.handler.loader.JsonLoader#load
> >
> > 30%
> > JsonLoader.java:115
> > org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader#load
> >
> > 13%
> > JavabinLoader.java:54org.apache.solr.handler.loader.JavabinLoader#load
> >
> > 9%
> > ThreadPoolExecutor.java:617
> > java.util.concurrent.ThreadPoolExecutor$Worker#run
> >
> > 9%
> > ThreadPoolExecutor.java:1142
> > java.util.concurrent.ThreadPoolExecutor#runWorker
> >
> > 33%
> > ConcurrentMergeScheduler.java:626
> > org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread#run
> >
> > 33%
> > ConcurrentMergeScheduler.java:588
> > org.apache.lucene.index.ConcurrentMergeScheduler#doMerge
> >
> > 33%
> > SolrIndexWriter.java:233org.apache.solr.update.SolrIndexWriter#merge
> >
> > 33%
> > IndexWriter.java:3920org.apache.lucene.index.IndexWriter#merge
> >
> > 33%
> > IndexWriter.java:4343org.apache.lucene.index.IndexWriter#mergeMiddle
> >
> > 20%
> > SegmentMerger.java:101org.apache.lucene.index.SegmentMerger#merge
> >
> > 11%
> > SegmentMerger.java:89org.apache.lucene.index.SegmentMerger#merge
> >
> > 2%
> > SegmentMerger.java:144org.apache.lucene.index.SegmentMerger#merge
> >
> >
> > On Mon, Mar 13, 2017 at 5:12 PM, Erick Erickson <[email protected]
> >
> > wrote:
> >
> >> Note that 70,000 docs/second pretty much guarantees that there are
> >> multiple shards. Lots of shards.
> >>
> >> But since you're using SolrJ, the  very first thing I'd try would be
> >> to comment out the SolrClient.add(doclist) call so you're doing
> >> everything _except_ send the docs to Solr. That'll tell you whether
> >> there's any bottleneck on getting the docs from the system of record.
> >> The fact that you're pegging the CPUs argues that you are feeding Solr
> >> as fast as Solr can go so this is just a sanity check. But it's
> >> simple/fast.
> >>
> >> As far as what on Solr could be the bottleneck, no real way to know
> >> without profiling. But 300+ fields per doc probably just means you're
> >> doing a lot of processing, I'm not particularly hopeful you'll be able
> >> to speed things up without either more shards or simplifying your
> >> schema.
> >>
> >> Best,
> >> Erick
> >>
> >> On Mon, Mar 13, 2017 at 6:58 AM, Mahmoud Almokadem
> >> <[email protected]> wrote:
> >> > Hi great community,
> >> >
> >> > I have a SolrCloud with the following configuration:
> >> >
> >> >    - 2 nodes (r3.2xlarge 61GB RAM)
> >> >    - 4 shards.
> >> >    - The producer can produce 13,000+ docs per second
> >> >    - The schema contains about 300+ fields and the document size is
> about
> >> >    3KB.
> >> >    - Using SolrJ and SolrCloudClient, each batch to solr contains 500
> >> docs.
> >> >
> >> > When I start my bulk indexer program the CPU utilization is 100% on
> each
> >> > server but the rate of the indexer is about 1500 docs per second.
> >> >
> >> > I know that some solr benchmarks reached 70,000+ doc. per second.
> >> >
> >> > The question: What is the best way to determine the bottleneck on solr
> >> > indexing rate?
> >> >
> >> > Thanks,
> >> > Mahmoud
> >>
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Re: Indexing CPU performance

Reply via email to