Re: Indexing CPU performance

Mahmoud Almokadem Mon, 13 Mar 2017 12:06:45 -0700

Thanks Erick,

I've commented out the line SolrClient.add(doclist) and get 5500+ docs per
second from single producer.


Regarding more shards, you mean use 2 nodes with 8 shards per node so we
got 16 shards on the same 2 nodes or spread shards over more nodes?

I'm using solr 6.4.1 with zookeeper on the same nodes.

Here's what I got from sematext profiler

51%
Thread.java:745java.lang.Thread#run

42%
QueuedThreadPool.java:589
org.eclipse.jetty.util.thread.QueuedThreadPool$2#run
Collapsed 29 calls (Expand)

43%
UpdateRequestHandler.java:97
org.apache.solr.handler.UpdateRequestHandler$1#load

30%
JsonLoader.java:78org.apache.solr.handler.loader.JsonLoader#load

30%
JsonLoader.java:115
org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader#load

13%
JavabinLoader.java:54org.apache.solr.handler.loader.JavabinLoader#load

9%
ThreadPoolExecutor.java:617
java.util.concurrent.ThreadPoolExecutor$Worker#run

9%
ThreadPoolExecutor.java:1142
java.util.concurrent.ThreadPoolExecutor#runWorker

33%
ConcurrentMergeScheduler.java:626
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread#run

33%
ConcurrentMergeScheduler.java:588
org.apache.lucene.index.ConcurrentMergeScheduler#doMerge

33%
SolrIndexWriter.java:233org.apache.solr.update.SolrIndexWriter#merge

33%
IndexWriter.java:3920org.apache.lucene.index.IndexWriter#merge

33%
IndexWriter.java:4343org.apache.lucene.index.IndexWriter#mergeMiddle

20%
SegmentMerger.java:101org.apache.lucene.index.SegmentMerger#merge

11%
SegmentMerger.java:89org.apache.lucene.index.SegmentMerger#merge

2%
SegmentMerger.java:144org.apache.lucene.index.SegmentMerger#merge


On Mon, Mar 13, 2017 at 5:12 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> Note that 70,000 docs/second pretty much guarantees that there are
> multiple shards. Lots of shards.
>
> But since you're using SolrJ, the  very first thing I'd try would be
> to comment out the SolrClient.add(doclist) call so you're doing
> everything _except_ send the docs to Solr. That'll tell you whether
> there's any bottleneck on getting the docs from the system of record.
> The fact that you're pegging the CPUs argues that you are feeding Solr
> as fast as Solr can go so this is just a sanity check. But it's
> simple/fast.
>
> As far as what on Solr could be the bottleneck, no real way to know
> without profiling. But 300+ fields per doc probably just means you're
> doing a lot of processing, I'm not particularly hopeful you'll be able
> to speed things up without either more shards or simplifying your
> schema.
>
> Best,
> Erick
>
> On Mon, Mar 13, 2017 at 6:58 AM, Mahmoud Almokadem
> <prog.mahm...@gmail.com> wrote:
> > Hi great community,
> >
> > I have a SolrCloud with the following configuration:
> >
> >    - 2 nodes (r3.2xlarge 61GB RAM)
> >    - 4 shards.
> >    - The producer can produce 13,000+ docs per second
> >    - The schema contains about 300+ fields and the document size is about
> >    3KB.
> >    - Using SolrJ and SolrCloudClient, each batch to solr contains 500
> docs.
> >
> > When I start my bulk indexer program the CPU utilization is 100% on each
> > server but the rate of the indexer is about 1500 docs per second.
> >
> > I know that some solr benchmarks reached 70,000+ doc. per second.
> >
> > The question: What is the best way to determine the bottleneck on solr
> > indexing rate?
> >
> > Thanks,
> > Mahmoud
>

Re: Indexing CPU performance

Reply via email to