What is your replication factor and doc size?

Replication can affect performance a fair amount more than it should currently.

For the number of nodes, that doesn’t sound like it matches what I’ve seen 
unless those are huge documents or you have some slow analyzer in the chain or 
something.

Without replication, with relatively small docs and decent hardware, I’d expect 
around 10,000-12,000 doc’s per node. Replication can up to half that by some 
reports. Larger doc size or other outliers might cut some off as well.

Solr 4.4 is pretty ancient in SolrCloud terms at this point in general by the 
way.

- Mark

http://about.me/markrmiller

> On Feb 3, 2015, at 7:47 PM, Tim Smith <secs...@gmail.com> wrote:
> 
> Hi,
> 
> I have a SolrCloud (Solr 4.4, writing to HDFS on CDH-5.3) collection
> configured to be populated by flume Morphlines sink. The flume agent reads
> data from Kafka and writes to the Solr collection.
> 
> The issue is that Solr indexing rate is abysmally poor (~6k docs/sec at
> best, dips to a few hundred per sec) across the cluster. The incoming
> data/document rate is about 30-40k/second.
> 
> I have gone wide/thin with 18 nodes and each with 8GB (Java) + 4GB
> (non-heap) memory and narrow/thick with current set of 5 dedicated nodes
> each with 36GB (Java) and 16GB (non-heap) memory (18 shards with the former
> config and 5 shards, right now).
> 
> On the flume side, I have gone from 5 flume instances, each with a single
> sink to 5 sinks for each flume instance. I have tweaked batchSize and
> batchDuration.
> 
> I checked ZooKeeper loads and don't see it stressed. Neither are the
> datanodes. On the Solr nodes, solr is consuming all the allocated memory
> (32GB) but I don't see solr hitting any CPU limits.
> 
> *But*, indexing rate stubbornly stays at ~6k docs/sec. When I bounce the
> flume agent, it jumps up momentarily to several hundreds of thousands but
> then comes down to ~6k/sec and the flume channels get saturated within
> seconds.
> 
> Any clues/pointers for troubleshooting will be appreciated?
> 
> 
> Thanks,
> 
> Tim

Reply via email to