Re: Ideas for debugging poor SolrCloud scalability

Erick Erickson Sat, 01 Nov 2014 16:15:07 -0700

bq: but it should be more or less a constant factor no matter how many
Solr nodes you are using, right?

Not really. You've stated that you're not driving Solr very hard in
your tests. Therefore you're waiting on I/O. Therefore your tests just
aren't going to scale linearly with the number of shards. This is a
simplification, but....

Your network utilization is pretty much irrelevant. I send a packet
somewhere. "somewhere" does some stuff and sends me back an
acknowledgement. While I'm waiting, the network is getting no traffic,
so..... If the network traffic was in the 90% range that would be
different, so it's a good thing to monitor.

Really, use a "leader aware" client and rack enough clients together
that you're driving Solr hard. Then double the number of shards. Then
rack enough _more_ clients to drive Solr at the same level. In this
case I'll go out on a limb and predict near 2x throughput increases.

One additional note, though. When you add _replicas_ to shards expect
to see a drop in throughput that may be quite significant, 20-40%
anecdotally...

Best,
Erick

On Sat, Nov 1, 2014 at 9:23 AM, Shawn Heisey <apa...@elyograg.org> wrote:
> On 11/1/2014 9:52 AM, Ian Rose wrote:
>> Just to make sure I am thinking about this right: batching will certainly
>> make a big difference in performance, but it should be more or less a
>> constant factor no matter how many Solr nodes you are using, right?  Right
>> now in my load tests, I'm not actually that concerned about the absolute
>> performance numbers; instead I'm just trying to figure out why relative
>> performance (no matter how bad it is since I am not batching) does not go
>> up with more Solr nodes.  Once I get that part figured out and we are
>> seeing more writes per sec when we add nodes, then I'll turn on batching in
>> the client to see what kind of additional performance gain that gets us.
>
> The basic problem I see with your methodology is that you are sending an
> update request and waiting for it to complete before sending another.
> No matter how big the batches are, this is an inefficient use of resources.
>
> If you send many such requests at the same time, then they will be
> handled in parallel.  Lucene (and by extension, Solr) has the thread
> synchronization required to keep multiple simultaneous update requests
> from stomping on each other and corrupting the index.
>
> If you have enough CPU cores, such handling will *truly* be in parallel,
> otherwise the operating system will just take turns giving each thread
> CPU time.  This results in a pretty good facsimile of parallel
> operation, but because it splits the available CPU resources, isn't as
> fast as true parallel operation.
>
> Thanks,
> Shawn
>

Re: Ideas for debugging poor SolrCloud scalability

Reply via email to