Re: Ideas for debugging poor SolrCloud scalability

Ian Rose Fri, 07 Nov 2014 06:19:29 -0800

Hi again, all -

Since several people were kind enough to jump in to offer advice on this
thread, I wanted to follow up in case anyone finds this useful in the
future.

*tl;dr: *Routing updates to a random Solr node (and then letting it forward
the docs to where they need to go) is very expensive, more than I
expected.  Using a "smart" router that uses the cluster config to route
documents directly to their shard results in (near) linear scaling for us.

*Expository version:*

We use Go on our client, for which (to my knowledge) there is no SolrCloud
router implementation.  So we started by just routing updates to a random
Solr node and letting it forward the docs to where they need to go.  My
theory was that this would lead to a constant amount of additional work
(and thus still linear scaling).  This was based on the observation that if
you send an update of K documents to a Solr node in a N node cluster, in
the worst case scenario, all K documents will need to be forwarded on to
other nodes.  Since Solr nodes have perfect knowledge of where docs belong,
each doc would only take 1 additional hop to get to its replica.  So random
routing (in the limit) imposes 1 additional network hop for each document.

In practice, however, we find that (for small networks, at least) per-node
performance falls as you add shards.  In fact, the client performance (in
writes/sec) was essentially constant no matter how many shards we added.  I
do have a working theory as to why this might be (i.e. where the flaw is in
my logic above) but as this is merely an unverified theory I don't want to
lead anyone astray by writing it up here.

However, by writing a "smart" router that retrieves the clusterstate.json
file from Zookeeper and uses that to "perfectly" route documents to their
proper shard, we were able to achieve much better scalability.  Using a
synthetic workload, we were able to achieve 141.7 writes/sec to a cluster
of size 1 and 2506 writes/sec to a cluster of size 20 (125
writes/sec/node).  So a dropoff of ~12% which is not too bad.  We are
hoping to continue our tests with larger clusters to ensure that the
per-node write performance levels off and does not continue to drop as the
cluster scales.

I will also note that we initially had several bugs in our "smart" router
implementation so if you follow a similar path and see bad performance look
to your router implementation as you might not be routing correctly.  We
ended up writing a simple proxy that we ran in front of Solr to observe all
requests which helped immensely when verifying and debugging our router.
Yes tcpdump does something similar but viewing HTTP-level traffic is way
more convenient than TCP-level.  Plus Go makes little proxies like this
super easy to do.

Hope all that is useful to someone.  Thanks again to the posters above for
providing suggestions!

- Ian

On Sat, Nov 1, 2014 at 7:13 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> bq: but it should be more or less a constant factor no matter how many
> Solr nodes you are using, right?
>
> Not really. You've stated that you're not driving Solr very hard in
> your tests. Therefore you're waiting on I/O. Therefore your tests just
> aren't going to scale linearly with the number of shards. This is a
> simplification, but....
>
> Your network utilization is pretty much irrelevant. I send a packet
> somewhere. "somewhere" does some stuff and sends me back an
> acknowledgement. While I'm waiting, the network is getting no traffic,
> so..... If the network traffic was in the 90% range that would be
> different, so it's a good thing to monitor.
>
> Really, use a "leader aware" client and rack enough clients together
> that you're driving Solr hard. Then double the number of shards. Then
> rack enough _more_ clients to drive Solr at the same level. In this
> case I'll go out on a limb and predict near 2x throughput increases.
>
> One additional note, though. When you add _replicas_ to shards expect
> to see a drop in throughput that may be quite significant, 20-40%
> anecdotally...
>
> Best,
> Erick
>
> On Sat, Nov 1, 2014 at 9:23 AM, Shawn Heisey <apa...@elyograg.org> wrote:
> > On 11/1/2014 9:52 AM, Ian Rose wrote:
> >> Just to make sure I am thinking about this right: batching will
> certainly
> >> make a big difference in performance, but it should be more or less a
> >> constant factor no matter how many Solr nodes you are using, right?
> Right
> >> now in my load tests, I'm not actually that concerned about the absolute
> >> performance numbers; instead I'm just trying to figure out why relative
> >> performance (no matter how bad it is since I am not batching) does not
> go
> >> up with more Solr nodes.  Once I get that part figured out and we are
> >> seeing more writes per sec when we add nodes, then I'll turn on
> batching in
> >> the client to see what kind of additional performance gain that gets us.
> >
> > The basic problem I see with your methodology is that you are sending an
> > update request and waiting for it to complete before sending another.
> > No matter how big the batches are, this is an inefficient use of
> resources.
> >
> > If you send many such requests at the same time, then they will be
> > handled in parallel.  Lucene (and by extension, Solr) has the thread
> > synchronization required to keep multiple simultaneous update requests
> > from stomping on each other and corrupting the index.
> >
> > If you have enough CPU cores, such handling will *truly* be in parallel,
> > otherwise the operating system will just take turns giving each thread
> > CPU time.  This results in a pretty good facsimile of parallel
> > operation, but because it splits the available CPU resources, isn't as
> > fast as true parallel operation.
> >
> > Thanks,
> > Shawn
> >
>

Re: Ideas for debugging poor SolrCloud scalability

Reply via email to