Re: Sharding and Replication

Asif Fri, 21 Jun 2013 20:35:31 -0700

Erick,

Thanks for your reply.

You are right about 10 updates being batch up - It was hard to figure out
due to large number of updates/logging that happens in our system.

We are batching 1000 updates every time.

Here is my observation from leader and replica -

1. Leader logs are clearly indicating that 1000 updates arrived - [ (1000
adds)],commit=]
2. On replica - for each 1000 document adds on leader - I see a lot of
requests on replica - with no indication of how many updates in each
request.

Digging a little bit into Solr code  I figured this variable I am
interested in - maxBufferedAddsPerServer is set to 10 -

http://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/java/org/apache/solr/update/SolrCmdDistributor.java?view=markup

This means for a batch update of 1000 documents - we will be seeing 100
requests for replica - which translates into 100 writes per collection per
second in our system.

Should this variable be made configurable via solrconfig.xml (or any other
appropriate place)?

A little background about a system we are trying to build - real time
analytics solution using the Solr Cloud + Atomic updates - we have very
high amount of writes - going as high as 1000 updates a second (possibly
more in long run).

- Asif

On Sat, Jun 22, 2013 at 4:21 AM, Erick Erickson <erickerick...@gmail.com>wrote:

> Update are batched, but it's on a per-request basis. So, if
> you're sending one document at a time you'll won't get any
> batching. If you send 10 docs at a time and they happen to
> go to 10 different shards, you'll get 10 different update
> requests.
>
> If you're sending 1,000 docs per update you' should be seeing
> some batching going on.
>
> bq:  but why not batch them up or give a option to batch N
> updates in either of the above case
>
> I suspect what you're seeing is that you're not sending very
> many docs per update request and so are being mislead.
>
> But that's a guess since you haven't provided much in the
> way of data on _how_ you're updating.
>
> bq: the cloud eventually starts to fail
> How? Details matter.
>
> Best
> Erick
>
> On Wed, Jun 19, 2013 at 4:23 AM, Asif <talla...@gmail.com> wrote:
> > Hi,
> >
> > I had questions on implementation of Sharding and Replication features of
> > Solr/Cloud.
> >
> > 1. I noticed that when sharding is enabled for a collection - individual
> > requests are sent to each node serving as a shard.
> >
> > 2. Replication too follows above strategy of sending individual documents
> > to the nodes serving as a replica.
> >
> > I am working with a system that requires massive number of writes - I
> have
> > noticed that due to above reason - the cloud eventually starts to fail
> > (Even though I am using a ensemble).
> >
> > I do understand the reason behind individual updates - but why not batch
> > them up or give a option to batch N updates in either of the above case
> - I
> > did come across a presentation that talked about batching 10 updates for
> > replication at least, but I do not think this is the case.
> > - Asif
>

Re: Sharding and Replication

Reply via email to