Asif: Thanks, this is great info and may add to the priority of making this configurable. I raised a JIRA, see: https://issues.apache.org/jira/browse/SOLR-4956 and feel free to add anything you'd like or correct anything I didn't get right.
Best Erick On Sat, Jun 22, 2013 at 10:16 PM, Asif <talla...@gmail.com> wrote: > Erick, > > Its a completely practical problem - we are exploring Solr to build a real > time analytics/data solution for a system handling about 1000 qps. We have > various metrics that are stored as different collections on the cloud, > which means very high amount of writes. The cloud also needs to support > about 300-400 qps. > > We initially tested with a single Solr node on a 16 core / 24 GB box for a > single metric. We saw that writes were not a issue at all - Solr was > handling it extremely well. We were also able to achieve about 200 qps from > a single node. > > When we set up the cloud ( a ensemble on 6 boxes), we saw very high CPU > usage on the replicas. Up to 10 cores were getting used for writes on the > replicas. Hence my concern with respect to batch updates for the replicas. > > BTW, I altered the maxBufferedAddsPerServer to 1000 - and now CPU usage is > very similar to single node installation. > > - Asif > > > > > > > On Sat, Jun 22, 2013 at 9:53 PM, Erick Erickson > <erickerick...@gmail.com>wrote: > >> Yeah, there's been talk of making this configurable, but there are >> more pressing priorities so far. >> >> So just to be clear, is this theoretical or practical? I know of several >> very >> high-performance situations where 1,000 updates/sec (and I'm assuming >> that it's 1,000 docs/sec not 1,000 batches of 1,000 docs) hasn't caused >> problems here. So unless you're actually seeing performance problems >> as opposed to fearing that there _might_ be, I'd just go on the to the next >> urgent problem. >> >> Best >> Erick >> >> On Fri, Jun 21, 2013 at 8:34 PM, Asif <talla...@gmail.com> wrote: >> > Erick, >> > >> > Thanks for your reply. >> > >> > You are right about 10 updates being batch up - It was hard to figure out >> > due to large number of updates/logging that happens in our system. >> > >> > We are batching 1000 updates every time. >> > >> > Here is my observation from leader and replica - >> > >> > 1. Leader logs are clearly indicating that 1000 updates arrived - [ (1000 >> > adds)],commit=] >> > 2. On replica - for each 1000 document adds on leader - I see a lot of >> > requests on replica - with no indication of how many updates in each >> > request. >> > >> > Digging a little bit into Solr code I figured this variable I am >> > interested in - maxBufferedAddsPerServer is set to 10 - >> > >> > >> http://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/java/org/apache/solr/update/SolrCmdDistributor.java?view=markup >> > >> > This means for a batch update of 1000 documents - we will be seeing 100 >> > requests for replica - which translates into 100 writes per collection >> per >> > second in our system. >> > >> > Should this variable be made configurable via solrconfig.xml (or any >> other >> > appropriate place)? >> > >> > A little background about a system we are trying to build - real time >> > analytics solution using the Solr Cloud + Atomic updates - we have very >> > high amount of writes - going as high as 1000 updates a second (possibly >> > more in long run). >> > >> > - Asif >> > >> > >> > >> > >> > >> > On Sat, Jun 22, 2013 at 4:21 AM, Erick Erickson <erickerick...@gmail.com >> >wrote: >> > >> >> Update are batched, but it's on a per-request basis. So, if >> >> you're sending one document at a time you'll won't get any >> >> batching. If you send 10 docs at a time and they happen to >> >> go to 10 different shards, you'll get 10 different update >> >> requests. >> >> >> >> If you're sending 1,000 docs per update you' should be seeing >> >> some batching going on. >> >> >> >> bq: but why not batch them up or give a option to batch N >> >> updates in either of the above case >> >> >> >> I suspect what you're seeing is that you're not sending very >> >> many docs per update request and so are being mislead. >> >> >> >> But that's a guess since you haven't provided much in the >> >> way of data on _how_ you're updating. >> >> >> >> bq: the cloud eventually starts to fail >> >> How? Details matter. >> >> >> >> Best >> >> Erick >> >> >> >> On Wed, Jun 19, 2013 at 4:23 AM, Asif <talla...@gmail.com> wrote: >> >> > Hi, >> >> > >> >> > I had questions on implementation of Sharding and Replication >> features of >> >> > Solr/Cloud. >> >> > >> >> > 1. I noticed that when sharding is enabled for a collection - >> individual >> >> > requests are sent to each node serving as a shard. >> >> > >> >> > 2. Replication too follows above strategy of sending individual >> documents >> >> > to the nodes serving as a replica. >> >> > >> >> > I am working with a system that requires massive number of writes - I >> >> have >> >> > noticed that due to above reason - the cloud eventually starts to fail >> >> > (Even though I am using a ensemble). >> >> > >> >> > I do understand the reason behind individual updates - but why not >> batch >> >> > them up or give a option to batch N updates in either of the above >> case >> >> - I >> >> > did come across a presentation that talked about batching 10 updates >> for >> >> > replication at least, but I do not think this is the case. >> >> > - Asif >> >> >>