We actually do currently batch updates - we are being somewhat loose when we say a document at a time. There is a buffer of updates per replica that gets flushed depending on the requests coming through and the buffer size.
- Mark Miller lucidimagination.com On Feb 28, 2012, at 3:38 AM, eks dev wrote: > SolrCluod is going to be great, NRT feature is really huge step > forward, as well as central configuration, elasticity ... > > The only thing I do not yet understand is treatment of cases that were > traditionally covered by Master/Slave setup. Batch update > > If I get it right (?), updates to replicas are sent one by one, > meaning when one server receives update, it gets forwarded to all > replicas. This is great for reduced update latency case, but I do not > know how is it implemented if you hit it with "batch" update. This > would cause huge amount of update commands going to replicas. Not so > good for throughput. > > - Master slave does distribution at segment level, (no need to > replicate analysis, far less network traffic). Good for batch updates > - SolrCloud does par update command (low latency, but chatty and > Analysis step is done N_Servers times). Good for incremental updates > > Ideally, some sort of "batching" is going to be available in > SolrCloud, and some cont roll over it, e.g. forward batches of 1000 > documents (basically keep update log slightly longer and forward it as > a batch update command). This would still cause duplicate analysis, > but would reduce network traffic. > > Please bare in mind, this is more of a question than a statement, I > didn't look at the cloud code. It might be I am completely wrong here! > > > > > > On Tue, Feb 28, 2012 at 4:01 AM, Erick Erickson <erickerick...@gmail.com> > wrote: >> As I understand it (and I'm just getting into SolrCloud myself), you can >> essentially forget about master/slave stuff. If you're using NRT, >> the soft commit will make the docs visible, you don't ned to do a hard >> commit (unlike the master/slave days). Essentially, the update is sent >> to each shard leader and then fanned out into the replicas for that >> leader. All automatically. Leaders are elected automatically. ZooKeeper >> is used to keep the cluster information. >> >> Additionally, SolrCloud keeps a transaction log of the updates, and replays >> them if the indexing is interrupted, so you don't risk data loss the way >> you used to. >> >> There aren't really masters/slaves in the old sense any more, so >> you have to get out of that thought-mode (it's hard, I know). >> >> The code is under pretty active development, so any feedback is >> valuable.... >> >> Best >> Erick >> >> On Mon, Feb 27, 2012 at 3:26 AM, roz dev <rozde...@gmail.com> wrote: >>> Hi All, >>> >>> I am trying to understand features of Solr Cloud, regarding commits and >>> scaling. >>> >>> >>> - If I am using Solr Cloud then do I need to explicitly call commit >>> (hard-commit)? Or, a soft commit is okay and Solr Cloud will do the job of >>> writing to disk? >>> >>> >>> - Do We still need to use Master/Slave setup to scale searching? If we >>> have to use Master/Slave setup then do i need to issue hard-commit to make >>> my changes visible to slaves? >>> - If I were to use NRT with Master/Slave setup with soft commit then >>> will the slave be able to see changes made on master with soft commit? >>> >>> Any inputs are welcome. >>> >>> Thanks >>> >>> -Saroj