Thanks Mark, Good, this is probably good enough to give it a try. My analyzers are normally fast, doing duplicate analysis (at each replica) is probably not going to cost a lot, if there is some decent "batching"
Can this be somehow controlled (depth of this buffer / time till flush or some such). Which "events" trigger this flushing to replicas (softCommit, commit, something new?) What I found useful is to always think in terms of incremental (low latency) and batch (high throughput) updates. I just then need some knobs to tweak behavior of this update process. I wold really like to move away from Master/Slave, Cloud makes a lot of things way simpler for us users ... Will give it a try in a couple of weeks Later we can even think about putting replication at segment level for "extremely expensive analysis, batch cases", or "initial cluster seeding" as a replication option. But this is then just an optimization. Cheers, eks On Thu, Mar 1, 2012 at 5:24 AM, Mark Miller <markrmil...@gmail.com> wrote: > We actually do currently batch updates - we are being somewhat loose when we > say a document at a time. There is a buffer of updates per replica that gets > flushed depending on the requests coming through and the buffer size. > > - Mark Miller > lucidimagination.com > > On Feb 28, 2012, at 3:38 AM, eks dev wrote: > >> SolrCluod is going to be great, NRT feature is really huge step >> forward, as well as central configuration, elasticity ... >> >> The only thing I do not yet understand is treatment of cases that were >> traditionally covered by Master/Slave setup. Batch update >> >> If I get it right (?), updates to replicas are sent one by one, >> meaning when one server receives update, it gets forwarded to all >> replicas. This is great for reduced update latency case, but I do not >> know how is it implemented if you hit it with "batch" update. This >> would cause huge amount of update commands going to replicas. Not so >> good for throughput. >> >> - Master slave does distribution at segment level, (no need to >> replicate analysis, far less network traffic). Good for batch updates >> - SolrCloud does par update command (low latency, but chatty and >> Analysis step is done N_Servers times). Good for incremental updates >> >> Ideally, some sort of "batching" is going to be available in >> SolrCloud, and some cont roll over it, e.g. forward batches of 1000 >> documents (basically keep update log slightly longer and forward it as >> a batch update command). This would still cause duplicate analysis, >> but would reduce network traffic. >> >> Please bare in mind, this is more of a question than a statement, I >> didn't look at the cloud code. It might be I am completely wrong here! >> >> >> >> >> >> On Tue, Feb 28, 2012 at 4:01 AM, Erick Erickson <erickerick...@gmail.com> >> wrote: >>> As I understand it (and I'm just getting into SolrCloud myself), you can >>> essentially forget about master/slave stuff. If you're using NRT, >>> the soft commit will make the docs visible, you don't ned to do a hard >>> commit (unlike the master/slave days). Essentially, the update is sent >>> to each shard leader and then fanned out into the replicas for that >>> leader. All automatically. Leaders are elected automatically. ZooKeeper >>> is used to keep the cluster information. >>> >>> Additionally, SolrCloud keeps a transaction log of the updates, and replays >>> them if the indexing is interrupted, so you don't risk data loss the way >>> you used to. >>> >>> There aren't really masters/slaves in the old sense any more, so >>> you have to get out of that thought-mode (it's hard, I know). >>> >>> The code is under pretty active development, so any feedback is >>> valuable.... >>> >>> Best >>> Erick >>> >>> On Mon, Feb 27, 2012 at 3:26 AM, roz dev <rozde...@gmail.com> wrote: >>>> Hi All, >>>> >>>> I am trying to understand features of Solr Cloud, regarding commits and >>>> scaling. >>>> >>>> >>>> - If I am using Solr Cloud then do I need to explicitly call commit >>>> (hard-commit)? Or, a soft commit is okay and Solr Cloud will do the job >>>> of >>>> writing to disk? >>>> >>>> >>>> - Do We still need to use Master/Slave setup to scale searching? If we >>>> have to use Master/Slave setup then do i need to issue hard-commit to >>>> make >>>> my changes visible to slaves? >>>> - If I were to use NRT with Master/Slave setup with soft commit then >>>> will the slave be able to see changes made on master with soft commit? >>>> >>>> Any inputs are welcome. >>>> >>>> Thanks >>>> >>>> -Saroj > > > > > > > > > > > >