Thanks Mark,
Good, this is probably good enough to give it a try. My analyzers are
normally fast,  doing duplicate analysis  (at each replica) is
probably not going to cost a lot, if there is some decent "batching"

Can this be somehow controlled (depth of this buffer / time till flush
or some such). Which "events" trigger this flushing to replicas
(softCommit, commit, something new?)

What I found useful is to always think in terms of incremental (low
latency) and batch (high throughput) updates. I just then need some
knobs to tweak behavior of this update process.

I wold really like to move away from Master/Slave, Cloud makes a lot
of things way simpler for us users ... Will give it a try in a couple
of weeks

Later we can even think about putting replication at segment level for
"extremely expensive analysis, batch cases", or "initial cluster
seeding" as a replication option. But this is then just an
optimization.

Cheers,
eks


On Thu, Mar 1, 2012 at 5:24 AM, Mark Miller <markrmil...@gmail.com> wrote:
> We actually do currently batch updates - we are being somewhat loose when we 
> say a document at a time. There is a buffer of updates per replica that gets 
> flushed depending on the requests coming through and the buffer size.
>
> - Mark Miller
> lucidimagination.com
>
> On Feb 28, 2012, at 3:38 AM, eks dev wrote:
>
>> SolrCluod is going to be great, NRT feature is really huge step
>> forward, as well as central configuration, elasticity ...
>>
>> The only thing I do not yet understand is treatment of cases that were
>> traditionally covered by Master/Slave setup. Batch update
>>
>> If I get it right (?), updates to replicas are sent one by one,
>> meaning when one server receives update, it gets forwarded to all
>> replicas. This is great for reduced update latency case, but I do not
>> know how is it implemented if you hit it with "batch" update. This
>> would cause huge amount of update commands going to replicas. Not so
>> good for throughput.
>>
>> - Master slave does distribution at segment level, (no need to
>> replicate analysis, far less network traffic). Good for batch updates
>> - SolrCloud does par update command (low latency, but chatty and
>> Analysis step is done N_Servers times). Good for incremental updates
>>
>> Ideally, some sort of "batching" is going to be available in
>> SolrCloud, and some cont roll over it, e.g. forward batches of 1000
>> documents (basically keep update log slightly longer and forward it as
>> a batch update command). This would still cause duplicate analysis,
>> but would reduce network traffic.
>>
>> Please bare in mind, this is more of a question than a statement,  I
>> didn't look at the cloud code. It might be I am completely wrong here!
>>
>>
>>
>>
>>
>> On Tue, Feb 28, 2012 at 4:01 AM, Erick Erickson <erickerick...@gmail.com> 
>> wrote:
>>> As I understand it (and I'm just getting into SolrCloud myself), you can
>>> essentially forget about master/slave stuff. If you're using NRT,
>>> the soft commit will make the docs visible, you don't ned to do a hard
>>> commit (unlike the master/slave days). Essentially, the update is sent
>>> to each shard leader and then fanned out into the replicas for that
>>> leader. All automatically. Leaders are elected automatically. ZooKeeper
>>> is used to keep the cluster information.
>>>
>>> Additionally, SolrCloud keeps a transaction log of the updates, and replays
>>> them if the indexing is interrupted, so you don't risk data loss the way
>>> you used to.
>>>
>>> There aren't really masters/slaves in the old sense any more, so
>>> you have to get out of that thought-mode (it's hard, I know).
>>>
>>> The code is under pretty active development, so any feedback is
>>> valuable....
>>>
>>> Best
>>> Erick
>>>
>>> On Mon, Feb 27, 2012 at 3:26 AM, roz dev <rozde...@gmail.com> wrote:
>>>> Hi All,
>>>>
>>>> I am trying to understand features of Solr Cloud, regarding commits and
>>>> scaling.
>>>>
>>>>
>>>>   - If I am using Solr Cloud then do I need to explicitly call commit
>>>>   (hard-commit)? Or, a soft commit is okay and Solr Cloud will do the job 
>>>> of
>>>>   writing to disk?
>>>>
>>>>
>>>>   - Do We still need to use  Master/Slave setup to scale searching? If we
>>>>   have to use Master/Slave setup then do i need to issue hard-commit to 
>>>> make
>>>>   my changes visible to slaves?
>>>>   - If I were to use NRT with Master/Slave setup with soft commit then
>>>>   will the slave be able to see changes made on master with soft commit?
>>>>
>>>> Any inputs are welcome.
>>>>
>>>> Thanks
>>>>
>>>> -Saroj
>
>
>
>
>
>
>
>
>
>
>
>

Reply via email to