let me see if i get this correctly,

greater the no.of zookeeper nodes , more the time it takes to come to a
consensus.

During an indexing operation, how many times does a solr client needs to
contact zookeeper for consensus ?
- per docs ? per commit ? ?

thanks,
Ani


On Sun, Nov 11, 2012 at 11:17 AM, Nick Chase <nch...@earthlink.net> wrote:

> Thanks, Jack, this is a great explanation!  And since a greater number of
> ZK nodes tends to degrade write performance, that would be a factor in
> making every Solr node a ZK node as well.  Much obliged!
>
> ----  Nick
>
>
> On 11/11/2012 10:45 AM, Jack Krupansky wrote:
>
>> "Production" typically implies "high availability" and in a distributed
>> system the goal is that the overall cluster integrity and performance
>> should not be compromised just because a few "worker" nodes go down.
>> Solr nodes do a lot of complex operations and are quite prone to running
>> into "issues" that compromise their integrity and require that they be
>> taken down, restarted, etc. In fact, taking down a "bunch" of Solr
>> "worker" nodes should not be a big deal (unless they are all of the
>> nodes/replicas from a single shard/slice), while taking down a "bunch"
>> of zookeepers could be catastrophic to maintaining the integrity of the
>> zookeeper ensemble. (OTOH, if every Solr node is also a zookeeper node,
>> a "bunch" of Solr nodes would generally be less than a quorum, so maybe
>> that is not an absolute issue per se.) Zookeeper nodes are categorically
>> distinct in terms of their importance to maintaining the integrity and
>> availability of the overall cluster. They are special in that sense. And
>> they are special because they are maintaining the integrity of the
>> cluster's configuration information. Even for large clusters their
>> number will be relatively "few" compared to the "many" of "worker" nodes
>> (replicas), so zookeeper nodes need to be "protected" from the vagaries
>> that can disrupt and take Solr nodes down, not the least of which is
>> incoming traffic.
>>
>> I'm not sure what the implications would be if you had a large cluster
>> and because Zookeeper was embedded you had a large number of zookeepers.
>> Any of the inter-zookeeper operations would take longer and could be
>> compromised by even a single busy/overloaded/dead Solr node. OTOH, the
>> Zookeeper ensemble design is supposed to be able to handle a far number
>> of missing zookeeper nodes.
>>
>> OTOH, if high availability is not a requirement for a production cluster
>> (use case?), then non-embedded zookeepers are certainly an annoyance.
>>
>> Maybe you could think of embedded zookeeper like every employee having
>> their manager sitting right next to them all the time. How could that be
>> anything but a bad idea in terms of maximizing worker output - and
>> distracting/preventing managers from focusing on their own "work"?
>>
>> -- Jack Krupansky
>>
>> -----Original Message----- From: Nick Chase
>> Sent: Sunday, November 11, 2012 7:12 AM
>> To: solr-user@lucene.apache.org
>> Subject: Internal Vs. External ZooKeeper
>>
>> OK, I can't find a definitive answer on this.  The wiki says not to use
>> the embedded ZooKeeper servers for production.  But my question is: why
>> not?  Basically, what are the reasons and circumstances that make you
>> better off using an external ZooKeeper ensemble?
>>
>> Thanks...
>>
>> ---- Nick
>>
>>


-- 
Anirudha P. Jadhav

Reply via email to