Thanks mark !

On Sun, Nov 11, 2012 at 5:46 PM, Mark Miller <markrmil...@gmail.com> wrote:

> When SolrCloud is in a steady state (eg the number of nodes in the cluster
> is not changing and config is not changing), Solr does not really talk to
> ZooKeeper other than really light stuff like a heartbeat and maintaining a
> connection. So performance is not likely a large concern here.
>
> Mostly it's just a hassle because ZooKeeper does not currently support
> dynamically changing the nodes in an ensemble without doing a rolling
> restart. There are JIRA issues that are being worked on that will help with
> this though.
>
> Until then, it's just kind of a pain that some nodes have to be special or
> you have to do rolling restarts to make additional nodes part of the zk
> quorum.
>
> It's really up to you though - having the services separate just seems
> "nicer" to me. Easier to maintain. Often, once you start running ZooKeeper
> for one thing, you may end up running other things that use ZooKeeper as
> well - many people like to colocate this stuff on a single dedicated
> ZooKeeper ensemble.
>
> Embedded will run just fine - we simply recommend the other way to save
> headaches. If you know what you are getting into, it's certainly a valid
> choice.
>
> - Mark
>
>
> On 11/11/2012 05:11 PM, Anirudha Jadhav wrote:
>
>> let me see if i get this correctly,
>>
>> greater the no.of zookeeper nodes , more the time it takes to come to a
>> consensus.
>>
>> During an indexing operation, how many times does a solr client needs to
>> contact zookeeper for consensus ?
>> - per docs ? per commit ? ?
>>
>> thanks,
>> Ani
>>
>>
>> On Sun, Nov 11, 2012 at 11:17 AM, Nick Chase <nch...@earthlink.net>
>> wrote:
>>
>>  Thanks, Jack, this is a great explanation!  And since a greater number of
>>> ZK nodes tends to degrade write performance, that would be a factor in
>>> making every Solr node a ZK node as well.  Much obliged!
>>>
>>> ----  Nick
>>>
>>>
>>> On 11/11/2012 10:45 AM, Jack Krupansky wrote:
>>>
>>>  "Production" typically implies "high availability" and in a distributed
>>>> system the goal is that the overall cluster integrity and performance
>>>> should not be compromised just because a few "worker" nodes go down.
>>>> Solr nodes do a lot of complex operations and are quite prone to running
>>>> into "issues" that compromise their integrity and require that they be
>>>> taken down, restarted, etc. In fact, taking down a "bunch" of Solr
>>>> "worker" nodes should not be a big deal (unless they are all of the
>>>> nodes/replicas from a single shard/slice), while taking down a "bunch"
>>>> of zookeepers could be catastrophic to maintaining the integrity of the
>>>> zookeeper ensemble. (OTOH, if every Solr node is also a zookeeper node,
>>>> a "bunch" of Solr nodes would generally be less than a quorum, so maybe
>>>> that is not an absolute issue per se.) Zookeeper nodes are categorically
>>>> distinct in terms of their importance to maintaining the integrity and
>>>> availability of the overall cluster. They are special in that sense. And
>>>> they are special because they are maintaining the integrity of the
>>>> cluster's configuration information. Even for large clusters their
>>>> number will be relatively "few" compared to the "many" of "worker" nodes
>>>> (replicas), so zookeeper nodes need to be "protected" from the vagaries
>>>> that can disrupt and take Solr nodes down, not the least of which is
>>>> incoming traffic.
>>>>
>>>> I'm not sure what the implications would be if you had a large cluster
>>>> and because Zookeeper was embedded you had a large number of zookeepers.
>>>> Any of the inter-zookeeper operations would take longer and could be
>>>> compromised by even a single busy/overloaded/dead Solr node. OTOH, the
>>>> Zookeeper ensemble design is supposed to be able to handle a far number
>>>> of missing zookeeper nodes.
>>>>
>>>> OTOH, if high availability is not a requirement for a production cluster
>>>> (use case?), then non-embedded zookeepers are certainly an annoyance.
>>>>
>>>> Maybe you could think of embedded zookeeper like every employee having
>>>> their manager sitting right next to them all the time. How could that be
>>>> anything but a bad idea in terms of maximizing worker output - and
>>>> distracting/preventing managers from focusing on their own "work"?
>>>>
>>>> -- Jack Krupansky
>>>>
>>>> -----Original Message----- From: Nick Chase
>>>> Sent: Sunday, November 11, 2012 7:12 AM
>>>> To: solr-user@lucene.apache.org
>>>> Subject: Internal Vs. External ZooKeeper
>>>>
>>>> OK, I can't find a definitive answer on this.  The wiki says not to use
>>>> the embedded ZooKeeper servers for production.  But my question is: why
>>>> not?  Basically, what are the reasons and circumstances that make you
>>>> better off using an external ZooKeeper ensemble?
>>>>
>>>> Thanks...
>>>>
>>>> ---- Nick
>>>>
>>>>
>>>>
>>
>


-- 
Anirudha P. Jadhav

Reply via email to