Thanks mark !
On Sun, Nov 11, 2012 at 5:46 PM, Mark Miller <markrmil...@gmail.com> wrote: > When SolrCloud is in a steady state (eg the number of nodes in the cluster > is not changing and config is not changing), Solr does not really talk to > ZooKeeper other than really light stuff like a heartbeat and maintaining a > connection. So performance is not likely a large concern here. > > Mostly it's just a hassle because ZooKeeper does not currently support > dynamically changing the nodes in an ensemble without doing a rolling > restart. There are JIRA issues that are being worked on that will help with > this though. > > Until then, it's just kind of a pain that some nodes have to be special or > you have to do rolling restarts to make additional nodes part of the zk > quorum. > > It's really up to you though - having the services separate just seems > "nicer" to me. Easier to maintain. Often, once you start running ZooKeeper > for one thing, you may end up running other things that use ZooKeeper as > well - many people like to colocate this stuff on a single dedicated > ZooKeeper ensemble. > > Embedded will run just fine - we simply recommend the other way to save > headaches. If you know what you are getting into, it's certainly a valid > choice. > > - Mark > > > On 11/11/2012 05:11 PM, Anirudha Jadhav wrote: > >> let me see if i get this correctly, >> >> greater the no.of zookeeper nodes , more the time it takes to come to a >> consensus. >> >> During an indexing operation, how many times does a solr client needs to >> contact zookeeper for consensus ? >> - per docs ? per commit ? ? >> >> thanks, >> Ani >> >> >> On Sun, Nov 11, 2012 at 11:17 AM, Nick Chase <nch...@earthlink.net> >> wrote: >> >> Thanks, Jack, this is a great explanation! And since a greater number of >>> ZK nodes tends to degrade write performance, that would be a factor in >>> making every Solr node a ZK node as well. Much obliged! >>> >>> ---- Nick >>> >>> >>> On 11/11/2012 10:45 AM, Jack Krupansky wrote: >>> >>> "Production" typically implies "high availability" and in a distributed >>>> system the goal is that the overall cluster integrity and performance >>>> should not be compromised just because a few "worker" nodes go down. >>>> Solr nodes do a lot of complex operations and are quite prone to running >>>> into "issues" that compromise their integrity and require that they be >>>> taken down, restarted, etc. In fact, taking down a "bunch" of Solr >>>> "worker" nodes should not be a big deal (unless they are all of the >>>> nodes/replicas from a single shard/slice), while taking down a "bunch" >>>> of zookeepers could be catastrophic to maintaining the integrity of the >>>> zookeeper ensemble. (OTOH, if every Solr node is also a zookeeper node, >>>> a "bunch" of Solr nodes would generally be less than a quorum, so maybe >>>> that is not an absolute issue per se.) Zookeeper nodes are categorically >>>> distinct in terms of their importance to maintaining the integrity and >>>> availability of the overall cluster. They are special in that sense. And >>>> they are special because they are maintaining the integrity of the >>>> cluster's configuration information. Even for large clusters their >>>> number will be relatively "few" compared to the "many" of "worker" nodes >>>> (replicas), so zookeeper nodes need to be "protected" from the vagaries >>>> that can disrupt and take Solr nodes down, not the least of which is >>>> incoming traffic. >>>> >>>> I'm not sure what the implications would be if you had a large cluster >>>> and because Zookeeper was embedded you had a large number of zookeepers. >>>> Any of the inter-zookeeper operations would take longer and could be >>>> compromised by even a single busy/overloaded/dead Solr node. OTOH, the >>>> Zookeeper ensemble design is supposed to be able to handle a far number >>>> of missing zookeeper nodes. >>>> >>>> OTOH, if high availability is not a requirement for a production cluster >>>> (use case?), then non-embedded zookeepers are certainly an annoyance. >>>> >>>> Maybe you could think of embedded zookeeper like every employee having >>>> their manager sitting right next to them all the time. How could that be >>>> anything but a bad idea in terms of maximizing worker output - and >>>> distracting/preventing managers from focusing on their own "work"? >>>> >>>> -- Jack Krupansky >>>> >>>> -----Original Message----- From: Nick Chase >>>> Sent: Sunday, November 11, 2012 7:12 AM >>>> To: solr-user@lucene.apache.org >>>> Subject: Internal Vs. External ZooKeeper >>>> >>>> OK, I can't find a definitive answer on this. The wiki says not to use >>>> the embedded ZooKeeper servers for production. But my question is: why >>>> not? Basically, what are the reasons and circumstances that make you >>>> better off using an external ZooKeeper ensemble? >>>> >>>> Thanks... >>>> >>>> ---- Nick >>>> >>>> >>>> >> > -- Anirudha P. Jadhav