Garth, Here is something else related to help push the upgrade further:
http://search-lucene.com/m/gUajqxuETB1/&subj=Re+SolrCloud+and+split+brain Monitor your beast keeper: http://search-lucene.com/m/R9vEg2JmiR91 Otis On Tue, Nov 19, 2013 at 5:56 PM, Garth Grimm < garthgr...@averyranchconsulting.com> wrote: > Thanks Mark and Tim. My understanding has been upgraded. > > -----Original Message----- > From: Mark Miller [mailto:markrmil...@gmail.com] > Sent: Tuesday, November 19, 2013 1:59 PM > To: solr-user@lucene.apache.org > Subject: Re: Zookeeper down question > > > On Nov 19, 2013, at 2:24 PM, Timothy Potter <thelabd...@gmail.com> wrote: > > > Good questions ... From my understanding, queries will work if Zk goes > > down but writes do not work w/o Zookeeper. This works because the > > clusterstate is cached on each node so Zookeeper doesn't participate > > directly in queries and indexing requests. Solr has to decide not to > > allow writes if it loses its connection to Zookeeper, which is a safe > > guard mechanism. In other words, Solr assumes it's pretty safe to > > allow reads if the cluster doesn't have a healthy coordinator, but > chooses to not allow writes to be safe. > > Right - we currently stop accepting writes when Solr cannot talk to > ZooKeeper - this is because we can no longer count on knowing about any > changes to the cluster and no new leaders can be elected, etc. It gets > tricky fast if you consider allowing updates without ZooKeeper connectivity > for very long. > > > > > If a Solr nodes goes down while ZK is not available, since Solr no > > longer accepts writes, leader / replica doesn't really matter. I'd > > venture to guess there is some failover logic built in when executing > > distributing queries but I'm not as familiar with that part of the > > code (I'll brush up on it though as I'm now curious as well). > > Right - query requests will fail over to other replicas - this is > important in general because the cluster state a Solr instance has can be a > bit stale - so a request might hit something that has gone down and another > replica in the shard can be tried. We use the load balancing solrj client > for these internal requests. CloudSolrServer handles failover for the user > (or non internal) requests. Or you can use your own external load balancer. > > - Mark > > > > > Cheers, > > Tim > > > > > > On Tue, Nov 19, 2013 at 11:58 AM, Garth Grimm < > > garthgr...@averyranchconsulting.com> wrote: > > > >> Given a 4 solr node instance (i.e. 2 shards, 2 replicas per shard), > >> and a standalone zookeeper. > >> > >> Correct me if any of my understanding is incorrect on the following: > >> If ZK goes down, most normal operations will still function, since my > >> understanding is that ZK isn't involved on a transaction by > >> transaction basis for each of these..... > >> Document adds, updates, and deletes on existing collection will still > >> work as expected. > >> Queries will still get processed as expected. > >> Is the above correct? > >> > >> But adding new collections, changing configs, etc., will all fail > >> while ZK is down (or at least, place things in an inconsistent > >> state?) Is that correct? > >> > >> If, while ZK is down, one of the 4 solr nodes also goes down, will > >> all normal operations fail? Will they all continue to succeed? I.e. > >> will each of the nodes realize which node is down and route indexing > >> and query requests around them, or is that impossible while ZK is > >> down? Will some queries succeed (because they were lucky enough to > >> get routed to the one replica on the one shard that is still > >> functional) while other queries fail (they aren't so lucky and get > >> routed to the one replica that is down on the one shard)? > >> > >> Thanks, > >> Garth Grimm > >> > >> > >> > >