Re: Zookeeper down question

Otis Gospodnetic Tue, 19 Nov 2013 20:05:20 -0800

Garth,

Here is something else related to help push the upgrade further:


http://search-lucene.com/m/gUajqxuETB1/&subj=Re+SolrCloud+and+split+brain

Monitor your beast keeper: http://search-lucene.com/m/R9vEg2JmiR91

Otis


On Tue, Nov 19, 2013 at 5:56 PM, Garth Grimm <
garthgr...@averyranchconsulting.com> wrote:

> Thanks Mark and Tim.  My understanding has been upgraded.
>
> -----Original Message-----
> From: Mark Miller [mailto:markrmil...@gmail.com]
> Sent: Tuesday, November 19, 2013 1:59 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Zookeeper down question
>
>
> On Nov 19, 2013, at 2:24 PM, Timothy Potter <thelabd...@gmail.com> wrote:
>
> > Good questions ... From my understanding, queries will work if Zk goes
> > down but writes do not work w/o Zookeeper. This works because the
> > clusterstate is cached on each node so Zookeeper doesn't participate
> > directly in queries and indexing requests. Solr has to decide not to
> > allow writes if it loses its connection to Zookeeper, which is a safe
> > guard mechanism. In other words, Solr assumes it's pretty safe to
> > allow reads if the cluster doesn't have a healthy coordinator, but
> chooses to not allow writes to be safe.
>
> Right - we currently stop accepting writes when Solr cannot talk to
> ZooKeeper - this is because we can no longer count on knowing about any
> changes to the cluster and no new leaders can be elected, etc. It gets
> tricky fast if you consider allowing updates without ZooKeeper connectivity
> for very long.
>
> >
> > If a Solr nodes goes down while ZK is not available, since Solr no
> > longer accepts writes, leader / replica doesn't really matter. I'd
> > venture to guess there is some failover logic built in when executing
> > distributing queries but I'm not as familiar with that part of the
> > code (I'll brush up on it though as I'm now curious as well).
>
> Right - query requests will fail over to other replicas - this is
> important in general because the cluster state a Solr instance has can be a
> bit stale - so a request might hit something that has gone down and another
> replica in the shard can be tried. We use the load balancing solrj client
> for these internal requests. CloudSolrServer handles failover for the user
> (or non internal) requests. Or you can use your own external load balancer.
>
> - Mark
>
> >
> > Cheers,
> > Tim
> >
> >
> > On Tue, Nov 19, 2013 at 11:58 AM, Garth Grimm <
> > garthgr...@averyranchconsulting.com> wrote:
> >
> >> Given a 4 solr node instance (i.e. 2 shards, 2 replicas per shard),
> >> and a standalone zookeeper.
> >>
> >> Correct me if any of my understanding is incorrect on the following:
> >> If ZK goes down, most normal operations will still function, since my
> >> understanding is that ZK isn't involved on a transaction by
> >> transaction basis for each of these.....
> >> Document adds, updates, and deletes on existing collection will still
> >> work as expected.
> >> Queries will still get processed as expected.
> >> Is the above correct?
> >>
> >> But adding new collections, changing configs, etc., will all fail
> >> while ZK is down (or at least, place things in an inconsistent
> >> state?) Is that correct?
> >>
> >> If, while ZK is down, one of the 4 solr nodes also goes down, will
> >> all normal operations fail?  Will they all continue to succeed?  I.e.
> >> will each of the nodes realize which node is down and route indexing
> >> and query requests around them, or is that impossible while ZK is
> >> down?  Will some queries succeed (because they were lucky enough to
> >> get routed to the one replica on the one shard that is still
> >> functional) while other queries fail (they aren't so lucky and get
> >> routed to the one replica that is down on the one shard)?
> >>
> >> Thanks,
> >> Garth Grimm
> >>
> >>
> >>
>
>

Re: Zookeeper down question

Reply via email to