Re: Replica and node states

Shalin Shekhar Mangar Wed, 25 Mar 2015 11:08:54 -0700

Comments inline:

On Wed, Mar 25, 2015 at 8:30 AM, Shai Erera <ser...@gmail.com> wrote:


> Hi
>
> Is it possible for a replica to be DOWN, while the node it resides on is
> under /live_nodes? If so, what can lead to it, aside from someone unloading
> a core.
>

Yes, aside from someone unloading the index, this can happen in two ways 1)
during startup each core publishes it's state as 'down' before it enters
recovery, and 2) the leader force-publishes a replica as 'down' if it is
not able to forward updates to that replica (this mechanism is called
Leader-Initiated-Recovery or LIR in short)

The #2 above can happen when the replica is partitioned from leader but
both are able to talk to ZooKeeper.


>
> I don't know if each SolrCore reports status to ZK independently, or it's
> done by the Solr process as a whole.
>
>
It is done on a per-core basis for now. But the 'live' node is maintained
one per Solr instance (JVM).


> Also, is it possible for a replica to report ACTIVE, while the node it
> lives on is no longer under /live_nodes? Are there any ZK timings that can
> cause that?
>

Yes, this can happen if the JVM crashed. A replica publishes itself as
'down' on shutdown so if the graceful shutdown step is skipped then the
replica will continue to be 'active' in the cluster state. Even LIR doesn't
apply here because there's no point in the leader marking a node as 'down'
if it is not 'live' already.


>
> Shai
>



-- 
Regards,
Shalin Shekhar Mangar.

Re: Replica and node states

Reply via email to