Comments inline: On Wed, Mar 25, 2015 at 8:30 AM, Shai Erera <ser...@gmail.com> wrote:
> Hi > > Is it possible for a replica to be DOWN, while the node it resides on is > under /live_nodes? If so, what can lead to it, aside from someone unloading > a core. > Yes, aside from someone unloading the index, this can happen in two ways 1) during startup each core publishes it's state as 'down' before it enters recovery, and 2) the leader force-publishes a replica as 'down' if it is not able to forward updates to that replica (this mechanism is called Leader-Initiated-Recovery or LIR in short) The #2 above can happen when the replica is partitioned from leader but both are able to talk to ZooKeeper. > > I don't know if each SolrCore reports status to ZK independently, or it's > done by the Solr process as a whole. > > It is done on a per-core basis for now. But the 'live' node is maintained one per Solr instance (JVM). > Also, is it possible for a replica to report ACTIVE, while the node it > lives on is no longer under /live_nodes? Are there any ZK timings that can > cause that? > Yes, this can happen if the JVM crashed. A replica publishes itself as 'down' on shutdown so if the graceful shutdown step is skipped then the replica will continue to be 'active' in the cluster state. Even LIR doesn't apply here because there's no point in the leader marking a node as 'down' if it is not 'live' already. > > Shai > -- Regards, Shalin Shekhar Mangar.