On Wed, Mar 25, 2015 at 12:51 PM, Shai Erera <ser...@gmail.com> wrote:
> Thanks. > > Does Solr ever clean up those states? I.e. does it ever remove "down" > replicas, or replicas belonging to non-live_nodes after some time? Or will > these remain in the cluster state forever (assuming they never come back > up)? > No, they remain there forever. You can still call the deletereplica API to clean them up. There's even a param onyIfDown=true which will remove a replica only if it's already 'down'. > > If they remain there, is there any penalty? E.g. Solr tries to send them > updates, maybe tries to route search requests to? I'm talking about > replicas that stay in ACTIVE state, but their nodes aren't under > /live_nodes. > No, there is no penalty because we always check for the state=active and the live-ness before routing any requests to a replica. > > Shai > > On Wed, Mar 25, 2015 at 8:05 PM, Shalin Shekhar Mangar < > shalinman...@gmail.com> wrote: > > > Comments inline: > > > > On Wed, Mar 25, 2015 at 8:30 AM, Shai Erera <ser...@gmail.com> wrote: > > > > > Hi > > > > > > Is it possible for a replica to be DOWN, while the node it resides on > is > > > under /live_nodes? If so, what can lead to it, aside from someone > > unloading > > > a core. > > > > > > > Yes, aside from someone unloading the index, this can happen in two ways > 1) > > during startup each core publishes it's state as 'down' before it enters > > recovery, and 2) the leader force-publishes a replica as 'down' if it is > > not able to forward updates to that replica (this mechanism is called > > Leader-Initiated-Recovery or LIR in short) > > > > The #2 above can happen when the replica is partitioned from leader but > > both are able to talk to ZooKeeper. > > > > > > > > > > I don't know if each SolrCore reports status to ZK independently, or > it's > > > done by the Solr process as a whole. > > > > > > > > It is done on a per-core basis for now. But the 'live' node is maintained > > one per Solr instance (JVM). > > > > > > > Also, is it possible for a replica to report ACTIVE, while the node it > > > lives on is no longer under /live_nodes? Are there any ZK timings that > > can > > > cause that? > > > > > > > Yes, this can happen if the JVM crashed. A replica publishes itself as > > 'down' on shutdown so if the graceful shutdown step is skipped then the > > replica will continue to be 'active' in the cluster state. Even LIR > doesn't > > apply here because there's no point in the leader marking a node as > 'down' > > if it is not 'live' already. > > > > > > > > > > Shai > > > > > > > > > > > -- > > Regards, > > Shalin Shekhar Mangar. > > > -- Regards, Shalin Shekhar Mangar.