That already happens. The ZK client itself will reconnect when it can and trigger everything to be setup like when the cluster first starts up, including a live node and leader election, etc.
You may have hit a bug or something else missing from this conversation, but reconnecting after losing the ZK connection is a basic feature from day one. Mark On Wed, Dec 7, 2016 at 12:34 AM Manohar Sripada <manohar...@gmail.com> wrote: > Thanks Erick! Should I create a JIRA issue for the same? > > Regarding the logs, I have changed the log level to WARN. That may be the > reason, I couldn't get anything from it. > > Thanks, > Manohar > > On Tue, Dec 6, 2016 at 9:58 PM, Erick Erickson <erickerick...@gmail.com> > wrote: > > > Most likely reason is that the Solr node in question, > > was not reachable thus it was removed from > > live_nodes. Perhaps due to temporary network > > glitch, long GC pause or the like. If you're rolling > > your logs over it's quite possible that any illuminating > > messages were lost. The default 4M size for each > > log is quite lo at INFO level... > > > > It does seem possible for a Solr node to periodically > > check its status and re-insert itself into live_nodes, > > go through recovery and all that. So far most of that > > registration logic is baked into startup code. What > > do others think? Worth a JIRA? > > > > Erick > > > > On Tue, Dec 6, 2016 at 3:53 AM, Manohar Sripada <manohar...@gmail.com> > > wrote: > > > We have a 16 node cluster of Solr (5.2.1) and 5 node Zookeeper (3.4.6). > > > > > > All the Solr nodes were registered to Zookeeper (ls /live_nodes) when > > setup > > > was done 3 months back. Suddenly, few days back our search started > > failing > > > because one of the solr node(consider s16) was not seen in Zookeeper, > > i.e., > > > when we checked for *"ls /live_nodes"*, *s16 *solr node was not found. > > > However, the corresponding Solr process was up and running. > > > > > > To my surprise, I couldn't find any errors or warnings in solr or > > zookeeper > > > logs related to this. I have few questions - > > > > > > 1. Is there any reason why this registration to ZK was lost? I know > logs > > > should provide some information, but, it didn't. Did anyone encountered > > > similar issue, if so, what can be the root cause? > > > 2. Shouldn't Solr be clever enough to detect that the registration to > ZK > > > was lost (for some reason) and should try to re-register again? > > > > > > PS: The issue is resolved by restarting the Solr node. However, I am > > > curious to know why it happened in the first place. > > > > > > Thanks > > > -- - Mark about.me/markrmiller