Most likely reason is that the Solr node in question, was not reachable thus it was removed from live_nodes. Perhaps due to temporary network glitch, long GC pause or the like. If you're rolling your logs over it's quite possible that any illuminating messages were lost. The default 4M size for each log is quite lo at INFO level...
It does seem possible for a Solr node to periodically check its status and re-insert itself into live_nodes, go through recovery and all that. So far most of that registration logic is baked into startup code. What do others think? Worth a JIRA? Erick On Tue, Dec 6, 2016 at 3:53 AM, Manohar Sripada <manohar...@gmail.com> wrote: > We have a 16 node cluster of Solr (5.2.1) and 5 node Zookeeper (3.4.6). > > All the Solr nodes were registered to Zookeeper (ls /live_nodes) when setup > was done 3 months back. Suddenly, few days back our search started failing > because one of the solr node(consider s16) was not seen in Zookeeper, i.e., > when we checked for *"ls /live_nodes"*, *s16 *solr node was not found. > However, the corresponding Solr process was up and running. > > To my surprise, I couldn't find any errors or warnings in solr or zookeeper > logs related to this. I have few questions - > > 1. Is there any reason why this registration to ZK was lost? I know logs > should provide some information, but, it didn't. Did anyone encountered > similar issue, if so, what can be the root cause? > 2. Shouldn't Solr be clever enough to detect that the registration to ZK > was lost (for some reason) and should try to re-register again? > > PS: The issue is resolved by restarting the Solr node. However, I am > curious to know why it happened in the first place. > > Thanks