This happens some time that one of the node goes down but then it gets registered as Leader/Active. Does the Cloud View shows anything about this node (Recovering/Down/Recovery Failed etc.) and are you able to perform query to just this shard/node directly?
Susheel On Wed, Dec 7, 2016 at 10:13 PM, Mark Miller <markrmil...@gmail.com> wrote: > That already happens. The ZK client itself will reconnect when it can and > trigger everything to be setup like when the cluster first starts up, > including a live node and leader election, etc. > > You may have hit a bug or something else missing from this conversation, > but reconnecting after losing the ZK connection is a basic feature from day > one. > > Mark > On Wed, Dec 7, 2016 at 12:34 AM Manohar Sripada <manohar...@gmail.com> > wrote: > > > Thanks Erick! Should I create a JIRA issue for the same? > > > > Regarding the logs, I have changed the log level to WARN. That may be the > > reason, I couldn't get anything from it. > > > > Thanks, > > Manohar > > > > On Tue, Dec 6, 2016 at 9:58 PM, Erick Erickson <erickerick...@gmail.com> > > wrote: > > > > > Most likely reason is that the Solr node in question, > > > was not reachable thus it was removed from > > > live_nodes. Perhaps due to temporary network > > > glitch, long GC pause or the like. If you're rolling > > > your logs over it's quite possible that any illuminating > > > messages were lost. The default 4M size for each > > > log is quite lo at INFO level... > > > > > > It does seem possible for a Solr node to periodically > > > check its status and re-insert itself into live_nodes, > > > go through recovery and all that. So far most of that > > > registration logic is baked into startup code. What > > > do others think? Worth a JIRA? > > > > > > Erick > > > > > > On Tue, Dec 6, 2016 at 3:53 AM, Manohar Sripada <manohar...@gmail.com> > > > wrote: > > > > We have a 16 node cluster of Solr (5.2.1) and 5 node Zookeeper > (3.4.6). > > > > > > > > All the Solr nodes were registered to Zookeeper (ls /live_nodes) when > > > setup > > > > was done 3 months back. Suddenly, few days back our search started > > > failing > > > > because one of the solr node(consider s16) was not seen in Zookeeper, > > > i.e., > > > > when we checked for *"ls /live_nodes"*, *s16 *solr node was not > found. > > > > However, the corresponding Solr process was up and running. > > > > > > > > To my surprise, I couldn't find any errors or warnings in solr or > > > zookeeper > > > > logs related to this. I have few questions - > > > > > > > > 1. Is there any reason why this registration to ZK was lost? I know > > logs > > > > should provide some information, but, it didn't. Did anyone > encountered > > > > similar issue, if so, what can be the root cause? > > > > 2. Shouldn't Solr be clever enough to detect that the registration to > > ZK > > > > was lost (for some reason) and should try to re-register again? > > > > > > > > PS: The issue is resolved by restarting the Solr node. However, I am > > > > curious to know why it happened in the first place. > > > > > > > > Thanks > > > > > > -- > - Mark > about.me/markrmiller >