This happens some time that one of the node goes down but then it gets
registered as Leader/Active.  Does the Cloud View shows anything about this
node (Recovering/Down/Recovery Failed etc.) and are you able to perform
query to just this shard/node directly?

Susheel

On Wed, Dec 7, 2016 at 10:13 PM, Mark Miller <markrmil...@gmail.com> wrote:

> That already happens. The ZK client itself will reconnect when it can and
> trigger everything to be setup like when the cluster first starts up,
> including a live node and leader election, etc.
>
> You may have hit a bug or something else missing from this conversation,
> but reconnecting after losing the ZK connection is a basic feature from day
> one.
>
> Mark
> On Wed, Dec 7, 2016 at 12:34 AM Manohar Sripada <manohar...@gmail.com>
> wrote:
>
> > Thanks Erick! Should I create a JIRA issue for the same?
> >
> > Regarding the logs, I have changed the log level to WARN. That may be the
> > reason, I couldn't get anything from it.
> >
> > Thanks,
> > Manohar
> >
> > On Tue, Dec 6, 2016 at 9:58 PM, Erick Erickson <erickerick...@gmail.com>
> > wrote:
> >
> > > Most likely reason is that the Solr node in question,
> > > was not reachable thus it was removed from
> > > live_nodes. Perhaps due to temporary network
> > > glitch, long GC pause or the like. If you're rolling
> > > your logs over it's quite possible that any illuminating
> > > messages were lost. The default 4M size for each
> > > log is quite lo at INFO level...
> > >
> > > It does seem possible for a Solr node to periodically
> > > check its status and re-insert itself into live_nodes,
> > > go through recovery and all that. So far most of that
> > > registration logic is baked into startup code. What
> > > do others think? Worth a JIRA?
> > >
> > > Erick
> > >
> > > On Tue, Dec 6, 2016 at 3:53 AM, Manohar Sripada <manohar...@gmail.com>
> > > wrote:
> > > > We have a 16 node cluster of Solr (5.2.1) and 5 node Zookeeper
> (3.4.6).
> > > >
> > > > All the Solr nodes were registered to Zookeeper (ls /live_nodes) when
> > > setup
> > > > was done 3 months back. Suddenly, few days back our search started
> > > failing
> > > > because one of the solr node(consider s16) was not seen in Zookeeper,
> > > i.e.,
> > > > when we checked for *"ls /live_nodes"*, *s16 *solr node was not
> found.
> > > > However, the corresponding Solr process was up and running.
> > > >
> > > > To my surprise, I couldn't find any errors or warnings in solr or
> > > zookeeper
> > > > logs related to this. I have few questions -
> > > >
> > > > 1. Is there any reason why this registration to ZK was lost? I know
> > logs
> > > > should provide some information, but, it didn't. Did anyone
> encountered
> > > > similar issue, if so, what can be the root cause?
> > > > 2. Shouldn't Solr be clever enough to detect that the registration to
> > ZK
> > > > was lost (for some reason) and should try to re-register again?
> > > >
> > > > PS: The issue is resolved by restarting the Solr node. However, I am
> > > > curious to know why it happened in the first place.
> > > >
> > > > Thanks
> > >
> >
> --
> - Mark
> about.me/markrmiller
>

Reply via email to