That already happens. The ZK client itself will reconnect when it can and
trigger everything to be setup like when the cluster first starts up,
including a live node and leader election, etc.

You may have hit a bug or something else missing from this conversation,
but reconnecting after losing the ZK connection is a basic feature from day
one.

Mark
On Wed, Dec 7, 2016 at 12:34 AM Manohar Sripada <manohar...@gmail.com>
wrote:

> Thanks Erick! Should I create a JIRA issue for the same?
>
> Regarding the logs, I have changed the log level to WARN. That may be the
> reason, I couldn't get anything from it.
>
> Thanks,
> Manohar
>
> On Tue, Dec 6, 2016 at 9:58 PM, Erick Erickson <erickerick...@gmail.com>
> wrote:
>
> > Most likely reason is that the Solr node in question,
> > was not reachable thus it was removed from
> > live_nodes. Perhaps due to temporary network
> > glitch, long GC pause or the like. If you're rolling
> > your logs over it's quite possible that any illuminating
> > messages were lost. The default 4M size for each
> > log is quite lo at INFO level...
> >
> > It does seem possible for a Solr node to periodically
> > check its status and re-insert itself into live_nodes,
> > go through recovery and all that. So far most of that
> > registration logic is baked into startup code. What
> > do others think? Worth a JIRA?
> >
> > Erick
> >
> > On Tue, Dec 6, 2016 at 3:53 AM, Manohar Sripada <manohar...@gmail.com>
> > wrote:
> > > We have a 16 node cluster of Solr (5.2.1) and 5 node Zookeeper (3.4.6).
> > >
> > > All the Solr nodes were registered to Zookeeper (ls /live_nodes) when
> > setup
> > > was done 3 months back. Suddenly, few days back our search started
> > failing
> > > because one of the solr node(consider s16) was not seen in Zookeeper,
> > i.e.,
> > > when we checked for *"ls /live_nodes"*, *s16 *solr node was not found.
> > > However, the corresponding Solr process was up and running.
> > >
> > > To my surprise, I couldn't find any errors or warnings in solr or
> > zookeeper
> > > logs related to this. I have few questions -
> > >
> > > 1. Is there any reason why this registration to ZK was lost? I know
> logs
> > > should provide some information, but, it didn't. Did anyone encountered
> > > similar issue, if so, what can be the root cause?
> > > 2. Shouldn't Solr be clever enough to detect that the registration to
> ZK
> > > was lost (for some reason) and should try to re-register again?
> > >
> > > PS: The issue is resolved by restarting the Solr node. However, I am
> > > curious to know why it happened in the first place.
> > >
> > > Thanks
> >
>
-- 
- Mark
about.me/markrmiller

Reply via email to