Ludovic, recent Solr changes won't do much to prevent ZK session expiry,
you might want to enable GC logging on Solr and Zookeeper to check for
pauses and tune appropriately.

The patch below fixes a situation under which the cloud can get to a bad
state during the recovery after session expiry. The recovery after a
session expiry is unavoidable, but as you guessed, it would be quick if
there aren't too many updates.

4.6.1 also has SOLR-5577 which will prevent updates from unnecessarily
stalling when you are disconnected from ZK for a short while.

These changes (and probably others) will thus probably help the cloud
behave better on ZK expiry and for that reason I would encourage you to
upgrade, but the ZK expiry problem would have to be dealt with ensuring
that ZK and Solr don't pause for too long and by choosing an appropriate
session timeout (which btw will be defaulted up to 30s from 15s in Solr 4.7
onwards).
On 13 Feb 2014 08:23, "lboutros" <boutr...@gmail.com> wrote:

> Dear all,
>
> we are currenty using Solr 4.3.1 in production (With SolrCloud).
>
> We encounter quite the same problem described in this other old post:
>
>
> http://lucene.472066.n3.nabble.com/SolrCloud-CloudSolrServer-Zookeeper-disconnects-and-re-connects-with-heavy-memory-usage-consumption-td4026421.html
>
> Sometime some nodes are disconnected from Zookeeper and then they try to
> reconnect. The process is quite long because we have a quite long warming
> process. And because of this long warming process, just after the recovery
> process, the node is disconnected again and so on... until OOM sometime.
>
> We already increased the Zk timeout. But it is not enought.
>
> We are thinking to migrate to Solr 4.6.1 at least (perhaps 4.7 will be up
> before the end of the migration :) ).
>
> I know that a lot of SolrCloud bugs are corrected since Solr 4.3.1.
>
> But, could we be sure that this problem will be resolved ? Or can this
> problem occur with the last Solr version ? (I know this is not an easy
> question ;) )
>
> It seems that this correction :
>
> Deadlock while trying to recover after a ZK session expiry :
> https://issues.apache.org/jira/browse/SOLR-5615
>
> is a good point in addressing our current problem.
>
> But do you think it will be enought ?
>
> One last thing, I don't know if it is already adressed by a correction,
> but,
> if there is no updates between disconnection and the reconnection, the
> recovery process should not do anything more than the reconnection, I mean:
> no replication, no tLog replay and no warming process. Is it the case ?
>
> Ludovic.
>
>
>
> -----
> Jouve
> France.
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/SolrCloud-Zookeeper-disconnection-reconnection-tp4117101.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Reply via email to