Hi,
I have a SolrCloud 6.2.1 setup with 5 nodes. I do an occasional restart
of my nodes in which I restart one node at a time. I have quite a few
collections. Lets say 2000 with a replication factor of 3. When the node
comes up again it looks like I get the same issue as described in
SOLR-5796. According to Jira this should be fixed in 6.0. Is there now a
setting to increase the conflict resolution time as I also saw some
leader conflict exceptions in some logs. If so could somebody point me
to those settings?
A second thing is that it looks like when a new collection is being
created there is first data being written into /clusterstate.json in ZK.
I thought this was a legacy file and not being used anymore but that
does not seem to be the case. The problem is now that when a new
collection is being created and the first node is being assigned to a
node and then I'm happening to stop exactly that node the collection
does not seem to recover after the restart. The admin UI shows the new
collection with one down replica but it never recovers. In this state I
can not create any further collections anymore. The only solution that I
found so far is to set the contents of /clusterstate.json to "{}" but
this kills the collection. Is that a know issue?
The release notes of Solr 6.3 stated "Many bug fixes related to
SolrCloud recovery for data safety and faster recovery times". Any
chance that those could fix the issues I'm seeing?
thanks,
Hendrik