Problem is that we would like to run without down times. Rolling updates worked fine so far except when creating a collection at the wrong time. I just did another test with stateFormat=2. This seems to greatly improve the situation. One collection creation got stuck but other creations still worked and after a restart of some nodes the stuck collection creation also looked ok. For some reason it just resulted in two replicas for the same shard getting assigned to the same node even though I specified a rule of "shard:*,replica:<2,node:*".

On 03.01.2017 15:34, Shawn Heisey wrote:
On 1/3/2017 2:59 AM, Hendrik Haddorp wrote:
I have a SolrCloud setup with 5 nodes and am creating collections with
a replication factor of 3. If I kill and restart nodes at the "right"
time during the creation process the creation seems to get stuck.
Collection data is left in the clusterstate.json file in ZooKeeper and
no collections can be created anymore until this entry gets removed. I
can reproduce this on Solr 6.2.1 and 6.3, while 6.3 seems to be
somewhat less likely to get stuck. Is Solr supposed to recover from
data being stuck in the clusterstate.json at some point? I had one
instance where it looked like data was removed again but normally the
data does not seem to get cleaned up automatically and just blocks any
further collection creations.

I did not find anything like this in Jira. Just SOLR-7198 sounds a bit
similar even though it is about deleting collections.
Don't restart your nodes at the same time you're trying to do
maintenance of any kind on your collections.  Try to only do maintenance
when they are all working, or you'll get unexpected results.

The most recent development goal is make it so that collection deletion
can be done even if the creation was partial.  The idea is that if
something goes wrong, you can delete the bad collection and then be free
to try to create it again.  I see that you've started another thread
about deletion not fully eliminating everything in HDFS.  That does
sound like a bug.  I have no experience with HDFS at all, so I can't be
helpful with that.

Thanks,
Shawn


Reply via email to