On 1/3/2017 2:59 AM, Hendrik Haddorp wrote: > I have a SolrCloud setup with 5 nodes and am creating collections with > a replication factor of 3. If I kill and restart nodes at the "right" > time during the creation process the creation seems to get stuck. > Collection data is left in the clusterstate.json file in ZooKeeper and > no collections can be created anymore until this entry gets removed. I > can reproduce this on Solr 6.2.1 and 6.3, while 6.3 seems to be > somewhat less likely to get stuck. Is Solr supposed to recover from > data being stuck in the clusterstate.json at some point? I had one > instance where it looked like data was removed again but normally the > data does not seem to get cleaned up automatically and just blocks any > further collection creations. > > I did not find anything like this in Jira. Just SOLR-7198 sounds a bit > similar even though it is about deleting collections.
Don't restart your nodes at the same time you're trying to do maintenance of any kind on your collections. Try to only do maintenance when they are all working, or you'll get unexpected results. The most recent development goal is make it so that collection deletion can be done even if the creation was partial. The idea is that if something goes wrong, you can delete the bad collection and then be free to try to create it again. I see that you've started another thread about deletion not fully eliminating everything in HDFS. That does sound like a bug. I have no experience with HDFS at all, so I can't be helpful with that. Thanks, Shawn