Would probably need to see some logs to have an idea of what happened. Would also be nice to see the after state of zk in a text dump.
You should be able to fix it, as long as you have the index on a disk, just make sure it is where it is expected and manually update the clusterstate.json. Would be good to take a look at the logs and see if it tells anything first though. I’d also highly recommend you try moving to Solr 4.6.1 when you can though. We have fixed many, many, many bugs around SolrCloud in the 4 releases since 4.4. You can follow the progress in the CHANGES file we update for each release. I wrote a little about the 4.6.1 as it relates to SolrCloud here: https://plus.google.com/+MarkMillerMan/posts/CigxUPN4hbA - Mark http://about.me/markrmiller On Jan 31, 2014, at 10:13 AM, David Santamauro <david.santama...@gmail.com> wrote: > > Hi, > > I have a strange situation. I created a collection with 4 ndoes (separate > servers, numShards=4), I then proceeded to index data ... all has been > seemingly well until this morning when I had to reboot one of the nodes. > > After reboot, the node I rebooted went into recovery mode! This is completely > illogical as there is 1 shard per node (no replicas). > > What could have possibly happened to 1) trigger a recovery and; 2) have the > node think it has a replica to even recover from? > > Looking at the graph from the SOLR admin page it shows that shard1 > disappeared and the server that was rebooted appears in a recovering state > under the server home to shard2. > > I then looked at clusterstate.json and it confirms that shard1 is completely > missing and shard2 now has a replica. ... I'm baffled, confused, dismayed. > > Versions: > Solr 4.4 (4 nodes with tomcat container) > zookeeper-3.4.5 (5-node ensemble) > > Oh, and I'm assuming shard1 is completely corrupt. > > I'd really appreciate any insight. > > David > > PS I have a copy of all the shards backed up. Is there a way to possibly > rsync shard1 back into place and "fix" clusterstate.json manually?