Hi,
I have a strange situation. I created a collection with 4 ndoes
(separate servers, numShards=4), I then proceeded to index data ... all
has been seemingly well until this morning when I had to reboot one of
the nodes.
After reboot, the node I rebooted went into recovery mode! This is
completely illogical as there is 1 shard per node (no replicas).
What could have possibly happened to 1) trigger a recovery and; 2) have
the node think it has a replica to even recover from?
Looking at the graph from the SOLR admin page it shows that shard1
disappeared and the server that was rebooted appears in a recovering
state under the server home to shard2.
I then looked at clusterstate.json and it confirms that shard1 is
completely missing and shard2 now has a replica. ... I'm baffled,
confused, dismayed.
Versions:
Solr 4.4 (4 nodes with tomcat container)
zookeeper-3.4.5 (5-node ensemble)
Oh, and I'm assuming shard1 is completely corrupt.
I'd really appreciate any insight.
David
PS I have a copy of all the shards backed up. Is there a way to possibly
rsync shard1 back into place and "fix" clusterstate.json manually?