Hi, We try to set up a SOLR Cloud environment using 1 shard with 2 replicas (1 leader). The replicas are managed by 3 zookeeper instances.
The setup seems fine when we do the normal work. The data is being replicated at runtime. Now we try to simulate erroneous behavior in several cases: Turn off one of the replicas in two different scenarios: leader and non-leader Cutting off the network making the non-leader replica down In both cases the data is being written contentiously to the SOLR Cloud. CASE 1: The replication process starts after the failed machine gets boot up again. The complete data set is present in both replicas. Everything works fine. CASE 2: Once reconnected to network the non-leader replica starts the recovery process ,but for some reason the new data from leader is not being replicated onto the previously failed replica. >From what I was able to read from logs comparing both cases I don't understand why SOLR sees RecoveryStrategy ###### currentVersions as present and RecoveryStrategy ###### startupVersions=[[]] (empty) compared to CASE 1 when RecoveryStrategy ###### startupVersions are filled with objects that are in currentVersions in CASE 2 The general question is... why restarting SOLR results in a successful migration process, but reconnecting the network does not? Thanks for any tips / leads! Cheers, Greg