Hi Mark, I get that use case, if the non-leader dies, when it comes back it has to allow for recovery, that makes perfect sense.
I guess I was (naively!) assuming there was an optimized scenario if the leader dies, and is the first one to come back (is still therefore leader), there is no recovery to do as by definition no updates can have been made whilst the shard was inactive. Aside: Interesting point about Solr only ack updates when they are on every replica, are you talking about when the records are removed from the transaction log? My understanding was the the external "update" request completes as soon as the document has made it to the leader's transaction log (might not even have committed into the leader index), and the replicas then were pushed those updates as they became available. If a single replica dies, the leader can still process update/add document requests, so it can't be waiting for replicas in that scenario? > On Nov 28, 2012, at 11:58 AM, Mark Miller <markrmil...@gmail.com> wrote: > > > and we don't want to lose any updates. > > > That's probably somewhat inaccurate - in this case it's more about > consistency - we only ack updates once they are on every replica. So it's > not a lost updates issue, but a consistency issue. > > The lost updates part is more like when you stop the cluster, than you > start an old shard or something before starting more recent shards - you > don't want that thing to become the leader because the other shards were > not up yet. > > - Mark > >