On Nov 29, 2012, at 1:26 PM, Daniel Collins <danwcoll...@gmail.com> wrote:
> Hi Mark, > > I get that use case, if the non-leader dies, when it comes back it has to > allow for recovery, that makes perfect sense. > > I guess I was (naively!) assuming there was an optimized scenario if the > leader dies, and is the first one to come back (is still therefore leader), > there is no recovery to do as by definition no updates can have been made > whilst the shard was inactive. > > Aside: Interesting point about Solr only ack updates when they are on every > replica, are you talking about when the records are removed from the > transaction log? > > My understanding was the the external "update" request completes as soon as > the document has made it to the leader's transaction log (might not even > have committed into the leader index), and the replicas then were pushed > those updates as they became available. No, currently it won't return until the update hits the replicas - its sent to replicas in parallel. > > If a single replica dies, the leader can still process update/add document > requests, so it can't be waiting for replicas in that scenario? There should be no wait if there are any nodes waiting in line to be leader - it should only wait when a node comes up and realizes it's the leader and no one else was in line to be leader. - Mark > >> On Nov 28, 2012, at 11:58 AM, Mark Miller <markrmil...@gmail.com> wrote: >> >> >> and we don't want to lose any updates. >> >> >> That's probably somewhat inaccurate - in this case it's more about >> consistency - we only ack updates once they are on every replica. So it's >> not a lost updates issue, but a consistency issue. >> >> The lost updates part is more like when you stop the cluster, than you >> start an old shard or something before starting more recent shards - you >> don't want that thing to become the leader because the other shards were >> not up yet. >> >> - Mark >> >>