Shalin:

Just to see if my understanding is correct, how often would you expect <2> to
occur? My assumption so far is that it would be quite rare that the leader
and all replicas happened to hit autocommit points at the same time and thus it
would be save to just bring down a few segments. But that's
an assumption, I have no facts to back that up.

Nishanth:

Currently no, you can't configure the missed updates and still peer
sync. Getting
to the bottom of the connection resets seems indicated.

Best
Erick

On Wed, Jan 21, 2015 at 6:46 PM, Nishanth S <nishanth.2...@gmail.com> wrote:
> Thank you Shalin.So in a system where the indexing rate is more than 5K TPS
> or so the replica  will never be able to recover   through peer sync
> process.In  my case I have mostly seen  step 3 where a full copy happens
> and  if the index size is huge it takes a very long time for replicas to
> recover.Is there a way we can  configure the  number of missed updates for
> peer sync.
>
> Thanks,
> Nishanth
>
> On Wed, Jan 21, 2015 at 4:47 PM, Shalin Shekhar Mangar <
> shalinman...@gmail.com> wrote:
>
>> Hi Nishanth,
>>
>> The recovery happens as follows:
>>
>> 1. PeerSync is attempted first. If the number of new updates on leader is
>> less than 100 then the missing documents are fetched directly and indexed
>> locally. The tlog tells us the last 100 updates very quickly. Other uses of
>> the tlog are for durability of updates and of course, startup recovery.
>> 2. If the above step fails then replication recovery is attempted. A hard
>> commit is called on the leader and then the leader is polled for the latest
>> index version and generation. If the leader's version and generation are
>> greater than local index's version/generation then the difference of the
>> index files between leader and replica are fetched and installed.
>> 3. If the above fails (because leader's version/generation is somehow equal
>> or more than local) then a full index recovery happens and the entire index
>> from the leader is fetched and installed locally.
>>
>> There are some other details involved in this process too but probably not
>> worth going into here.
>>
>> On Wed, Jan 21, 2015 at 5:13 PM, Nishanth S <nishanth.2...@gmail.com>
>> wrote:
>>
>> > Hello Everyone,
>> >
>> > I am hitting a few issues with solr replicas going into recovery and then
>> > doing a full index copy.I am trying to understand the solr recovery
>> > process.I have read a few blogs  on this and saw  that when leader
>> notifies
>> > a replica to  recover(in my case it is due to connection resets) it will
>> > try to do a peer sync first and  if the missed updates are more than 100
>> it
>> > will do a full index copy from the leader.I am trying to understand what
>> > peer sync is and where does tlog come into picture.Are tlogs replayed
>> only
>> > during server restart?.Can some one  help me with this?
>> >
>> > Thanks,
>> > Nishanth
>> >
>>
>>
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>

Reply via email to