Hi,

I'm working with a few clusters of 100+ nodes and I've been wondering how
exactly the failover, as well as a cold start, works in respect to the
block reports.

I sometimes see failover times of 15-45 minutes waiting in the safe mode
for all blocks to report in.

Datanodes usually send a report every six hours I believe, so there must be
something else going on.

How are Datanodes informed of the new Namenode?
How do they know that they should send a full block report (assuming this
is what happens)?
-> I assume the answer to both lies in Heartbeats?

Are there any guidelines on how long recovery should take and are there any
options that can be used to decrease the time?

Thank you!

Reply via email to