Distcp is a backup tool, not a synchronization tool. At best, you get a point-in-time snapshot of the DC1. For example, a period schedule of distcp every night at 12am. But in case of total failure, you lose everything from that point in time.
On Mon, May 7, 2018 at 12:30 AM, akshay naidu <[email protected]> wrote: > Hello Hadoopers, > I am planning for a Disaster Recovery(DR) project mainly for *hadoop > clusters*. > Infrastructure is in a DataCenter in West say DC1 . I Have created a > backup hadoop-spark cluster in DataCenter in east say DC2. With Distcp will > keep DC2 synchd with DC1 . This will work as DR . > > But what I want is that in case when DC1 went down completely, the > automatic failover should happen and without any or very very less downtime > DC2 is live. > > I have configured *hadoop high availability* and *Automatic Failover *in > hadoop cluster in DC1 and it works fine. But that won't help in case whole > DC1 goes down. > > Is there a solution where I can keep two hadoop clusters running in > parallel, completely synchd, in two different DataCenters. In case hadoop > cluster in DC1 goes down , Automatic failover occurs to DC2. > > Any hint would be of great help, any feedback, positive or negative, will > be a great help. > > Thanks . > -- A very happy Clouderan
