Here is nice explanation [1] what your options are. HDFS does not support replication between clusters [2]. If you are using HBase, things are better.
> In case hadoop cluster in DC1 goes down , Automatic failover occurs to > DC2. There are setups with DRBD, but if you can afford small loss between distcp runs in case of disaster, haproxy (for client facing connections) with distcp is the simplest approach. [1] https://community.hortonworks.com/questions/29645/hdfs-replication-for-dr.html [2] https://issues.apache.org/jira/browse/HDFS-5442 Best, Sanel akshay naidu <[email protected]> writes: > Hello Hadoopers, > I am planning for a Disaster Recovery(DR) project mainly for *hadoop > clusters*. > Infrastructure is in a DataCenter in West say DC1 . I Have created a backup > hadoop-spark cluster in DataCenter in east say DC2. With Distcp will keep > DC2 synchd with DC1 . This will work as DR . > > But what I want is that in case when DC1 went down completely, the > automatic failover should happen and without any or very very less downtime > DC2 is live. > > I have configured *hadoop high availability* and *Automatic Failover *in > hadoop cluster in DC1 and it works fine. But that won't help in case whole > DC1 goes down. > > Is there a solution where I can keep two hadoop clusters running in > parallel, completely synchd, in two different DataCenters. In case hadoop > cluster in DC1 goes down , Automatic failover occurs to DC2. > > Any hint would be of great help, any feedback, positive or negative, will > be a great help. > > Thanks . --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
