[Beowulf] Re: Lustre failover

Greg Keller Mon, 15 Sep 2008 14:09:52 -0700

Re: Lustre failover

I've worked on a number of large'sh lustre configs over the years, andall of them have been configured with Active/Active type mappings.There are a few issues being confused here:

1) Active/Active does not mean both OSS are accessing the same luns atthe same time. Each "pair" of OSS nodes can see the same group of OSTLuns exist on shared storage, but each OSS normally accesses only it's1/2 of them until an OSS dies.

2) Active/Active does not mean *automatic* failover. In all but 1case I have worked on the choice was made to have a rational HumanBeing like creature decide if the safest/fastest repair was to bringback the original OSS, or to failover the orphaned luns to theiralternate OSS node. When the phone rings at 3am the rational andhuman being like descriptors are diminished, but still smarter thanmost scripts at assessing the best response to a failure.

3) Automatic Failover is completely doable if you can STONITH thefailed node (Shoot the other node in the head). With a good networkcontrolled power strip you can kill the failed node so it can't comeback and continue writing to the OST Luns it used to own (whichconfetti's your 1s and 0s). Linux HA with heartbeats on Serial andTCP/GbE is the most common approach to automation. Once the failed/suspect node is guaranteed not to make a surprise comback, the OST'sit left behind will need to be started by the surviving OSS.

4) IPMI Power control has just enough lag/inconsisteny that the"Shooting draw" between 2 functional OSS servers can result in BOTHservers (or neither) powering down.... don't depend on it unless yourIPMI implementation is ultra responsive and reliable. Make sure yourscript verifies it's "tripple dog sure" the other node can't come backbefore taking over the abandoned OST's.

**Shameful Plug** DataDirect has a whitepaper that demonstrates manyLustre Failover Best Practices complete with pictures etc that is spoton from my experience. Here's the link:

http://www.datadirectnet.com/resource-downloads/best-practices-for-architecting-a-lustre-based-storage-environment-download
*****

Cheers!
Greg


On Sep 10, 2008, at 8:05 AM, [EMAIL PROTECTED] wrote:


With OST servers it is possible to have a load-balanced active/active
configuration.
Each node is the primary node for a group of OSTs, and the failover
node for other

...

Anyone done this on a production system?


we have a number of HP's Lustre (SFS) clusters, which use
dual-homed disk arrays, but in active/passive configuration.
it works reasonably well.

Experiances? Comments?


active/active seems strange to me - it implies that the bottleneck
is the OSS (OST server), rather than the disk itself.  and a/a means
each OSS has to do more locking for the shared disk, which would seem
to make the problem worse...


_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

[Beowulf] Re: Lustre failover

Reply via email to