Re: Lustre failover

I've worked on a number of large'sh lustre configs over the years, and all of them have been configured with Active/Active type mappings. There are a few issues being confused here:

1) Active/Active does not mean both OSS are accessing the same luns at the same time. Each "pair" of OSS nodes can see the same group of OST Luns exist on shared storage, but each OSS normally accesses only it's 1/2 of them until an OSS dies.

2) Active/Active does not mean *automatic* failover. In all but 1 case I have worked on the choice was made to have a rational Human Being like creature decide if the safest/fastest repair was to bring back the original OSS, or to failover the orphaned luns to their alternate OSS node. When the phone rings at 3am the rational and human being like descriptors are diminished, but still smarter than most scripts at assessing the best response to a failure.

3) Automatic Failover is completely doable if you can STONITH the failed node (Shoot the other node in the head). With a good network controlled power strip you can kill the failed node so it can't come back and continue writing to the OST Luns it used to own (which confetti's your 1s and 0s). Linux HA with heartbeats on Serial and TCP/GbE is the most common approach to automation. Once the failed/ suspect node is guaranteed not to make a surprise comback, the OST's it left behind will need to be started by the surviving OSS.

4) IPMI Power control has just enough lag/inconsisteny that the "Shooting draw" between 2 functional OSS servers can result in BOTH servers (or neither) powering down.... don't depend on it unless your IPMI implementation is ultra responsive and reliable. Make sure your script verifies it's "tripple dog sure" the other node can't come back before taking over the abandoned OST's.

**Shameful Plug** DataDirect has a whitepaper that demonstrates many Lustre Failover Best Practices complete with pictures etc that is spot on from my experience. Here's the link:
http://www.datadirectnet.com/resource-downloads/best-practices-for-architecting-a-lustre-based-storage-environment-download
*****

Cheers!
Greg


On Sep 10, 2008, at 8:05 AM, [EMAIL PROTECTED] wrote:


With OST servers it is possible to have a load-balanced active/active
configuration.
Each node is the primary node for a group of OSTs, and the failover
node for other
...
Anyone done this on a production system?

we have a number of HP's Lustre (SFS) clusters, which use
dual-homed disk arrays, but in active/passive configuration.
it works reasonably well.

Experiances? Comments?

active/active seems strange to me - it implies that the bottleneck
is the OSS (OST server), rather than the disk itself.  and a/a means
each OSS has to do more locking for the shared disk, which would seem
to make the problem worse...


_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to