Hi Jon,

Quoting Tony Travis <[EMAIL PROTECTED]>:
Although Kerrighed looks very promising, it is also quite fragile in our hands. If one node crashes, you lose the entire cluster. That said, the Kerrighed project is extremely well supported and I believe it will be a good alternative in the near future.


We found that with Kerrighed, one node crashing sees the whole cluster go down. The following is output to kern.log before the cluster dies.

Jul  2 13:57:03 [EMAIL PROTECTED] kernel: TIPC: Resetting link
<1.1.2:eth1-1.1.3:eth1>, peer not responding
Jul  2 13:57:03 [EMAIL PROTECTED] kernel: TIPC: Lost link
<1.1.2:eth1-1.1.3:eth1> on network plane B
Jul  2 13:57:03 [EMAIL PROTECTED] kernel: TIPC: Lost contact with <1.1.3>

From the Kerrighed mailing list (Louis Rilling);

"Indeed, Kerrighed does not tolerate node failures yet. We have no precise date
for this, and giving a date right now would be meaningless. The first step for
us is to support dynamic cluster resizing (IOW live node additions and removals), and we've just started working on it. We will work on node failures in a second step."

It seems they are working on this, and on a new framework for configurable process scheduling. Probably Kerrighed will provide a good alternative in future.

Kenneth



--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.




_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to