Re: [Beowulf] Personal Introduction & First Beowulf Cluster Question

Joe Landman Tue, 09 Dec 2008 06:20:04 -0800

Steve Herborn wrote:

The system's compute nodes were originally built to be "Stateful" and the
current power player on my team wants it to remain that way.  As things sit


Ok, not a problem.

as of today I'm looking at either using AutoYast and am also evaluating Xcat
to perform the task.  The biggest issue with AutoYast is that it will assist
me in getting the OS out to the Nodes; it really doesn't provide any of the

Cluster Management Tools that I would like to get installed.

Which tools do you have in mind? The Autoyast package that we have setup for our customers installs the OS locally, as well as pdsh, ganglia,and several other tools. Then in our finishing scripts which theautoyast.xml file links to, we set up SGE, adjust NIS/mounts, ...

As I indicated, we get operational compute nodes shortly after turningthem on. The current version of autoyast.xml + finishing scripts wehave constructed also builds a RAID0 for local scratch, uses xfs filesystems for root and scratch, installs OFED RPMs (on SuSE), updates thekernel to a late model (2.6.23.14 or so) and does some sysctl tuning.

Now you maybe asking yourself why "Stateful" Compute Nodes as I did.  It

Not really ... end users and customers have preferences. Our job is tohelp them understand the good and bad elements of each. Once theyunderstand, if they prefer to make the decision, then we have themdecide and go from there. If they leave it up to us, we try to helpthem make the best choice.

appears to me at this time that along with occasionally using these nodes as
part of a Cluster, they also use them as plain old Servers/Workstations as
I've found User Accounts & home directories on some of the compute nodes.


Ow.  A central "enterprise" disk is definitely needed.

As I said in my first post I'm new to this position & organization and not
quite sure with exactly how & for what the system is even used for.  I was
simply told to get'er up.

:)

Bug me offline if you want our autoyast.xml, and access to our finishingscripts (parts of our Tiburon package). Check out xCat as well.

Steven A. Herborn
U.S. Naval Academy
Advanced Research Computing
410-293-6480 (Desk)
757-418-0505 (Cell)

-----Original Message-----
From: Joe Landman [mailto:[EMAIL PROTECTED]Sent: Monday, December 08, 2008 1:35 PM
To: Steve Herborn
Cc: beowulf@beowulf.org
Subject: Re: [Beowulf] Personal Introduction & First Beowulf Cluster
Question

Steve Herborn wrote:
Good day to the group. I would like to make a brief introduction tomyself and raise my first question to the forum.
My name is Steve Herborn and I am a new employee at the United StatesNaval Academy in the Advanced Research Computing group which supports
Greetings Steve
the IT systems used for faculty research. Part of my responsibilitieswill be the care & feeding of our Beowulf Cluster which is acommercially procured Cluster from Aspen Systems. It purchased &installed about four or five years ago. As delivered the system wasoriginally configured with two Head nodes each with 32 compute nodes.One head node was running SUSE 9.x and the other Head Node was running// Scyld (version unknown) also with 32 compute nodes. While I don'tknow all of the history, apparently this system was not very activelymaintain and had numerous hardware & software issues, to include losingthe array on which Scyld was installed. //Prior to my arrival a
Ouch ... if you call the good folks at Aspen, they could help with that(ping me if you need a contact)
decision was made to reconfigure the system from having two differenthead nodes running two different OS Distributions to one Head Nodecontrolling all 64 Compute Nodes. In addition SUSE Linux EnterpriseServer (10SP2) (X86-64) was selected as the OS for all of the nodes.
Ok.
Now on to my question which will more then likely be the first of many.In the collective group wisdom what would be the most efficient &
Danger Will Robinson ... for the N people who answer, you are likely toget N+2 answers, and N/2 arguments going ... not a bad thing, but tosteal from the Perl motto "there is more than one way to do these things..."
effective way to "push" the SLES OS out to all of the compute nodes onceit is fully installed & configured on the Head Node. In my research
First: Stateless (e.g. diskless) versus Stateful (e.g. localinstallation). Scyld is "stateless" though Don will likely correct me(as this is massively oversimpilfied). SuSE can be installed Statelessor Stateful. Its installation can be automated ... we have been doingthis for years (one of the few vendors to have done this with SuSE). Itcan also be run diskless ... we have booted compute nodes withInfiniband to fully operational compute nodes visible in all aspectswithin the cluster in under 60 seconds. This is the case for 9.3, 10.xSuSE flavors.
I've read about various Cluster packages/distributions that have thatcapability built in, such as ROCKS & OSCAR which appear to have theinnate capability to do this as well as some additional tools that wouldbe very nice to use in managing the system. However, from my currentresearch in appears that they do not support SLES 10sp2 for the AMD
Rocks only supports Redhat and rebuilds, I wouldn't recommend it for thetask as you have indicated.
Oscar might be able to handle this, though I haven't kept up on it, so Iam not sure how active it is.
You want to look at xCat v2 (open source), and Warewulf/Perceus (opensource). Our package (Tiburon) is not ready to be released, and we willlikely make it a meta package atop Perceus at some point soon. Thoughit is used in production at several large commercial companiesspecifically for SuSE clusters.
64-bit Architecture (although since I am so new at this I could bewrong). Are there any other "free" (money is always an issue) productsor methodologies I should be looking at to push the OS out & help memanage the system? It appears that a commercial product Moab Cluster
See above. If you want a prepackaged system, likely you are going toneed to spend money. Moab is a possibility, though for SuSE, I wouldrecommend looking at Concurrent Thinking's appliance. It will costmoney, but they solve pretty much all of the problems for you.
Builder will do everything I need & more, but I do not have the funds topurchase a solution. I also certainly do not want to perform a manualOS install on all 64 Compute Nodes.
No... in all likelihood, you really don't want to do any installation tothe nodes (stateless if possible).
Thanks in advance for any & all help, advice, guidance, or pearls ofwisdom that you can provide this Neophyte. Oh and please don't ask whySLES 10sp2, I've already been through that one with management. It iswhat I have been provided & will make work.
It's not an issue, though we recommend better kernels/kernel updates.Compared to the RHEL kernels, it uses modern stuff.
Joe
** Steven A. Herborn **

* * U.S. * * ** Naval Academy **

** Advanced Research Computing **

** 410-293-6480 (Desk) **

** 757-418-0505 (Cell) **** **


------------------------------------------------------------------------

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf



--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: [EMAIL PROTECTED]
web  : http://www.scalableinformatics.com
       http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Personal Introduction & First Beowulf Cluster Question

Reply via email to