Forward: I don't actually take any advocacy position on choice of distro. RH, Debian, BSD, I don't care. Any contrary statements are made strictly in the interest of the truth.
Robert G. Brown wrote: > On Fri, 29 Dec 2006, Andrew M.A. Cater wrote: <snip> > > Also, how large are those speed advantages? How many of them cannot > already be obtained by simply using a good commercial compiler and > spending some time tuning the application? Very few tools (ATLAS > being a good example) really tune per microarchitecture. The process > is not linear, and it is not easy. Even ATLAS tunes "automatically" > more from a multidimensional gradient search based on certain > assumptions -- I don't think it would be easy to prove that the > optimum it reaches is a global optimum. It most definitely isn't. Goto trounces it easily. ATLAS is the first stab at an optimized BLAS library before the hand coders go to work. > No, not joking at all. FC is perfectly fine for a cluster, > especially one built with very new hardware (hardware likely to need > a very recent kernel and libraries to work at all) and actually > upgrades-by-one tend to work quite well at this point for systems > that haven't been overgooped with user-level crack or homemade stuff > overlaid outside of the RPM/repo/yum ritual. > > Remember, a cluster node is likely to have a really, really boring > and very short package list. We're not talking about major overhauls > in X or gnome or the almost five thousand packages in extras having > much impact -- it is more a matter of the kernel and basic libraries, > PVM and/or MPI and/or a few user's choice packages, maybe some > specialty libraries. I'm guessing four or five very basic package > groups and a dozen individual packages and whatever dependencies they > pull in. Or less. The good thing about FC >>is<< the relatively > rapid renewal of at least some of the libraries -- one could die of > old age waiting for the latest version of the GSL, for example, to > get into RHEL/Centos. So one possible strategy is to develop a very > conservative cluster image and upgrade every other FC release, which > is pretty much what Duke does with FC anyway. I'd rather have volatile user-level libraries and stable system level software than vice versa. Centos users need to be introduced to the lovely concept of backporting. > Also, plenty of folks on this list have done just fine running > "frozen" linux distros "as is" for years on cluster nodes. If they > aren't broke, and live behind a firewall so security fixes aren't > terribly important, why fix them? I've got a server upstairs (at > home) that is still running <blush> RH 9. I keep meaning to upgrade > it, but I never have time to set up and safely solve the > bootstrapping problem involved, and it works fine (well inside a > firewall and physically secure). Call me paranoid, but I don't like the idea of a Cadbury Cream Egg security model (hard outer shell, soft gooey center). I won't say more, 'cuz I feel like I've had this discussion before. Upgrade it, man. Once, when I was bored, I installed apt-rpm on a RH8 machine to see what dist-upgrade looked like in the land of the Red Hat. Interesting experience, and it worked just fine. > Similarly, I had nodes at Duke that ran RH 7.3 for something like > four years, until they were finally reinstalled with FC 2 or > thereabouts. Why not? 7.3 was stable and just plain "worked" on at > least these nodes; the nodes ran just fine without crashing and > supported near-continuous computation for that entire time. So one > could also easily use FC-whatever by developing and fine tuning a > reasonably bulletproof cluster node configuration for YOUR hardware > within its supported year+, then just freeze it. Or freeze it until > there is a strong REASON to upgrade it -- a miraculously improved > libc, a new GSL that has routines and bugfixes you really need, > superyum, bproc as a standard option, cernlib in extras (the latter a > really good reason for at least SOME people to upgrade to FC6:-). Or use a distro that backports security fixes into affected packages while maintaining ABI and API stability. Gives you a frozen target for your users and more peace of mind. > Honestly, with a kickstart-based cluster, reinstalling a thousand > nodes is a matter of preparing the (new) repo -- usually by rsync'ing > one of the toplevel mirrors -- and debugging the old install on a > single node until satisfied. One then has a choice between a yum > upgrade or (I'd recommend instead) yum-distributing an "upgrade" > package that sets up e.g. grub to do a new, clean, kickstart > reinstall, and then triggers it. You could package the whole thing > to go off automagically overnight and not even be present -- the next > day you come in, your nodes are all upgraded. Isn't automatic package management great. Like crack on gasoline. > I used to include a "node install" in my standard dog and pony show > for people come to visit our cluster -- I'd walk up to an idle node, > reboot it into the PXE kickstart image, and talk about the fact that > I was reinstalling it. We had a fast enough network and tight enough > node image that usually the reinstall would finish about the same > time that my spiel was finished. It was then immediately available > for more work. Upgrades are just that easy. That's scalability. > > Warewulf makes it even easier -- build your new image, change a > single pointer on the master/server, reboot the cluster. > > I wouldn't advise either running upgrades or freezes of FC for all > cluster environments, but they certainly are reasonable alternatives > for at least some. FC is far from laughable as a cluster distro. What I'd like to see is an interested party which would implement a good, long term security management program for FC(2n+b) releases. RH obviously won't do this. > Yeah, I dunno about SuSE. I tend to include it in any list because > it is a serious player and (as has been pointed out already in this > thread e.g. deleted below) only the serious players tend to attract > commercial/supported software companies. Still, as long as it and RH > maintain ridiculously high prices (IMHO) for non-commercial > environments I have a hard time pushing either one native anywhere > but in a corporate environment or a non-commercial environment where > their line of support or a piece of software that "only" runs on e.g. > RHEL or SuSE is a critical issue. Banks need super conservatism and > can afford to pay for it. Cluster nodes can afford to be agile and > change, or not, as required by their function and environment, and > cluster builders in academe tend to be poor and highly cost senstive. > Most of them don't need to pay for either one. <snip> > Not to argue, but Scientific Linux is (like Centos) recompiled RHEL > and also has a large set of these tools including some > physics/astronomy related tools that were, at least, hard to find > other places. However, FC 6 is pretty insane. There are something > like 6500 packages total in the repo list I have selected in yumex on > my FC 6 laptop (FC itself, livna, extras, some Duke stuff, no > freshrpms. This number seems to have increased by around 500 in the > last four weeks IIRC -- I'm guessing people keep adding stuff to > extras and maybe livna. At this point FC 6 has e.g. cernlib, > ganglia, and much more -- I'm guessing that anything that is in SL is > now in FC 6 extras as SL is too slow/conservative for a lot of > people (as is the RHEL/Centos that is its base). Do _not_ start a contest like this with the Debian people. You _will_ lose. > Debian may well have more stuff, or better stuff for doing numerical > work -- I personally haven't done a detailed package-by-package > comparison and don't know. I do know that only a tiny fraction of > all of the packages available in either one are likely to be relevant > to most cluster builders, and that it is VERY likely that anything > that is missing from either one can easily be packaged and added to > your "local" repo with far less work than what is involved in > learning a "new" distro if you're already used to one. Agreed, and security is not as much of a concern with such user-level programs, so these packages don't necessarily have to follow any security patching regime. > The bottom line is that I think that most people will find it easiest > to install the linux distro they are most used to and will find that > nearly any of them are adequate to the task, EXCEPT (as noted) > non-packaged or poorly packaged distros -- gentoo and slackware e.g. > Scaling is everything. Scripted installs (ideally FAST scripted > installs) and fully automated maintenance from a common and > user-modifiable repo base are a necessity. There is no question that > Debian has this. There is also no question that most of the > RPM-based distros have it as well, and at this point with yum they > are pretty much AS easy to install and update and upgrade as Debian > ever has been. So it ends up being a religious issue, not a > substantive one, except where economics or task specific > functionality kick in (which can necessitate a very specific distro > choice even if it is quite expensive). I haven't used a RH based machine which regularly synced against a fast-moving package repository, so I can't really compare. :) <snip> > Excellent advice. Warewulf in particular will help you learn some of > the solutions that make a cluster scalable even if you opt for some > other paradigm in the end. > > A "good" solution in all cases is one where you prototype with a > server and ONE node initially, and can install the other six or seven > by at most network booting them and going off to play with your wii > and drink a beer for a while. Possibly a very short while. If, of > course, you managed to nab a wii (we hypothesized that wii stands for > "where is it?" and not "wireless interactive interface" while > shopping before Christmas...;-). And like beer. Prototyping is absolutely necessary for any large-scale roll out. Better to learn how to do it right. > Yeah, kickstart is lovely. It isn't quite perfect -- I personally > wish it were a two-phase install, with a short "uninterruptible" > installation of the basic package group and maybe X, followed by a > yum-based overlay installation of everything else that is entirely > interruptible and restartable. But then, I <sigh> install over DSL > lines from home sometimes and get irritated if the install fails for > any reason before finishing, which over a full day of installation > isn't that unlikely... > > Otherwise, though, it is quite decent. > <snip> > Oooo, that sounds a lot like using yum to do a RPM-based install from > a "naked" list of packages and PXE/diskless root. Something that > I'd do if my life depended on it, for sure, but way short of what > kickstart does and something likely to be a world of > fix-me-up-after-the-fact pain. kickstart manages e.g. network > configuration, firewall setup, language setup, time setup, KVM setup > (or not), disk and raid setup (and properly layered mounting), > grup/boot setup, root account setup, more. The actual installation of > packages from a list is the easy part, at least at this point, given > dpkg and/or yum. I personally believe more configuration is done on Debian systems in package configuration than in the installer as compared with RH, but I do agree with you mainly. It's way short of what FAI, replicator, and system imager do too. > Yes, one can (re)invent many wheels to make all this happen -- > package up stuff, rsync stuff, use cfengine (in FC6 extras:-), write > bash or python scripts. Sheer torture. Been there, done that, long > ago and never again. Hey, some people like this. Some people compete in Japanese game shows. > rgb -- Geoffrey D. Jacobs _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf