Re: How bleeding edge are people with kernels (Was Re: [Beowulf] impressions of Super Micro IPMI management cards?)

stephen mulcahy Wed, 21 Nov 2007 19:53:15 -0800


Brian Dobbins wrote:

I had at one point a simple script that would allow me to select akernel type at job submit time, it would load that up, reboot the nodeswith that kernel, and then run my job. Sometimes this was incrediblyuseful, as I found a difference of roughly 20-25% performance on oneparticular code running on the same hardware, one with an /old/ 2.4series and libc, and another with a more modern kernel + libc. Evennow, as we're looking at a larger system, I'll probably put (in a staticfashion) one of the interactive nodes with a kernel supporting PAPI, andquite possibly will put most of the compute nodes on a kernel with somemodifications for performance.

Thanks for your response. We're running a diskless environment aswell(it's a pretty small cluster - 20 nodes running a customised Debian).Performance is certainly interesting to me -- but stability is startingto become so also. We've squeezed a good bit out on the performancefront by tweaking various components in the system including the MPIlibraries and so on. So much so that the scientists I'm running thecluster for are largely happy with the performance (I suspect therecould be another 5-10% lurking in there, but getting it out wouldprobably involve a lot of my time and a lot of cluster downtime fortesting/profiling .. so it feels like we're in the sweet spot at themoment).

So we're happy with performance, and now we'd like to run our models forweeks on end without any user intervention. What we have seen as westart doing this is some stability problems that have not beenconsistently reproducible so far and have left no traces in the logs (Imight send a separate mail about these just to generally pick peoplesbrains) -- the key point here though is that I have no idea at themoment if these are kernel level problems or hardware level problems.

We're running Debian's stable kernel 2.6.18-5-amd64 (for the disklessnodes, we're using the 2.6.18-5-amd64 kernel source, recompiled afterstripping out all unneccesary drivers). My concern about rolling to2.6.22 or something in between is that we might get some performancebenefits but we might also get more intermittment wierd stability issues(the kind that may even be peculiar to our own hardware/softwareenvironment). I was just wondering what other peoples take is -- clearlya lot depends on your own risk aversion level, how much time you havefor testing and supporting what you deploy and so on. Thanks to all thatresponded.

In case anyone is interested, I'm planning on bugging the NationalLabs + Cray guys a bit more soon, and if they can't release or documentwhat they change, I'll set up a wiki about kernel stripping / tuning forHPC workloads, and maybe the community can put together a decent'how-to' until the big guys can chime in. If/when I find the time, I'llalso try to get some information on how much this can impact performanceon some modern code suites, but it might take a few weeks at leastbefore I'm able to do so.

I'm not sure how much of the stuff thats relevant to tuning really bigclusters would percolate down to the likes of myself but I would beinterested in taking a look at it anyways.

Disclaimer to all of the above - I haven't done much system-levelstuff in a long while now, so your mileage may vary considerably. :)

Oh, I understand that all suggestions on beowulf include the standard"But it depends" disclaimer :)


Thanks,

-stephen

--
Stephen Mulcahy, Applepie Solutions Ltd., Innovation in Business Center,
GMIT, Dublin Rd, Galway, Ireland.  +353.91.751262  http://www.aplpi.com
Registered in Ireland, no. 289353 (5 Woodlands Avenue, Renmore, Galway)
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: How bleeding edge are people with kernels (Was Re: [Beowulf] impressions of Super Micro IPMI management cards?)

Reply via email to