On 4/19/13 12:31 PM, "Adam DeConinck" <ajde...@ajdecon.org> wrote:
>-----BEGIN PGP SIGNED MESSAGE----- >Hash: SHA1 > >On Fri, Apr 19, 2013 at 05:10:37PM +0100, Tim Cutts wrote: >> Anyone running a research computing setup has encountered both of these >>issues. Virtualisation mitigates the damage that can be done, without >>the expense of an separate toy cluster, but it doesn't address these >>support and transition to production issues. > >Yeah, these are the problems you're going to hit with either toy >clusters or VMs. The main reason I like VMs for this is that it >eliminates a whole class of "futzing around with hardware" problems: >I might have to help dig people out of their holes occasionally, but >at least I don't need to go down to their labs and untangle a rats' >nest of cables. > >I'm lucky enough to have a smallish and relatively good-humored use >community, so when someone really craters their test system, they >usually have a six-pack of my favorite beer on hand when they ask for >help. ;-) But that doesn't really scale... Aha.. But beer *making* does scale.. In fact, small quantities are harder than larger, so it has superlinear scaling. But I guess what you're really getting at is that if your cluster were, say, 10 times bigger, with 10 times as many users, your effectiveness in providing support would be adversely affected by the 10 times larger beer consumption. Joking aside, I think that's one of the key differences as system size (and cost) scales up. Not only do million dollar/euro clusters attract more accounting interest, they also are harder to administer in a casual way. If you look back 20 years, there were great challenges in adapting to the Beowulf cluster model with nodes interconnected by a "relatively slow" interconnect (compared, say, to multiport memory), but by now, there's an enormous amount of software which has either been modified or designed from scratch to work well in a cluster architecture. And a lot of it scales really well from 10 to 100 to 1000 nodes. Really smart people (who are on this list, of course) have spent significant effort and succeeded at this. But there are aspects of scalability totally unrelated to the processing/memory/disk/interconnect architecture. Physical plant (look at the discussions on HVAC, liquid cooling, etc.). Sys Admin (root access, sharing, batch queues, partitioning). Financial and Business Admin (chargebacks, accounting for time, amortization). We had discussed some of this 10 years or so ago in the context of "how big a cluster can one person manage" and issues of the granularity of staff. If you have someone with the right skills who's not fully occupied, then the incremental admin cost to add nodes is small. In some ways, I think those problems are actually harder, because they have constraints and evaluation functions that are a lot less tangible, less tractable, and less measureable. How do you deal with the "when something is expensive, more levels of management get involved" aspect. That's getting into sociology more than engineering. _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf