On 10/04/2012 01:30 PM, Andrew Holway wrote: >> bitter? sure. to me Canadian HPC is on the verge of extinction, >> partly because of this issue. > > Is Canadien HPC a distinct entity from US HPC?
Quite distinct. US has XSEDE and a number of other national/regional and national lab initiatives. Canada has SharcNet and other things (ComputeCanada). [...] > I wonder if there is a HPC 'critical mass'. For business? Some what. For unis and research/edu in general? Looks to me like lots of support. Mark's point though was this: > I think in some sense, the problem is that in academic HPC organizations, > decisions are typically made by academics recruited to be management, > and they have either a high fear/expectation of failure or a low expectation > in being able to fix problems that do arise (or both). it's crippling, > and being emotional, prevents such organizations from considering how to > rationally estimate the risks, and to design the process to manage it. > > in a sense, beowulf has been corrupted by its own success. > hacking (in the classic sense) is inherently risky I don't know the actual state of HPC in Canada, and as Mark works in this, I'd say his view is likely far more accurate than I could guess. Researchers sometimes make good managers, sometimes they don't. Risk aversion by choice of brand name is one way to avoid making careful risk analyses, and substitute them with something of lower value, which may not be valid ... but hey, no one was ever fired for choosing IBM/Microsoft/... (insert large brand name here). I think the term Mark used was "sclerosis". I believe this is an apt description and a correct description. With respect to "cutting out the middleman" point that Mark made, there are costs and benefits to every decision. We've seen great designs from good architects at various places. We've seen just awful/terrible designs at many others. Google designs to their needs, as does FB. They buy enough quantity that the costs associated with their efforts are lower if they can control the BOM going into the parts. This isn't true of everyone. Moreover, their failover model doesn't engineer "enterprise" features into their systems, think large RAIN (Redundant Array of Inexpensive Nodes) scenario. They are engineering for failure, at a coarser grain (extra-unit) level, so they don't need to pay for failure avoidance at a fine grain (intra-unit) level beyond failure detection. Google and FB are, to a degree, taking Beowulf (design what you need, engineer at the software stack to handle management and other issues) to the next level. This isn't BYOC (build your own cluster), this is BTPYN (build the platform you need). Paying for extra stuff they don't need across N servers (where log(N).base10 >= 5) makes no sense. Paying "middlemen" to do what they want makes no sense. Contracting with Quanta et al to design/build to their specs makes a great deal of sense. Regards, Joe -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics Inc. email: land...@scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/sicluster phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf