Joe Landman wrote:
Eric Thibodeau wrote:
Joe Landman wrote:
Gus Correa wrote:
Otherwise, your "newbie scientist" can put his/her earbuds and pump
up the volume on his Ipod,
while he/she navigates through the Vista colorful 3D menus.
Owie .... I can just imagine the folks squawking about this at SC08
"Yes folks, you need a Cray supercomputer to make Vista run at
acceptable performance ..."
Maybe they have a "tune options for performance" option ;)
The machine seems to run w2k8. My own experience with w2k8 is that,
frankly, it doesn't suck. This is the first time I have seen a
windows release that I can say that about.
A few questions (not necessarily expecting a response):
POSIX?
VERBS?
Kernel latency and scheduler control?
Don't mistake me for a w2k8 apologist. I reamed them pretty hard on
the lack of a real posix infrastructure (they claim SUA, but frankly
it doesn't build most of what we throw at it, so it really is a
non-starter and not worth considering IMO). They need to pull Cygwin
in close and tight to get a good POSIX infrastructure. It is in their
best interests. Sadly, I suspect the ego driven nature of this will
pretty much prevent them from doing this. Can't touch the "toxic" OSS
now, can they ...
Cygwin...yerk, that emulation bloat is slow as hell and can barely run
some of my basic scripts. A simple find would put the CPU in 100% usage
state. Now I don't blame Cygwin for this per say but rather the way that
windows (probably) runs this as a DOS(ish) app in a somewhat polled
mode. My lack of interest in that OS stopped me from diggin deeper into
why Cygwin is so slow. Given it's a more than mature project, I'd have
expected such poor performance to have been addressed by now.
IB Verbs? Well through OFED, yes. Through the windows stack? Who
knows. We were playing with it on JackRabbit for a customer
test/benchmark.
...and...the results? ;)
Kernel latency? Much better/more responsive than w2k3. Scheduler
control? Not sure how much you have. I don't like deep diving into
registries ... that is a pretty sure way to kill a windows machine.
Well, let's just say these are mechanisms I expect an HPC machine to
have when "squeezing the last drop of performance" is mentioned.
These are the real barriers IMHO, without minimally supporting POSIX
(threads), there is very little incentive to use the machine for
development unless you're willing to accept the code will _only_ run
on your "desktop".
The low end economics probably won't work out for this machine
though, unless it is N times faster than some other agglomeration of
Intel-like products. Adding windows will add cost, not performance
in any noticeable way.
The question that Cray (and every other vendor building
non-commodity units) is how much better is this than a small cluster
someone can build/buy on their own? Better as in faster, able to
leap more tall buildings in a single bound, ... (Superman TV show
reference for those not in the know). And the hard part will be
justifying the additional cost. If the machine isn't 2x the
performance, would it be able to justify 2x the price? Since it
appears to be a somewhat well branded cluster, I am not sure that
argument will be easy to make.
I just rebuilt a 32 core cluster for ~5k$ (CAD) (8*Q6600 1Gig
RAM/node + gige netwroking). Bang for the buck? I can't wait to see
the CX1's performance specs under _both_ windows and Linux.
The desktop CPUs/MBs will get you best bang per buck, as long as you
don't mind no ECC, and 8GB ram limits per node. For your
applications, this might be fine. For others, with large memory
footprint and long run times, I see people need/require ECC (as memory
density increases, ECC becomes important .... darned cosmic
rays/natural decays/noisy power supplies/...)
Well, the nodes I built have MBs that state that they can go as high as
8Gigs, but any one with a little experience with the 8Gig mix + 800+MHz
RAM know that very little hardware (MB) can actually do it in a stable
fashion. My recent experiences with "fast" RAM (800 -1066MHz) is that
they end up costing you time ($$$) since it would seem most MBs claim to
support it but they all seem to have some impedance problems of some
sort (totally unstable). And if one reads the fine prints and the QVLs
(Qualifies Vendor Lists), you notice these really high throughputs are
for low memory density of the banks (a _total_ of 2-4 Gigs max). I'd
personally say this is more of an issue than the ECCism of RAM.
I work on "Clusering Algorithms", and not to confuse people, I mean the
k-means type which we could call "data aggregation/mining" algorithms.
They are long running and applied to sizable databases (1.2Gigs) which
need to be loaded onto each node. This is where having multi-core nodes
comes in quite handy as there is way less time lost in data propagation
and loading (the databases).
...which brings me to wonder how the I/O is managed under the CX1...is
it as basic as one I/O node and GFS or do all nodes have their own I/O
paths. I mention this since I've too often seen people ignore the I/O
(load times) ignored in their performance assessments ;)
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf