On Nov 20, 2008, at 5:39 PM, Jan Heichler wrote:

Hallo Mark,



Donnerstag, 20. November 2008, meintest Du:



>> [shameless plug]



>> A project I have spent some time with is showing 117x on a 3-GPU machine over

>> a single core of a host machine (3.0 GHz Opteron 2222). The code is

>> mpihmmer, and the GPU version of it. See http:// www.mpihmmer.org for more

>> details.  Ping me offline if you need more info.



>> [/shameless plug]



MH> I'm happy for you, but to me, you're stacking the deck by comparing to a

MH> quite old CPU. you could break out the prices directly, but comparing 3x

MH> GPU (modern? sounds like pci-express at least) to a current entry-level

MH> cluster node (8 core2/shanghai cores at 2.4-3.4 GHz) be more appropriate.



Instead of benchmarking some CPU vs. some GPU wouldn't it be fairer to



a) compare systems of similar costs (1k, 2k, 3k EUR/USD)

b) compare systems with a similar power footprint



?



What does it help that 3 GPUs are 1000x faster than a Asus Eee PC?




Exactly.

http://re.jrc.ec.europa.eu/energyefficiency/html/ standby_initiative_data%20centers.htm

The correct comparision is comparing power usage, as that is what is 'hot' these days. Just plain cash money compare is not enough. Weird yet true. In 3d world nations like for example China, India power is not a concern at all, not for government related tasks either.

The slow adaptation to manycores, even for workloads that would do well on them (just in theory),
is definitely limited by portability.

Had some ESA dude on the phone a few days ago. I heard the word "portability" just a bit too much. That's why they do just too much with ugly slow JAVA code. Not fast enough at 1 pc?
Put another 100 there.

I was told exactly the same reasoning (portability problem) for other projects where i tried to sneak in GPU computing (regardless which manufacturer). Portability was also the KILLER there.

If you write burocratic paper documents then CUDA is not portable and never will be of course, as the hardware
is simply different from a CPU.

Yet that code must be portable between oldie Sun, UNIX type machines and modern quadcores as well as new GPU
hardware, inc ase you want to introduce GPU's. Not realistic of course.

Just enjoy the speedup i'd say, if you can get it.

They can spend millions on hardware, but not even a couple of hundreds of thousands on customized software to solve the problem of portability by having a plugin that is doing the crunching just for gpu's.

Idiotic yet that's the sole truth.

So to speak, manycores will only make it in there when NASA writes a big article online bragging how fast their supercomputing codes are at todays gpu's where they own a 100k from to do number crunching.

I would argue for workloads favourable to GPU's, which is just a very few as of now, NVIDIA/AMD is up to 10x faster than a quadcore, if you know how to get it out of the card.

Probably gpgpu for now is the cheap alternative for a few very specific tasks of 3d world nations therefore.

May they lead us in the path ahead...

In itself very funny that burocratic reasons (portability) is the biggest problem limiting progress.

When you speak to hardware designers about say for example 32 core cpu's, they laugh loud. The only scalable hardware for now at 1 cpu giving a big punch, it seems to be manycores.

All those managers simply have put their mind in a big storage bunker where alternatives are not allowed in. Even an economic crisis will not help it. They have to get bombarded with actual products that are interesting to them, that get a huge speedup at GPU's, to start understanding the advantage of it.

The few who do understand already, they all keep their stuff so secret, and usually guys who are not exactly very good in parallellization may "try out" the GPU in question. That's another recipe for disaster of course.

Logically that they never even get a speedup over a simple quadcore. If you compare assembler level SSE2 (modified intel primitives in SSE2 so you want) with a clumsy guy (not in his own thinking) who tries out
the GPU for a few weeks, obviously it is gonna fail.

Something algorithmic optimized for like 20-30 years now for pc type hardware, that suddenly must get ported
within a few weeks to GPU. There is not many who can do that.

You need complete different algorithmic approach for that. Something that is memory bound CAN get rewritten to cpu bound. Sometimes even without losing speed. Just because they didn't have the luxury of such huge
cpu crunching power, they never tried!

But that optimization step of 20 years is a big limit to GPU's.

Add to it that intel is used to GIVE AWAY hardware to developers.
I'll have to see nvidia do that.

If those same guys as the above guys who failed, have that hardware for years at home,
they MIGHT get to some ideas and tell their boss.

It's those reports of those guys currently which adds to the storage bunker thinking.

It is wrong to assume that experts can predict the future.

Vincent


_______________________________________________
Beowulf mailing list, [email protected]
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, [email protected]
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to