Joseph,

I'm even licensed CUDA developer by Nvidia for Tesla,
but even the documents there are very very poor. Knowing
latencies is really important. Another big problem is that if you
write such a program measuring the latencies, no dude is gonna run it.

The promises of nvidia was for videocards long ago released,
all of them being single precision. We know that for sure as most of
those videocards have been released years ago already.

Not to mention the 8800.

If you look however to the latest supercomputer announced, 1 peta instructions
a second @ 12960 new generation CELL cpu's, that means a tad more
than 77 gflop double precision for each CELL.

It is a bit weird if you claim to be NDA bound, whereas the news has it in
big capitals what the new IBM CELL can deliver.

See for example:

http://www.cbc.ca/cp/technology/080609/z060917A.html

Also i calculated back that all powercosts together it is about 410 watt a node, each node having a dual cell cpu. That's network + harddrives and RAM together of
course.

I calculated 2.66 MW for the entire machine based upon this article.

Sounds reasonably good to me.

Now interesting to know is how they are gonna use that cpu power effectively.

In any case it is a lot better than what the GPU's can deliver so far
in practical situations known to me. So claims by programmers other than the
nvidia engineer claims.

Especially stuff like matrix calculations, as the weak part of the GPU hardware is the latency to and from the machine RAM (not to confuse with device RAM).

From/to 8800 hardware a reliable person i know measured it at 3000 messages
a second, which would make it about several hundreds of microseconds of
communication latency speed.

So a very reasonable question to ask is what the latency is from the stream processors to the device RAM. A 8800 document i read says 600 cycles. It doesn't mention for how many streamprocessors this is though. Also surprising to know i learned that RAM lookups do not get cached. That means a lot of extra work when 128 stream processors hammer regurarly onto the device RAM for stuff that CPU's simply cache in their L1 or L2
cache and todays even L3 caches.

So knowing such technical data is total crucial as there is no way to escape the memory controllers latency in a lot of different software that searches for the holy grail.

Thanks,
Vincent

On Jun 15, 2008, at 3:51 PM, Joe Landman wrote:



Vincent Diepeveen wrote:
Seems the next CELL is 100% confirmed double precision.
Yet if you look back in history, Nvidia promises on this can be found years back.

[scratches head /]

Vincent, it may be possible that some of us on this list may in fact be bound by NDA (non-disclosure agreements), and cannot talk about hardware which has not been announced.


The only problem with hardware like Tesla is that it is rather hard to get technical information; like which instructions does Tesla support in hardware?

[scratches head /]

Hmmm .... www.nvidia.com/cuda is a good starting point.

I might suggest http://www.nvidia.com/object/cuda_what_is.html as a start on information. More to the point, you can look at http:// www.nvidia.com/object/cuda_develop.html



--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: [EMAIL PROTECTED]
web  : http://www.scalableinformatics.com
       http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 866 888 3112
cell : +1 734 612 4615


_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to