Re: [Beowulf] Nvidia, cuda, tesla and... where's my double floating point?

Vincent Diepeveen Sun, 15 Jun 2008 10:47:48 -0700

Joseph,

I'm even licensed CUDA developer by Nvidia for Tesla,
but even the documents there are very very poor. Knowing
latencies is really important. Another big problem is that if you
write such a program measuring the latencies, no dude is gonna run it.


The promises of nvidia was for videocards long ago released,
all of them being single precision. We know that for sure as most of
those videocards have been released years ago already.

Not to mention the 8800.

If you look however to the latest supercomputer announced, 1 petainstructions

a second @ 12960 new generation CELL cpu's, that means a tad more
than 77 gflop double precision for each CELL.

It is a bit weird if you claim to be NDA bound, whereas the news hasit in

big capitals what the new IBM CELL can deliver.

See for example:

http://www.cbc.ca/cp/technology/080609/z060917A.html

Also i calculated back that all powercosts together it is about 410watt a node,each node having a dual cell cpu. That's network + harddrives and RAMtogether of

course.

I calculated 2.66 MW for the entire machine based upon this article.

Sounds reasonably good to me.

Now interesting to know is how they are gonna use that cpu powereffectively.


In any case it is a lot better than what the GPU's can deliver so far

in practical situations known to me. So claims by programmers otherthan the

nvidia engineer claims.

Especially stuff like matrix calculations, as the weak part of theGPU hardware isthe latency to and from the machine RAM (not to confuse with deviceRAM).

From/to 8800 hardware a reliable person i know measured it at 3000messages

a second, which would make it about several hundreds of microseconds of
communication latency speed.

So a very reasonable question to ask is what the latency is from thestream processorsto the device RAM. A 8800 document i read says 600 cycles. It doesn'tmention for howmany streamprocessors this is though. Also surprising to know ilearned that RAMlookups do not get cached. That means a lot of extra work when 128stream processorshammer regurarly onto the device RAM for stuff that CPU's simplycache in their L1 or L2

cache and todays even L3 caches.

So knowing such technical data is total crucial as there is no way toescape the memorycontrollers latency in a lot of different software that searches forthe holy grail.


Thanks,
Vincent

On Jun 15, 2008, at 3:51 PM, Joe Landman wrote:

Vincent Diepeveen wrote:
Seems the next CELL is 100% confirmed double precision.
Yet if you look back in history, Nvidia promises on this can befound years back.
[scratches head /]
Vincent, it may be possible that some of us on this list may infact be bound by NDA (non-disclosure agreements), and cannot talkabout hardware which has not been announced.
The only problem with hardware like Tesla is that it is ratherhard toget technical information; like which instructions does Teslasupport in hardware?
[scratches head /]

Hmmm .... www.nvidia.com/cuda is a good starting point.
I might suggest http://www.nvidia.com/object/cuda_what_is.html as astart on information. More to the point, you can look at http://www.nvidia.com/object/cuda_develop.html
--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: [EMAIL PROTECTED]
web  : http://www.scalableinformatics.com
       http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 866 888 3112
cell : +1 734 612 4615


_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Nvidia, cuda, tesla and... where's my double floating point?

Reply via email to