Craig Tierney wrote:
> Where did you get the 1/12th number for NVIDIA?  For each streaming 
> multiprocessor (SM)
> has 1 single precision FPU per thread (8 threads per SM), but only 1 double 
> precision FPU
> on the SM.  So that ratio would be 1/8.

I just used the nvidia provided information:
http://www.nvidia.com/object/product_tesla_c1060_us.html

Click on specifications. 933/78=11.96

> I have demonstrated this ratio on a simple
> code that required little to no memory transfers.

Maybe the ratios are different when the workload isn't optimal for getting the
peak rate.  Peak numbers often require very special situations, something like
interleaved adds and multiplies or a fused instruction that does 2 flops.  So
maybe for pure multiplications you get 1/8th, but for the perfect workload you
get 1/12th.



_______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to