Yes, Firestream has a great paper performance, but how can you get it? But for the costs, if you don't mind to use some un-professional components, you can try their gaming cards, much cheaper. We bought NVidia's last flagship card 8800Ultra for 600 Euro, what's a crazy price, and now you can buy two GTX280 for less. If you can bear SP, and you will get 936GFlops for each. And we have achieved 40% of their peak performance, sounds good. Regards, Li, Bo ----- Original Message ----- From: "Mikhail Kuzminsky" <[EMAIL PROTECTED]> To: "Li, Bo" <[EMAIL PROTECTED]> Cc: "Vincent Diepeveen" <[EMAIL PROTECTED]>; <beowulf@beowulf.org> Sent: Friday, August 29, 2008 1:52 AM Subject: Re: [Beowulf] gpgpu
> In message from "Li, Bo" <[EMAIL PROTECTED]> (Thu, 28 Aug 2008 14:20:15 > +0800): >> ... >>Currently, the DP performance of GPU is not good as we expected, or >>only 1/8 1/10 of SP Flops. It is also a problem. > > AMD data: Firestream 9170 SP performance is 5 GFLOPS/W vs 1 GFLOPS/W > for DP. It's 5 times slower than SP. > > Firestream 9250 has 1 TFLOPS for SP, therefore 1/5 is about 200 GFLOPS > DP. The price will be, I suppose, about $2000 - as for 9170. > > Let me look to modern dual socket quad-core beowulf node w/price about > $4000+, for example. For Opteron 2350/2 Ghz chips (I use) peak DP > performance is 64 GFLOPS (8 cores). For 3 Ghz Xeon chips - about 100 > GFLOPS. > > Therefore GPGPU peak DP performance is 1.5-2 times higher than w/CPUs. > Is it enough for essential calculation speedup - taking into account > time for data transmission to/from GPU ? > >>I would suggest hybrid computation platforms, with GPU, CPU, and >>processors like Clearspeed. It may be a good topic for programming >>model. > > Clearspeed, if there is no new hardware now, has not enough DP > performance in comparison w/typical modern servers on quad-core CPUs. > > Yours > Mikhail > >>Regards, >>Li, Bo >>----- Original Message ----- >>From: "Vincent Diepeveen" <[EMAIL PROTECTED]> >>To: "Li, Bo" <[EMAIL PROTECTED]> >>Cc: "Mikhail Kuzminsky" <[EMAIL PROTECTED]>; "Beowulf" >><beowulf@beowulf.org> >>Sent: Thursday, August 28, 2008 12:22 AM >>Subject: Re: [Beowulf] gpgpu >> >> >>> Hi Bo, >>> >>> Thanks for your message. >>> >>> What library do i call to find primes? >>> >>> Currently it's searching here after primes (PRP's) in the form of p >>> >>> = (2^n + 1) / 3 >>> >>> n is here about 1.5 million bits roughly as we speak. >>> >>> For SSE2 type processors there is the George Woltman assembler code >>> >>> (MiT) to do the squaring + implicit modulo; >>> how do you plan to beat that type of real optimized number crunching >>> >>> at a GPU? >>> >>> You'll have to figure out a way to find an instruction level >>> parallellism of at least 32, >>> which also doesn't write to the same cacheline, i *guess* (no >>> documentation to verify that in fact). >>> >>> So that's a range of 256 * 32 = 2^8 * 2^5 = 2^13 = 8192 bytes >>> >>> In fact the first problem to solve is to do some sort of squaring >>> real quickly. >>> >>> If you figured that out at a PC, experience learns you're still >>> losing a potential of factor 8, >>> thanks to another zillion optimizations. >>> >>> You're not allowed to lose factor 8. that 52 gflop a gpu can deliver >>> >>> on paper @ 250 watt TDP (you bet it will consume that >>> when you let it work so hard) means GPU delivers effectively less >>> than 7 gflops double precision thanks to inefficient code. >>> >>> Additionally remember the P4. On paper in integers claim was when it >>> >>> released it would be able to execute 4 integers a >>> cycle, reality is that it was a processor getting an IPC far under 1 >>> >>> for most integer codes. All kind of stuff sucked at it. >>> >>> The experience learns this is the same for todays GPU's, the >>> scientists who have run codes on it so far and are really >>>experienced >>> CUDA programmers, figured out the speed it delivers is a very big >>> bummer. >>> >>> Additionally 250 watt TDP for massive number crunching is too much. >>> >>> It's well over factor 2 power consumption of a quadcore. Now i can >>> take a look soon in China myself what power prices >>> are over there, but i can assure you they will rise soon. >>> >>> Now that's a lot less than a quadcore delivers with a tdp far under >>> >>> 100 watt. >>> >>> Now i explicitly mention the n's i'm searching here, as it should >>>fit >>> within caches. >>> So the very secret bandwidth you can practical achieve (as we know >>> nvidia lobotomized >>> bandwidth in the GPU cards, only the Tesla type seems to be not >>> lobotomized), >>> i'm not even teasing you with that. >>> >>> This is true for any type of code. You're losing it to the details. >>> >>> Only custom tailored solutions will work, >>> simply because they're factors faster. >>> >>> Thanks, >>> Vincent >>> >>> On Aug 27, 2008, at 2:50 AM, Li, Bo wrote: >>> >>>> Hello, >>>> IMHO, it is better to call the BLAS or similiar libarary rather >>>> than programing you own functions. And CUDA treats the GPU as a >>>> cluster, so .CU is not working as our normal codes. If you have got >>>> >>>> to many matrix or vector computation, it is better to use Brook+/ >>>> CAL, which can show great power of AMD gpu. >>>> Regards, >>>> Li, Bo >>>> ----- Original Message ----- >>>> From: "Mikhail Kuzminsky" <[EMAIL PROTECTED]> >>>> To: "Vincent Diepeveen" <[EMAIL PROTECTED]> >>>> Cc: "Beowulf" <beowulf@beowulf.org> >>>> Sent: Wednesday, August 27, 2008 2:35 AM >>>> Subject: Re: [Beowulf] gpgpu >>>> >>>> >>>>> In message from Vincent Diepeveen <[EMAIL PROTECTED]> (Tue, 26 Aug 2008 >>>>> 00:30:30 +0200): >>>>>> Hi Mikhail, >>>>>> >>>>>> I'd say they're ok for black box 32 bits calculations that can do >>>>>> with >>>>>> a GB or 2 RAM, >>>>>> other than that they're just luxurious electric heating. >>>>> >>>>> I also want to have simple blackbox, but 64-bit (Tesla C1060 or >>>>> Firestream 9170 or 9250). Unfortunately the life isn't restricted to >>>>> BLAS/LAPACK/FFT :-) >>>>> >>>>> So I'll need to program something other. People say that the best >>>>> choice is CUDA for Nvidia. When I look to sgemm source, it has >>>>> about 1 >>>>> thousand (or higher) strings in *.cu files. Thereofore I think that >>>>>a >>>>> bit more difficult alghorithm as some special matrix >>>>>diagonalization >>>>> will require a lot of programming work :-(. >>>>> >>>>> It's interesting, that when I read Firestream Brook+ "kernel >>>>> function" >>>>> source example - for addition of 2 vectors ("Building a High Level >>>>> Language Compiler For GPGPU", >>>>> Bixia Zheng ([EMAIL PROTECTED]) >>>>> Derek Gladding ([EMAIL PROTECTED]) >>>>> Micah Villmow ([EMAIL PROTECTED]) >>>>> June 8th, 2008) >>>>> >>>>> - it looks SIMPLE. May be there are a lot of details/source lines >>>>> which were omitted from this example ? >>>>> >>>>> >>>>>> Vincent >>>>>> p.s. if you ask me, honestely, 250 watt or so for latest gpu is >>>>>> really >>>>>> too much. >>>>> >>>>> 250 W is TDP, the average value declared is about 160 W. I don't >>>>> remember, which GPU - from AMD or Nvidia - has a lot of special >>>>> functional units for sin/cos/exp/etc. If they are not used, may be >>>>> the >>>>> power will a bit more lower. >>>>> >>>>> What is about Firestream 9250, AMD says about 150 W (although I'm >>>>>not >>>>> absolutely sure that it's TDP) - it's as for some >>>>> Intel Xeon quad-cores chips w/names beginning from X. >>>>> >>>>> Mikhail >>>>> >>>>> >>>>>> On Aug 23, 2008, at 10:31 PM, Mikhail Kuzminsky wrote: >>>>>> >>>>>>> BTW, why GPGPUs are considered as vector systems ? >>>>>>> Taking into account that GPGPUs contain many (equal) execution >>>>>>> units, >>>>>>> I think it might be not SIMD, but SPMD model. Or it depends from >>>>>>> the software tools used (CUDA etc) ? >>>>>>> >>>>>>> Mikhail Kuzminsky >>>>>>> Computer Assistance to Chemical Research Center >>>>>>> Zelinsky Institute of Organic Chemistry >>>>>>> Moscow >>>>>>> _______________________________________________ >>>>>>> Beowulf mailing list, Beowulf@beowulf.org >>>>>>> To change your subscription (digest mode or unsubscribe) visit >>>>>>> http://www.beowulf.org/mailman/listinfo/beowulf >>>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Beowulf mailing list, Beowulf@beowulf.org >>>>> To change your subscription (digest mode or unsubscribe) visit >>>>> http://www.beowulf.org/mailman/listinfo/beowulf >>>> >>>> >>> > _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf