All: This video may help clear things up:
http://www.youtube.com/watch?v=usGkq7tAhfc have a nice weekend -- Doug > > On Apr 7, 2011, at 6:25 PM, Gus Correa wrote: > >> Vincent Diepeveen wrote: >> >>> GPU monster box, which is basically a few videocards inside such a >>> box stacked up a tad, wil only add a couple of >>> thousands. >>> >> >> This price may be OK for the videocard-class GPUs, >> but sounds underestimated, at least for Fermi Tesla. > > Tesla (448 cores @ 1.15Ghz, 3GB ddr5) : $2.200 > note there is a 6 GB version, not aware of price will be $$$$ i bet. > or AMD 6990 (3072 PE's @ 0.83Ghz, 4GB ddr5) : 519 euro > > VERSUS > > 8 socket Nehalem-ex, 512GB ram DDR3, basic configuration, $205k. > > Factor 100 difference to those cards. > > A couple of thousands versus a couple of hundreds of thousands. > Hope i made my point clear. > > >> Last I checked, a NVidia S2050 pizza box with four Fermi Tesla C2050, >> with 448 cores and 3GB RAM per GPU, cost around $10k. >> For the beefed up version with with C2070 (6GB/GPU) it bumps to ~$15k. >> If you care about ECC, that's the price you pay, right? > > When fermi released it was a great gpu. > > Regrettably they lobotomized the gamers card's double precision as i > understand, > So it hardly has double precision capabilities; if you go for nvidia > you sure need a Tesla, > no question about it. > > As a company i would buy in 6990's though, they're a lot cheaper and > roughly 3x faster > than the Nvidia's (for some more than 3x for other occassions less > than 3x, note the card > has 2 GPU's and 2 x 2GB == 4 GB ram on board so 2GB per gpu). > > 3072 cores @ 0.83Ghz with 50% of 'em 32 bits multiplication units for > AMD > versus 448 cores nvidia with 448 execution units of 32 bits > multiplication. > > Especially because multiplication has improved a lot. > > Already having written CUDA code some while ago, i wanted the cheap > gamers card with big > horse power now at home so i'm toying on a 6970 now so will be able > to report to you what is possible to > achieve at that card with respect to prime numbers and such. > > I'm a bit amazed so little public initiatives write code for the AMD > gpu's. > > Note that DDR5 ram doesn't have ECC by default, but has in case of > AMD a CRC calculation > (if i understand it correctly). It's a bit more primitive than ECC, > but works pretty ok and shows you > also when problems occured there, so figuring out remove what goes on > is possible. > > Make no mistake that this isn't ECC. > We know some HPC centers have as a hard requirement ECC, only nvidia > is an alternative then. > > In earlier posts from some time ago and some years ago i already > wrote on that governments should > adapt more to how hardware develops rather than demand that hardware > has to follow them. > > HPC has too little cash to demand that from industry. > > OpenCL i cannot advice at this moment (for a number of reasons). > > AMD-CAL and CUDA are somewhat similar. Sure there is differences, but > majority of codes are possible > to port quite well (there is exceptions), or easy work arounds. > > Any company doing gpgpu i would advice developing both branches of > code at the same time, > as that gives the company a lot of extra choices for really very > little extra work. Maybe 1 coder, > and it always allows you to have the fastest setup run your > production code. > > That said we can safely expect that from raw performance coming years > AMD will keep the leading edge > from crunching viewpoint. Elsewhere i pointed out why. > > Even then i'd never bet at just 1 manufacturer. Go for both > considering the cheap price of it. > > For a lot of HPC centers the choice of nvidia will be an easy one, as > the price of the Fermi cards > is peanuts compared to the price rest of the system and considering > other demands that's what they'll go for. > > That might change once you stick in bunches of videocards in nodes. > > Please note that the gpu 'streamcores' or PE's whatever name you want > to give them, are so bloody fast, > that your code has to work within the PE's themselves and hardly use > the RAM. > > Both for Nvidia as well as AMD, the streamcores are so fast, that you > simply don't want to lose time on the RAM > when your software runs, let alone that you want to use huge RAM. > > Add to that, that nvidia (have to still figure out for AMD) can in > background stream from and to the gpu's RAM > from the CPU, so if you do really large calculations involving many > nodes, > all that shouldn't be an issue in the first place. > > So if you really need 3 GB or 6 GB rather than 2 GB of RAM, that > would really amaze me, though i'm sure > there is cases where that happens. If we see however what was ordered > it mostly is the 3GB Tesla's, > at least on what has been reported, i have no global statistics on > that... > > Now all choices are valid there, but even then we speak about peanuts > money compared to the price of > a single 8 socket Nehalem-ex box, which fully configured will be > maybe $300k-$400k or something? > > Whereas a set of 4x nvidia will be probably under $15k and 4x AMD > 6990 is 2000 euro. > > There won't be 2 gpu nvidia's any soon because of the choice they > have historically made for the memory controllers. > See explanation of intel fanboy David Kanter for that at > realworldtech in a special article he wrote there. > > Please note i'm not judging AMD nor Nvidia, they have made their > choices based upon totally different > businessmodels i suspect and we must be happy we have this rich > choice right now between cpu's from different > manufacturers and gpu's from different manufacturers. > > Nvidia really seems to aim at supercomputers, giving their tesla line > without lobotomization and lobotomizing their > gamers cards, where AMD aims at gamers and their gamercards have full > functionality > without lobotomization. > > Total different businessmodels. Both have their advantages and > disadvantages. > > From pure performance viewpoint it's easy to see what's faster though. > > Yet right now i realize all too well that just too many still > hesitate between also offering gpu services additional to > cpu services, in which case having a gpu, regardless nvidia or amd, > kicks butt of course from throughput viewpoint. > > To be really honest with you guys, i had expected that by 2011 we > would have a gpu reaching far over 1 Teraflop double precision > handsdown. If we see that Nvidia delivers somewhere around 515 Gflop > and AMD has 2 gpu's on a single card to get over that Teraflop double > precision (claim is 1.27 Teraflop double precision), > that really is underneath my expectations from a few years ago. > > Now of course i hope you realize i'm not coding double precision code > at all; i'm writing everything in integers of 32 bits for the AMD > card and the Nvidia equivalent also is using 32 bits integers. The > ideal way to do calculations on those cards, so also very big > transforms, is using the 32 x 32 == 64 bits instructions (that's 2 > instructions in case of AMD). > > Regards, > Vincent > > >> >> Gus Correa >> _______________________________________________ >> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin >> Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > -- > This message has been scanned for viruses and > dangerous content by MailScanner, and is > believed to be clean. > -- Doug -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf