On Mar 12, 2013, at 5:45 AM, Mark Hahn wrote: > >>> I think HSA is potentially interesting for HPC, too. >>> I really expect >>> AMD and/or Intel to ship products this year that have a C/GPU chip >>> mounted on >>> the same interposer as some high-bandwidth ram. >> >> How can an integrated gpu outperform a gpgpu card? > > if you want dedicated gpu computation, a gpu card is ideal. > obviously, integrated GPUs reduce the PCIe latency overhead, > and/or have an advantage in directly accessing host memory. > > I'm merely pointing out that the market has already transitioned to > putting integrated gpus - the vote on this is closed. > the real question is what direction the onboard gpu takes: > how integrated it becomes with the cpu, and how it will take > advantage of upcoming 2.5d-stacked in-package dram.
Integrated gpu's will of course always have a very limited power budget. So the gpgpu cards with the same generation gpu for gpgpu from the same manufacturer with a bigger power envelope is always going to be 10x faster of course. If you'd get 10 computers with 10 apu's, even for a small price, you still would need an expensive network and switch to handle those, so that's 10 ports. So that's 1000 dollar a port roughly, so that's $10k extra, and we assume then that your massive supercomputer doesn't get into trouble further up in bandwidth otherwise your network cost suddenly gets $3000 a port instead of $2k a port, with factor 10 ports more. That's always going to lose it moneywise from a single gpgpu card that's 10x faster. Whether that's Xeon Phi version X Nvidia Kx0X, it's always going to be 10x faster of course and 10x cheaper for massive supercomputing. > >> Something like what is it 25 watt versus 250 watt, what will be >> faster? > > per-watt? per dollar? per transaction? > > the integrated gpu is, of course, merely a smaller number of cores > as the > separate card, so will perform the same, relative to a proportional > slice of the appropriate-generation add-in-card. > > trinity a10-5700 has 384 radeon 69xx cores running at 760 MHz, > delivering 584 SP gflops - 65W iirc. but only 30 GB/s for it and > the CPU. > > let's compare that to a 6930 card: 1280 cores, 154 GB/s, 1920 Gflops. > about 1/3 the cores, flops, and something less than 1/5 the bandwidth. > no doubt the lower bandwidth will hurt some codes, and the lower > host-gpu > latency will help others. I don't know whether APUs have the same > SP/DP ratio as comparable add-in cards. > >> I assume you will not build 10 nodes with 10 cpu's with integrated >> gpu in order to rival a >> single card. > > no, as I said, the premise of my suggestion of in-package ram is > that it would permit glueless tiling of these chips. the number > you could tile in a 1U chassis would primarily be a question of > power dissipation. > 32x 40W units would be easy. perhaps 20 60W units. since I'm just > making up numbers here, I'm going to claim that performance will be > twice that of trinity (a nice round 1 Tflop apiece or 20 Tflops/RU. > I speculate that 4x 4Gb in-package gddr5 would deliver 64 GB/s, 2GB/ > socket - a total capacity of 40 GB/RU at 1280 GB/s. > > compare this to a 1U server hosting 2-3 K10 cards = 4.6 Gflops and > 320 GB/s each. 13.8 Gflops, 960 GB/s. similar power dissipation. _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf