Interesting article. Regrettably the writer is a technical noob, clearly readable in the German he writes.
Confusing MB with GB, so it's not so clear how accurate it is what he writes. Well what can you expect from Heise.de in that sense... Let's assume that majority he wrote down is ok. Then we speak about 60 cores at 1.053Ghz using vectors of 512 bits, so that's 8 doubles i assume or AVX2. The horror architecture previously called Larrabee. Just more cores. I read nothing about cache coherency anymore and the fact they can 'turn off' 2 cores obviously means it might not have it. So it's no longer having the bottleneck that Larrabee had. You have to run 4 threads at it simultaneouslly says this article. That's factor 2 more than todays top GPU's need. Both AMD as well as Nvidia you can perform well running 2 'threads' "at the same time' (they get alternated). I assume that's for the same reason, namely to hide the latency that's there from releasing results after the execution units executed the instructions. From Larrabee we knew that pretty important instructions to HPC were not having a good throughput handling, eating several cycles. So it's difficult to do calculations now on what is possible to achieve. Let's assume now 1 instruction can get executed and retired each clockcycle. This is a dangerous assumption, as intel historically doesn't have very good multiplying execution units at not a single architecture when compared to competitors. Historically latency also at their x86 / x64 cpu's was nearly factor 2 worse than for example AMD's opterons. This for 64 bits (integers) multiplication. Latest i7 should have improved there though. Under this assumption throughput latency is 1 clock, and that multiply-add is several clocks, that gives us: 1.053Ghz * 60 cores * 8 = 505.44 Gflop Knowing that everyone always "lies" that factor 2 to it for multiply- add, even though i bet no one will manage to push them through within 1 cycle an instruction in a nonstop manner; Also the big transforms using Fourier Transforms, they cannot use multiply-add at all, yet if we ignore that, like everyone ignores it, that gives a bragging rights of 2 * 505 = 1.01088 Tflop This isn't bad at all considering the fact that K20, which based upon Moore's Law deduction of transistors to doubling of speed, would have landed nearby 2 Tflop, appears to be just above 1.0 Tflop right now. The fear was of course the latest Larrabee incarnation, Xeon Phi would cost $10k, yet it seems intel wants to conquer the HPC market and Heise gives here first time i see it a price for it which is 2649 dollar. Available in 2013 though - which is a disadvantage. Of course be careful buying this chip if you don't know what AVX2 is. Many tried to write code for AVX2 and it took them years to get some prime number transforms to work a tad at it. We see that Intel has deviated from their original plan, yet that they still tell the nonsense story to reporters as if it would be interesting to run pentium code at it. A single i7 will beat it there of course, as to get to the maximum throughput, you need to put your data inside vectors of 8 doubles, otherwise it will perform horrible. Assuming the Larrabee instruction set survived, it is also possible to indirectly acces each core using special instructions. Those had however a 7 cycle latency at Larrabee so it's not very encouraging to use them. So doing the same thing you can reasonably simple do at GPU's, is pretty difficult here, yet not impossible. Of course the only bummer is that it's not yet available. Where this from marketing viewpoint is a good idea though from intel to already release it now, as otherwise everyone would already sign a deal with Nvidia, we know from some years ago how intel brought several HPC organisations in big problems by simply not delivering the itanium2 cpu's at the appointed time. That took another 6 months to a year. As they all talk there with each other, i am not sure of the impact of this. It's obvious however intel wants to compete right now by pricing the chip not so expensive. That's good for the HPC community. Now let's hope that none of the manufacturers gets a total monopoly, otherwise we'll be paying that $7500 that Itanium2 1.5Ghz had as a cost price at introduction. Financially seen these manufacturers can easily offer these cpu's for $1500 - $2k, as that pays back easily all production and development costs. On Nov 13, 2012, at 1:40 PM, Eugen Leitl wrote: > > http://www.heise.de/newsticker/meldung/SC12-Intel-bringt- > Coprozessor-Xeon-Phi-offiziell-heraus-1747942.html > > http://translate.google.com/translate? > sl=auto&tl=en&js=n&prev=_t&hl=en&ie=UTF-8&layout=2&eotf=1&u=http%3A% > 2F%2Fwww.heise.de%2Fnewsticker%2Fmeldung%2FSC12-Intel-bringt- > Coprozessor-Xeon-Phi-offiziell-heraus-1747942.html&act=url > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf