Let me email you a latencytest using all cores at the same time. All those claims always about using 1 core are not so relevant for HPC, as we wouldn't need multicore cpu's then.
In a perfect world you're right, regrettably that's not how software usually works. It usually hits 100 other problems where a higher clocked cpu has a major advantage over a lower clocked advantage. The L3 has an important function yet it is also a very big bottleneck to the memory practical for most software as it adds considerable latency. I'll mail you my test. Most memory bandwidth tests just work for 1 core. This is testing the latency for 8 byte reads at n cores at the same time. I typically take a gigabyte or so and spread it over the different cores to benchmark. You see typically that latency of the old core2 is pretty ok around a 60 ns. 2 socket i7 machines are considerable slower there, when using cpu's of higher frequency (3.4Ghz Xeon @ 12 cores , hyperthreading turned off of course), latencies typicall go get faster to around a 90 ns. Fastest single socket i7's get to around 70 ns. Could you run this test at different clocked cpu's and tell me your conclusions? This test had been designed to test latency at shared memory supercomputers with all cores running at the same time, we typically see, also if you increase the number of bytes read, that they have designed 100 tricks to fool all the single core latency tests. When using all cores at the same time, which is realistic for many software loads, suddenly latencies are easily up to factor 12 worse than claimed by manufacturers (SGI origin and altix series being one example). Bandwidth gets total overruled by other concerns. As for the single socket machines: You see how AMD's bulldozer @ 4 modules totally ughs out on this test. Its latency already is very bad when running the test at 4 cores, but when using all 8 minicores at the same time suddenly latencies climb towards 160-200 ns. Over factor 2.5 times worse than intel. We speak here of highest clocked production cpu's. You also typically see that when the memory buffer you serve is larger, that latencies get slower. In HPC and especially for software working in this manner as how this tests simulates, one typically uses the maximum amount of RAM available. So using many gigabytes is not a theoretic example. This is a practical test that simulates very accurate how things work in for example game tree search. Typically older hardware has more problems there than newer hardware. Paul Hsieh later tried to redo this test by just using pointer arithmetic, but they never coded it up for more than 1 core. In every single result, higher clocked cpu's tend to do better. On Sep 14, 2012, at 5:16 PM, Steffen Persvold wrote: > Vincent, > > Your statement only holds true for the cache bandwidth (which somewhat > scales with the core frequency), not the DDR3 memory controller > bandwidth (or latency for that matter). The main limiting factor > for the > DDR3 memory bandwidth is the # of channels (i.e now much data you can > get in parallel) and how fast the dram is (i.e frequency the DDR3 > interface runs on). > > cheers, > --Steffen > > On 9/14/2012 17:08, Vincent Diepeveen wrote: >> Yes, >> >> You can easily see this in the latency numbers of higher clocked >> processors. they're faster >> than lower clocked i7's of the same kind. >> >> Let me email directly to you a test i wrote for that some years ago. >> >> >> >> >> >> On Sep 14, 2012, at 5:04 PM, Orion Poplawski wrote: >> >>> On 09/14/2012 08:54 AM, Vincent Diepeveen wrote: >>>> The memory controller is on die, so the bandwidth that the CPU >>>> itself delivers, >>>> independant from the number of channels, is dependant upon the CPU >>>> frequency. >>>> >>>> Higher frequency means more bandwidth simply with the given memory >>>> channels >>>> available. >>>> >>> >>> Really? >>> >>> http://ark.intel.com/compare/64590,64591,64587 >>> >>> Clock Speed 2 GHz 2.5 GHz 3.3 GHz >>> Max Turbo Frequency 2.8 GHz 3 GHz 3.5 GHz >>> # of Memory Channels 4 4 4 >>> Max Memory Bandwidth 51.2 GB/s 42.6 GB/s 51.2 GB/s >>> >>>> >>>> On Sep 14, 2012, at 4:41 PM, Orion Poplawski wrote: >>>> >>>>> On 09/14/2012 05:00 AM, Igor Kozin wrote: >>>>>> if memory bandwidth is your concern then there are models which >>>>>> boost >>>>>> it quite significantly. e.g. >>>>>> http://ark.intel.com/products/64584/Intel-Xeon-Processor- >>>>>> E5-2660-20M-Cache-2_20-GHz-8_00-GTs-Intel-QPI >>>>>> >>>>>> probably very few codes are going to benefit from AVX without >>>>>> extra >>>>>> efforts but BW is a clear win. >>>>>> i'm seeing a good speed up on some applications which can be >>>>>> attributed to higher BW. >>>>> >>>>> There are 6.4, 7.2, and 8 GT/s chips >>>>> >>>>> This is an interesting puzzle and the mid tier price point: >>>>> >>>>> DUAL INTEL XEON 6C E5-2640 (2.5GHz/7.2GT/s/15MB) CPU [+ >>>>> $1,810.00] >>>>> DUAL INTEL XEON 4C E5-2643 (3.3GHz/8GT/s/10MB) CPU [+ $1,798.00] >>>>> DUAL INTEL XEON 8C E5-2650 (2GHz/8GT/s/20MB) CPU [+ $2,270.00] >>>>> >>>>> So for BW limited one would go with the second two, but you have >>>>> a big choice >>>>> between low cores/cache high MHz and high cores/cache low MHz. >>>>> >>>>> -- >>>>> Orion Poplawski >>>>> Technical Manager 303-415-9701 x222 >>>>> NWRA, Boulder Office FAX: 303-415-9702 >>>>> 3380 Mitchell Lane or...@nwra.com >>>>> Boulder, CO 80301 http://www.nwra.com >>>>> _______________________________________________ >>>>> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin >>>>> Computing >>>>> To change your subscription (digest mode or unsubscribe) visit >>>>> http://www.beowulf.org/mailman/listinfo/beowulf >>> >>> >>> -- >>> Orion Poplawski >>> Technical Manager 303-415-9701 x222 >>> NWRA, Boulder Office FAX: 303-415-9702 >>> 3380 Mitchell Lane or...@nwra.com >>> Boulder, CO 80301 http://www.nwra.com >> >> _______________________________________________ >> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin >> Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >> > > > -- > Steffen Persvold, Chief Architect NumaChip > Numascale AS - www.numascale.com > Tel: +47 92 49 25 54 Skype: spersvold > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf