Correct me if I'm wrong, but it looks like all these benchmarks are for single threaded applications. I don't see any refences to MPI or OpenMP or other threading method in the compiler, optimization and invocation notes. The only parallelism I see is in the use of AVX2 in the 2016 results, and some references to SIMD in the 2012 results.

So regardless of the number of cores in each test system, what the benchmarks are really comparing is single core performance between a 2.3 GHz AMD Opteron and a 2.4 GHz. Intel Xeon. Is that correct?

I'm assuming the starting source code is exactly the same in each case, too.

If so, those results aren't surprising. Since systems started going multi-core, the performance has really come from adding parallelism to your programs using threads or message-passing, or taking advantage of the larger vector processing capabilities that get added to each successful generation of processors. If these benchmarks were rewritten to optimize them for data parallelism and to make sure the data was properly aligned for the vector registers, I'm sure the newer processor would show better performance.

So glad that we have thousands of Phi's...

I wouldn't be so glad. You're still going to have to rewrite your code as mentioned above to get any meaningful performance.

When Intel first started marketing the Xeon Phi, they emphasized that you wouldn't need to rewrite your code to use the Xeon Phi. This was a marketing moving to differentiate the Xeon Phi from the NVIDIA CUDA processors. That may have been a true statement, but it didn't mention anything about performance of that existing code, and was, frankly, very misleading. The truth is, if you don't rewrite your code, you're not going to see much (relatively speaking) of a performance improvement, and when you do rewrite your code to optimize it for the Xeon Phi, you'll also see amazing speed ups on regular Xeon processors.

I've seen several presentations where speed ups of 5x, 10x, etc., on regular Xeons just through optimizing the code to be more thread- and vector- friendly. Some improvements were so significant, they make you ask if the Xeon Phi was even needed. These are the first gens I'm talking about, I imagine the KNL will make a more compelling argument for the Phi.

If you pay attention to Intel's marketing and the industry news the past couple of years, you will have noticed that Intel has been promoting "code modernization" efforts, saying all codes need to be modernized to take advantage newer processors, while that is certainly true, "code modernization" is just a euphemism for "rewrite your code". This is Intel backpedaling on their earlier statements that you don't need to rewrite your code to take advantage of a Xeon Phi, without actually admitting it.

Prentice

On 08/16/2016 04:35 AM, Stu Midgley wrote:
https://www.spec.org/cpu2006/results/res2016q2/cpu2006-20160308-39354.html <https://www.spec.org/cpu2006/results/res2016q2/cpu2006-20160308-39354.html> https://www.spec.org/cpu2006/results/res2012q4/cpu2006-20121108-25077.html <https://www.spec.org/cpu2006/results/res2012q4/cpu2006-20121108-25077.html>

Its like no progress has been made. So glad that we have thousands of Phi's...

--
Dr Stuart Midgley
sdm...@sdm900.com <mailto:sdm...@sdm900.com>


_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to