Re: [Beowulf] Theoretical vs. Actual Performance

2018-02-22 Thread Chris Samuel
On Friday, 23 February 2018 2:52:54 AM AEDT John Hearns via Beowulf wrote: > Oh, and use the Adaptive computing HPL calculator to get your input file. > Thanks Adaptive guys! I think you mean Advanced Clustering.. :-) http://www.advancedclustering.com/act_kb/tune-hpl-dat-file/ cheers, Chris --

Re: [Beowulf] Theoretical vs. Actual Performance

2018-02-22 Thread Chris Samuel
On Friday, 23 February 2018 1:45:00 AM AEDT Joe Landman wrote: > 85% makes the assumption that you have the systems configured in an > optimal manner, that the compiler doesn't do anything wonky, and that, > to some degree, you isolate the OS portion of the workload off of most > of the cores to r

Re: [Beowulf] Theoretical vs. Actual Performance

2018-02-22 Thread Prentice Bisbal
Joe, Thanks for the link. Based on that, they should be pretty close in performance, and mine are not, so I must be doing something wrong with my OpenBLAS build. Since ACML is dead, I was hoping I could use OpenBLAS moving forward. Prentice On 02/22/2018 06:01 PM, Joe Landman wrote: ACML is

Re: [Beowulf] Theoretical vs. Actual Performance

2018-02-22 Thread Joe Landman
ACML is hand coded assembly.  Not likely that OpenBLAS will be much better.  Could be similar.  c.f. http://gcdart.blogspot.co.uk/2013/06/fast-matrix-multiply-and-ml.html On 02/22/2018 05:48 PM, Prentice Bisbal wrote: Just rebuilt OpenBLAS 0.2.20 locally on the test system with GCC 6.1.0, an

Re: [Beowulf] Theoretical vs. Actual Performance

2018-02-22 Thread Benson Muite
Consider trying: https://github.com/amd/blis https://github.com/clMathLibraries/clBLAS as well. On 02/23/2018 12:48 AM, Prentice Bisbal wrote: Just rebuilt OpenBLAS 0.2.20 locally on the test system with GCC 6.1.0, and I'm only getting 91 GFLOPS. I'm pretty sure OpenBLAS performance should be

Re: [Beowulf] Theoretical vs. Actual Performance

2018-02-22 Thread Prentice Bisbal
Just rebuilt OpenBLAS 0.2.20 locally on the test system with GCC 6.1.0, and I'm only getting 91 GFLOPS. I'm pretty sure OpenBLAS performance should be close to ACML performance, if not better. I'll have to dig into this later. For now, I'm going to continue my testing using the ACML-based build

Re: [Beowulf] Theoretical vs. Actual Performance

2018-02-22 Thread Prentice Bisbal
So I just rebuilt HPL using the ACML 6.1.0 libraries with GCC 6.1.0, and I'm now getting 197 GFLOPS, so clearly there's a problem with my OpenBLAS build. I'm going to try building OpenBLAS without the dynamic arch support on the machine where I plan on running my tests, and see if that version

Re: [Beowulf] Theoretical vs. Actual Performance

2018-02-22 Thread Prentice Bisbal
For OpenBlas, or hpl? For hpl, I used GCC 6.1.0 with these flags. I $ egrep -i "flags|defs" Make.gcc-6.1.0_openblas-0.2.19 F2CDEFS  = -DAdd__ -DF77_INTEGER=int -DStringSunStyle HPL_DEFS = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES) CCNOOPT  = $(HPL_DEFS) OMP_DEFS = -openmp CCFLAGS  = $(HPL_DE

Re: [Beowulf] Theoretical vs. Actual Performance

2018-02-22 Thread Prentice Bisbal
This is my source for those theoretical numbers: http://dewaele.org/~robbe/thesis/writing/references/49747D_HPC_Processor_Comparison_v3_July2012.pdf If those numbers are off, that makes my job a bit easier.  And it looks like you're right. In the text above the table, it does mention 2-socket

Re: [Beowulf] Theoretical vs. Actual Performance

2018-02-22 Thread Dmitri Chubarov
Hi, not sure if the 282 GFLOPS number is correct. We have 16 Bulldozer/Interlagos cores at 2.2 GHz. Each pair of cores forms a CMT module. The two cores in the module share an FPU with 2 128-bit FMAC units. In terms of double precision FLOPS it should make 16 * 2.2GHz * 2 double precision scalar

Re: [Beowulf] Theoretical vs. Actual Performance

2018-02-22 Thread David Mathog
On Thu, 22 Feb 2018 09:37:54 -0500 Prentice Bisbal wrote: I found literature from AMD stating the theoretical performance of these processors is 282 GFLOPS, and my LINPACK performance isn't coming close to that (I get approximately ~33% of that).  That does seem low. Check the usual culprit

Re: [Beowulf] Theoretical vs. Actual Performance

2018-02-22 Thread Joe Landman
which compiler are you using, and what options are you compiling it with? On 02/22/2018 11:48 AM, Prentice Bisbal wrote: On 02/22/2018 10:44 AM, Michael Di Domenico wrote: i can't speak to AMD, but using HPL 2.1 on Intel using the Intel compiler and the Intel MKL, i can hit 90% without issue. 

Re: [Beowulf] Theoretical vs. Actual Performance

2018-02-22 Thread Prentice Bisbal
On 02/22/2018 10:44 AM, Michael Di Domenico wrote: i can't speak to AMD, but using HPL 2.1 on Intel using the Intel compiler and the Intel MKL, i can hit 90% without issue. no major tuning either if you're at 33% i would be suspect of your math library I'm using OpenBLAS 0.29 with dynamic arc

Re: [Beowulf] Theoretical vs. Actual Performance

2018-02-22 Thread John Hearns via Beowulf
Oh, and use the Adaptive computing HPL calculator to get your input file. Thanks Adaptive guys! On 22 February 2018 at 16:44, Michael Di Domenico wrote: > i can't speak to AMD, but using HPL 2.1 on Intel using the Intel > compiler and the Intel MKL, i can hit 90% without issue. no major > tunin

Re: [Beowulf] Theoretical vs. Actual Performance

2018-02-22 Thread Michael Di Domenico
i can't speak to AMD, but using HPL 2.1 on Intel using the Intel compiler and the Intel MKL, i can hit 90% without issue. no major tuning either if you're at 33% i would be suspect of your math library On Thu, Feb 22, 2018 at 9:37 AM, Prentice Bisbal wrote: > Beowulfers, > > In your experience,

Re: [Beowulf] Theoretical vs. Actual Performance

2018-02-22 Thread Benson Muite
There is a very nice and simple Max flops code that requires much less tuning than Linpack. It is described in pg 57 of: Rahman "Intel® Xeon Phi™ Coprocessor Architecture and Tools" https://link.springer.com/book/10.1007%2F978-1-4302-5927-5 An example Fortran code is here: https://github.com/bk

Re: [Beowulf] Theoretical vs. Actual Performance

2018-02-22 Thread John Hearns via Beowulf
Prentice, I echo what Joe says. When doing benchmarking with HPL or SPEC benchmarks, I would optimise the BIOS settings to the highest degree I could. Switch off processor C) states As Joe says you need to look at what the OS is runnign in the background. I would disable the Bright cluster manager

Re: [Beowulf] Theoretical vs. Actual Performance

2018-02-22 Thread Joe Landman
On 02/22/2018 09:37 AM, Prentice Bisbal wrote: Beowulfers, In your experience, how close does actual performance of your processors match up to their theoretical performance? I'm investigating a performances issue on some of my nodes. These are older systems using AMD Opteron 6274 processor

[Beowulf] Theoretical vs. Actual Performance

2018-02-22 Thread Prentice Bisbal
Beowulfers, In your experience, how close does actual performance of your processors match up to their theoretical performance? I'm investigating a performances issue on some of my nodes. These are older systems using AMD Opteron 6274 processors. I found literature from AMD stating the theore