On Friday, 23 February 2018 2:52:54 AM AEDT John Hearns via Beowulf wrote:
> Oh, and use the Adaptive computing HPL calculator to get your input file.
> Thanks Adaptive guys!
I think you mean Advanced Clustering.. :-)
http://www.advancedclustering.com/act_kb/tune-hpl-dat-file/
cheers,
Chris
--
On Friday, 23 February 2018 1:45:00 AM AEDT Joe Landman wrote:
> 85% makes the assumption that you have the systems configured in an
> optimal manner, that the compiler doesn't do anything wonky, and that,
> to some degree, you isolate the OS portion of the workload off of most
> of the cores to r
Joe,
Thanks for the link. Based on that, they should be pretty close in
performance, and mine are not, so I must be doing something wrong with
my OpenBLAS build. Since ACML is dead, I was hoping I could use OpenBLAS
moving forward.
Prentice
On 02/22/2018 06:01 PM, Joe Landman wrote:
ACML is
ACML is hand coded assembly. Not likely that OpenBLAS will be much
better. Could be similar. c.f.
http://gcdart.blogspot.co.uk/2013/06/fast-matrix-multiply-and-ml.html
On 02/22/2018 05:48 PM, Prentice Bisbal wrote:
Just rebuilt OpenBLAS 0.2.20 locally on the test system with GCC
6.1.0, an
Consider trying:
https://github.com/amd/blis
https://github.com/clMathLibraries/clBLAS
as well.
On 02/23/2018 12:48 AM, Prentice Bisbal wrote:
Just rebuilt OpenBLAS 0.2.20 locally on the test system with GCC 6.1.0,
and I'm only getting 91 GFLOPS. I'm pretty sure OpenBLAS performance
should be
Just rebuilt OpenBLAS 0.2.20 locally on the test system with GCC 6.1.0,
and I'm only getting 91 GFLOPS. I'm pretty sure OpenBLAS performance
should be close to ACML performance, if not better. I'll have to dig
into this later. For now, I'm going to continue my testing using the
ACML-based build
So I just rebuilt HPL using the ACML 6.1.0 libraries with GCC 6.1.0, and
I'm now getting 197 GFLOPS, so clearly there's a problem with my
OpenBLAS build. I'm going to try building OpenBLAS without the dynamic
arch support on the machine where I plan on running my tests, and see if
that version
For OpenBlas, or hpl?
For hpl, I used GCC 6.1.0 with these flags. I
$ egrep -i "flags|defs" Make.gcc-6.1.0_openblas-0.2.19
F2CDEFS = -DAdd__ -DF77_INTEGER=int -DStringSunStyle
HPL_DEFS = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES)
CCNOOPT = $(HPL_DEFS)
OMP_DEFS = -openmp
CCFLAGS = $(HPL_DE
This is my source for those theoretical numbers:
http://dewaele.org/~robbe/thesis/writing/references/49747D_HPC_Processor_Comparison_v3_July2012.pdf
If those numbers are off, that makes my job a bit easier. And it looks
like you're right. In the text above the table, it does mention 2-socket
Hi,
not sure if the 282 GFLOPS number is correct.
We have 16 Bulldozer/Interlagos cores at 2.2 GHz. Each pair of cores forms
a CMT module. The two cores in the module share an FPU with 2 128-bit FMAC
units.
In terms of double precision FLOPS it should make
16 * 2.2GHz * 2 double precision scalar
On Thu, 22 Feb 2018 09:37:54 -0500 Prentice Bisbal wrote:
I found literature from AMD stating the
theoretical performance of these processors is 282 GFLOPS, and my
LINPACK performance isn't coming close to that (I get approximately
~33%
of that).
That does seem low. Check the usual culprit
which compiler are you using, and what options are you compiling it with?
On 02/22/2018 11:48 AM, Prentice Bisbal wrote:
On 02/22/2018 10:44 AM, Michael Di Domenico wrote:
i can't speak to AMD, but using HPL 2.1 on Intel using the Intel
compiler and the Intel MKL, i can hit 90% without issue.
On 02/22/2018 10:44 AM, Michael Di Domenico wrote:
i can't speak to AMD, but using HPL 2.1 on Intel using the Intel
compiler and the Intel MKL, i can hit 90% without issue. no major
tuning either
if you're at 33% i would be suspect of your math library
I'm using OpenBLAS 0.29 with dynamic arc
Oh, and use the Adaptive computing HPL calculator to get your input file.
Thanks Adaptive guys!
On 22 February 2018 at 16:44, Michael Di Domenico
wrote:
> i can't speak to AMD, but using HPL 2.1 on Intel using the Intel
> compiler and the Intel MKL, i can hit 90% without issue. no major
> tunin
i can't speak to AMD, but using HPL 2.1 on Intel using the Intel
compiler and the Intel MKL, i can hit 90% without issue. no major
tuning either
if you're at 33% i would be suspect of your math library
On Thu, Feb 22, 2018 at 9:37 AM, Prentice Bisbal wrote:
> Beowulfers,
>
> In your experience,
There is a very nice and simple Max flops code that requires much less
tuning than Linpack. It is described in pg 57 of:
Rahman "Intel® Xeon Phi™ Coprocessor Architecture and Tools"
https://link.springer.com/book/10.1007%2F978-1-4302-5927-5
An example Fortran code is here:
https://github.com/bk
Prentice, I echo what Joe says.
When doing benchmarking with HPL or SPEC benchmarks, I would optimise the
BIOS settings to the highest degree I could.
Switch off processor C) states
As Joe says you need to look at what the OS is runnign in the background. I
would disable the Bright cluster manager
On 02/22/2018 09:37 AM, Prentice Bisbal wrote:
Beowulfers,
In your experience, how close does actual performance of your
processors match up to their theoretical performance? I'm
investigating a performances issue on some of my nodes. These are
older systems using AMD Opteron 6274 processor
Beowulfers,
In your experience, how close does actual performance of your processors
match up to their theoretical performance? I'm investigating a
performances issue on some of my nodes. These are older systems using
AMD Opteron 6274 processors. I found literature from AMD stating the
theore
19 matches
Mail list logo