There is a very nice and simple Max flops code that requires much less
tuning than Linpack. It is described in pg 57 of:
Rahman "Intel® Xeon Phi™ Coprocessor Architecture and Tools"
https://link.springer.com/book/10.1007%2F978-1-4302-5927-5
An example Fortran code is here:
https://github.com/bkmgit/intel-xeon-phi-coprocessor-architecture-tools/tree/master/ch05
On 02/22/2018 05:16 PM, John Hearns via Beowulf wrote:
Prentice, I echo what Joe says.
When doing benchmarking with HPL or SPEC benchmarks, I would optimise
the BIOS settings to the highest degree I could.
Switch off processor C) states
As Joe says you need to look at what the OS is runnign in the
background. I would disable the Bright cluster manager daemon for instance.
85% of theoretical peak on an HPL run sounds reasonable to me and I
would get fogures in that ballpark.
For your AMDs I would start by choosing one system, no interconnect to
cloud the waters. See what you can get out of that.
On 22 February 2018 at 15:45, Joe Landman <joe.land...@gmail.com
<mailto:joe.land...@gmail.com>> wrote:
On 02/22/2018 09:37 AM, Prentice Bisbal wrote:
Beowulfers,
In your experience, how close does actual performance of your
processors match up to their theoretical performance? I'm
investigating a performances issue on some of my nodes. These
are older systems using AMD Opteron 6274 processors. I found
literature from AMD stating the theoretical performance of these
processors is 282 GFLOPS, and my LINPACK performance isn't
coming close to that (I get approximately ~33% of that). The
number I often hear mentioned is actual performance should be
~85%. of theoretical performance is that a realistic number your
experience?
85% makes the assumption that you have the systems configured in an
optimal manner, that the compiler doesn't do anything wonky, and
that, to some degree, you isolate the OS portion of the workload off
of most of the cores to reduce jitter. Among other things.
At Scalable, I'd regularly hit 60-90 % of theoretical max computing
performance, with progressively more heroic tuning. Storage, I'd
typically hit 90-95% of theoretical max (good architectures almost
always beat bad ones). Networking, fairly similar, though tuning
per use case mattered significantly.
I don't want this to be a discussion of what could be wrong at
this point, we will get to that in future posts, I assure you!
--
Joe Landman
t: @hpcjoe
w: https://scalability.org
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
<mailto:Beowulf@beowulf.org> sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
<http://www.beowulf.org/mailman/listinfo/beowulf>
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf