There is a very nice and simple Max flops code that requires much less tuning than Linpack. It is described in pg 57 of:

Rahman "Intel® Xeon Phi™ Coprocessor Architecture and Tools"
https://link.springer.com/book/10.1007%2F978-1-4302-5927-5

An example Fortran code is here:
https://github.com/bkmgit/intel-xeon-phi-coprocessor-architecture-tools/tree/master/ch05

On 02/22/2018 05:16 PM, John Hearns via Beowulf wrote:
Prentice, I echo what Joe says.
When doing benchmarking with HPL or SPEC benchmarks, I would optimise the BIOS settings to the highest degree I could.
Switch off processor C) states
As Joe says you need to look at what the OS is runnign in the background. I would disable the Bright cluster manager daemon for instance.


85% of theoretical peak on an HPL run sounds reasonable to me and I would get fogures in that ballpark.

For your AMDs I would start by choosing one system, no interconnect to cloud the waters. See what you can get out of that.









On 22 February 2018 at 15:45, Joe Landman <joe.land...@gmail.com <mailto:joe.land...@gmail.com>> wrote:



    On 02/22/2018 09:37 AM, Prentice Bisbal wrote:

        Beowulfers,

        In your experience, how close does actual performance of your
        processors match up to their theoretical performance? I'm
        investigating a performances issue on some of my nodes. These
        are older systems using AMD Opteron 6274 processors. I found
        literature from AMD stating the theoretical performance of these
        processors is 282 GFLOPS, and my LINPACK performance isn't
        coming close to that (I get approximately ~33% of that).  The
        number I often hear mentioned is actual performance should be
        ~85%. of theoretical performance is that a realistic number your
        experience?


    85% makes the assumption that you have the systems configured in an
    optimal manner, that the compiler doesn't do anything wonky, and
    that, to some degree, you isolate the OS portion of the workload off
    of most of the cores to reduce jitter.   Among other things.

    At Scalable, I'd regularly hit 60-90 % of theoretical max computing
    performance, with progressively more heroic tuning.   Storage, I'd
    typically hit 90-95% of theoretical max (good architectures almost
    always beat bad ones).  Networking, fairly similar, though tuning
    per use case mattered significantly.


        I don't want this to be a discussion of what could be wrong at
        this point, we will get to that in future posts, I assure you!


-- Joe Landman
    t: @hpcjoe
    w: https://scalability.org


    _______________________________________________
    Beowulf mailing list, Beowulf@beowulf.org
    <mailto:Beowulf@beowulf.org> sponsored by Penguin Computing
    To change your subscription (digest mode or unsubscribe) visit
    http://www.beowulf.org/mailman/listinfo/beowulf
    <http://www.beowulf.org/mailman/listinfo/beowulf>




_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to