In message from Scott Atchley <e.scott.atch...@gmail.com> (Sun, 20 Mar
2022 14:52:10 -0400):
On Sat, Mar 19, 2022 at 6:29 AM Mikhail Kuzminsky <k...@free.net>
wrote:
If so, it turns out that for the HPC user, stream gives a more
important estimate - the application is translated by the compiler
(they do not write in assembler - except for modules from
mathematical
libraries), and stream will give a real estimate of what will be
received in the application.
When vendors advertise STREAM results, they compile the application
with
non-temporal loads and stores. This means that all memory accesses
bypass
the processor's caches. If your application of interest does a random
walk
through memory and there is neither temporal or spatial locality,
then
using non-temporal loads and stores makes sense and STREAM
irrelevant.
STREAM is not initially oriented to random access to memory. In this
case, memory latencies are important, and it makes more sense to get a
bandwidth estimate in the mega-sweep
(https://github.com/UK-MAC/mega-stream).
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf