On Sat, Mar 19, 2022 at 6:29 AM Mikhail Kuzminsky <k...@free.net> wrote:

> If so, it turns out that for the HPC user, stream gives a more
> important estimate - the application is translated by the compiler
> (they do not write in assembler - except for modules from mathematical
> libraries), and stream will give a real estimate of what will be
> received in the application.
>

When vendors advertise STREAM results, they compile the application with
non-temporal loads and stores. This means that all memory accesses bypass
the processor's caches. If your application of interest does a random walk
through memory and there is neither temporal or spatial locality, then
using non-temporal loads and stores makes sense and STREAM irrelevant.

If you want to know what memory bandwidth that your application may
achieve, you can use STREAM without the compiler flags to enable
non-temporal loads and stores.
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Reply via email to