On Sat, Mar 19, 2022 at 6:29 AM Mikhail Kuzminsky <k...@free.net> wrote:
> If so, it turns out that for the HPC user, stream gives a more > important estimate - the application is translated by the compiler > (they do not write in assembler - except for modules from mathematical > libraries), and stream will give a real estimate of what will be > received in the application. > When vendors advertise STREAM results, they compile the application with non-temporal loads and stores. This means that all memory accesses bypass the processor's caches. If your application of interest does a random walk through memory and there is neither temporal or spatial locality, then using non-temporal loads and stores makes sense and STREAM irrelevant. If you want to know what memory bandwidth that your application may achieve, you can use STREAM without the compiler flags to enable non-temporal loads and stores.
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf