In message from Bill Broadley <b...@cse.ucdavis.edu> (Thu, 13 Aug 2009
17:09:24 -0700):
Tom Elken wrote:
To add some details to what Christian says, the HPC Challenge
version of
STREAM uses dynamic arrays and is hard to optimize. I don't know
what's
best with current compiler versions, but you could try some of these
that
were used in past HPCC submissions with your program, Bill:
Thanks for the heads up, I've checked the specbench.org compiler
options for
hints on where to start with optimization flags, but I didn't know
about the
dynamic stream.
Is the HPC challenge code open source?
Yes, they are open.
PathScale 2.2.1 on Opteron:
Base OPT flags: -O3 -OPT:Ofast:fold_reassociate=0
STREAMFLAGS=-O3 -OPT:Ofast:fold_reassociate=0
-OPT:alias=restrict:align_unsafe=on -CG:movnti=1
Alas my pathscale license expired and I believe with sci-cortex's
death (RIP)
I can't renew it.
Now I understand that I was sage :-)
(we purchased perpetual acafemic license). ВТW, do
somebody know about Pathscale compilers future (if it will be) ?
Mikhail
I tried open64-4.2.2 with those flags and on a nehalem single socket:
$ opencc -O4 -fopenmp stream.c -o stream-open64 -static
$ opencc -O4 -fopenmp stream-malloc.c -o stream-open64-malloc -static
$ ./stream-open64
Total memory required = 457.8 MB.
Function Rate (MB/s) Avg time Min time Max time
Copy: 22061.4958 0.0145 0.0145 0.0146
Scale: 22228.4705 0.0144 0.0144 0.0145
Add: 20659.2638 0.0233 0.0232 0.0233
Triad: 20511.0888 0.0235 0.0234 0.0235
Dynamic:
$ ./stream-open64-malloc
Function Rate (MB/s) Avg time Min time Max time
Copy: 14436.5155 0.0222 0.0222 0.0222
Scale: 14667.4821 0.0218 0.0218 0.0219
Add: 15739.7070 0.0305 0.0305 0.0305
Triad: 15770.7775 0.0305 0.0304 0.0305
Intel C/C++ Compiler 10.1 on Harpertown CPUs:
Base OPT flags: -O2 -xT -ansi-alias -ip -i-static
Intel recently used
Intel C/C++ Compiler 11.0.081 on Nehalem CPUs:
-O2 -xSSE4.2 -ansi-alias -ip
and got good STREAM results in their HPCC submission on their
ENdeavor cluster.
$ icc -O2 -xSSE4.2 -ansi-alias -ip -openmp stream.c -o stream-icc
$ icc -O2 -xSSE4.2 -ansi-alias -ip -openmp stream-malloc.c -o
stream-icc-malloc
$ ./stream-icc | grep ":"
STREAM version $Revision: 5.9 $
Copy: 14767.0512 0.0022 0.0022 0.0022
Scale: 14304.3513 0.0022 0.0022 0.0023
Add: 15503.3568 0.0031 0.0031 0.0031
Triad: 15613.9749 0.0031 0.0031 0.0031
$ ./stream-icc-malloc | grep ":"
STREAM version $Revision: 5.9 $
Copy: 14604.7582 0.0022 0.0022 0.0022
Scale: 14480.2814 0.0022 0.0022 0.0022
Add: 15414.3321 0.0031 0.0031 0.0031
Triad: 15738.4765 0.0031 0.0030 0.0031
So ICC does manage zero penalty, alas no faster than open64 with the
penalty.
I'll attempt to track down the HPCC stream source code to see if
their dynamic
arrays are any friendlier than mine (I just use malloc).
In any case many thanks for the pointer.
Oh, my dynamic tweak:
$ diff stream.c stream-malloc.c
43a44
# include <stdlib.h>
97c98
< static double a[N+OFFSET],
---
/* static double a[N+OFFSET],
99c100,102
< c[N+OFFSET];
---
c[N+OFFSET]; */
double *a, *b, *c;
134a138,142
a=(double *)malloc(sizeof(double)*(N+OFFSET));
b=(double *)malloc(sizeof(double)*(N+OFFSET));
c=(double *)malloc(sizeof(double)*(N+OFFSET));
283c291,293
<
---
free(a);
free(b);
free(c);
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin
Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
--
üÔÏ ÓÏÏÂÝÅÎÉÅ ÂÙÌÏ ÐÒÏ×ÅÒÅÎÏ ÎÁ ÎÁÌÉÞÉÅ × ÎÅÍ ×ÉÒÕÓÏ×
É ÉÎÏÇÏ ÏÐÁÓÎÏÇÏ ÓÏÄÅÒÖÉÍÏÇÏ ÐÏÓÒÅÄÓÔ×ÏÍ
MailScanner, É ÍÙ ÎÁÄÅÅÍÓÑ
ÞÔÏ ÏÎÏ ÎÅ ÓÏÄÅÒÖÉÔ ×ÒÅÄÏÎÏÓÎÏÇÏ ËÏÄÁ.
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf