With 8 processes, AMD-gnu is better than others. Parallel 8 core job results:
AMD-GNU 26.880 sec AMD-Pathscale 33.746 sec AMD-Intel10 27.979 sec Intel-Intel10 30.371 sec Thank you, Sangamesh Consultant, HPC On Thu, Sep 18, 2008 at 2:08 PM, Bill Broadley <[EMAIL PROTECTED]> wrote: > Sangamesh B wrote: > >> Hi Bill, >> >> I'm sorry. I composed the mail in proper format, but its not showing >> as >> I put. >> >> See, I've tested with three compilers only for AMD. For intel only Intel >> ifort. >> > > Ah, so with 8 threads what was the intel time? The amd-gnu, amd-pathscale, > and amd-intel times? > > > >> Also there are two results for a single run (not for all. I missed out to >> take results with time command). >> >> I hope this helps, >> >> Thanks, >> Sangamesh >> >> On Thu, Sep 18, 2008 at 11:59 AM, Bill Broadley <[EMAIL PROTECTED] >> >wrote: >> >> >>> I'm trying to understand your post, but failed. Can you post a link, >>> publish a google spreadsheet or format it differently? >>> >>> You tried 3 compilers on both machines? Which times are for which >>> CPU/Compiler combos? I tried to match up the columns and ros, but >>> sometimes >>> there were 3 columns, and sometimes 4. None of them lines up nicely >>> under >>> CPU or compiler headings. >>> >>> Mine (and many other folks) read email in ASCII/text, so a table should >>> look like: >>> >>> Serial run: >>> Compiler A Compiler B Compiler C >>> ===================================================== >>> Intel 2.3 GHz 30 29 31 >>> AMD 2.3 GHZ 28 32 32 >>> >>> Note that I used spaces and not tabs so it appears clear to everyone >>> irregardless of their mail client, ascii/text, html, tab settings, etc. >>> >>> I've been testing these machines quite a bit lately and have been quite >>> impressed with the barcelona memory systems, for instance: >>> >>> http://cse.ucdavis.edu/bill/fat-node-numa3.png >>> >>> >>> Sangamesh B wrote: >>> >>> The scientific application used is Dl-Poly - 2.17. >>>> >>>> Tested with Pathscale and Intel compilers on AMD Opteron Quad core. The >>>> time >>>> figures mentioned were taken from DL-Poly output file. Also I had used >>>> time >>>> command. Here are the results: >>>> >>>> >>>> AMD-2.3GHz (32 GB RAM) >>>> INTEL-2.33GHz (32 GB RAM) >>>> >>>> GNU gfortran Pathscale Intel 10 >>>> ifort Intel 10 fiort >>>> >>>> 1. Serial >>>> >>>> OUTPUT file 147.719 sec 158.158 sec 135.729 sec >>>> 73.952 sec >>>> >>>> Time command 2m27.791s >>>> 2m38.268s 1m13.972s >>>> >>>> 2. Parallel >>>> 4 core >>>> >>>> OUTPUT file 39.798 sec 44.717 sec 36.962 sec >>>> 32.317 sec >>>> >>>> Time Command 0m41.527s >>>> 0m46.571s 0m36.218s >>>> >>>> >>>> 3. Parallel >>>> 8 core >>>> >>>> OUTPUT 26.880 sec 33.746 sec 27.979 sec >>>> 30.371 sec >>>> >>>> Time cmd >>>> 0m30.171s >>>> >>>> >>>> The optimization flags used: >>>> >>>> Intel ifort 10: -O3 -axW -funroll-loops (don't remember exact >>>> flag. Similar to loop unroll) >>>> >>>> Pathscale: -O3 -OPT:Ofast -ffast-math -fno-math-errno >>>> >>>> GNU gfortran -O3 -ffast-math -funroll-all-loops -ftree-vectorize >>>> >>>> >>>> I'll try to use the further: http://directory.fsf.org/project/time/ >>>> >>>> Thanks, >>>> Sangamesh >>>> >>>> >>>> On Thu, Sep 18, 2008 at 6:07 AM, Vincent Diepeveen <[EMAIL PROTECTED]> >>>> wrote: >>>> >>>> How does all this change when you use a PGO optimized executable on >>>> both >>>> >>>>> sides? >>>>> >>>>> Vincent >>>>> >>>>> >>>>> On Sep 18, 2008, at 2:34 AM, Eric Thibodeau wrote: >>>>> >>>>> Vincent Diepeveen wrote: >>>>> >>>>> Nah, >>>>>> >>>>>>> I guess he's referring to sometimes it's using single precision >>>>>>> floating >>>>>>> point >>>>>>> to get something done instead of double precision, and it tends to >>>>>>> keep >>>>>>> sometimes stuff in registers. >>>>>>> >>>>>>> That isn't a problem necessarily, but if i remember well floating >>>>>>> point >>>>>>> state >>>>>>> could get wiped out when switching to SSE2. >>>>>>> >>>>>>> Sometimes you lose your FPU registerset in that case. >>>>>>> >>>>>>> Main problem is that there is so many dangerous optimizations >>>>>>> possible, >>>>>>> to speedup testsets, because in itself floating point is real slow to >>>>>>> do >>>>>>> at hardware, >>>>>>> from hardware viewpoint seen. >>>>>>> >>>>>>> Yet in general last generations of intel compilers that has improved >>>>>>> really a lot. >>>>>>> >>>>>>> Well, running the same code here is the result discrepancy I got: >>>>>>> >>>>>> FLOPS: >>>>>> my code has to do: 7,975,847,125,000 (~8Tflops) ...takes 15minutes on >>>>>> 8*2core Opeteron with 32 Gigs-o-RAM (thank you OpenMP ;) >>>>>> >>>>>> The running times (ran it a _few_ times...but not the statistical >>>>>> minimum >>>>>> of 30): >>>>>> ICC -> runtime == 689.249 ; summed error == 1651.78 >>>>>> GCC -> runtime == 1134.404 ; summed error == 0.883501 >>>>>> >>>>>> Compiler Flags: >>>>>> icc -xW -openmp -O3 vqOpenMP.c -o vqOpenMP >>>>>> gcc -lm -fopenmp -O3 -march=native vqOpenMP.c -o vqOpenMP_GCC >>>>>> >>>>>> No trickery, no smoky mirrors ;) Just a _huge_ kick ASS k-Means >>>>>> parallelized with OpenMP (thank gawd, otherwise it takes hours to run) >>>>>> and a >>>>>> rather big database of 1.4 Gigs >>>>>> >>>>>> ... So this is what I meant by floating point errors. Yes, the runtime >>>>>> was >>>>>> almost halved by ICC (and this is on an *opteron* based system, Tyan >>>>>> VX50). >>>>>> The running time wasn't what I was actually looking for rather than >>>>>> precision skew and that's where I fell off my chair. >>>>>> >>>>>> For the ones itching for a little more specs: >>>>>> >>>>>> [EMAIL PROTECTED] ~ $ icc -V >>>>>> Intel(R) C Compiler for applications running on Intel(R) 64, Version >>>>>> 10.1 >>>>>> Build 20080602 >>>>>> Copyright (C) 1985-2008 Intel Corporation. All rights reserved. >>>>>> FOR NON-COMMERCIAL USE ONLY >>>>>> >>>>>> [EMAIL PROTECTED] ~ $ gcc -v >>>>>> Using built-in specs. >>>>>> Target: x86_64-pc-linux-gnu >>>>>> Configured with: >>>>>> /dev/shm/portage/sys-devel/gcc-4.3.1-r1/work/gcc-4.3.1/configure >>>>>> --prefix=/usr --bindir=/usr/x86_64-pc-linux-gnu/gcc-bin/4.3.1 >>>>>> --includedir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.1/include >>>>>> --datadir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.3.1 >>>>>> --mandir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.3.1/man >>>>>> --infodir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.3.1/info >>>>>> >>>>>> >>>>>> --with-gxx-include-dir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.1/include/g++-v4 >>>>>> --host=x86_64-pc-linux-gnu --build=x86_64-pc-linux-gnu >>>>>> --disable-altivec >>>>>> --enable-nls --without-included-gettext --with-system-zlib >>>>>> --disable-checking --disable-werror --enable-secureplt >>>>>> --enable-multilib >>>>>> --enable-libmudflap --disable-libssp --enable-cld --disable-libgcj >>>>>> --enable-languages=c,c++,treelang,fortran --enable-shared >>>>>> --enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu >>>>>> --with-bugurl=http://bugs.gentoo.org/ --with-pkgversion='Gentoo >>>>>> 4.3.1-r1 >>>>>> p1.1' >>>>>> Thread model: posix >>>>>> gcc version 4.3.1 (Gentoo 4.3.1-r1 p1.1) >>>>>> >>>>>> Vincent >>>>>> >>>>>>> On Sep 17, 2008, at 10:25 PM, Greg Lindahl wrote: >>>>>>> >>>>>>> On Wed, Sep 17, 2008 at 03:43:36PM -0400, Eric Thibodeau wrote: >>>>>>> >>>>>>> Also, note that I've had issues with icc >>>>>>>> >>>>>>>> generating really fast but inaccurate code (fp model is not IEEE >>>>>>>>> *by >>>>>>>>> default*, I am sure _everyone_ knows this and I am stating the >>>>>>>>> obvious >>>>>>>>> here). >>>>>>>>> >>>>>>>>> All modern, high-performance compilers default that way. It's >>>>>>>>> >>>>>>>> certainly >>>>>>>> the case that sometimes it goes more horribly wrong than necessary, >>>>>>>> but >>>>>>>> I wouldn't ding icc for this default. Compare results with IEEE >>>>>>>> mode. >>>>>>>> >>>>>>>> -- greg >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>> >>>>> Beowulf mailing list, [email protected] >>>>> To change your subscription (digest mode or unsubscribe) visit >>>>> http://www.beowulf.org/mailman/listinfo/beowulf >>>>> >>>>> >>>>> ------------------------------------------------------------------------ >>>> >>>> _______________________________________________ >>>> Beowulf mailing list, [email protected] >>>> To change your subscription (digest mode or unsubscribe) visit >>>> http://www.beowulf.org/mailman/listinfo/beowulf >>>> >>>> >>> >> >
_______________________________________________ Beowulf mailing list, [email protected] To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
