On Thu, Sep 8, 2011 at 12:31 AM, Steve White <stevan.wh...@googlemail.com> wrote: > Hi, > > I run some tests of simple number-crunching loops whenever new > architectures and compilers arise. > > These tests on recent Intel architectures show similar performance > between gcc and icc compilers, at full optimization. > > However a recent test on x86_64 showed the open64 compiler > outstripping gcc by a factor of 2 to 3. I tried all the obvious > flags; nothing helped.
Like -funroll-loops? > Versions: gcc 4.5.2, Open64 4.2.5. AMD Phenom(tm) II X4 840 Processor. > > A peek in the assembler makes it clear though. Even with -O3, gcc is > not unrolling loops in this code, but opencc does, and profits. > > Attached find the C file. It's not pretty but the guts are in the > small routine double_array_mults_by_const(). > > For your convenience, also attached is the assembler for the innermost > loop, generated by the two compilers with the -S flag. > ----------------------------------------------------------------------- > Building and running: > > $ gcc --std=c99 -O3 -Wall -pedantic mults_by_const.c > $ ./a.out > double array mults by const 450 ms [ 1.013193] > > $ opencc -std=c99 -O3 -Wall mults_by_const.c > $ ./a.out > double array mults by const 170 ms [ 1.013193] > ----------------------------------------------------------------------- > Now, the gcc -O3 should have turned on loop unrolling. I tried turning > it on explicitly without success. > > By the way, I also tried. No difference. > -march=native > and > -ffast-math > did not affect the time at all. > > Cheers! >