On Thu, Sep 8, 2011 at 12:31 AM, Steve White
<stevan.wh...@googlemail.com> wrote:
> Hi,
>
> I run some tests of simple number-crunching loops whenever new
> architectures and compilers arise.
>
> These tests on recent Intel architectures show similar performance
> between gcc and icc compilers, at full optimization.
>
> However a recent test on x86_64 showed the open64 compiler
> outstripping gcc by a factor of 2 to 3.  I tried all the obvious
> flags; nothing helped.

Like -funroll-loops?

> Versions: gcc 4.5.2, Open64 4.2.5.  AMD Phenom(tm) II X4 840 Processor.
>
> A peek in the assembler makes it clear though.  Even with -O3, gcc is
> not unrolling loops in this code, but opencc does, and profits.
>
> Attached find the C file. It's not pretty but the guts are in the
> small routine double_array_mults_by_const().
>
> For your convenience, also attached is the assembler for the innermost
> loop, generated by the two compilers with the -S flag.
> -----------------------------------------------------------------------
> Building and running:
>
> $ gcc --std=c99 -O3 -Wall -pedantic mults_by_const.c
> $ ./a.out
> double array mults by const             450 ms [  1.013193]
>
> $ opencc -std=c99 -O3 -Wall mults_by_const.c
> $ ./a.out
> double array mults by const             170 ms [  1.013193]
> -----------------------------------------------------------------------
> Now, the gcc -O3 should have turned on loop unrolling. I tried turning
> it on explicitly without success.
>
> By the way, I also tried.  No difference.
>        -march=native
> and
>        -ffast-math
> did not affect the time at all.
>
> Cheers!
>

Reply via email to