http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21182



--- Comment #6 from Denis Vlasenko <vda.linux at googlemail dot com> 2013-01-18 
00:48:23 UTC ---

Created attachment 29200

  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=29200

Updated testcase, build heper, and results of testing with different gcc

versions



Tarball contains:



serpent.c:

the original testcase, only with "#ifdef NAIL_REGS" instead of "#if 0" which

allows test compiles w/o editing it. Basically, "gcc -DNAIL_REGS serpent.c"

will try to force gcc to use only registers instead of stack.



gencode.sh:

builds serpent.c with -O2 and -O3, with and without -DNAIL_REGS. The object

file names contain gcc version and used options. Then they are objdump'ed and

output saved. Tweakable with setting $PREFIX and/or $CC.

No -fomit-frame-pointer used: the testcase can be compiled so that stack is not

used even without that option.



Disassembly:

serpent-O2-3.4.3.asm

serpent-O2-4.2.1.asm

serpent-O2-4.6.3.asm

serpent-O2-DNAIL_REGS-3.4.3.asm

serpent-O2-DNAIL_REGS-4.2.1.asm

serpent-O2-DNAIL_REGS-4.6.3.asm

serpent-O3-3.4.3.asm

serpent-O3-4.2.1.asm

serpent-O3-4.6.3.asm

serpent-O3-DNAIL_REGS-3.4.3.asm

serpent-O3-DNAIL_REGS-4.2.1.asm

serpent-O3-DNAIL_REGS-4.6.3.asm



Object files:

   text    data     bss     dec     hex filename

   3260       0       0    3260     cbc serpent-O2-DNAIL_REGS-3.4.3.o

   3260       0       0    3260     cbc serpent-O3-DNAIL_REGS-3.4.3.o

   3292       0       0    3292     cdc serpent-O3-3.4.3.o

   3536       0       0    3536     dd0 serpent-O2-4.6.3.o

   3536       0       0    3536     dd0 serpent-O3-4.6.3.o

   3845       0       0    3845     f05 serpent-O2-DNAIL_REGS-4.6.3.o

   3845       0       0    3845     f05 serpent-O3-DNAIL_REGS-4.6.3.o

   3877       0       0    3877     f25 serpent-O2-4.2.1.o

   3877       0       0    3877     f25 serpent-O3-4.2.1.o

   4302       0       0    4302    10ce serpent-O2-3.4.3.o

   4641       0       0    4641    1221 serpent-O2-DNAIL_REGS-4.2.1.o

   4641       0       0    4641    1221 serpent-O3-DNAIL_REGS-4.2.1.o



Take a look inside serpent-O2-DNAIL_REGS-3.4.3.asm file.

This is what I want to get without asm hacks: the smallest code, uses no stack.



gcc-3.4.3 -O3 comes close: it does spill a few words to stack (search for

(%ebp)), but is generally good code (close to ideal?).



All other attempts fare worse:



gcc-3.4.3 -O2: code is significantly worse than -O3.

gcc-4.2.1 -O2/-O3: code is better than gcc-3.4.3 -O2, worse than gcc-4.6.3

gcc-4.6.3 -O2/-O3: six instances of spills to stack . Code is still not as good

as gcc-3.4.3 -O3. (-DNAIL_REGS only confuses it more, unlike 3.4.3).



Stack usage summary:



$ grep 'sub.*,%esp' *.asm | grep -v DNAIL_REGS

serpent-O2-3.4.3.asm:   6:  81 ec 00 01 00 00       sub    $0x100,%esp

serpent-O2-4.2.1.asm:   6:  83 ec 78                sub    $0x78,%esp

serpent-O2-4.6.3.asm:   4:  83 ec 04                sub    $0x4,%esp

serpent-O3-4.2.1.asm:   6:  83 ec 78                sub    $0x78,%esp

serpent-O3-4.6.3.asm:   4:  83 ec 04                sub    $0x4,%esp



(serpent-O3-3.4.3.asm is not listed, but it allocates and uses one word on

stack by push insn).





Modules with best (= minimal) stack usage:



$ grep -F -e '(%esp)' -e '(%ebp)' serpent-O2-DNAIL_REGS-3.4.3.asm

   6:   8b 75 08                mov    0x8(%ebp),%esi

   9:   8b 7d 10                mov    0x10(%ebp),%edi

 ca9:   8b 75 0c                mov    0xc(%ebp),%esi



$ grep -F -e '(%esp)' -e '(%ebp)' serpent-O3-3.4.3.asm

   7:   8b 7d 08                mov    0x8(%ebp),%edi

   a:   8b 4d 10                mov    0x10(%ebp),%ecx

 18c:   89 7d f0                mov    %edi,-0x10(%ebp)

 1dd:   8b 45 f0                mov    -0x10(%ebp),%eax

 23b:   8b 75 f0                mov    -0x10(%ebp),%esi

 299:   8b 7d f0                mov    -0x10(%ebp),%edi

 432:   8b 55 f0                mov    -0x10(%ebp),%edx

 4a0:   8b 4d f0                mov    -0x10(%ebp),%ecx

 50e:   8b 7d f0                mov    -0x10(%ebp),%edi

 84f:   8b 45 f0                mov    -0x10(%ebp),%eax

 8b9:   8b 75 f0                mov    -0x10(%ebp),%esi

 923:   8b 7d f0                mov    -0x10(%ebp),%edi

 cb6:   8b 55 0c                mov    0xc(%ebp),%edx



$ grep -F -e '(%esp)' -e '(%ebp)' serpent-O3-4.6.3.asm

   7:   8b 4c 24 20             mov    0x20(%esp),%ecx

   b:   8b 44 24 18             mov    0x18(%esp),%eax

 22e:   89 0c 24                mov    %ecx,(%esp)

 239:   23 3c 24                and    (%esp),%edi

 588:   89 0c 24                mov    %ecx,(%esp)

 58f:   23 3c 24                and    (%esp),%edi

 8f4:   89 0c 24                mov    %ecx,(%esp)

 8fd:   23 3c 24                and    (%esp),%edi

 c60:   89 0c 24                mov    %ecx,(%esp)

 c6b:   23 3c 24                and    (%esp),%edi

 d37:   89 14 24                mov    %edx,(%esp)

 d5a:   8b 44 24 1c             mov    0x1c(%esp),%eax

 d5e:   33 14 24                xor    (%esp),%edx





Conclusion:

gcc-4.6.3 -O3 was close to ideal.

gcc-4.2.1 is worse.

gcc-4.6.3 got better a bit, still not as good as gcc-4.6.3 -O3.

Reply via email to