This bug-report is in fact a wishlist for an optimization described in GCC
manual, yet not implemented unfortunately (at least not for x86). About the
"internal" visibility of a symbol, the manual states: "By indicating that a
symbol cannot be called from outside the module, GCC may for instance omit the
load of a PIC register since it is known that the calling function loaded the
correct value."

This is a great idea, since loading the GOT register on x86 is a costly
operation. Even with "-march=pentium3" (it prevents the return address predictor
of the processor from going into lalaland because of the load), the PIC version
of the testcase still runs twice as slow. Although g() has already loaded the
GOT address in the %ebx callee-save register, f() will load it once again in 
%ecx.

The optimization described for the "internal" visibility would prevent such a
reload, since GCC would have complete control over the callers. I did not see
anything in the psABI that would disallow such an optimization. Hence this
wishlist. This was tested with GCC 4.0.2 and compiled by "gcc -O -fPIC" (or 
-fpic).

Testcase:

        extern int a;

        __attribute__((visibility("internal")))
        void f(void) { ++a; }

        void g(void) { a = 0; f(); }


Excerpt from the generated assembly code:

080483c9 <g>:
 80483c9:       55                      push   %ebp
 80483ca:       89 e5                   mov    %esp,%ebp
 80483cc:       53                      push   %ebx
 80483cd:       e8 00 00 00 00          call   80483d2 <g+0x9>  \
 80483d2:       5b                      pop    %ebx              | first load
 80483d3:       81 c3 2e 12 00 00       add    $0x122e,%ebx     /
 80483d9:       8b 83 f8 ff ff ff       mov    0xfffffff8(%ebx),%eax
 80483df:       c7 00 00 00 00 00       movl   $0x0,(%eax)
 80483e5:       e8 c6 ff ff ff          call   80483b0 <f>
 ...
080483b0 <f>:
 80483b0:       55                      push   %ebp
 80483b1:       89 e5                   mov    %esp,%ebp
 80483b3:       e8 00 00 00 00          call   80483b8 <f+0x8>  \
 80483b8:       59                      pop    %ecx              | second load
 80483b9:       81 c1 48 12 00 00       add    $0x1248,%ecx     /
 80483bf:       8b 81 f8 ff ff ff       mov    0xfffffff8(%ecx),%eax
 80483c5:       ff 00                   incl   (%eax)
 80483c7:       5d                      pop    %ebp
 80483c8:       c3                      ret


Note: it is impossible to specify both the "internal" visibility and the
"static" qualifier (GCC complains). And using only "static" does not help here
either.

$ gcc -v
Using built-in specs.
Target: i486-linux-gnu
Configured with: ../src/configure -v
--enable-languages=c,c++,java,f95,objc,ada,treelang --prefix=/usr
--enable-shared --with-system-zlib --libexecdir=/usr/lib --enable-nls
--without-included-gettext --enable-threads=posix --program-suffix=-4.0
--enable-__cxa_atexit --enable-libstdcxx-allocator=mt --enable-clocale=gnu
--enable-libstdcxx-debug --enable-java-gc=boehm --enable-java-awt=gtk
--enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.4.2-gcj-4.0-1.4.2.0/jre
--enable-mpfr --disable-werror --enable-checking=release i486-linux-gnu
Thread model: posix
gcc version 4.0.2 20050821 (prerelease) (Debian 4.0.1-6)

-- 
           Summary: Missed optimization for PIC code with internal
                    visibility
           Product: gcc
           Version: 4.0.2
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: c
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: guillaume dot melquiond at ens-lyon dot fr
                CC: gcc-bugs at gcc dot gnu dot org
GCC target triplet: i486-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23756

Reply via email to