This bug-report is in fact a wishlist for an optimization described in GCC manual, yet not implemented unfortunately (at least not for x86). About the "internal" visibility of a symbol, the manual states: "By indicating that a symbol cannot be called from outside the module, GCC may for instance omit the load of a PIC register since it is known that the calling function loaded the correct value."
This is a great idea, since loading the GOT register on x86 is a costly operation. Even with "-march=pentium3" (it prevents the return address predictor of the processor from going into lalaland because of the load), the PIC version of the testcase still runs twice as slow. Although g() has already loaded the GOT address in the %ebx callee-save register, f() will load it once again in %ecx. The optimization described for the "internal" visibility would prevent such a reload, since GCC would have complete control over the callers. I did not see anything in the psABI that would disallow such an optimization. Hence this wishlist. This was tested with GCC 4.0.2 and compiled by "gcc -O -fPIC" (or -fpic). Testcase: extern int a; __attribute__((visibility("internal"))) void f(void) { ++a; } void g(void) { a = 0; f(); } Excerpt from the generated assembly code: 080483c9 <g>: 80483c9: 55 push %ebp 80483ca: 89 e5 mov %esp,%ebp 80483cc: 53 push %ebx 80483cd: e8 00 00 00 00 call 80483d2 <g+0x9> \ 80483d2: 5b pop %ebx | first load 80483d3: 81 c3 2e 12 00 00 add $0x122e,%ebx / 80483d9: 8b 83 f8 ff ff ff mov 0xfffffff8(%ebx),%eax 80483df: c7 00 00 00 00 00 movl $0x0,(%eax) 80483e5: e8 c6 ff ff ff call 80483b0 <f> ... 080483b0 <f>: 80483b0: 55 push %ebp 80483b1: 89 e5 mov %esp,%ebp 80483b3: e8 00 00 00 00 call 80483b8 <f+0x8> \ 80483b8: 59 pop %ecx | second load 80483b9: 81 c1 48 12 00 00 add $0x1248,%ecx / 80483bf: 8b 81 f8 ff ff ff mov 0xfffffff8(%ecx),%eax 80483c5: ff 00 incl (%eax) 80483c7: 5d pop %ebp 80483c8: c3 ret Note: it is impossible to specify both the "internal" visibility and the "static" qualifier (GCC complains). And using only "static" does not help here either. $ gcc -v Using built-in specs. Target: i486-linux-gnu Configured with: ../src/configure -v --enable-languages=c,c++,java,f95,objc,ada,treelang --prefix=/usr --enable-shared --with-system-zlib --libexecdir=/usr/lib --enable-nls --without-included-gettext --enable-threads=posix --program-suffix=-4.0 --enable-__cxa_atexit --enable-libstdcxx-allocator=mt --enable-clocale=gnu --enable-libstdcxx-debug --enable-java-gc=boehm --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.4.2-gcj-4.0-1.4.2.0/jre --enable-mpfr --disable-werror --enable-checking=release i486-linux-gnu Thread model: posix gcc version 4.0.2 20050821 (prerelease) (Debian 4.0.1-6) -- Summary: Missed optimization for PIC code with internal visibility Product: gcc Version: 4.0.2 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: guillaume dot melquiond at ens-lyon dot fr CC: gcc-bugs at gcc dot gnu dot org GCC target triplet: i486-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23756