http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55354
--- Comment #10 from Jakub Jelinek <jakub at gcc dot gnu.org> 2012-11-18 19:54:37 UTC --- (In reply to comment #9) > NOT-SO-BAD: -fPIC -shared -ftls-model=initial-exec > % gcc x.c -O2 -fPIC -shared -o x.so -ftls-model=initial-exec ; objdump -d > x.so > | grep foo.: -A 5 > 0000000000000630 <foo>: > 630: 48 8b 05 a9 09 20 00 mov 0x2009a9(%rip),%rax # 200fe0 > <_DYNAMIC+0x1b8> > 637: 64 8b 00 mov %fs:(%rax),%eax > 63a: c3 retq > > > GOOD: -fPIE > % gcc -c x.c -O2 -fPIE -o x.o ; objdump -d x.o | grep foo.: -A 5 > 0000000000000000 <foo>: > 0: 64 8b 04 25 00 00 00 mov %fs:0x0,%eax > 7: 00 > 8: c3 retq > > > So, while -ftls-model=initial-exec improves the TLS performance, it is still > 2x slower than -fPIE. Except obviously you can't use the last code sequence if you want to link it into a shared library. The extra indirection is the standard cost of relocatable code, especially if there are just a few TLS vars in libtsan and they are accessed a lot, that memory (the .got section entry) is in caches most likely and so the indirection can be just a cycle or at most a few of them. No idea how would you plan to compile libtsan with -fPIE flag, for libtsan.so.0 you obviously can't, it would fail to link or load, and for libtsan.a it would make the shared library only usable in executables or PIEs, not from shared libraries.