http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55354
--- Comment #23 from Dmitry Vyukov <dvyukov at google dot com> 2012-11-23 07:27:27 UTC --- (In reply to comment #21) > (In reply to comment #20) > > What I see is that it also affect code generation (register allocation). Do > > we > > need to file a bug on that? > > If you see a code generation difference even with -ftls-model=local-exec -fPIC > vs. -fPIE, then it must mean you don't have visibility attributes on the > symbols used in the fast path. For initial-exec, the RA effects should be > minimal, the TLS offset load from got is usually very close to the actual TLS > memory load (or lea), and thus it will just pick up some short lived scratch > register. Generally in GCC, -fPIE sets flag_pic and not flag_shlib, while > -fPIC sets flag_pic and flag_shlib. flag_pic is about whether position > independent code needs to be generated, flag_shlib is about whether locally > defined symbols can be interposed (plus it affects TLS model default choice). When I compile with -fvisibility=hidden, it does not affect generated code. It's not that we access a lot of symbols in the function, there is one thread-local and one static global var. That "minimal" RA effects do have effect in our case. We don't have a reserve to squeeze another register for tls access: // -fPIE 000000000009ca30 <__tsan_write2>: 9ca30: 64 48 8b 04 25 40 1f mov %fs:0xffffffffffeb1f40,%rax 9ca37: eb ff 9ca39: 48 8b 0c 24 mov (%rsp),%rcx 9ca3d: a8 01 test $0x1,%al 9ca3f: 0f 85 d3 00 00 00 jne 9cb18 <__tsan_write2+0xe8> 9ca45: 48 83 e8 80 sub $0xffffffffffffff80,%rax 9ca49: 48 89 fe mov %rdi,%rsi 9ca4c: 48 89 c2 mov %rax,%rdx 9ca4f: 64 48 89 04 25 40 1f mov %rax,%fs:0xffffffffffeb1f40 9ca56: eb ff // -fPIC -ftls-model=initial-exec 00000000000969f0 <__tsan_write2>: 969f0: 48 c7 c2 40 1f eb ff mov $0xffffffffffeb1f40,%rdx 969f7: 53 push %rbx 969f8: 48 8b 4c 24 08 mov 0x8(%rsp),%rcx 969fd: 64 48 8b 02 mov %fs:(%rdx),%rax 96a01: a8 01 test $0x1,%al 96a03: 0f 85 c7 00 00 00 jne 96ad0 <__tsan_write2+0xe0>