https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96607
Bug ID: 96607 Summary: GCC feeds SPARC/Solaris linker with unrecognized TLS sequences Product: gcc Version: 10.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: vita.batrla at gmail dot com Target Milestone: --- GCC generates invalid TLS sequences not understood by SPARC/Solaris linker so resulting binary dumps core. The following example shows dump of .o file containing TLS sequence that is not contiguous and uses register %i2 to hold temporary value. See an example: 48: 35 00 00 00 sethi %hi(0), %i2 48: R_SPARC_TLS_GD_HI22 _ZSt15__once_callable ... 50: b4 06 a0 00 add %i2, 0, %i2 50: R_SPARC_TLS_GD_LO10 _ZSt15__once_callable ... 68: b4 05 c0 1a add %l7, %i2, %i2 68: R_SPARC_TLS_GD_ADD _ZSt15__once_callable ... 398: 40 00 00 00 call 398 <_ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZN4llvm10ThreadPoolC4EjEUlvE_EEEEE6_M_runEv+0x398> 398: R_SPARC_TLS_GD_CALL _ZSt15__once_callable 39c: 90 10 00 1a mov %i2, %o0 Per spec [1] GD TLS sequnce should be 4 instructions in this order: General Dynamic Model Code Sequence Initial Relocation Symbol 0x00 sethi %hi(@dtlndx(x)), %o0 R_SPARC_TLS_GD_HI22 x 0x04 add %o0,%lo(@dtlndx(x)), %o0 R_SPARC_TLS_GD_LO10 x 0x08 add %l7,%o0, %o0 R_SPARC_TLS_GD_ADD x 0x0c call __tls_get_addr R_SPARC_TLS_GD_CALL x With comment: > The code sequence must appear in the code as is. It is not possible to move > the second add instruction in the delay slot of the call instruction since > the linker would not recognize the instruction sequence. Solaris linker doesn't expect anything in the branch delay slot of the above call instruction. An optimized sequence using a delay slot of call instruction related to R_SPARC_TLS_GD_CALL relocation confuses the linker and causes it to produce invalid code. Solaris linker in GD->IE translation replaces call instruction by an 'add'. So the above example at offset 398 becomes: example by > +0x398: call +0 ----> add %g7, %o0, %o0 > +0x39c: mov %i2, %o0 ----> mov %i2, %o0 Before linking the "mov" instruction executes before "call" instruction (it is in branch delay slot). After linking the 'mov' instruction executes in program order, so the effect of 'add' instruction is lost and register %o0 value becomes corrupted leading to SIGSEGV on dereference in best case. 3952 /* Return nonzero if TRIAL can go into the call delay slot. */ 3953 3954 int 3955 eligible_for_call_delay (rtx_insn *trial) 3956 { 3957 rtx pat; 3958 3959 if (get_attr_in_branch_delay (trial) == IN_BRANCH_DELAY_FALSE) 3960 return 0; 3961 3962 /* The only problematic cases are TLS sequences with Sun as/ld. */ 3963 if ((TARGET_GNU_TLS && HAVE_GNU_LD) || !TARGET_TLS) 3964 return 1; 3965 3966 pat = PATTERN (trial); 3967 3968 /* We must reject tgd_add{32|64}, i.e. 3969 (set (reg) (plus (reg) (unspec [(reg) (symbol_ref)] UNSPEC_TLSGD))) 3970 and tldm_add{32|64}, i.e. 3971 (set (reg) (plus (reg) (unspec [(reg) (symbol_ref)] UNSPEC_TLSLDM))) 3972 for Sun as/ld. */ 3973 if (GET_CODE (pat) == SET 3974 && GET_CODE (SET_SRC (pat)) == PLUS) 3975 { 3976 rtx unspec = XEXP (SET_SRC (pat), 1); 3977 3978 if (GET_CODE (unspec) == UNSPEC 3979 && (XINT (unspec, 1) == UNSPEC_TLSGD 3980 || XINT (unspec, 1) == UNSPEC_TLSLDM)) 3981 return 0; 3982 } 3983 3984 return 1; 3985 } 3986 My hypothesis is that the block / check at line 3973 catches only 'add' instructions in TLS sequence, but not if the TLS sequence calculates a temporary value and stores it in register (intermediate code): mov 'reg_with_temp_value', %o0 <-- eligible_for_call_delay() return 1 call R_SPARC_TLS_GD_CALL the 'mov' bypasses branch delay slot eligibility check andcompiler produces .o: call R_SPARC_TLS_GD_CALL mov 'reg_with_temp_value', %o0 Then Solaris linker comes and optimizes GD -> IE like this: add %g7, %o0, %o0 mov 'reg_with_temp_value', %o0 And breaks the binary. The check in eligible_for_call_delay() on line 3963 updated in [2] is fallen through as the gcc used in example above is compiled to use GNU 'as' but Solaris linker. The decision whether the instruction is eligible for delay slot is then taken by 3973, but that check unfortunately handles simple cases only. [1] https://www.uclibc.org/docs/tls.pdf [2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93704