Btw for those having interest in testing this also for 64-bit, use the
following small testcase port:
.section .text.startup,"x"
.globl main
.def main; .scl 2; .type 32; .endef
main:
subq $40, %rsp
movl _tls_index, %eax
movq %gs:88, %rcx
shlq $3, %rax
addq %rax, %rcx
movq (%rcx), %rcx
movl _tlsVar
So, yes. we need to remove the +1 for tls_start. The issue is here -
beside some nits in .sc files in binutils - that the initialization
code treates those fields as given, so later accesses would need
actual a plus sizeof (void *) offset of the .tls$AAA field.
That the .tls$ZZZ is also part of