Hi!

While the issue is Hurd-specific, non-Hurd people might nevertheless be
able to help here with their glibc/TLS expertise.

I'm working on a patch to move the Hurd's errno from the Hurd-specific
threadvar (in short, a mechanism somewhat equivalent to TLS, using a
portion of space at the beginning of a thread's stack for storing
thread-specific data) to TLS proper.

The specific glibc tree is
<http://git.savannah.gnu.org/cgit/hurd/glibc.git/tree/?id=cba1c83ad62a11347684a9daf349e659237a1741>,
but apart from Hurd-specifc patches this is equivalent to mainline commit
fc56c5bbc1a0d56b9b49171dd377c73c268ebcfd.

On Thu, 10 May 2012 17:25:59 +0800, I wrote:
>     $ gdb -q --args ./ld.so 
>     Reading symbols from /home/tschwinge/tmp/ld.so...done.
>     (gdb) r
>     Starting program: /home/tschwinge/tmp/ld.so 
>     
>     Program received signal EXC_BAD_ACCESS, Could not access memory.
>     0x00015797 in __strerror_r (errnum=0, buf=0x0, buflen=2) at 
> dl-minimal.c:173
>     173     dl-minimal.c: No such file or directory.
>             in dl-minimal.c
>     (gdb) bt
>     #0  0x00015797 in __strerror_r (errnum=0, buf=0x0, buflen=2) at 
> dl-minimal.c:173
>     #1  0x00000000 in ?? ()
>     (gdb) info registers
>     eax            0x0      0
>     ecx            0xa      10
>     edx            0x2      2
>     ebx            0x26ff4  159732
>     esp            0x1028c60        0x1028c60
>     ebp            0x1028cb8        0x1028cb8
>     esi            0xa      10
>     edi            0x21b4c  138060
>     eip            0x15797  0x15797 <__strerror_r+167>
>     eflags         0x10202  [ IF RF ]
>     cs             0x17     23
>     ss             0x1f     31
>     ds             0x1f     31
>     es             0x1f     31
>     fs             0x1f     31
>     gs             0x1f     31
> 
> 0x15797 is bogus: it's not even an instruction boundary.
> 
> Apparently I forgot how to debug ld.so from the very beginning...
> 
> It seems that gs is not set up, but even if that were an invalid TLS gs:X
> access, that doesn't explain to me how the PC would be badly affected by
> that?

It turns out that GDB's understanding of addresses (.text only?) is off
by 0x1000 (has been reloacted, I assume), so after hitting a breakpoint
you have to »set $pc = $pc - 0x1000« to be able to make sense out of
backtraces, etc.  (For posterity, in case this is useful to someone who
then remembers these words, I eventually figured this out by sprinkling a
few »__asm __volatile ("hlt");« (to transfer control to GDB) before the
places in ld.so code where TLS data (errno, specifically) is accessed,
and then comparing the dissassembly and looking for looking for magic
constants, where I found »movl $0x40000009,%gs:(%eax)« (»errno = EBADF«)
and that constant only used in two places, one of them being __writev --
oh, it's trying to print something?  -- etc., etc.)  Manually offsetting
each frame's PC by -0x1000 I then got a backtrace, which included:

    #3  0x00013fb6 in __assert_fail (assertion=0x1e114 "info[30] == ((void *)0) 
|| (info[30]->d_un.d_val & ~0x00000008) == 0", file=0x1f4e3 "dynamic-link.h", 
line=207, function=0x1f6ec "elf_get_dynamic_info") at dl-minimal.c:208
    #4  0x00003f69 in elf_get_dynamic_info (temp=0x0, l=0x24604) at 
dynamic-link.h:206
    #5  _dl_start (arg=0x1027000) at rtld.c:416

In my understanding of x86 TLS (and that understanding is not too
detailed), »movl $0x40000009,%gs:(%eax)« is local-exec TLS, which causes
the linker to set the DF_STATIC_TLS flag, and thus the assertion in
elf/dynamic-link.h, line 206 to fail:

       202  #ifdef RTLD_BOOTSTRAP
       203    /* Only the bind now flags are allowed.  */
       204    assert (info[VERSYMIDX (DT_FLAGS_1)] == NULL
       205            || (info[VERSYMIDX (DT_FLAGS_1)]->d_un.d_val & ~DF_1_NOW) 
== 0);
       206    assert (info[DT_FLAGS] == NULL
       207            || (info[DT_FLAGS]->d_un.d_val & ~DF_BIND_NOW) == 0);
       208    /* Flags must not be set for ld.so.  */
       209    assert (info[DT_RUNPATH] == NULL);
       210    assert (info[DT_RPATH] == NULL);
       211  #else

(Again for posterity, and as GDB would not access the variable properly,
I confirmed this by putting »volatile Elf32_Word tmp =
info[DT_FLAGS]->d_un.d_val; __asm __volatile ("hlt");« before the assert,
and then GDB could »print tmp« to confirm it was 0x10 (DF_STATIC_TLS).)
(At this time, _hurd_init_dtablesize is zero, so it can't print anything
yet, and errno is set to EBADF, triggering the faulting TLS access.

Not knowing what this assert is good for, I simply made it allow the
DF_STATIC_TLS case, too, and this allowed ld.so to progress a little bit
further: if invoked without arguments, it is now able to print its usage
information, elf/rtld.c:dl_main, line 1017.

Yet, something like »./ld.so --library-path $PWD ./libc.so« still fails,
and I (again manually with 0x1000 offset) obtained the following
backtrace:

    #0  0x00004a69 in open_verify (name=0x25ae0 "/home/thomas/libc.so", 
fbp=0x1026a28, loader=0x0, whatcode=0, found_other_class=0x1026a27, 
free_name=true) at dl-load.c:1722
    #1  0x00007915 in _dl_map_object (loader=0x0, name=0x102703b 
"/home/thomas/libc.so", type=1, trace_mode=0, mode=536870912, nsid=0) at 
dl-load.c:2285
    #2  0x00002078 in dl_main (phdr=0x1034, phnum=7, user_entry=0x1026eac, 
auxv=0x0) at rtld.c:1084
    #3  0x00012d25 in go (argdata=0x1026d90) at 
../sysdeps/mach/hurd/dl-sysdep.c:213
    #4  0x00015f46 in _hurd_startup (argptr=0x1027000, main=0x1026f94) at 
hurdstartup.c:188
    #5  0x00013be3 in _dl_sysdep_start (start_argptr=0x1027000, dl_main=0x275a 
<dl_main+4096>) at ../sysdeps/mach/hurd/dl-sysdep.c:281
    #6  0x0000421b in _dl_start_final (arg=0x1027000) at rtld.c:338
    #7  _dl_start (arg=0x1027000) at rtld.c:564

dl-load.c:1722 again is an errno access, and the processor's segment
register setup tells me TLS has not yet been initialized at that point.
Now what is important is that glibc's Hurd-specific code, contrary to the
Linux kernel-specific code, does not have a private errno for ld.so:

sysdeps/mach/hurd/dl-sysdep.h:

    /* The private errno doesn't make sense on the Hurd.  errno is always the
       thread-local slot shared with libc, and it matters to share the cell
       with libc because after startup we use libc functions that set errno
       (open, mmap, etc).  */

    #define RTLD_PRIVATE_ERRNO 0

And thus in the GNU Hurd configuration, ld.so code uses the TLS errno.
In sysdeps/generic/dl-sysdep.h, this is explained/defined as follows:

    /* This macro must be defined to either 0 or 1.

       If 1, then an errno global variable hidden in ld.so will work right with
       all the errno-using libc code compiled for ld.so, and there is never a
       need to share the errno location with libc.  This is appropriate only if
       all the libc functions that ld.so uses are called without PLT and always
       get the versions linked into ld.so rather than the libc ones.  */

    #ifdef IS_IN_rtld
    # define RTLD_PRIVATE_ERRNO 1
    #else
    # define RTLD_PRIVATE_ERRNO 0
    #endif

Now, in elf/rtld.so:dl_main, TLS will eventually be initialized (at
earliest when »we have auditing DSOs to load« -- but this is after
mapping in objects (_dl_map_object which then invokes open_verify that
contains the errno access).

My naïve attempt to simply move »tcbp = init_tls ();« before mapping
objects did not work out -- any suggestions to help me back onto firm
ground?


Any what, by the way, is the story that elf/rtld.c still contains code
conditioned by USE___THREAD (and that code looking somewhat relevant for
my case), but USE___THREAD not being defined anywhere?


Grüße,
 Thomas

Attachment: pgp3o49rfdv0J.pgp
Description: PGP signature

Reply via email to