Will Newton <will.new...@linaro.org> writes:

> On 12 December 2013 21:02, Michael Hudson-Doyle
> <michael.hud...@linaro.org> wrote:
>> Hi,
>>
>> Thanks for the respsonse.
>>
>> Will Newton <will.new...@linaro.org> writes:
>>
>>> On 12 December 2013 08:00, Michael Hudson-Doyle
>>> <michael.hud...@linaro.org> wrote:
>>>> Hi all,
>>>>
>>>> I have a bit of a strange one.  I'm not after a full solution, just any
>>>> hints that quickly come to mind :)
>>>>
>>>> After a few simple patches I have a build of mongodb for aarch64 (built
>>>> with gcc-4.8).  However, all of the test binaries that the build spits
>>>> out immediately segfault.  gdb-ing shows that they segfault inside this
>>>> macro:
>>>>
>>>> TSP_DECLARE(OwnedOstreamVector, threadOstreamCache);
>>>>
>>>> This expands to:
>>>>
>>>> #  define TSP_DECLARE(T,p) \
>>>>     extern __thread T* _ ## p; \
>>>>     template<> inline T* TSP<T>::get() const { return _ ## p; } \
>>>>     extern TSP<T> p;
>>>>
>>>> And indeed, it's mongo::TSP<mongo::OwnedPointerVector<...> >::get()
>>>> const that we're segfaulting in.  This is the disassembly of this
>>>> function (at -O0) with the faulting instruction marked:
>>>>
>>>>    0x00000000004b4b6c <+0>:     stp     x29, x30, [sp,#-32]!
>>>>    0x00000000004b4b70 <+4>:     mov     x29, sp
>>>>    0x00000000004b4b74 <+8>:     str     x0, [x29,#16]
>>>>    0x00000000004b4b78 <+12>:    adrp    x0, 0x64c000
>>>>    0x00000000004b4b7c <+16>:    ldr     x0, [x0,#776]
>>>>    0x00000000004b4b80 <+20>:    nop
>>>>    0x00000000004b4b84 <+24>:    nop
>>>>    0x00000000004b4b88 <+28>:    mrs     x1, tpidr_el0
>>>>    0x00000000004b4b8c <+32>:    add     x0, x1, x0
>>>> => 0x00000000004b4b90 <+36>:    ldr     x0, [x0]
>>>>    0x00000000004b4b94 <+40>:    ldp     x29, x30, [sp],#32
>>>>    0x00000000004b4b98 <+44>:    ret
>>>>
>>>> And the registers:
>>>>
>>>> (gdb) info registers
>>>> x0             0x7fb863fd70     548554407280
>>>
>>> This value looks surprisingly large if it is an offset from TP (x1).
>>
>> Yeah, it does a bit doesn't it.
>>
>> (gdb) p/x $x0 - $x1
>> $9 = 0x648680
>>
>> (not really a suspicious number)
>>
>> I guess I don't understand the adrp code.  My understanding is that:
>>
>> 0x00000000004b4b78 <+12>:    adrp    x0, 0x64c000
>>
>> would result in 0x4b4000 + 0x64c000 in x0 and then
>
> The disassembler may have done this for you, would 0x64c000 make more sense?

Yes, indeed.

>> 0x00000000004b4b7c <+16>:    ldr     x0, [x0,#776]
>>
>> reads from 0x4b4000 + 0x64c000 + 776 but
>>
>> (gdb) x 0x4b4000 + 0x64c000 + 776
>> 0xb00308:       Cannot access memory at address 0xb00308
>>
>> (I'm not sure if the disassembly for adrp has the immediate shifted or
>> not, but anyway:
>>
>> (gdb) x 0x4b4000 + (0x64c000<<12) + 776
>> 0x4c4b4308:     Cannot access memory at address 0x4c4b4308)
>>
>> So I'm clearly missing something here...
>>
>>>> x1             0x7fb7ff76f0     548547819248
>>>
>>> Have you tried printing the memory at this address? It looks like it
>>> is probably ok...
>>
>> Yeah, it's fine.
>
> I guess that means that the thread pointer is probably correct.

It's plausible, at least :)

>> (gdb) x/20g $x1
>> 0x7fb7ff76f0:   0x0000007fb7ff7e28      0x0000000000000000
>> 0x7fb7ff7700:   0x0000000000000000      0x0000000000000000
>> 0x7fb7ff7710:   0x0000000000000000      0x0000000000000000
>> 0x7fb7ff7720:   0x0000000000000000      0x0000007fb7e5ce50
>> 0x7fb7ff7730:   0x0000007fb7e5fff8      0x0000000000000000
>> 0x7fb7ff7740:   0x0000007fb7e1bab8      0x0000007fb7e1b4b8
>> 0x7fb7ff7750:   0x0000007fb7e1c3b8      0x0000007fb7e5c550
>> 0x7fb7ff7760:   0x0000000000000000      0x0000000000000000
>> 0x7fb7ff7770:   0x0000000000000000      0x0000000000000000
>> 0x7fb7ff7780:   0x0000000000000000      0x0000000000000000
>>
>>>> I guess I see three things that could be wrong:
>>>>
>>>>  1) The operand to "adrp    x0, 0x64c000"[1]
>>>>  2) The operand to "ldr     x0, [x0,#776]"
>>>
>>> Is there a dynamic reloc for this GOT slot?
>>
>> How would I tell? :)
>
> Generally the TLS code will load the TP then load an offset from the
> GOT that the dynamic linker has fixed up based on a dynamic relocation
> which should reference the correct symbol etc.
>
> I would guess that 0x64c000 is the base of the GOT and 776 is the
> offset into it (but I could be wrong). objdump -h will give you the
> layout of the sections, objdump -R will dump the relocations.

So I get this:

$ objdump -h build/linux2/normal/mongo/base/counter_test | grep got --context=2
 23 .dynamic      00000220  000000000064b160  000000000064b160  0023b160  2**3
                  CONTENTS, ALLOC, LOAD, DATA
 24 .got          00001c78  000000000064b380  000000000064b380  0023b380  2**3
                  CONTENTS, ALLOC, LOAD, DATA
 25 .data         00000130  000000000064d000  000000000064d000  0023d000  2**4

And objdump -C -R gives this: http://paste.ubuntu.com/6563640/

This would seem to be the relevant entry:

000000000064ccb8 R_AARCH64_TLS_TPREL64  mongo::_threadOstreamCache

But I don't know what the offset means here and how it relates to the
776 in "ldr     x0, [x0,#776]".  0x64c000 + 776 is 0x64c308 which is

000000000064c308 R_AARCH64_GLOB_DAT  vtable for 
boost::program_options::typed_value<unsigned int, char>

which is just random, but I don't know if that's a valid thing to be
looking at :-)  That said, if we examine the memory at 0x64ccb8 and
interpret it as an offset against tpidr_el0 things *seem* to make sense:

(gdb) x 0x64ccb8
0x64ccb8:       0x00000010
(gdb) x/g $x1 + 0x10
0x7fb7ff7700:   0x0000000000000000

The correct value for this tls pointer at this point in time _is_ in
fact NULL, but obviously this could happen just by chance :-)

Still, looks a bit like a toolchain bug to me.  This is with g++ 4.8
from trusty fwiw.

Cheers,
mwh

_______________________________________________
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Reply via email to