Will Newton <will.new...@linaro.org> writes: > On 12 December 2013 21:02, Michael Hudson-Doyle > <michael.hud...@linaro.org> wrote: >> Hi, >> >> Thanks for the respsonse. >> >> Will Newton <will.new...@linaro.org> writes: >> >>> On 12 December 2013 08:00, Michael Hudson-Doyle >>> <michael.hud...@linaro.org> wrote: >>>> Hi all, >>>> >>>> I have a bit of a strange one. I'm not after a full solution, just any >>>> hints that quickly come to mind :) >>>> >>>> After a few simple patches I have a build of mongodb for aarch64 (built >>>> with gcc-4.8). However, all of the test binaries that the build spits >>>> out immediately segfault. gdb-ing shows that they segfault inside this >>>> macro: >>>> >>>> TSP_DECLARE(OwnedOstreamVector, threadOstreamCache); >>>> >>>> This expands to: >>>> >>>> # define TSP_DECLARE(T,p) \ >>>> extern __thread T* _ ## p; \ >>>> template<> inline T* TSP<T>::get() const { return _ ## p; } \ >>>> extern TSP<T> p; >>>> >>>> And indeed, it's mongo::TSP<mongo::OwnedPointerVector<...> >::get() >>>> const that we're segfaulting in. This is the disassembly of this >>>> function (at -O0) with the faulting instruction marked: >>>> >>>> 0x00000000004b4b6c <+0>: stp x29, x30, [sp,#-32]! >>>> 0x00000000004b4b70 <+4>: mov x29, sp >>>> 0x00000000004b4b74 <+8>: str x0, [x29,#16] >>>> 0x00000000004b4b78 <+12>: adrp x0, 0x64c000 >>>> 0x00000000004b4b7c <+16>: ldr x0, [x0,#776] >>>> 0x00000000004b4b80 <+20>: nop >>>> 0x00000000004b4b84 <+24>: nop >>>> 0x00000000004b4b88 <+28>: mrs x1, tpidr_el0 >>>> 0x00000000004b4b8c <+32>: add x0, x1, x0 >>>> => 0x00000000004b4b90 <+36>: ldr x0, [x0] >>>> 0x00000000004b4b94 <+40>: ldp x29, x30, [sp],#32 >>>> 0x00000000004b4b98 <+44>: ret >>>> >>>> And the registers: >>>> >>>> (gdb) info registers >>>> x0 0x7fb863fd70 548554407280 >>> >>> This value looks surprisingly large if it is an offset from TP (x1). >> >> Yeah, it does a bit doesn't it. >> >> (gdb) p/x $x0 - $x1 >> $9 = 0x648680 >> >> (not really a suspicious number) >> >> I guess I don't understand the adrp code. My understanding is that: >> >> 0x00000000004b4b78 <+12>: adrp x0, 0x64c000 >> >> would result in 0x4b4000 + 0x64c000 in x0 and then > > The disassembler may have done this for you, would 0x64c000 make more sense?
Yes, indeed. >> 0x00000000004b4b7c <+16>: ldr x0, [x0,#776] >> >> reads from 0x4b4000 + 0x64c000 + 776 but >> >> (gdb) x 0x4b4000 + 0x64c000 + 776 >> 0xb00308: Cannot access memory at address 0xb00308 >> >> (I'm not sure if the disassembly for adrp has the immediate shifted or >> not, but anyway: >> >> (gdb) x 0x4b4000 + (0x64c000<<12) + 776 >> 0x4c4b4308: Cannot access memory at address 0x4c4b4308) >> >> So I'm clearly missing something here... >> >>>> x1 0x7fb7ff76f0 548547819248 >>> >>> Have you tried printing the memory at this address? It looks like it >>> is probably ok... >> >> Yeah, it's fine. > > I guess that means that the thread pointer is probably correct. It's plausible, at least :) >> (gdb) x/20g $x1 >> 0x7fb7ff76f0: 0x0000007fb7ff7e28 0x0000000000000000 >> 0x7fb7ff7700: 0x0000000000000000 0x0000000000000000 >> 0x7fb7ff7710: 0x0000000000000000 0x0000000000000000 >> 0x7fb7ff7720: 0x0000000000000000 0x0000007fb7e5ce50 >> 0x7fb7ff7730: 0x0000007fb7e5fff8 0x0000000000000000 >> 0x7fb7ff7740: 0x0000007fb7e1bab8 0x0000007fb7e1b4b8 >> 0x7fb7ff7750: 0x0000007fb7e1c3b8 0x0000007fb7e5c550 >> 0x7fb7ff7760: 0x0000000000000000 0x0000000000000000 >> 0x7fb7ff7770: 0x0000000000000000 0x0000000000000000 >> 0x7fb7ff7780: 0x0000000000000000 0x0000000000000000 >> >>>> I guess I see three things that could be wrong: >>>> >>>> 1) The operand to "adrp x0, 0x64c000"[1] >>>> 2) The operand to "ldr x0, [x0,#776]" >>> >>> Is there a dynamic reloc for this GOT slot? >> >> How would I tell? :) > > Generally the TLS code will load the TP then load an offset from the > GOT that the dynamic linker has fixed up based on a dynamic relocation > which should reference the correct symbol etc. > > I would guess that 0x64c000 is the base of the GOT and 776 is the > offset into it (but I could be wrong). objdump -h will give you the > layout of the sections, objdump -R will dump the relocations. So I get this: $ objdump -h build/linux2/normal/mongo/base/counter_test | grep got --context=2 23 .dynamic 00000220 000000000064b160 000000000064b160 0023b160 2**3 CONTENTS, ALLOC, LOAD, DATA 24 .got 00001c78 000000000064b380 000000000064b380 0023b380 2**3 CONTENTS, ALLOC, LOAD, DATA 25 .data 00000130 000000000064d000 000000000064d000 0023d000 2**4 And objdump -C -R gives this: http://paste.ubuntu.com/6563640/ This would seem to be the relevant entry: 000000000064ccb8 R_AARCH64_TLS_TPREL64 mongo::_threadOstreamCache But I don't know what the offset means here and how it relates to the 776 in "ldr x0, [x0,#776]". 0x64c000 + 776 is 0x64c308 which is 000000000064c308 R_AARCH64_GLOB_DAT vtable for boost::program_options::typed_value<unsigned int, char> which is just random, but I don't know if that's a valid thing to be looking at :-) That said, if we examine the memory at 0x64ccb8 and interpret it as an offset against tpidr_el0 things *seem* to make sense: (gdb) x 0x64ccb8 0x64ccb8: 0x00000010 (gdb) x/g $x1 + 0x10 0x7fb7ff7700: 0x0000000000000000 The correct value for this tls pointer at this point in time _is_ in fact NULL, but obviously this could happen just by chance :-) Still, looks a bit like a toolchain bug to me. This is with g++ 4.8 from trusty fwiw. Cheers, mwh _______________________________________________ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain