On 12 December 2013 21:59, Michael Hudson-Doyle <michael.hud...@linaro.org> wrote: > Will Newton <will.new...@linaro.org> writes: > >> On 12 December 2013 21:02, Michael Hudson-Doyle >> <michael.hud...@linaro.org> wrote: >>> Hi, >>> >>> Thanks for the respsonse. >>> >>> Will Newton <will.new...@linaro.org> writes: >>> >>>> On 12 December 2013 08:00, Michael Hudson-Doyle >>>> <michael.hud...@linaro.org> wrote: >>>>> Hi all, >>>>> >>>>> I have a bit of a strange one. I'm not after a full solution, just any >>>>> hints that quickly come to mind :) >>>>> >>>>> After a few simple patches I have a build of mongodb for aarch64 (built >>>>> with gcc-4.8). However, all of the test binaries that the build spits >>>>> out immediately segfault. gdb-ing shows that they segfault inside this >>>>> macro: >>>>> >>>>> TSP_DECLARE(OwnedOstreamVector, threadOstreamCache); >>>>> >>>>> This expands to: >>>>> >>>>> # define TSP_DECLARE(T,p) \ >>>>> extern __thread T* _ ## p; \ >>>>> template<> inline T* TSP<T>::get() const { return _ ## p; } \ >>>>> extern TSP<T> p; >>>>> >>>>> And indeed, it's mongo::TSP<mongo::OwnedPointerVector<...> >::get() >>>>> const that we're segfaulting in. This is the disassembly of this >>>>> function (at -O0) with the faulting instruction marked: >>>>> >>>>> 0x00000000004b4b6c <+0>: stp x29, x30, [sp,#-32]! >>>>> 0x00000000004b4b70 <+4>: mov x29, sp >>>>> 0x00000000004b4b74 <+8>: str x0, [x29,#16] >>>>> 0x00000000004b4b78 <+12>: adrp x0, 0x64c000 >>>>> 0x00000000004b4b7c <+16>: ldr x0, [x0,#776] >>>>> 0x00000000004b4b80 <+20>: nop >>>>> 0x00000000004b4b84 <+24>: nop >>>>> 0x00000000004b4b88 <+28>: mrs x1, tpidr_el0 >>>>> 0x00000000004b4b8c <+32>: add x0, x1, x0 >>>>> => 0x00000000004b4b90 <+36>: ldr x0, [x0] >>>>> 0x00000000004b4b94 <+40>: ldp x29, x30, [sp],#32 >>>>> 0x00000000004b4b98 <+44>: ret >>>>> >>>>> And the registers: >>>>> >>>>> (gdb) info registers >>>>> x0 0x7fb863fd70 548554407280 >>>> >>>> This value looks surprisingly large if it is an offset from TP (x1). >>> >>> Yeah, it does a bit doesn't it. >>> >>> (gdb) p/x $x0 - $x1 >>> $9 = 0x648680 >>> >>> (not really a suspicious number) >>> >>> I guess I don't understand the adrp code. My understanding is that: >>> >>> 0x00000000004b4b78 <+12>: adrp x0, 0x64c000 >>> >>> would result in 0x4b4000 + 0x64c000 in x0 and then >> >> The disassembler may have done this for you, would 0x64c000 make more sense? > > Yes, indeed. > >>> 0x00000000004b4b7c <+16>: ldr x0, [x0,#776] >>> >>> reads from 0x4b4000 + 0x64c000 + 776 but >>> >>> (gdb) x 0x4b4000 + 0x64c000 + 776 >>> 0xb00308: Cannot access memory at address 0xb00308 >>> >>> (I'm not sure if the disassembly for adrp has the immediate shifted or >>> not, but anyway: >>> >>> (gdb) x 0x4b4000 + (0x64c000<<12) + 776 >>> 0x4c4b4308: Cannot access memory at address 0x4c4b4308) >>> >>> So I'm clearly missing something here... >>> >>>>> x1 0x7fb7ff76f0 548547819248 >>>> >>>> Have you tried printing the memory at this address? It looks like it >>>> is probably ok... >>> >>> Yeah, it's fine. >> >> I guess that means that the thread pointer is probably correct. > > It's plausible, at least :) > >>> (gdb) x/20g $x1 >>> 0x7fb7ff76f0: 0x0000007fb7ff7e28 0x0000000000000000 >>> 0x7fb7ff7700: 0x0000000000000000 0x0000000000000000 >>> 0x7fb7ff7710: 0x0000000000000000 0x0000000000000000 >>> 0x7fb7ff7720: 0x0000000000000000 0x0000007fb7e5ce50 >>> 0x7fb7ff7730: 0x0000007fb7e5fff8 0x0000000000000000 >>> 0x7fb7ff7740: 0x0000007fb7e1bab8 0x0000007fb7e1b4b8 >>> 0x7fb7ff7750: 0x0000007fb7e1c3b8 0x0000007fb7e5c550 >>> 0x7fb7ff7760: 0x0000000000000000 0x0000000000000000 >>> 0x7fb7ff7770: 0x0000000000000000 0x0000000000000000 >>> 0x7fb7ff7780: 0x0000000000000000 0x0000000000000000 >>> >>>>> I guess I see three things that could be wrong: >>>>> >>>>> 1) The operand to "adrp x0, 0x64c000"[1] >>>>> 2) The operand to "ldr x0, [x0,#776]" >>>> >>>> Is there a dynamic reloc for this GOT slot? >>> >>> How would I tell? :) >> >> Generally the TLS code will load the TP then load an offset from the >> GOT that the dynamic linker has fixed up based on a dynamic relocation >> which should reference the correct symbol etc. >> >> I would guess that 0x64c000 is the base of the GOT and 776 is the >> offset into it (but I could be wrong). objdump -h will give you the >> layout of the sections, objdump -R will dump the relocations. > > So I get this: > > $ objdump -h build/linux2/normal/mongo/base/counter_test | grep got > --context=2 > 23 .dynamic 00000220 000000000064b160 000000000064b160 0023b160 2**3 > CONTENTS, ALLOC, LOAD, DATA > 24 .got 00001c78 000000000064b380 000000000064b380 0023b380 2**3 > CONTENTS, ALLOC, LOAD, DATA > 25 .data 00000130 000000000064d000 000000000064d000 0023d000 2**4 > > And objdump -C -R gives this: http://paste.ubuntu.com/6563640/ > > This would seem to be the relevant entry: > > 000000000064ccb8 R_AARCH64_TLS_TPREL64 mongo::_threadOstreamCache > > But I don't know what the offset means here and how it relates to the > 776 in "ldr x0, [x0,#776]". 0x64c000 + 776 is 0x64c308 which is > > 000000000064c308 R_AARCH64_GLOB_DAT vtable for > boost::program_options::typed_value<unsigned int, char>
This looks wrong. > which is just random, but I don't know if that's a valid thing to be > looking at :-) That said, if we examine the memory at 0x64ccb8 and > interpret it as an offset against tpidr_el0 things *seem* to make sense: > > (gdb) x 0x64ccb8 > 0x64ccb8: 0x00000010 > (gdb) x/g $x1 + 0x10 > 0x7fb7ff7700: 0x0000000000000000 > > The correct value for this tls pointer at this point in time _is_ in > fact NULL, but obviously this could happen just by chance :-) > > Still, looks a bit like a toolchain bug to me. This is with g++ 4.8 > from trusty fwiw. I would be inclined to agree. Is there a simple way to reproduce the build? (although I don't think I will have time to look at it until the new year) -- Will Newton Toolchain Working Group, Linaro _______________________________________________ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain