[ACTIVITY] Week 50
== Issues == * 1.5 day of due to car issue. (3/10) * Calxedas are down after lab maintenance. == Progress == * LRA on AArch32: o TCWG-343 : Make LRA the default for the ARM backend (5/10) - Turn LRA on by default committed as rev205887 http://gcc.gnu.org/ml/gcc-patches/2013-12/msg01088.html - New Thumb regressions reported (Cortex-m0 and bootstrap), analysis ongoing. - Analysed last week regressions and reported them upstream, Vladimir fixed them at rev205974. - iWMMXT issue : work ongoing. o TCWG-345 : Analyse performance of LRA for ARM. (0/10) - No progress this week. * Reviewed some merge requests. (1/10) * Various meetings. (1/10) == Next == * Continue LRA, merge and patch reviews. ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: segfault using __thread variable
On 16 December 2013 03:36, Michael Hudson-Doyle wrote: > Michael Hudson-Doyle writes: > >> Aaah, you might be onto something there. I built myself a cross gcc-4.8 >> today and it appeared to compile things correctly (I didn't actually get >> to run it, but the objdump poking looked right) and I got a bit worried >> that this was all down to some cosmic ray / corruption when I first >> compiled it. But, the scripts I cargo culted just use compile binutils >> from git tip, so if the bug is in binutils... > > So I still don't know what's going on, exactly, but I have a debug build > of binutils now and some clues. It still only happens on real hardware, > not cross compiling on my laptop, but I think I have an idea as to why. > This might be complete crack, but anyway. > > I think it's to do with the order of things within the GOT. > > When I cross compile, sort the relocations by address, then count up the > number of relocations of each type, it looks like this: > > $ objdump -C -R build/linux2/*/mongo/base/counter_test | LC_ALL=C sort | cut > -d' ' -f 2 | uniq -c > 4 > 496 R_AARCH64_GLOB_DAT > 1 R_AARCH64_TLS_TPREL64 > 103 R_AARCH64_GLOB_DAT > 305 R_AARCH64_JUMP_SLOT > 12 R_AARCH64_COPY > 1 RELOCATION > 2 > > In this case, the code and the relocation agree on where the thread > local variable is. > > When I compile natively, it looks like this: > > (t-mwhudson)ubuntu@arm64:~/src/mongo$ objdump -C -R > build/linux2/*/mongo/base/counter_test | LC_ALL=C sort | cut -d' ' -f 2 | > uniq -c > 4 > 295 R_AARCH64_JUMP_SLOT > 496 R_AARCH64_GLOB_DAT > 1 R_AARCH64_TLS_TPREL64 > 104 R_AARCH64_GLOB_DAT > 12 R_AARCH64_COPY > 1 RELOCATION > 2 > > And the code and the relocation disagree on where the thread local > variable is -- by 298 * sizeof(void*). Which is almost (but I admit, > not exactly) the number of JUMP_SLOTs that are, in this case, before the > TLS variable in the GOT. When I compiled in a different way, there were > only 160 JUMP_SLOTs before the TLS reloc, and the code and relocation > disagreed by 163 slots. > > So is it possible somehow that the GOT has these JUMP_SLOTs inserted > into it after the relocation for the TLS has been written out? I don't > really see how but maybe this rings a bell... Indeed it does. ;-) A similar issue was caused by commit 692e2b8bcdd8325ebfbe1daace87100d53d15ad6 (which adds ifunc support to the aarch64 ld backend) but was intended to be fixed by the rework of the same code in 1419bbe5712af8a43f1d63feb32687649563426d. However I was never actually able to reproduce the failure case (I saw binaries that were broken so I know it could happen) so the fix was somewhat speculative. Hence I am very interested in finding a reproducible case where this GOT entry misordering happens! -- Will Newton Toolchain Working Group, Linaro ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
[ACTIVITY] 9-13 December 2013
== Progress == - 2013.12 releases (4/10): * stalled due to lab unavailability. * A couple of backports are waiting for approval, another one is being debugged. - cross-validation (4/10): fixed arneb+qemu validations. - misc (2/10): misc conf-calls and meetings == Next == - Make 2013.12 releases - cbuild2: continue testing, try to make 4.7 source release - libsanitizer on AArch64: resume work == Future == Next 2 weeks off (Dec 23rd-Jan 3rd) ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
[ACTIVITY] Week 50
== Progress == TCWG-293 (9/10) - wrote and tested 64bit division code - it seems to work - still need to do performance testing TCWG-347 Fix PR59142 (1/10) - split into series of 3 patches - patch almost ready, was held up by non-availability of the lab - need to bootstrap on Thumb-1 to prove change made in response to review comments TCWG-346 AArch64 Benchmarking: CoreMark & Dhrystone - no significant progress, no access to the lab == Next == Pick up aarch64 benchmarking when the board becomes accessible again Submit PR59142 ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: MMU Off / Strict Alignment
Hi, On 11/20/2013 03:45 PM, Matthew Gretton-Dann wrote: > On 20 November 2013 17:57, Christopher Covington wrote: >> Hi, >> >> We've noticed an issue trying to use the Linaro AArch64 binary bare metal >> toolchain release with the MMU turned off for some low-level tests. >> >> Anytime puts, sprintf, etc. gets called, a reent structure gets created with >> references to STDIN, STDOUT, STDERR FILE types. A member in the __sFile >> struct, _mbstate, is an 8 byte struct, but is not aligned on an 8 byte >> boundary. This means that when memset (or a similar function) gets called on >> this struct, and doesn't operate one byte at a time, a data alignment fault >> will be generated when operating out of device memory, such as on a system >> where the MMU has not yet been turned on yet. We believe to have narrowed down the issue to the AArch64 optimized memcpy/memset implementations that assume unaligned accesses will not fault. While the current AArch64 libgloss startup code turns the MMU on so such accesses will succeed, I don't think turning on the MMU should be required of all startup code. Would it be possible to modify these routines to make only size-aligned accesses without degrading performance? If a single implementation can't make everyone happy, should the ifdefs around them perhaps be expanded to include something about requiring the MMU to be on? Thanks, Christopher -- Employee of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the Linux Foundation. ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: segfault using __thread variable
Will Newton writes: > On 16 December 2013 03:36, Michael Hudson-Doyle > wrote: >> Michael Hudson-Doyle writes: >> >>> Aaah, you might be onto something there. I built myself a cross gcc-4.8 >>> today and it appeared to compile things correctly (I didn't actually get >>> to run it, but the objdump poking looked right) and I got a bit worried >>> that this was all down to some cosmic ray / corruption when I first >>> compiled it. But, the scripts I cargo culted just use compile binutils >>> from git tip, so if the bug is in binutils... >> >> So I still don't know what's going on, exactly, but I have a debug build >> of binutils now and some clues. It still only happens on real hardware, >> not cross compiling on my laptop, but I think I have an idea as to why. >> This might be complete crack, but anyway. >> >> I think it's to do with the order of things within the GOT. >> >> When I cross compile, sort the relocations by address, then count up the >> number of relocations of each type, it looks like this: >> >> $ objdump -C -R build/linux2/*/mongo/base/counter_test | LC_ALL=C sort | cut >> -d' ' -f 2 | uniq -c >> 4 >> 496 R_AARCH64_GLOB_DAT >> 1 R_AARCH64_TLS_TPREL64 >> 103 R_AARCH64_GLOB_DAT >> 305 R_AARCH64_JUMP_SLOT >> 12 R_AARCH64_COPY >> 1 RELOCATION >> 2 >> >> In this case, the code and the relocation agree on where the thread >> local variable is. >> >> When I compile natively, it looks like this: >> >> (t-mwhudson)ubuntu@arm64:~/src/mongo$ objdump -C -R >> build/linux2/*/mongo/base/counter_test | LC_ALL=C sort | cut -d' ' -f 2 | >> uniq -c >> 4 >> 295 R_AARCH64_JUMP_SLOT >> 496 R_AARCH64_GLOB_DAT >> 1 R_AARCH64_TLS_TPREL64 >> 104 R_AARCH64_GLOB_DAT >> 12 R_AARCH64_COPY >> 1 RELOCATION >> 2 >> >> And the code and the relocation disagree on where the thread local >> variable is -- by 298 * sizeof(void*). Which is almost (but I admit, >> not exactly) the number of JUMP_SLOTs that are, in this case, before the >> TLS variable in the GOT. When I compiled in a different way, there were >> only 160 JUMP_SLOTs before the TLS reloc, and the code and relocation >> disagreed by 163 slots. >> >> So is it possible somehow that the GOT has these JUMP_SLOTs inserted >> into it after the relocation for the TLS has been written out? I don't >> really see how but maybe this rings a bell... > > Indeed it does. ;-) > > A similar issue was caused by commit > 692e2b8bcdd8325ebfbe1daace87100d53d15ad6 (which adds ifunc support to > the aarch64 ld backend) but was intended to be fixed by the rework of > the same code in 1419bbe5712af8a43f1d63feb32687649563426d. However I > was never actually able to reproduce the failure case (I saw binaries > that were broken so I know it could happen) so the fix was somewhat > speculative. Hence I am very interested in finding a reproducible case > where this GOT entry misordering happens! I'm possibly doing something wrong, but I've tried to try compiling the suspect binary with both binutils git tip and the commit before 692e2b8bc but both had the problem. So I guess it's something else, or I wasn't testing what I thought I was testing. Cheers, mwh ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: segfault using __thread variable
Michael Hudson-Doyle writes: > Will Newton writes: > >> On 16 December 2013 03:36, Michael Hudson-Doyle >> wrote: >>> Michael Hudson-Doyle writes: >>> Aaah, you might be onto something there. I built myself a cross gcc-4.8 today and it appeared to compile things correctly (I didn't actually get to run it, but the objdump poking looked right) and I got a bit worried that this was all down to some cosmic ray / corruption when I first compiled it. But, the scripts I cargo culted just use compile binutils from git tip, so if the bug is in binutils... >>> >>> So I still don't know what's going on, exactly, but I have a debug build >>> of binutils now and some clues. It still only happens on real hardware, >>> not cross compiling on my laptop, but I think I have an idea as to why. >>> This might be complete crack, but anyway. >>> >>> I think it's to do with the order of things within the GOT. >>> >>> When I cross compile, sort the relocations by address, then count up the >>> number of relocations of each type, it looks like this: >>> >>> $ objdump -C -R build/linux2/*/mongo/base/counter_test | LC_ALL=C sort | >>> cut -d' ' -f 2 | uniq -c >>> 4 >>> 496 R_AARCH64_GLOB_DAT >>> 1 R_AARCH64_TLS_TPREL64 >>> 103 R_AARCH64_GLOB_DAT >>> 305 R_AARCH64_JUMP_SLOT >>> 12 R_AARCH64_COPY >>> 1 RELOCATION >>> 2 >>> >>> In this case, the code and the relocation agree on where the thread >>> local variable is. >>> >>> When I compile natively, it looks like this: >>> >>> (t-mwhudson)ubuntu@arm64:~/src/mongo$ objdump -C -R >>> build/linux2/*/mongo/base/counter_test | LC_ALL=C sort | cut -d' ' -f 2 | >>> uniq -c >>> 4 >>> 295 R_AARCH64_JUMP_SLOT >>> 496 R_AARCH64_GLOB_DAT >>> 1 R_AARCH64_TLS_TPREL64 >>> 104 R_AARCH64_GLOB_DAT >>> 12 R_AARCH64_COPY >>> 1 RELOCATION >>> 2 >>> >>> And the code and the relocation disagree on where the thread local >>> variable is -- by 298 * sizeof(void*). Which is almost (but I admit, >>> not exactly) the number of JUMP_SLOTs that are, in this case, before the >>> TLS variable in the GOT. When I compiled in a different way, there were >>> only 160 JUMP_SLOTs before the TLS reloc, and the code and relocation >>> disagreed by 163 slots. >>> >>> So is it possible somehow that the GOT has these JUMP_SLOTs inserted >>> into it after the relocation for the TLS has been written out? I don't >>> really see how but maybe this rings a bell... >> >> Indeed it does. ;-) >> >> A similar issue was caused by commit >> 692e2b8bcdd8325ebfbe1daace87100d53d15ad6 (which adds ifunc support to >> the aarch64 ld backend) but was intended to be fixed by the rework of >> the same code in 1419bbe5712af8a43f1d63feb32687649563426d. However I >> was never actually able to reproduce the failure case (I saw binaries >> that were broken so I know it could happen) so the fix was somewhat >> speculative. Hence I am very interested in finding a reproducible case >> where this GOT entry misordering happens! > > I'm possibly doing something wrong, but I've tried to try compiling the > suspect binary with both binutils git tip and the commit before > 692e2b8bc but both had the problem. So I guess it's something else, or > I wasn't testing what I thought I was testing. Argh, I wasn't testing what I thought I was testing... trying again. Cheers, mwh ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: segfault using __thread variable
Michael Hudson-Doyle writes: > Michael Hudson-Doyle writes: > >> Will Newton writes: >> >>> On 16 December 2013 03:36, Michael Hudson-Doyle >>> wrote: Michael Hudson-Doyle writes: > Aaah, you might be onto something there. I built myself a cross gcc-4.8 > today and it appeared to compile things correctly (I didn't actually get > to run it, but the objdump poking looked right) and I got a bit worried > that this was all down to some cosmic ray / corruption when I first > compiled it. But, the scripts I cargo culted just use compile binutils > from git tip, so if the bug is in binutils... So I still don't know what's going on, exactly, but I have a debug build of binutils now and some clues. It still only happens on real hardware, not cross compiling on my laptop, but I think I have an idea as to why. This might be complete crack, but anyway. I think it's to do with the order of things within the GOT. When I cross compile, sort the relocations by address, then count up the number of relocations of each type, it looks like this: $ objdump -C -R build/linux2/*/mongo/base/counter_test | LC_ALL=C sort | cut -d' ' -f 2 | uniq -c 4 496 R_AARCH64_GLOB_DAT 1 R_AARCH64_TLS_TPREL64 103 R_AARCH64_GLOB_DAT 305 R_AARCH64_JUMP_SLOT 12 R_AARCH64_COPY 1 RELOCATION 2 In this case, the code and the relocation agree on where the thread local variable is. When I compile natively, it looks like this: (t-mwhudson)ubuntu@arm64:~/src/mongo$ objdump -C -R build/linux2/*/mongo/base/counter_test | LC_ALL=C sort | cut -d' ' -f 2 | uniq -c 4 295 R_AARCH64_JUMP_SLOT 496 R_AARCH64_GLOB_DAT 1 R_AARCH64_TLS_TPREL64 104 R_AARCH64_GLOB_DAT 12 R_AARCH64_COPY 1 RELOCATION 2 And the code and the relocation disagree on where the thread local variable is -- by 298 * sizeof(void*). Which is almost (but I admit, not exactly) the number of JUMP_SLOTs that are, in this case, before the TLS variable in the GOT. When I compiled in a different way, there were only 160 JUMP_SLOTs before the TLS reloc, and the code and relocation disagreed by 163 slots. So is it possible somehow that the GOT has these JUMP_SLOTs inserted into it after the relocation for the TLS has been written out? I don't really see how but maybe this rings a bell... >>> >>> Indeed it does. ;-) >>> >>> A similar issue was caused by commit >>> 692e2b8bcdd8325ebfbe1daace87100d53d15ad6 (which adds ifunc support to >>> the aarch64 ld backend) but was intended to be fixed by the rework of >>> the same code in 1419bbe5712af8a43f1d63feb32687649563426d. However I >>> was never actually able to reproduce the failure case (I saw binaries >>> that were broken so I know it could happen) so the fix was somewhat >>> speculative. Hence I am very interested in finding a reproducible case >>> where this GOT entry misordering happens! >> >> I'm possibly doing something wrong, but I've tried to try compiling the >> suspect binary with both binutils git tip and the commit before >> 692e2b8bc but both had the problem. So I guess it's something else, or >> I wasn't testing what I thought I was testing. > > Argh, I wasn't testing what I thought I was testing... trying again. Ah... found it! This is the code that determines the offset to patch into the code (elfnn-aarch64.c line 3845): value = (symbol_got_offset (input_bfd, h, r_symndx) + globals->root.sgot->output_section->vma + globals->root.sgot->output_section->output_offset); and this is the code that determines the offset as written into the relocation (elfnn-aarch64.c line 4248): off = symbol_got_offset (input_bfd, h, r_symndx); ... rela.r_offset = globals->root.sgot->output_section->vma + globals->root.sgot->output_offset + off; Can you see the difference? The former is "root.sgot->output_section->output_offset", the latter is "root.sgot->output_offset". This suggests the rather obvious attached patch. I haven't tested this exact patch, but its an obvious translation from a patch to 692e2b8bcdd8325ebfbe1daace87100d53d15ad6^ which does work. I also haven't tested the second hunk at all, but it seems plausible... Cheers, mwh diff --git a/bfd/elfnn-aarch64.c b/bfd/elfnn-aarch64.c index 6a42bc5..f44b97b 100644 --- a/bfd/elfnn-aarch64.c +++ b/bfd/elfnn-aarch64.c @@ -3844,7 +3844,7 @@ elfNN_aarch64_final_link_relocate (reloc_howto_type *howto, value = (symbol_got_offset (input_bfd, h, r_symndx) + globals->root.sgot->output_section->vma - + globals->root.sgot->output_section->output_offset); + + globals->root.sgot->output_offset); value =