[ACTIVITY] Week 50

2013-12-16 Thread Yvan Roux
==  Issues ==
* 1.5 day of due to car issue. (3/10)
* Calxedas are down after lab maintenance.

== Progress ==
* LRA on AArch32:
  o TCWG-343 : Make LRA the default for the ARM backend (5/10)
- Turn LRA on by default committed as rev205887
   http://gcc.gnu.org/ml/gcc-patches/2013-12/msg01088.html
- New Thumb regressions reported (Cortex-m0 and bootstrap),
  analysis ongoing.
- Analysed last week regressions and reported them upstream,
  Vladimir fixed them at rev205974.
- iWMMXT issue : work ongoing.
  o TCWG-345 : Analyse performance of LRA for ARM. (0/10)
- No progress this week.

* Reviewed some merge requests. (1/10)

* Various meetings. (1/10)

== Next ==
* Continue LRA, merge and patch reviews.

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain


Re: segfault using __thread variable

2013-12-16 Thread Will Newton
On 16 December 2013 03:36, Michael Hudson-Doyle
 wrote:
> Michael Hudson-Doyle  writes:
>
>> Aaah, you might be onto something there.  I built myself a cross gcc-4.8
>> today and it appeared to compile things correctly (I didn't actually get
>> to run it, but the objdump poking looked right) and I got a bit worried
>> that this was all down to some cosmic ray / corruption when I first
>> compiled it.  But, the scripts I cargo culted just use compile binutils
>> from git tip, so if the bug is in binutils...
>
> So I still don't know what's going on, exactly, but I have a debug build
> of binutils now and some clues.  It still only happens on real hardware,
> not cross compiling on my laptop, but I think I have an idea as to why.
> This might be complete crack, but anyway.
>
> I think it's to do with the order of things within the GOT.
>
> When I cross compile, sort the relocations by address, then count up the
> number of relocations of each type, it looks like this:
>
> $ objdump -C -R build/linux2/*/mongo/base/counter_test | LC_ALL=C sort | cut 
> -d' ' -f 2 | uniq -c
>   4
> 496 R_AARCH64_GLOB_DAT
>   1 R_AARCH64_TLS_TPREL64
> 103 R_AARCH64_GLOB_DAT
> 305 R_AARCH64_JUMP_SLOT
>  12 R_AARCH64_COPY
>   1 RELOCATION
>   2
>
> In this case, the code and the relocation agree on where the thread
> local variable is.
>
> When I compile natively, it looks like this:
>
> (t-mwhudson)ubuntu@arm64:~/src/mongo$ objdump -C -R 
> build/linux2/*/mongo/base/counter_test | LC_ALL=C sort | cut -d' ' -f 2 | 
> uniq -c
>   4
> 295 R_AARCH64_JUMP_SLOT
> 496 R_AARCH64_GLOB_DAT
>   1 R_AARCH64_TLS_TPREL64
> 104 R_AARCH64_GLOB_DAT
>  12 R_AARCH64_COPY
>   1 RELOCATION
>   2
>
> And the code and the relocation disagree on where the thread local
> variable is -- by 298 * sizeof(void*).  Which is almost (but I admit,
> not exactly) the number of JUMP_SLOTs that are, in this case, before the
> TLS variable in the GOT.  When I compiled in a different way, there were
> only 160 JUMP_SLOTs before the TLS reloc, and the code and relocation
> disagreed by 163 slots.
>
> So is it possible somehow that the GOT has these JUMP_SLOTs inserted
> into it after the relocation for the TLS has been written out?  I don't
> really see how but maybe this rings a bell...

Indeed it does. ;-)

A similar issue was caused by commit
692e2b8bcdd8325ebfbe1daace87100d53d15ad6 (which adds ifunc support to
the aarch64 ld backend) but was intended to be fixed by the rework of
the same code in 1419bbe5712af8a43f1d63feb32687649563426d. However I
was never actually able to reproduce the failure case (I saw binaries
that were broken so I know it could happen) so the fix was somewhat
speculative. Hence I am very interested in finding a reproducible case
where this GOT entry misordering happens!

-- 
Will Newton
Toolchain Working Group, Linaro

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain


[ACTIVITY] 9-13 December 2013

2013-12-16 Thread Christophe Lyon
== Progress ==
- 2013.12 releases (4/10):
  * stalled due to lab unavailability.
  * A couple of backports are waiting for approval, another one is
being debugged.

- cross-validation (4/10): fixed arneb+qemu validations.

- misc (2/10): misc conf-calls and meetings

== Next ==
- Make 2013.12 releases
- cbuild2: continue testing, try to make 4.7 source release
- libsanitizer on AArch64: resume work


== Future ==
Next 2 weeks off (Dec 23rd-Jan 3rd)

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain


[ACTIVITY] Week 50

2013-12-16 Thread Charles Baylis
== Progress ==

TCWG-293 (9/10)
- wrote and tested 64bit division code
  - it seems to work
  - still need to do performance testing

TCWG-347 Fix PR59142 (1/10)
- split into series of 3 patches
- patch almost ready, was held up by non-availability of the lab
- need to bootstrap on Thumb-1 to prove change made in response to
review comments

TCWG-346 AArch64 Benchmarking: CoreMark & Dhrystone
- no significant progress, no access to the lab

== Next ==

Pick up aarch64 benchmarking when the board becomes accessible again
Submit PR59142

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain


Re: MMU Off / Strict Alignment

2013-12-16 Thread Christopher Covington
Hi,

On 11/20/2013 03:45 PM, Matthew Gretton-Dann wrote:
> On 20 November 2013 17:57, Christopher Covington  wrote:
>> Hi,
>>
>> We've noticed an issue trying to use the Linaro AArch64 binary bare metal
>> toolchain release with the MMU turned off for some low-level tests.
>>
>> Anytime puts, sprintf, etc. gets called, a reent structure gets created with
>> references to STDIN, STDOUT, STDERR FILE types. A member in the __sFile
>> struct, _mbstate, is an 8 byte struct, but is not aligned on an 8 byte
>> boundary. This means that when memset (or a similar function) gets called on
>> this struct, and doesn't operate one byte at a time, a data alignment fault
>> will be generated when operating out of device memory, such as on a system
>> where the MMU has not yet been turned on yet.

We believe to have narrowed down the issue to the AArch64 optimized
memcpy/memset implementations that assume unaligned accesses will not fault.
While the current AArch64 libgloss startup code turns the MMU on so such
accesses will succeed, I don't think turning on the MMU should be required of
all startup code. Would it be possible to modify these routines to make only
size-aligned accesses without degrading performance? If a single
implementation can't make everyone happy, should the ifdefs around them
perhaps be expanded to include something about requiring the MMU to be on?

Thanks,
Christopher

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by the Linux Foundation.

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain


Re: segfault using __thread variable

2013-12-16 Thread Michael Hudson-Doyle
Will Newton  writes:

> On 16 December 2013 03:36, Michael Hudson-Doyle
>  wrote:
>> Michael Hudson-Doyle  writes:
>>
>>> Aaah, you might be onto something there.  I built myself a cross gcc-4.8
>>> today and it appeared to compile things correctly (I didn't actually get
>>> to run it, but the objdump poking looked right) and I got a bit worried
>>> that this was all down to some cosmic ray / corruption when I first
>>> compiled it.  But, the scripts I cargo culted just use compile binutils
>>> from git tip, so if the bug is in binutils...
>>
>> So I still don't know what's going on, exactly, but I have a debug build
>> of binutils now and some clues.  It still only happens on real hardware,
>> not cross compiling on my laptop, but I think I have an idea as to why.
>> This might be complete crack, but anyway.
>>
>> I think it's to do with the order of things within the GOT.
>>
>> When I cross compile, sort the relocations by address, then count up the
>> number of relocations of each type, it looks like this:
>>
>> $ objdump -C -R build/linux2/*/mongo/base/counter_test | LC_ALL=C sort | cut 
>> -d' ' -f 2 | uniq -c
>>   4
>> 496 R_AARCH64_GLOB_DAT
>>   1 R_AARCH64_TLS_TPREL64
>> 103 R_AARCH64_GLOB_DAT
>> 305 R_AARCH64_JUMP_SLOT
>>  12 R_AARCH64_COPY
>>   1 RELOCATION
>>   2
>>
>> In this case, the code and the relocation agree on where the thread
>> local variable is.
>>
>> When I compile natively, it looks like this:
>>
>> (t-mwhudson)ubuntu@arm64:~/src/mongo$ objdump -C -R 
>> build/linux2/*/mongo/base/counter_test | LC_ALL=C sort | cut -d' ' -f 2 | 
>> uniq -c
>>   4
>> 295 R_AARCH64_JUMP_SLOT
>> 496 R_AARCH64_GLOB_DAT
>>   1 R_AARCH64_TLS_TPREL64
>> 104 R_AARCH64_GLOB_DAT
>>  12 R_AARCH64_COPY
>>   1 RELOCATION
>>   2
>>
>> And the code and the relocation disagree on where the thread local
>> variable is -- by 298 * sizeof(void*).  Which is almost (but I admit,
>> not exactly) the number of JUMP_SLOTs that are, in this case, before the
>> TLS variable in the GOT.  When I compiled in a different way, there were
>> only 160 JUMP_SLOTs before the TLS reloc, and the code and relocation
>> disagreed by 163 slots.
>>
>> So is it possible somehow that the GOT has these JUMP_SLOTs inserted
>> into it after the relocation for the TLS has been written out?  I don't
>> really see how but maybe this rings a bell...
>
> Indeed it does. ;-)
>
> A similar issue was caused by commit
> 692e2b8bcdd8325ebfbe1daace87100d53d15ad6 (which adds ifunc support to
> the aarch64 ld backend) but was intended to be fixed by the rework of
> the same code in 1419bbe5712af8a43f1d63feb32687649563426d. However I
> was never actually able to reproduce the failure case (I saw binaries
> that were broken so I know it could happen) so the fix was somewhat
> speculative. Hence I am very interested in finding a reproducible case
> where this GOT entry misordering happens!

I'm possibly doing something wrong, but I've tried to try compiling the
suspect binary with both binutils git tip and the commit before
692e2b8bc but both had the problem.  So I guess it's something else, or
I wasn't testing what I thought I was testing.

Cheers,
mwh

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain


Re: segfault using __thread variable

2013-12-16 Thread Michael Hudson-Doyle
Michael Hudson-Doyle  writes:

> Will Newton  writes:
>
>> On 16 December 2013 03:36, Michael Hudson-Doyle
>>  wrote:
>>> Michael Hudson-Doyle  writes:
>>>
 Aaah, you might be onto something there.  I built myself a cross gcc-4.8
 today and it appeared to compile things correctly (I didn't actually get
 to run it, but the objdump poking looked right) and I got a bit worried
 that this was all down to some cosmic ray / corruption when I first
 compiled it.  But, the scripts I cargo culted just use compile binutils
 from git tip, so if the bug is in binutils...
>>>
>>> So I still don't know what's going on, exactly, but I have a debug build
>>> of binutils now and some clues.  It still only happens on real hardware,
>>> not cross compiling on my laptop, but I think I have an idea as to why.
>>> This might be complete crack, but anyway.
>>>
>>> I think it's to do with the order of things within the GOT.
>>>
>>> When I cross compile, sort the relocations by address, then count up the
>>> number of relocations of each type, it looks like this:
>>>
>>> $ objdump -C -R build/linux2/*/mongo/base/counter_test | LC_ALL=C sort | 
>>> cut -d' ' -f 2 | uniq -c
>>>   4
>>> 496 R_AARCH64_GLOB_DAT
>>>   1 R_AARCH64_TLS_TPREL64
>>> 103 R_AARCH64_GLOB_DAT
>>> 305 R_AARCH64_JUMP_SLOT
>>>  12 R_AARCH64_COPY
>>>   1 RELOCATION
>>>   2
>>>
>>> In this case, the code and the relocation agree on where the thread
>>> local variable is.
>>>
>>> When I compile natively, it looks like this:
>>>
>>> (t-mwhudson)ubuntu@arm64:~/src/mongo$ objdump -C -R 
>>> build/linux2/*/mongo/base/counter_test | LC_ALL=C sort | cut -d' ' -f 2 | 
>>> uniq -c
>>>   4
>>> 295 R_AARCH64_JUMP_SLOT
>>> 496 R_AARCH64_GLOB_DAT
>>>   1 R_AARCH64_TLS_TPREL64
>>> 104 R_AARCH64_GLOB_DAT
>>>  12 R_AARCH64_COPY
>>>   1 RELOCATION
>>>   2
>>>
>>> And the code and the relocation disagree on where the thread local
>>> variable is -- by 298 * sizeof(void*).  Which is almost (but I admit,
>>> not exactly) the number of JUMP_SLOTs that are, in this case, before the
>>> TLS variable in the GOT.  When I compiled in a different way, there were
>>> only 160 JUMP_SLOTs before the TLS reloc, and the code and relocation
>>> disagreed by 163 slots.
>>>
>>> So is it possible somehow that the GOT has these JUMP_SLOTs inserted
>>> into it after the relocation for the TLS has been written out?  I don't
>>> really see how but maybe this rings a bell...
>>
>> Indeed it does. ;-)
>>
>> A similar issue was caused by commit
>> 692e2b8bcdd8325ebfbe1daace87100d53d15ad6 (which adds ifunc support to
>> the aarch64 ld backend) but was intended to be fixed by the rework of
>> the same code in 1419bbe5712af8a43f1d63feb32687649563426d. However I
>> was never actually able to reproduce the failure case (I saw binaries
>> that were broken so I know it could happen) so the fix was somewhat
>> speculative. Hence I am very interested in finding a reproducible case
>> where this GOT entry misordering happens!
>
> I'm possibly doing something wrong, but I've tried to try compiling the
> suspect binary with both binutils git tip and the commit before
> 692e2b8bc but both had the problem.  So I guess it's something else, or
> I wasn't testing what I thought I was testing.

Argh, I wasn't testing what I thought I was testing... trying again.

Cheers,
mwh

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain


Re: segfault using __thread variable

2013-12-16 Thread Michael Hudson-Doyle
Michael Hudson-Doyle  writes:

> Michael Hudson-Doyle  writes:
>
>> Will Newton  writes:
>>
>>> On 16 December 2013 03:36, Michael Hudson-Doyle
>>>  wrote:
 Michael Hudson-Doyle  writes:

> Aaah, you might be onto something there.  I built myself a cross gcc-4.8
> today and it appeared to compile things correctly (I didn't actually get
> to run it, but the objdump poking looked right) and I got a bit worried
> that this was all down to some cosmic ray / corruption when I first
> compiled it.  But, the scripts I cargo culted just use compile binutils
> from git tip, so if the bug is in binutils...

 So I still don't know what's going on, exactly, but I have a debug build
 of binutils now and some clues.  It still only happens on real hardware,
 not cross compiling on my laptop, but I think I have an idea as to why.
 This might be complete crack, but anyway.

 I think it's to do with the order of things within the GOT.

 When I cross compile, sort the relocations by address, then count up the
 number of relocations of each type, it looks like this:

 $ objdump -C -R build/linux2/*/mongo/base/counter_test | LC_ALL=C sort | 
 cut -d' ' -f 2 | uniq -c
   4
 496 R_AARCH64_GLOB_DAT
   1 R_AARCH64_TLS_TPREL64
 103 R_AARCH64_GLOB_DAT
 305 R_AARCH64_JUMP_SLOT
  12 R_AARCH64_COPY
   1 RELOCATION
   2

 In this case, the code and the relocation agree on where the thread
 local variable is.

 When I compile natively, it looks like this:

 (t-mwhudson)ubuntu@arm64:~/src/mongo$ objdump -C -R 
 build/linux2/*/mongo/base/counter_test | LC_ALL=C sort | cut -d' ' -f 2 | 
 uniq -c
   4
 295 R_AARCH64_JUMP_SLOT
 496 R_AARCH64_GLOB_DAT
   1 R_AARCH64_TLS_TPREL64
 104 R_AARCH64_GLOB_DAT
  12 R_AARCH64_COPY
   1 RELOCATION
   2

 And the code and the relocation disagree on where the thread local
 variable is -- by 298 * sizeof(void*).  Which is almost (but I admit,
 not exactly) the number of JUMP_SLOTs that are, in this case, before the
 TLS variable in the GOT.  When I compiled in a different way, there were
 only 160 JUMP_SLOTs before the TLS reloc, and the code and relocation
 disagreed by 163 slots.

 So is it possible somehow that the GOT has these JUMP_SLOTs inserted
 into it after the relocation for the TLS has been written out?  I don't
 really see how but maybe this rings a bell...
>>>
>>> Indeed it does. ;-)
>>>
>>> A similar issue was caused by commit
>>> 692e2b8bcdd8325ebfbe1daace87100d53d15ad6 (which adds ifunc support to
>>> the aarch64 ld backend) but was intended to be fixed by the rework of
>>> the same code in 1419bbe5712af8a43f1d63feb32687649563426d. However I
>>> was never actually able to reproduce the failure case (I saw binaries
>>> that were broken so I know it could happen) so the fix was somewhat
>>> speculative. Hence I am very interested in finding a reproducible case
>>> where this GOT entry misordering happens!
>>
>> I'm possibly doing something wrong, but I've tried to try compiling the
>> suspect binary with both binutils git tip and the commit before
>> 692e2b8bc but both had the problem.  So I guess it's something else, or
>> I wasn't testing what I thought I was testing.
>
> Argh, I wasn't testing what I thought I was testing... trying again.

Ah... found it!  This is the code that determines the offset to patch
into the code (elfnn-aarch64.c line 3845):

  value = (symbol_got_offset (input_bfd, h, r_symndx)
   + globals->root.sgot->output_section->vma
   + globals->root.sgot->output_section->output_offset);

and this is the code that determines the offset as written into the
relocation (elfnn-aarch64.c line 4248):

  off = symbol_got_offset (input_bfd, h, r_symndx);
  ...
  rela.r_offset = globals->root.sgot->output_section->vma +
globals->root.sgot->output_offset + off;

Can you see the difference?  The former is
"root.sgot->output_section->output_offset", the latter is
"root.sgot->output_offset".

This suggests the rather obvious attached patch.  I haven't tested this
exact patch, but its an obvious translation from a patch to
692e2b8bcdd8325ebfbe1daace87100d53d15ad6^ which does work.  I also
haven't tested the second hunk at all, but it seems plausible...

Cheers,
mwh

diff --git a/bfd/elfnn-aarch64.c b/bfd/elfnn-aarch64.c
index 6a42bc5..f44b97b 100644
--- a/bfd/elfnn-aarch64.c
+++ b/bfd/elfnn-aarch64.c
@@ -3844,7 +3844,7 @@ elfNN_aarch64_final_link_relocate (reloc_howto_type *howto,
 
   value = (symbol_got_offset (input_bfd, h, r_symndx)
 	   + globals->root.sgot->output_section->vma
-	   + globals->root.sgot->output_section->output_offset);
+	   + globals->root.sgot->output_offset);
 
   value =