Re: RFC: -mimplicit-it and GCC upstream

2010-11-22 Thread Dave Martin
On Wed, Nov 17, 2010 at 11:22 AM, Dave Martin  wrote:
> On Wed, Nov 17, 2010 at 2:53 AM, Michael Hope  wrote:
>> In general the product should move forward and drop work-arounds like
>> -mimplicit-it.  We (the greater ARM community) should fix these
>> package problems as they are found.  Here's a bunch of quick-fire

[...]

Having discussed this further with Richard, it sounds like there are
enough issues blocking -mimplicit-it upstream that we should not
expect it to be supported by default upstream in the foreseeable
future:

  * -mimplicit-it disables some important sanity-checking on the
compiler output (by not checking compiler-generated ITs, or the
absence thereof).  We could in principle make the assembler only
inject ITs in between #APP and #NO_APP, but the assembler doesn't
support this yet; nor does any arm gcc I've seen systematically
generate these directives for inline asms; so implementing this would
probably result in a flag day when everyone has to move to up-to-date
gcc and gas.  Upstreams are unlikely to go for that.

  * with -mimplicit-it, the compiler must be pretty conservative about
inline asm block size (assuming 6 byte per statement) - that's
feasible, but very suboptimal and is likely to result in the need for
yet another compiler option to turn it on; again, this is unlikely to
become the default upstream.

  * add-hoc workarounds can be used, such as wrapping GCC to compile
in multiple passes so that the correct inline asm size for each block
can be determined.  But such approaches are likely to be too
cumbersome to get merged in any project.


So I've now come round to the view that we _should_ probably bite the
bullet and fix the inline asm directly.  So:

   * We need to verify which binutils permit (and ignore) the IT
instructions in non-unified (ARM) syntax.  I've observed that 2.19.1
definitely supports this; I don't know about earlier versions -- this
is probably something the toolchain group should investigate.
   * We should be proactive about making these changes upstream.
Writing some standard wording to explain the reason for the change and
the likely impact would probably be a good idea.

Cheers
---Dave

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain


[ACTIVITY] 15th - 19th November

2010-11-22 Thread Andrew Stubbs

LP:663939 - Thumb2 constants
  * Continued testing, found a few bugs. Tidied a few bits up.
  * Wrote some new testcases to go with the patch.

LP:618684 - ICE
  * Begun looking at this one. So far I can't reproduce it. I have a 
debuggable native toolchain building, but it'd been delayed by hardware 
issues.


In the course of testing I discovered that the ARM FSF config wasn't 
testing the right thing, so begun work on a new, more appropriate FSF 
build/test config for Linaro work.


Also found the the SD card rootfs in my IGEPv2 board was corrupted. I've 
restored it from backup, and now it's working once more.


___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain


Of instruction timings

2010-11-22 Thread David Gilbert
Hi Richard,
  As per the discussion at this mornings call; I've reread the TRM and I
agree with you about the LSLS being the same speed as the TST. (1 cycle)

However as we agreed, the uxtb does look like 2 cycles v the AND 1 cycle.

On the space v perf theme, one thing that would be interesting to know is
whether there are any icache/issue stage limitations;
i.e. if I have a stream of 32-bit Thumb-2 instructions that are all listed
as 1 cycle and are all in i-cache, can they be fetched
and issued fast enough, or is there a performance advantage to short
instructions?

Dave
___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain


__sync barriers

2010-11-22 Thread Richard Sandiford
For the record, the thing I half-remembered on the call was:

http://gcc.gnu.org/ml/gcc-patches/2009-08/msg00697.html
and:
http://gcc.gnu.org/ml/gcc-patches/2009-09/msg02112.html

The problem is that all __sync operations besides __sync_lock_test_and_set
and __sync_lock_release are defined to be full barriers.  Using something
like __sync_val_compare_and_swap for __arch_compare_and_exchange_val_*_acq
and __arch_compare_and_exchange_val_*_rel may on some architectures be too
heavyweight, since those macros only need acquire/after and release/before
barriers.  See in particular:

http://gcc.gnu.org/ml/gcc-patches/2009-08/msg00928.html

from the first thread, where the feeling was that the future wasn't
these __sync builtins, but the new C and C++ atomic memory support.

Probably already known, sorry.  I just wasn't sure that trying to
convert everyone (not just ARM) to __sync_* was necessarily going
to go down well.

Richard

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain


Re: [PATCH, WIP] NEON quadword vectors in big-endian mode (#10061, #7306)

2010-11-22 Thread Ira Rosen
On 17 November 2010 13:21, Julian Brown  wrote:
>> > We'd need to figure out what the RTL for such loads/stores should
>> > look like, and whether it can represent alignment constraints, or
>> > strides, or loads/stores of multiple vector registers simulateously.

Alignment info is kept in struct ptr_info_def.
Is it necessary to represent stride?
Multiple loads/stores seem the most complicated part to me. In neon.md
vld is implemented with output_asm_insn. Is it going to change? Does
this assure consecutive (or stride two) registers?

>> > Getting it right might be a bit awkward, especially if we want to
>> > consider a scope wider than just NEON, i.e. other vector
>> > architectures also.
>>
>> I think we need to somehow enhance MEM_REF, or maybe generate a
>> MEM_REF for the first vector and a builtin after it.
>
> Yeah, keeping these things looking like memory references to most of
> the compiler seems like a good plan.

Is it possible to have a list of MEM_REFs and a builtin after them:

v0 = MEM_REF (addr)
v1 = MEM_REF (addr + 8B)
v2 = MEM_REF (addr + 16B)
builtin (v0, v1, v2, stride=3, reg_stride=1,...)

to be expanded into:

 (addr)
NOTE (...)

and then combined into vld3?

Ira

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain


Re: RFC: -mimplicit-it and GCC upstream

2010-11-22 Thread Nicolas Pitre
On Mon, 22 Nov 2010, Dave Martin wrote:

> So I've now come round to the view that we _should_ probably bite the
> bullet and fix the inline asm directly.  So:
> 
>* We need to verify which binutils permit (and ignore) the IT
> instructions in non-unified (ARM) syntax.  I've observed that 2.19.1
> definitely supports this; I don't know about earlier versions -- this
> is probably something the toolchain group should investigate.
>* We should be proactive about making these changes upstream.
> Writing some standard wording to explain the reason for the change and
> the likely impact would probably be a good idea.

I hope there is at least a validation of the IT instructions by the 
assembler with regards to the condition codes on the following 
instructions (and vice versa) to make sure they are all coherent, and 
even so for ARM mode compilation.


Nicolas

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain


[ACTIVITY] November 15-21st

2010-11-22 Thread Julian Brown
== Linaro GCC ==

 * Continued looking at big-endian/quad-vector patch: attempted to
figure out the proper semantics for vec_extract in big endian mode
(about 1 day). Put on hold temporarily to work on lp675347, QT failing
to build due to constraint failure in inline asm statements used for
atomic operations: found the patch which introduced the failure, and
suggested a workaround to the OP. Came up with a plausible-looking
patch, and started testing it, after spending some time trying to
figure out why ARM Linux mainline doesn't build at present. Patch sent
upstream.

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain


Status Report 11-22-2010

2010-11-22 Thread Zach Welch
== Last Week ==

* Reached the point with understanding libunwind where I can begin
writing patches for parsing unwind information out of .ARM.exidx and
.ARM.extab ELF sections.

== This Week ==

* Begin writing support for ARM-specific unwind information to libunwind.

-- 
Zach Welch
CodeSourcery
zwe...@codesourcery.com
(650) 331-3385 x743

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain


Re: RFC: -mimplicit-it and GCC upstream

2010-11-22 Thread Dave Martin
Hi,

On Mon, Nov 22, 2010 at 2:39 PM, Nicolas Pitre  wrote:

[...]

> I hope there is at least a validation of the IT instructions by the
> assembler with regards to the condition codes on the following
> instructions (and vice versa) to make sure they are all coherent, and
> even so for ARM mode compilation.

In unified syntax, yes; in traditional syntax, no.

When driving gas via GCC, this means that the checking is done only
when building for Thumb-2 (since traditional syntax is always used for
ARM code generated by GCC, and this probably isn't going to change for
now).  In traditional syntax, at least some binutils versions totally
ignore IT instructions.

However, for code generated by GCC itself (i.e., not inline asm), GCC
is supposed to generate the correct IT instructions when generating
Thumb-2 code.  -mimplicit-it may therefore cause some code generation
errors to go unnoticed, particularly where the compiler misses out an
IT instruction which it was supposed to insert, but the assembler
silently inserts the missing instruction.

Cheers
---Dave

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain


Re: __sync barriers

2010-11-22 Thread Michael Hope
On Tue, Nov 23, 2010 at 12:34 AM, Richard Sandiford
 wrote:
> For the record, the thing I half-remembered on the call was:
>
>    http://gcc.gnu.org/ml/gcc-patches/2009-08/msg00697.html
> and:
>    http://gcc.gnu.org/ml/gcc-patches/2009-09/msg02112.html
>
> The problem is that all __sync operations besides __sync_lock_test_and_set
> and __sync_lock_release are defined to be full barriers.  Using something
> like __sync_val_compare_and_swap for __arch_compare_and_exchange_val_*_acq
> and __arch_compare_and_exchange_val_*_rel may on some architectures be too
> heavyweight, since those macros only need acquire/after and release/before
> barriers.  See in particular:
>
>    http://gcc.gnu.org/ml/gcc-patches/2009-08/msg00928.html
>
> from the first thread, where the feeling was that the future wasn't
> these __sync builtins, but the new C and C++ atomic memory support.
>
> Probably already known, sorry.  I just wasn't sure that trying to
> convert everyone (not just ARM) to __sync_* was necessarily going
> to go down well.

Good point.  Using __sync in ARM only is fine, but please do bring the
topic up with upstream.

I'd forgotten about LLVM when we were talking yesterday.  Both GCC and
LLVM supply sync primitives and I hope RVDS will soon as well.

-- Michael

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain