Re: [gnu-arm-releases] Re: [PATCH, WIP] NEON quadword vectors in big-endian mode (#10061, #7306)

2010-12-02 Thread Ira Rosen
On 1 December 2010 17:57, Daniel Jacobowitz  wrote:
> On Wed, Dec 01, 2010 at 11:16:16AM +0200, Ira Rosen wrote:
>> The meaning of the builtin (or maybe a new tree code would be better?)
>> is that the elements of v0, v1 and v2 are deinterleaved. I wanted the
>> MEM_REFs, since we actually have three data accesses here, and
>> something (builtin or tree code) to indicate the deinterleaving. Since
>> the vectors are passed to the builtin, I don't think it's a problem if
>> the statements get separated. When the expander sees the builtin, it
>> has to remove the loads it created for the MEM_REFs and create a new
>> "vector load multiple and deinterleave". Is that possible?
>
> This is a problem I've struggled with before.  My only caution is that
> representing the MEM_REF's separately from the deinterleaving in the IR
> allows all sorts of ways (many we haven't thought of yet) for them to
> get separated, and there's no instruction to efficiently implement the
> deinterleaving from registers.  For instance, suppose a pseudo gets
> propagated into the builtin and we can't find the MEM_REFs any more.
> The resulting code could easily be worse than pre-vectorization.

I see. So one builtin for everything, like

vector_load_deinterleave (v0, v1, v2,..., stride,...)

is our only option?

Thanks,
Ira

>
> --
> Daniel Jacobowitz
> CodeSourcery
>

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain


Re: [gnu-arm-releases] Re: [PATCH, WIP] NEON quadword vectors in big-endian mode (#10061, #7306)

2010-12-02 Thread Nathan Froyd
On Wed, Dec 01, 2010 at 11:24:01AM +, Julian Brown wrote:
> > PowerPC also has load/store multiple, but I guess they are generated
> > in the same phase as for ARM. Maybe there are other architectures that
> > do that allocate contiguous register but earlier?
> 
> I don't know about other architectures which do that.

PowerPC essentially restricts load/store multiple instruction use to the
prologue/epilogue.  There are some tricks it can play with load-string
instructions, but those are restricted to hard registers (see
movmemsi_$Nreg patterns in rs6000.md).  There's also the *ldmsi$N
patterns in rs6000.md, but it's not clear to me what conditions those
get generated under (combine?).

-Nathan

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain


Re: [gnu-arm-releases] Re: [PATCH, WIP] NEON quadword vectors in big-endian mode (#10061, #7306)

2010-12-02 Thread Daniel Jacobowitz
On Wed, Dec 01, 2010 at 11:16:16AM +0200, Ira Rosen wrote:
> The meaning of the builtin (or maybe a new tree code would be better?)
> is that the elements of v0, v1 and v2 are deinterleaved. I wanted the
> MEM_REFs, since we actually have three data accesses here, and
> something (builtin or tree code) to indicate the deinterleaving. Since
> the vectors are passed to the builtin, I don't think it's a problem if
> the statements get separated. When the expander sees the builtin, it
> has to remove the loads it created for the MEM_REFs and create a new
> "vector load multiple and deinterleave". Is that possible?

This is a problem I've struggled with before.  My only caution is that
representing the MEM_REF's separately from the deinterleaving in the IR
allows all sorts of ways (many we haven't thought of yet) for them to
get separated, and there's no instruction to efficiently implement the
deinterleaving from registers.  For instance, suppose a pseudo gets
propagated into the builtin and we can't find the MEM_REFs any more.
The resulting code could easily be worse than pre-vectorization.

-- 
Daniel Jacobowitz
CodeSourcery

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain


Re: [gnu-arm-releases] Re: [PATCH, WIP] NEON quadword vectors in big-endian mode (#10061, #7306)

2010-12-02 Thread Daniel Jacobowitz
On Thu, Dec 02, 2010 at 10:54:32AM +0200, Ira Rosen wrote:
> On 1 December 2010 17:57, Daniel Jacobowitz  wrote:
> > On Wed, Dec 01, 2010 at 11:16:16AM +0200, Ira Rosen wrote:
> >> The meaning of the builtin (or maybe a new tree code would be better?)
> >> is that the elements of v0, v1 and v2 are deinterleaved. I wanted the
> >> MEM_REFs, since we actually have three data accesses here, and
> >> something (builtin or tree code) to indicate the deinterleaving. Since
> >> the vectors are passed to the builtin, I don't think it's a problem if
> >> the statements get separated. When the expander sees the builtin, it
> >> has to remove the loads it created for the MEM_REFs and create a new
> >> "vector load multiple and deinterleave". Is that possible?
> >
> > This is a problem I've struggled with before.  My only caution is that
> > representing the MEM_REF's separately from the deinterleaving in the IR
> > allows all sorts of ways (many we haven't thought of yet) for them to
> > get separated, and there's no instruction to efficiently implement the
> > deinterleaving from registers.  For instance, suppose a pseudo gets
> > propagated into the builtin and we can't find the MEM_REFs any more.
> > The resulting code could easily be worse than pre-vectorization.
> 
> I see. So one builtin for everything, like
> 
> vector_load_deinterleave (v0, v1, v2,..., stride,...)
> 
> is our only option?

It's not the only option; the way you've described might work, too.

But yes, it's my opinion that a single builtin is less likely to
generate something the compiler can't recover from.

-- 
Daniel Jacobowitz
CodeSourcery

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain


Re: [PATCH, WIP] NEON quadword vectors in big-endian mode (#10061, #7306)

2010-12-02 Thread Joseph S. Myers
On Tue, 30 Nov 2010, Julian Brown wrote:

> * defaults.h (VECTOR_ELEMENTS_BIG_ENDIAN): Define.

Apart from the point that new target macros should be hooks, the *very 
first* thing to do with any new macro or hook is to write the .texi 
documentation, which appears to be missing from this patch.  This is best 
done before making any consequent code changes.  That documentation needs 
to make clear:

* What the effects are on the semantics of GNU C code (that uses generic 
vector extensions - which in 4.6 include subscripting), if any.

* What the effects are on the semantics of GENERIC and GIMPLE using vector 
types, if any.  If the semantics of memory references to such types are 
affected, or the semantics of any other existing GENERIC or GIMPLE 
operation (most likely any operations that may exist involving lane 
numbers) then you need to make clear how, and ensure that any 
documentation / comments for that operation are updated as well; the patch 
submission needs to make clear how you have audited all existing code for 
correctness given such a change - any code mentioning a GENERIC or GIMPLE 
code may do transformations assuming particular semantics.

* Likewise, for all machine-independent RTL operations.

* For UNSPECs the documentation issue doesn't arise, but the audit is 
still relevant.

-- 
Joseph S. Myers
jos...@codesourcery.com

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain


[ACTIVITY] Nov 29 - Dec 2

2010-12-02 Thread Ira Rosen
- Continued looking into NEON special loads and stores.

- Benchmarks: concentrated on EEMBC Telecom:
   - autcor gets vectorized
   - viterbi, besides strided data accesses, needs to sink conditional
stores to allow if-conversion and make the main loop vectorizable.
Since the potential here is 4x, I think it's worthwhile to work on
this.
   - conven, fbital also have control-flow issue, but much more
complicated than viterbi
   - fft has a problem with loop count, I would like to investigate
this a bit more
   - diffmeasure doesn't seem to have vectorization potential

- Fixed GCC PR 46663 on trunk, testing the fix for 4.3, 4.4, 4.5.

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain


Using inline NEON code

2010-12-02 Thread Michael Hope
Hi there.  Currently you can't use NEON instructions in inline
assembly if the compiler is set to -mfpu=vfp such as Ubuntu's
-mfpu=vfpv3-d16.  Trying code like this:

int main()
{
   asm("veor d1, d2, d3");
   return 0;
}

gives an error message like:

test.s: Assembler messages:
test.s:29: Error: selected processor does not support Thumb mode `veor d1,d2,d3'

The problem is that -mfpu=vfpv3-d16 has two jobs:  it tells the
compiler what instructions to use, and also tells the assembler what
instructions are valid.  We might want the compiler to use the VFP for
compatibility or power reasons, but still be able to use NEON
instructions in inline assembler without passing extra flags.

Inserting ".fpu neon" to the start of the inline assembly fixes the
problem.  Is this valid?  Are assembly files with multiple .fpu
statements allowed?  Passing '-Wa,-mfpu=neon' to GCC doesn't work as
gas seems to ignore the second -mfpu.

What's the best way to handle this? Some options are:
 * Add '.fpu neon' directives to the start of any inline assembly
 * Separate out the features, so you can specify the capabilities with
one option and restrict the compiler to a subset with another.
Something like '-mfpu=neon -mfpu-tune=vfpv3-d16'
 * Relax the assembler so that any instructions are accepted.  We'd
lose some checking of GCC's output though.

-- Michael

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain


Re: Using inline NEON code

2010-12-02 Thread The Rasterman
On Fri, 3 Dec 2010 10:49:29 +1300 Michael Hope  said:

> Hi there.  Currently you can't use NEON instructions in inline
> assembly if the compiler is set to -mfpu=vfp such as Ubuntu's
> -mfpu=vfpv3-d16.  Trying code like this:
> 
> int main()
> {
>asm("veor d1, d2, d3");
>return 0;
> }
> 
> gives an error message like:
> 
> test.s: Assembler messages:
> test.s:29: Error: selected processor does not support Thumb mode `veor
> d1,d2,d3'
> 
> The problem is that -mfpu=vfpv3-d16 has two jobs:  it tells the
> compiler what instructions to use, and also tells the assembler what
> instructions are valid.  We might want the compiler to use the VFP for
> compatibility or power reasons, but still be able to use NEON
> instructions in inline assembler without passing extra flags.
> 
> Inserting ".fpu neon" to the start of the inline assembly fixes the
> problem.  Is this valid?  Are assembly files with multiple .fpu
> statements allowed?  Passing '-Wa,-mfpu=neon' to GCC doesn't work as
> gas seems to ignore the second -mfpu.
> 
> What's the best way to handle this? Some options are:
>  * Add '.fpu neon' directives to the start of any inline assembly
>  * Separate out the features, so you can specify the capabilities with
> one option and restrict the compiler to a subset with another.
> Something like '-mfpu=neon -mfpu-tune=vfpv3-d16'
>  * Relax the assembler so that any instructions are accepted.  We'd
> lose some checking of GCC's output though.

relax dude... relax... :) (that was a vote for relaxing the output - if i stick
some neon asm in my code.. i expect the assembler to do as commanded and punt
out that asm just as instructed and not try and tell me what versions of the
machine instruction set may or may not be valid based on the C compilers
optimisation flags. if i'm dropping to asm.. i'm doing so because "i know
better" than the C compiler)

-- 
- Codito, ergo sum - "I code, therefore I am" --
The Rasterman (Carsten Haitzler)ras...@rasterman.com


___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain