Re: [gnu-arm-releases] Re: [PATCH, WIP] NEON quadword vectors in big-endian mode (#10061, #7306)
On 1 December 2010 17:57, Daniel Jacobowitz wrote: > On Wed, Dec 01, 2010 at 11:16:16AM +0200, Ira Rosen wrote: >> The meaning of the builtin (or maybe a new tree code would be better?) >> is that the elements of v0, v1 and v2 are deinterleaved. I wanted the >> MEM_REFs, since we actually have three data accesses here, and >> something (builtin or tree code) to indicate the deinterleaving. Since >> the vectors are passed to the builtin, I don't think it's a problem if >> the statements get separated. When the expander sees the builtin, it >> has to remove the loads it created for the MEM_REFs and create a new >> "vector load multiple and deinterleave". Is that possible? > > This is a problem I've struggled with before. My only caution is that > representing the MEM_REF's separately from the deinterleaving in the IR > allows all sorts of ways (many we haven't thought of yet) for them to > get separated, and there's no instruction to efficiently implement the > deinterleaving from registers. For instance, suppose a pseudo gets > propagated into the builtin and we can't find the MEM_REFs any more. > The resulting code could easily be worse than pre-vectorization. I see. So one builtin for everything, like vector_load_deinterleave (v0, v1, v2,..., stride,...) is our only option? Thanks, Ira > > -- > Daniel Jacobowitz > CodeSourcery > ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: [gnu-arm-releases] Re: [PATCH, WIP] NEON quadword vectors in big-endian mode (#10061, #7306)
On Wed, Dec 01, 2010 at 11:24:01AM +, Julian Brown wrote: > > PowerPC also has load/store multiple, but I guess they are generated > > in the same phase as for ARM. Maybe there are other architectures that > > do that allocate contiguous register but earlier? > > I don't know about other architectures which do that. PowerPC essentially restricts load/store multiple instruction use to the prologue/epilogue. There are some tricks it can play with load-string instructions, but those are restricted to hard registers (see movmemsi_$Nreg patterns in rs6000.md). There's also the *ldmsi$N patterns in rs6000.md, but it's not clear to me what conditions those get generated under (combine?). -Nathan ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: [gnu-arm-releases] Re: [PATCH, WIP] NEON quadword vectors in big-endian mode (#10061, #7306)
On Wed, Dec 01, 2010 at 11:16:16AM +0200, Ira Rosen wrote: > The meaning of the builtin (or maybe a new tree code would be better?) > is that the elements of v0, v1 and v2 are deinterleaved. I wanted the > MEM_REFs, since we actually have three data accesses here, and > something (builtin or tree code) to indicate the deinterleaving. Since > the vectors are passed to the builtin, I don't think it's a problem if > the statements get separated. When the expander sees the builtin, it > has to remove the loads it created for the MEM_REFs and create a new > "vector load multiple and deinterleave". Is that possible? This is a problem I've struggled with before. My only caution is that representing the MEM_REF's separately from the deinterleaving in the IR allows all sorts of ways (many we haven't thought of yet) for them to get separated, and there's no instruction to efficiently implement the deinterleaving from registers. For instance, suppose a pseudo gets propagated into the builtin and we can't find the MEM_REFs any more. The resulting code could easily be worse than pre-vectorization. -- Daniel Jacobowitz CodeSourcery ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: [gnu-arm-releases] Re: [PATCH, WIP] NEON quadword vectors in big-endian mode (#10061, #7306)
On Thu, Dec 02, 2010 at 10:54:32AM +0200, Ira Rosen wrote: > On 1 December 2010 17:57, Daniel Jacobowitz wrote: > > On Wed, Dec 01, 2010 at 11:16:16AM +0200, Ira Rosen wrote: > >> The meaning of the builtin (or maybe a new tree code would be better?) > >> is that the elements of v0, v1 and v2 are deinterleaved. I wanted the > >> MEM_REFs, since we actually have three data accesses here, and > >> something (builtin or tree code) to indicate the deinterleaving. Since > >> the vectors are passed to the builtin, I don't think it's a problem if > >> the statements get separated. When the expander sees the builtin, it > >> has to remove the loads it created for the MEM_REFs and create a new > >> "vector load multiple and deinterleave". Is that possible? > > > > This is a problem I've struggled with before. My only caution is that > > representing the MEM_REF's separately from the deinterleaving in the IR > > allows all sorts of ways (many we haven't thought of yet) for them to > > get separated, and there's no instruction to efficiently implement the > > deinterleaving from registers. For instance, suppose a pseudo gets > > propagated into the builtin and we can't find the MEM_REFs any more. > > The resulting code could easily be worse than pre-vectorization. > > I see. So one builtin for everything, like > > vector_load_deinterleave (v0, v1, v2,..., stride,...) > > is our only option? It's not the only option; the way you've described might work, too. But yes, it's my opinion that a single builtin is less likely to generate something the compiler can't recover from. -- Daniel Jacobowitz CodeSourcery ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: [PATCH, WIP] NEON quadword vectors in big-endian mode (#10061, #7306)
On Tue, 30 Nov 2010, Julian Brown wrote: > * defaults.h (VECTOR_ELEMENTS_BIG_ENDIAN): Define. Apart from the point that new target macros should be hooks, the *very first* thing to do with any new macro or hook is to write the .texi documentation, which appears to be missing from this patch. This is best done before making any consequent code changes. That documentation needs to make clear: * What the effects are on the semantics of GNU C code (that uses generic vector extensions - which in 4.6 include subscripting), if any. * What the effects are on the semantics of GENERIC and GIMPLE using vector types, if any. If the semantics of memory references to such types are affected, or the semantics of any other existing GENERIC or GIMPLE operation (most likely any operations that may exist involving lane numbers) then you need to make clear how, and ensure that any documentation / comments for that operation are updated as well; the patch submission needs to make clear how you have audited all existing code for correctness given such a change - any code mentioning a GENERIC or GIMPLE code may do transformations assuming particular semantics. * Likewise, for all machine-independent RTL operations. * For UNSPECs the documentation issue doesn't arise, but the audit is still relevant. -- Joseph S. Myers jos...@codesourcery.com ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
[ACTIVITY] Nov 29 - Dec 2
- Continued looking into NEON special loads and stores. - Benchmarks: concentrated on EEMBC Telecom: - autcor gets vectorized - viterbi, besides strided data accesses, needs to sink conditional stores to allow if-conversion and make the main loop vectorizable. Since the potential here is 4x, I think it's worthwhile to work on this. - conven, fbital also have control-flow issue, but much more complicated than viterbi - fft has a problem with loop count, I would like to investigate this a bit more - diffmeasure doesn't seem to have vectorization potential - Fixed GCC PR 46663 on trunk, testing the fix for 4.3, 4.4, 4.5. ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Using inline NEON code
Hi there. Currently you can't use NEON instructions in inline assembly if the compiler is set to -mfpu=vfp such as Ubuntu's -mfpu=vfpv3-d16. Trying code like this: int main() { asm("veor d1, d2, d3"); return 0; } gives an error message like: test.s: Assembler messages: test.s:29: Error: selected processor does not support Thumb mode `veor d1,d2,d3' The problem is that -mfpu=vfpv3-d16 has two jobs: it tells the compiler what instructions to use, and also tells the assembler what instructions are valid. We might want the compiler to use the VFP for compatibility or power reasons, but still be able to use NEON instructions in inline assembler without passing extra flags. Inserting ".fpu neon" to the start of the inline assembly fixes the problem. Is this valid? Are assembly files with multiple .fpu statements allowed? Passing '-Wa,-mfpu=neon' to GCC doesn't work as gas seems to ignore the second -mfpu. What's the best way to handle this? Some options are: * Add '.fpu neon' directives to the start of any inline assembly * Separate out the features, so you can specify the capabilities with one option and restrict the compiler to a subset with another. Something like '-mfpu=neon -mfpu-tune=vfpv3-d16' * Relax the assembler so that any instructions are accepted. We'd lose some checking of GCC's output though. -- Michael ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: Using inline NEON code
On Fri, 3 Dec 2010 10:49:29 +1300 Michael Hope said: > Hi there. Currently you can't use NEON instructions in inline > assembly if the compiler is set to -mfpu=vfp such as Ubuntu's > -mfpu=vfpv3-d16. Trying code like this: > > int main() > { >asm("veor d1, d2, d3"); >return 0; > } > > gives an error message like: > > test.s: Assembler messages: > test.s:29: Error: selected processor does not support Thumb mode `veor > d1,d2,d3' > > The problem is that -mfpu=vfpv3-d16 has two jobs: it tells the > compiler what instructions to use, and also tells the assembler what > instructions are valid. We might want the compiler to use the VFP for > compatibility or power reasons, but still be able to use NEON > instructions in inline assembler without passing extra flags. > > Inserting ".fpu neon" to the start of the inline assembly fixes the > problem. Is this valid? Are assembly files with multiple .fpu > statements allowed? Passing '-Wa,-mfpu=neon' to GCC doesn't work as > gas seems to ignore the second -mfpu. > > What's the best way to handle this? Some options are: > * Add '.fpu neon' directives to the start of any inline assembly > * Separate out the features, so you can specify the capabilities with > one option and restrict the compiler to a subset with another. > Something like '-mfpu=neon -mfpu-tune=vfpv3-d16' > * Relax the assembler so that any instructions are accepted. We'd > lose some checking of GCC's output though. relax dude... relax... :) (that was a vote for relaxing the output - if i stick some neon asm in my code.. i expect the assembler to do as commanded and punt out that asm just as instructed and not try and tell me what versions of the machine instruction set may or may not be valid based on the C compilers optimisation flags. if i'm dropping to asm.. i'm doing so because "i know better" than the C compiler) -- - Codito, ergo sum - "I code, therefore I am" -- The Rasterman (Carsten Haitzler)ras...@rasterman.com ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain