On Wed, Oct 07, 2015 at 09:07:30PM +0100, Christophe Lyon wrote: > On 7 October 2015 at 17:09, James Greenhalgh <james.greenha...@arm.com> wrote: > > On Tue, Sep 15, 2015 at 05:25:25PM +0100, Christophe Lyon wrote: > > > > Why do we want this for vtbx4 rather than putting out a VTBX instruction > > directly (as in the inline asm versions you replace)? > > > I just followed the pattern used for vtbx3. > > > This sequence does make sense for vtbx3. > In fact, I don't see why vtbx3 and vtbx4 should be different?
The difference between TBL and TBX is in their handling of a request to select an out-of-range value. For TBL this returns zero, for TBX this returns the value which was already in the destination register. Because the byte-vectors used by the TBX instruction in aarch64 are 128-bit (so two of them togather allow selecting elements in the range 0-31), and vtbx3 needs to emulate the AArch32 behaviour of picking elements from 3x64-bit vectors (allowing elements in the range 0-23), we need to manually check for values which would have been out-of-range on AArch32, but are not out of range for AArch64 and handle them appropriately. For vtbx4 on the other hand, 2x128-bit registers give the range 0..31 and 4x64-bit registers give the range 0..31, so we don't need the special masked handling. You can find the suggested instruction sequences for the Neon intrinsics in this document: http://infocenter.arm.com/help/topic/com.arm.doc.ihi0073a/IHI0073A_arm_neon_intrinsics_ref.pdf > >> /* vtrn */ > >> > >> __extension__ static __inline float32x2_t __attribute__ > >> ((__always_inline__)) > >> diff --git a/gcc/config/aarch64/iterators.md > >> b/gcc/config/aarch64/iterators.md > >> index b8a45d1..dfbd9cd 100644 > >> --- a/gcc/config/aarch64/iterators.md > >> +++ b/gcc/config/aarch64/iterators.md > >> @@ -100,6 +100,8 @@ > >> ;; All modes. > >> (define_mode_iterator VALL [V8QI V16QI V4HI V8HI V2SI V4SI V2DI V2SF V4SF > >> V2DF]) > >> > >> +(define_mode_iterator V8Q [V8QI]) > >> + > > > > This can be dropped if you use VAR1 in aarch64-builtins.c. > > > > Thanks for working on this, with your patch applied, the only > > remaining intrinsics I see failing for aarch64_be are: > > > > vqtbl2_*8 > > vqtbl2q_*8 > > vqtbl3_*8 > > vqtbl3q_*8 > > vqtbl4_*8 > > vqtbl4q_*8 > > > > vqtbx2_*8 > > vqtbx2q_*8 > > vqtbx3_*8 > > vqtbx3q_*8 > > vqtbx4_*8 > > vqtbx4q_*8 > > > Quite possibly. Which tests are you looking at? Since these are > aarch64-specific, they are not part of the > tests I added (advsimd-intrinsics). Do you mean > gcc.target/aarch64/table-intrinsics.c? Sorry, yes I should have given a reference. I'm running with a variant of a testcase from the LLVM test-suite repository: SingleSource/UnitTests/Vector/AArch64/aarch64_neon_intrinsics.c This has an execute test for most of the intrinsics specified for AArch64. It needs some modification to cover the intrinsics we don't implement yet. Thanks, James