On Wed, Oct 07, 2015 at 09:07:30PM +0100, Christophe Lyon wrote:
> On 7 October 2015 at 17:09, James Greenhalgh <james.greenha...@arm.com> wrote:
> > On Tue, Sep 15, 2015 at 05:25:25PM +0100, Christophe Lyon wrote:
> >
> > Why do we want this for vtbx4 rather than putting out a VTBX instruction
> > directly (as in the inline asm versions you replace)?
> >
> I just followed the pattern used for vtbx3.
> 
> > This sequence does make sense for vtbx3.
> In fact, I don't see why vtbx3 and vtbx4 should be different?

The difference between TBL and TBX is in their handling of a request to
select an out-of-range value. For TBL this returns zero, for TBX this
returns the value which was already in the destination register.

Because the byte-vectors used by the TBX instruction in aarch64 are 128-bit
(so two of them togather allow selecting elements in the range 0-31), and
vtbx3 needs to emulate the AArch32 behaviour of picking elements from 3x64-bit
vectors (allowing elements in the range 0-23), we need to manually check for
values which would have been out-of-range on AArch32, but are not out
of range for AArch64 and handle them appropriately. For vtbx4 on the other
hand, 2x128-bit registers give the range 0..31 and 4x64-bit registers give
the range 0..31, so we don't need the special masked handling.

You can find the suggested instruction sequences for the Neon intrinsics
in this document:

  
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0073a/IHI0073A_arm_neon_intrinsics_ref.pdf

> >>  /* vtrn */
> >>
> >>  __extension__ static __inline float32x2_t __attribute__ 
> >> ((__always_inline__))
> >> diff --git a/gcc/config/aarch64/iterators.md 
> >> b/gcc/config/aarch64/iterators.md
> >> index b8a45d1..dfbd9cd 100644
> >> --- a/gcc/config/aarch64/iterators.md
> >> +++ b/gcc/config/aarch64/iterators.md
> >> @@ -100,6 +100,8 @@
> >>  ;; All modes.
> >>  (define_mode_iterator VALL [V8QI V16QI V4HI V8HI V2SI V4SI V2DI V2SF V4SF 
> >> V2DF])
> >>
> >> +(define_mode_iterator V8Q [V8QI])
> >> +
> >
> > This can be dropped if you use VAR1 in aarch64-builtins.c.
> >
> > Thanks for working on this, with your patch applied, the only
> > remaining intrinsics I see failing for aarch64_be are:
> >
> >   vqtbl2_*8
> >   vqtbl2q_*8
> >   vqtbl3_*8
> >   vqtbl3q_*8
> >   vqtbl4_*8
> >   vqtbl4q_*8
> >
> >   vqtbx2_*8
> >   vqtbx2q_*8
> >   vqtbx3_*8
> >   vqtbx3q_*8
> >   vqtbx4_*8
> >   vqtbx4q_*8
> >
> Quite possibly. Which tests are you looking at? Since these are
> aarch64-specific, they are not part of the
> tests I added (advsimd-intrinsics). Do you mean
> gcc.target/aarch64/table-intrinsics.c?

Sorry, yes I should have given a reference. I'm running with a variant of
a testcase from the LLVM test-suite repository:

  SingleSource/UnitTests/Vector/AArch64/aarch64_neon_intrinsics.c

This has an execute test for most of the intrinsics specified for AArch64.
It needs some modification to cover the intrinsics we don't implement yet.

Thanks,
James

Reply via email to