Re: [AArch64_be] Fix vtbl[34] and vtbx4

James Greenhalgh Thu, 08 Oct 2015 02:13:47 -0700

On Wed, Oct 07, 2015 at 09:07:30PM +0100, Christophe Lyon wrote:
> On 7 October 2015 at 17:09, James Greenhalgh <james.greenha...@arm.com> wrote:
> > On Tue, Sep 15, 2015 at 05:25:25PM +0100, Christophe Lyon wrote:
> >
> > Why do we want this for vtbx4 rather than putting out a VTBX instruction
> > directly (as in the inline asm versions you replace)?
> >
> I just followed the pattern used for vtbx3.
> 
> > This sequence does make sense for vtbx3.
> In fact, I don't see why vtbx3 and vtbx4 should be different?


The difference between TBL and TBX is in their handling of a request to
select an out-of-range value. For TBL this returns zero, for TBX this
returns the value which was already in the destination register.

Because the byte-vectors used by the TBX instruction in aarch64 are 128-bit
(so two of them togather allow selecting elements in the range 0-31), and
vtbx3 needs to emulate the AArch32 behaviour of picking elements from 3x64-bit
vectors (allowing elements in the range 0-23), we need to manually check for
values which would have been out-of-range on AArch32, but are not out
of range for AArch64 and handle them appropriately. For vtbx4 on the other
hand, 2x128-bit registers give the range 0..31 and 4x64-bit registers give
the range 0..31, so we don't need the special masked handling.

You can find the suggested instruction sequences for the Neon intrinsics
in this document:

  
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0073a/IHI0073A_arm_neon_intrinsics_ref.pdf

> >>  /* vtrn */
> >>
> >>  __extension__ static __inline float32x2_t __attribute__ 
> >> ((__always_inline__))
> >> diff --git a/gcc/config/aarch64/iterators.md 
> >> b/gcc/config/aarch64/iterators.md
> >> index b8a45d1..dfbd9cd 100644
> >> --- a/gcc/config/aarch64/iterators.md
> >> +++ b/gcc/config/aarch64/iterators.md
> >> @@ -100,6 +100,8 @@
> >>  ;; All modes.
> >>  (define_mode_iterator VALL [V8QI V16QI V4HI V8HI V2SI V4SI V2DI V2SF V4SF 
> >> V2DF])
> >>
> >> +(define_mode_iterator V8Q [V8QI])
> >> +
> >
> > This can be dropped if you use VAR1 in aarch64-builtins.c.
> >
> > Thanks for working on this, with your patch applied, the only
> > remaining intrinsics I see failing for aarch64_be are:
> >
> >   vqtbl2_*8
> >   vqtbl2q_*8
> >   vqtbl3_*8
> >   vqtbl3q_*8
> >   vqtbl4_*8
> >   vqtbl4q_*8
> >
> >   vqtbx2_*8
> >   vqtbx2q_*8
> >   vqtbx3_*8
> >   vqtbx3q_*8
> >   vqtbx4_*8
> >   vqtbx4q_*8
> >
> Quite possibly. Which tests are you looking at? Since these are
> aarch64-specific, they are not part of the
> tests I added (advsimd-intrinsics). Do you mean
> gcc.target/aarch64/table-intrinsics.c?

Sorry, yes I should have given a reference. I'm running with a variant of
a testcase from the LLVM test-suite repository:

  SingleSource/UnitTests/Vector/AArch64/aarch64_neon_intrinsics.c

This has an execute test for most of the intrinsics specified for AArch64.
It needs some modification to cover the intrinsics we don't implement yet.

Thanks,
James

Re: [AArch64_be] Fix vtbl[34] and vtbx4

Reply via email to