Re: [AArch64_be] Fix vtbl[34] and vtbx4

James Greenhalgh Mon, 12 Oct 2015 06:30:59 -0700

On Fri, Oct 09, 2015 at 05:16:05PM +0100, Christophe Lyon wrote:
> On 8 October 2015 at 11:12, James Greenhalgh <[email protected]> wrote:
> > On Wed, Oct 07, 2015 at 09:07:30PM +0100, Christophe Lyon wrote:
> >> On 7 October 2015 at 17:09, James Greenhalgh <[email protected]> 
> >> wrote:
> >> > On Tue, Sep 15, 2015 at 05:25:25PM +0100, Christophe Lyon wrote:
> >> >
> >> > Why do we want this for vtbx4 rather than putting out a VTBX instruction
> >> > directly (as in the inline asm versions you replace)?
> >> >
> >> I just followed the pattern used for vtbx3.
> >>
> >> > This sequence does make sense for vtbx3.
> >> In fact, I don't see why vtbx3 and vtbx4 should be different?
> >
> > The difference between TBL and TBX is in their handling of a request to
> > select an out-of-range value. For TBL this returns zero, for TBX this
> > returns the value which was already in the destination register.
> >
> > Because the byte-vectors used by the TBX instruction in aarch64 are 128-bit
> > (so two of them togather allow selecting elements in the range 0-31), and
> > vtbx3 needs to emulate the AArch32 behaviour of picking elements from 
> > 3x64-bit
> > vectors (allowing elements in the range 0-23), we need to manually check for
> > values which would have been out-of-range on AArch32, but are not out
> > of range for AArch64 and handle them appropriately. For vtbx4 on the other
> > hand, 2x128-bit registers give the range 0..31 and 4x64-bit registers give
> > the range 0..31, so we don't need the special masked handling.
> >
> > You can find the suggested instruction sequences for the Neon intrinsics
> > in this document:
> >
> >   
> > http://infocenter.arm.com/help/topic/com.arm.doc.ihi0073a/IHI0073A_arm_neon_intrinsics_ref.pdf
> >
> 
> Hi James,
> 
> Please find attached an updated version which hopefully addresses your 
> comments.
> Tested on aarch64-none-elf and aarch64_be-none-elf using the Foundation Model.
> 
> OK?


Looks good to me,

Thanks,
James

Re: [AArch64_be] Fix vtbl[34] and vtbx4

Reply via email to