On Fri, Oct 09, 2015 at 05:16:05PM +0100, Christophe Lyon wrote: > On 8 October 2015 at 11:12, James Greenhalgh <james.greenha...@arm.com> wrote: > > On Wed, Oct 07, 2015 at 09:07:30PM +0100, Christophe Lyon wrote: > >> On 7 October 2015 at 17:09, James Greenhalgh <james.greenha...@arm.com> > >> wrote: > >> > On Tue, Sep 15, 2015 at 05:25:25PM +0100, Christophe Lyon wrote: > >> > > >> > Why do we want this for vtbx4 rather than putting out a VTBX instruction > >> > directly (as in the inline asm versions you replace)? > >> > > >> I just followed the pattern used for vtbx3. > >> > >> > This sequence does make sense for vtbx3. > >> In fact, I don't see why vtbx3 and vtbx4 should be different? > > > > The difference between TBL and TBX is in their handling of a request to > > select an out-of-range value. For TBL this returns zero, for TBX this > > returns the value which was already in the destination register. > > > > Because the byte-vectors used by the TBX instruction in aarch64 are 128-bit > > (so two of them togather allow selecting elements in the range 0-31), and > > vtbx3 needs to emulate the AArch32 behaviour of picking elements from > > 3x64-bit > > vectors (allowing elements in the range 0-23), we need to manually check for > > values which would have been out-of-range on AArch32, but are not out > > of range for AArch64 and handle them appropriately. For vtbx4 on the other > > hand, 2x128-bit registers give the range 0..31 and 4x64-bit registers give > > the range 0..31, so we don't need the special masked handling. > > > > You can find the suggested instruction sequences for the Neon intrinsics > > in this document: > > > > > > http://infocenter.arm.com/help/topic/com.arm.doc.ihi0073a/IHI0073A_arm_neon_intrinsics_ref.pdf > > > > Hi James, > > Please find attached an updated version which hopefully addresses your > comments. > Tested on aarch64-none-elf and aarch64_be-none-elf using the Foundation Model. > > OK?
Looks good to me, Thanks, James