On 12 October 2015 at 15:30, James Greenhalgh <james.greenha...@arm.com> wrote: > On Fri, Oct 09, 2015 at 05:16:05PM +0100, Christophe Lyon wrote: >> On 8 October 2015 at 11:12, James Greenhalgh <james.greenha...@arm.com> >> wrote: >> > On Wed, Oct 07, 2015 at 09:07:30PM +0100, Christophe Lyon wrote: >> >> On 7 October 2015 at 17:09, James Greenhalgh <james.greenha...@arm.com> >> >> wrote: >> >> > On Tue, Sep 15, 2015 at 05:25:25PM +0100, Christophe Lyon wrote: >> >> > >> >> > Why do we want this for vtbx4 rather than putting out a VTBX instruction >> >> > directly (as in the inline asm versions you replace)? >> >> > >> >> I just followed the pattern used for vtbx3. >> >> >> >> > This sequence does make sense for vtbx3. >> >> In fact, I don't see why vtbx3 and vtbx4 should be different? >> > >> > The difference between TBL and TBX is in their handling of a request to >> > select an out-of-range value. For TBL this returns zero, for TBX this >> > returns the value which was already in the destination register. >> > >> > Because the byte-vectors used by the TBX instruction in aarch64 are 128-bit >> > (so two of them togather allow selecting elements in the range 0-31), and >> > vtbx3 needs to emulate the AArch32 behaviour of picking elements from >> > 3x64-bit >> > vectors (allowing elements in the range 0-23), we need to manually check >> > for >> > values which would have been out-of-range on AArch32, but are not out >> > of range for AArch64 and handle them appropriately. For vtbx4 on the other >> > hand, 2x128-bit registers give the range 0..31 and 4x64-bit registers give >> > the range 0..31, so we don't need the special masked handling. >> > >> > You can find the suggested instruction sequences for the Neon intrinsics >> > in this document: >> > >> > >> > http://infocenter.arm.com/help/topic/com.arm.doc.ihi0073a/IHI0073A_arm_neon_intrinsics_ref.pdf >> > >> >> Hi James, >> >> Please find attached an updated version which hopefully addresses your >> comments. >> Tested on aarch64-none-elf and aarch64_be-none-elf using the Foundation >> Model. >> >> OK? > > Looks good to me, > > Thanks, > James >
I commited this as r228716, and noticed later that gcc.target/aarch64/table-intrinsics.c failed because of this patch. This is because that testcase scans the assembly for 'tbl v' or 'tbx v', but since I replaced some asm statements, the space is now a tab. I plan to commit this (probably obvious?):
2015-10-13 Christophe Lyon <christophe.l...@linaro.org> * gcc/testsuite/gcc.target/aarch64/table-intrinsics.c: Fix regexp after r228716 (Fix vtbl[34] and vtbx4).
Index: gcc/testsuite/gcc.target/aarch64/table-intrinsics.c =================================================================== --- gcc/testsuite/gcc.target/aarch64/table-intrinsics.c (revision 228759) +++ gcc/testsuite/gcc.target/aarch64/table-intrinsics.c (working copy) @@ -435,5 +435,5 @@ return vqtbx4q_p8 (r, tab, idx); } -/* { dg-final { scan-assembler-times "tbl v" 42} } */ -/* { dg-final { scan-assembler-times "tbx v" 30} } */ +/* { dg-final { scan-assembler-times "tbl\[ |\t\]*v" 42} } */ +/* { dg-final { scan-assembler-times "tbx\[ |\t\]*v" 30} } */