Re: [AArch64_be] Fix vtbl[34] and vtbx4

Christophe Lyon Tue, 13 Oct 2015 06:05:36 -0700

On 12 October 2015 at 15:30, James Greenhalgh <[email protected]> wrote:
> On Fri, Oct 09, 2015 at 05:16:05PM +0100, Christophe Lyon wrote:
>> On 8 October 2015 at 11:12, James Greenhalgh <[email protected]> 
>> wrote:
>> > On Wed, Oct 07, 2015 at 09:07:30PM +0100, Christophe Lyon wrote:
>> >> On 7 October 2015 at 17:09, James Greenhalgh <[email protected]> 
>> >> wrote:
>> >> > On Tue, Sep 15, 2015 at 05:25:25PM +0100, Christophe Lyon wrote:
>> >> >
>> >> > Why do we want this for vtbx4 rather than putting out a VTBX instruction
>> >> > directly (as in the inline asm versions you replace)?
>> >> >
>> >> I just followed the pattern used for vtbx3.
>> >>
>> >> > This sequence does make sense for vtbx3.
>> >> In fact, I don't see why vtbx3 and vtbx4 should be different?
>> >
>> > The difference between TBL and TBX is in their handling of a request to
>> > select an out-of-range value. For TBL this returns zero, for TBX this
>> > returns the value which was already in the destination register.
>> >
>> > Because the byte-vectors used by the TBX instruction in aarch64 are 128-bit
>> > (so two of them togather allow selecting elements in the range 0-31), and
>> > vtbx3 needs to emulate the AArch32 behaviour of picking elements from 
>> > 3x64-bit
>> > vectors (allowing elements in the range 0-23), we need to manually check 
>> > for
>> > values which would have been out-of-range on AArch32, but are not out
>> > of range for AArch64 and handle them appropriately. For vtbx4 on the other
>> > hand, 2x128-bit registers give the range 0..31 and 4x64-bit registers give
>> > the range 0..31, so we don't need the special masked handling.
>> >
>> > You can find the suggested instruction sequences for the Neon intrinsics
>> > in this document:
>> >
>> >   
>> > http://infocenter.arm.com/help/topic/com.arm.doc.ihi0073a/IHI0073A_arm_neon_intrinsics_ref.pdf
>> >
>>
>> Hi James,
>>
>> Please find attached an updated version which hopefully addresses your 
>> comments.
>> Tested on aarch64-none-elf and aarch64_be-none-elf using the Foundation 
>> Model.
>>
>> OK?
>
> Looks good to me,
>
> Thanks,
> James
>


I commited this as r228716, and noticed later that
gcc.target/aarch64/table-intrinsics.c failed because of this patch.

This is because that testcase scans the assembly for 'tbl v' or 'tbx
v', but since I replaced some asm statements,
the space is now a tab.

I plan to commit this (probably obvious?):

2015-10-13  Christophe Lyon  <[email protected]>

        * gcc/testsuite/gcc.target/aarch64/table-intrinsics.c: Fix regexp
        after r228716 (Fix vtbl[34] and vtbx4).

Index: gcc/testsuite/gcc.target/aarch64/table-intrinsics.c
===================================================================
--- gcc/testsuite/gcc.target/aarch64/table-intrinsics.c	(revision 228759)
+++ gcc/testsuite/gcc.target/aarch64/table-intrinsics.c	(working copy)
@@ -435,5 +435,5 @@
   return vqtbx4q_p8 (r, tab, idx);
 }
 
-/* { dg-final { scan-assembler-times "tbl v" 42} }  */
-/* { dg-final { scan-assembler-times "tbx v" 30} }  */
+/* { dg-final { scan-assembler-times "tbl\[ |\t\]*v" 42} }  */
+/* { dg-final { scan-assembler-times "tbx\[ |\t\]*v" 30} }  */

Re: [AArch64_be] Fix vtbl[34] and vtbx4

Reply via email to