Re: [PATCH,ARM] fix testsuite failures for arm-none-linux-gnueabihf
Hi Here is an updated version. Changelog: * gcc.dg/builtin-apply2.c: skip test on arm hardfloat ABI targets * gcc.dg/tls/pr42894.c: Remove options, forcing -mthumb fails with hardfloat, and test is not thumb-specific * gcc,target/arm/thumb-ltu.c: Avoid test failure with hardfloat ABI by requiring arm_thumb1_ok * lib/target-supports.exp (check_effective_target_arm_fp16_ok_nocache): don't force -mfloat-abi=soft when building for hardfloat target On 19 August 2013 16:34, Richard Earnshaw wrote: > On 15/08/13 15:10, Charles Baylis wrote: >> Hi >> >> The attached patch fixes some tests which fail when testing gcc for a >> arm-none-linux-gnueabihf target because they do not expect to be built >> with a hard float ABI. >> >> The change in target-supports.exp fixes arm-fp16-ops-5.c and >> arm-fp16-ops-6.c. >> >> Tested on arm-none-linux-gnueabihf using qemu-arm, and does not cause >> any other tests to break. >> >> Comments? This is my first patch, so please point out anything wrong. >> >> > >> >> >> 2013-08-15 Charles Baylis >> >> * gcc.dg/builtin-apply2.c: skip test on arm hardfloat ABI targets >> * gcc.dg/tls/pr42894.c: Use -mfloat-abi=soft as Thumb1 does >> not support hardfloat ABI >> * arm/thumb-ltu.c: Use -mfloat-abi=soft as Thumb1 does not >> support hardfloat ABI >> * target-supports.exp: don't force -mfloat-abi=soft when >> building for hardfloat target >> >> >> hf-fixes.txt >> >> >> Index: gcc/testsuite/gcc.dg/builtin-apply2.c >> === >> --- gcc/testsuite/gcc.dg/builtin-apply2.c (revision 201726) >> +++ gcc/testsuite/gcc.dg/builtin-apply2.c (working copy) >> @@ -1,6 +1,7 @@ >> /* { dg-do run } */ >> /* { dg-skip-if "Variadic funcs have all args on stack. Normal funcs have >> args in registers." { "aarch64*-*-* avr-*-* " } { "*" } { "" } } */ >> /* { dg-skip-if "Variadic funcs use Base AAPCS. Normal funcs use VFP >> variant." { "arm*-*-*" } { "-mfloat-abi=hard" } { "" } } */ >> +/* { dg-skip-if "Variadic funcs use Base AAPCS. Normal funcs use VFP >> variant." { "arm*-*-gnueabihf" } { "*" } { "-mfloat-abi=soft*" } } */ >> > > > As you've noticed, basing the test's behaviour on the config variant > doesn't work reliably. The builtin-apply2 test really should be skipped > if the current test variant is not soft-float. We already have > check_effective_target_arm_hf_eabi in target-supports.exp that checks > whether __ARM_PCS_VFP is defined during a compilation. So can replace > both arm related lines in builtin-apply2 with > > /* { dg-skip-if "Variadic funcs use Base AAPCS. Normal funcs use VFP > variant." { "arm*-*-*" && arm_hf_eabi} { "*" } { "" } } */ > >> /* PR target/12503 */ >> /* Origin: */ >> Index: gcc/testsuite/gcc.dg/tls/pr42894.c >> === >> --- gcc/testsuite/gcc.dg/tls/pr42894.c(revision 201726) >> +++ gcc/testsuite/gcc.dg/tls/pr42894.c(working copy) >> @@ -1,6 +1,7 @@ >> /* PR target/42894 */ >> /* { dg-do compile } */ >> /* { dg-options "-march=armv5te -mthumb" { target arm*-*-* } } */ >> +/* { dg-options "-march=armv5te -mthumb -mfloat-abi=soft" { target >> arm*-*-*hf } } */ >> /* { dg-require-effective-target tls } */ >> > > Although the original PR was for Thumb1, this is a generic test. I'm > not convinced that on ARM it should try to force thumb1. Removing the > original dg-options line should solve the problem and we then get better > multi-lib testing as well. > >> extern __thread int t; >> Index: gcc/testsuite/gcc.target/arm/thumb-ltu.c >> === >> --- gcc/testsuite/gcc.target/arm/thumb-ltu.c (revision 201726) >> +++ gcc/testsuite/gcc.target/arm/thumb-ltu.c (working copy) >> @@ -1,6 +1,6 @@ >> /* { dg-do compile } */ >> /* { dg-skip-if "incompatible options" { arm*-*-* } { "-march=*" } { >> "-march=armv6" "-march=armv6j" "-march=armv6z" } } */ >> -/* { dg-options "-mcpu=arm1136jf-s -mthumb -O2" } */ >> +/* { dg-options "-mcpu=arm1136jf-s -mthumb -O2 -mfloat-abi=soft" } */ >> > > This won't work if t
Re: [PATCH,ARM] fix testsuite failures for arm-none-linux-gnueabihf
PIng? On 19 September 2013 18:21, Charles Baylis wrote: > Hi > > Here is an updated version. > > Changelog: > > * gcc.dg/builtin-apply2.c: skip test on arm hardfloat ABI targets > * gcc.dg/tls/pr42894.c: Remove options, forcing -mthumb fails > with hardfloat, and test is not thumb-specific > * gcc,target/arm/thumb-ltu.c: Avoid test failure with > hardfloat ABI by requiring arm_thumb1_ok > * lib/target-supports.exp > (check_effective_target_arm_fp16_ok_nocache): don't force > -mfloat-abi=soft when building for hardfloat target > > On 19 August 2013 16:34, Richard Earnshaw wrote: >> On 15/08/13 15:10, Charles Baylis wrote: >>> Hi >>> >>> The attached patch fixes some tests which fail when testing gcc for a >>> arm-none-linux-gnueabihf target because they do not expect to be built >>> with a hard float ABI. >>> >>> The change in target-supports.exp fixes arm-fp16-ops-5.c and >>> arm-fp16-ops-6.c. >>> >>> Tested on arm-none-linux-gnueabihf using qemu-arm, and does not cause >>> any other tests to break. >>> >>> Comments? This is my first patch, so please point out anything wrong. >>> >>> >> >>> >>> >>> 2013-08-15 Charles Baylis >>> >>> * gcc.dg/builtin-apply2.c: skip test on arm hardfloat ABI targets >>> * gcc.dg/tls/pr42894.c: Use -mfloat-abi=soft as Thumb1 does >>> not support hardfloat ABI >>> * arm/thumb-ltu.c: Use -mfloat-abi=soft as Thumb1 does not >>> support hardfloat ABI >>> * target-supports.exp: don't force -mfloat-abi=soft when >>> building for hardfloat target >>> >>> >>> hf-fixes.txt >>> >>> >>> Index: gcc/testsuite/gcc.dg/builtin-apply2.c >>> === >>> --- gcc/testsuite/gcc.dg/builtin-apply2.c (revision 201726) >>> +++ gcc/testsuite/gcc.dg/builtin-apply2.c (working copy) >>> @@ -1,6 +1,7 @@ >>> /* { dg-do run } */ >>> /* { dg-skip-if "Variadic funcs have all args on stack. Normal funcs have >>> args in registers." { "aarch64*-*-* avr-*-* " } { "*" } { "" } } */ >>> /* { dg-skip-if "Variadic funcs use Base AAPCS. Normal funcs use VFP >>> variant." { "arm*-*-*" } { "-mfloat-abi=hard" } { "" } } */ >>> +/* { dg-skip-if "Variadic funcs use Base AAPCS. Normal funcs use VFP >>> variant." { "arm*-*-gnueabihf" } { "*" } { "-mfloat-abi=soft*" } } */ >>> >> >> >> As you've noticed, basing the test's behaviour on the config variant >> doesn't work reliably. The builtin-apply2 test really should be skipped >> if the current test variant is not soft-float. We already have >> check_effective_target_arm_hf_eabi in target-supports.exp that checks >> whether __ARM_PCS_VFP is defined during a compilation. So can replace >> both arm related lines in builtin-apply2 with >> >> /* { dg-skip-if "Variadic funcs use Base AAPCS. Normal funcs use VFP >> variant." { "arm*-*-*" && arm_hf_eabi} { "*" } { "" } } */ >> >>> /* PR target/12503 */ >>> /* Origin: */ >>> Index: gcc/testsuite/gcc.dg/tls/pr42894.c >>> === >>> --- gcc/testsuite/gcc.dg/tls/pr42894.c(revision 201726) >>> +++ gcc/testsuite/gcc.dg/tls/pr42894.c(working copy) >>> @@ -1,6 +1,7 @@ >>> /* PR target/42894 */ >>> /* { dg-do compile } */ >>> /* { dg-options "-march=armv5te -mthumb" { target arm*-*-* } } */ >>> +/* { dg-options "-march=armv5te -mthumb -mfloat-abi=soft" { target >>> arm*-*-*hf } } */ >>> /* { dg-require-effective-target tls } */ >>> >> >> Although the original PR was for Thumb1, this is a generic test. I'm >> not convinced that on ARM it should try to force thumb1. Removing the >> original dg-options line should solve the problem and we then get better >> multi-lib testing as well. >> >>> extern __thread int t; >>> Index: gcc/testsuite/gcc.target/arm/thumb-ltu.c >>> === >>> --- gcc/testsuite/gcc.target/arm/thumb-ltu.c (revision 201726) >>> +++ gcc/testsuite/gcc.target/arm/thumb-ltu.c (working copy) >>
Re: [PATCH] [ARM] PR68532: Fix VUZP and VZIP recognition on big endian
ping^2 On 13 January 2016 at 13:37, Charles Baylis wrote: > ping > > On 16 December 2015 at 17:44, Charles Baylis > wrote: >> Hi >> >> This patch addresses incorrect recognition of VEC_PERM_EXPRs as VUZP >> and VZIP on armeb-* targets. It also fixes the definition of the >> vuzpq_* and vzipq_* NEON intrinsics which use incorrect lane >> specifiers in the use of __builtin_shuffle(). >> >> The problem with arm_neon.h can be seen by temporarily altering >> arm_expand_vec_perm_const_1() to unconditionally return false. If this >> is done, the vuzp/vzip tests in the advsimd execution tests will fail. >> With these patches, this is no longer the case. >> >> The problem is caused by the weird mapping of architectural lane order >> to gcc lane order in big endian. For 64 bit vectors, the order is >> simply reversed, but 128 bit vectors are treated as 2 64 bit vectors >> where the lane ordering is reversed inside those. This is due to the >> memory ordering defined by the EABI. There is a large comment in >> gcc/config/arm.c above output_move_neon() which describes this in more >> detail. >> >> The arm_evpc_neon_vuzp() and arm_evpc_neon_vzip() functions do not >> allow for this lane order, instead treating the lane order as simply >> reversed in 128 bit vectors. These patches fix this. I have included a >> test case for vuzp, but I don't have one for vzip. >> >> Tested with make check on arm-unknown-linux-gnueabihf with no regressions >> Tested with make check on armeb-unknown-linux-gnueabihf. Some >> gcc.dg/vect tests fail due to no longer being vectorized. I haven't >> analysed these, but it is expected since vuzp is not usable for the >> shuffle patterns for which it was previously used. There are also a >> few new PASSes. >> >> >> Patch 1 (vuzp): >> >> gcc/ChangeLog: >> >> 2015-12-15 Charles Baylis >> >> * config/arm/arm.c (arm_neon_endian_lane_map): New function. >> (arm_neon_vector_pair_endian_lane_map): New function. >> (arm_evpc_neon_vuzp): Allow for big endian lane order. >> * config/arm/arm_neon.h (vuzpq_s8): Adjust shuffle patterns for big >> endian. >> (vuzpq_s16): Likewise. >> (vuzpq_s32): Likewise. >> (vuzpq_f32): Likewise. >> (vuzpq_u8): Likewise. >> (vuzpq_u16): Likewise. >> (vuzpq_u32): Likewise. >> (vuzpq_p8): Likewise. >> (vuzpq_p16): Likewise. >> >> gcc/testsuite/ChangeLog: >> >> 2015-12-15 Charles Baylis >> >> * gcc.c-torture/execute/pr68532.c: New test. >> >> >> Patch 2 (vzip) >> >> gcc/ChangeLog: >> >> 2015-12-15 Charles Baylis >> >> * config/arm/arm.c (arm_evpc_neon_vzip): Allow for big endian lane >> order. >> * config/arm/arm_neon.h (vzipq_s8): Adjust shuffle patterns for big >> endian. >> (vzipq_s16): Likewise. >> (vzipq_s32): Likewise. >> (vzipq_f32): Likewise. >> (vzipq_u8): Likewise. >> (vzipq_u16): Likewise. >> (vzipq_u32): Likewise. >> (vzipq_p8): Likewise. >> (vzipq_p16): Likewise.
Re: [PATCH] [ARM] PR68532: Fix VUZP and VZIP recognition on big endian
On 1 February 2016 at 17:14, Kyrill Tkachov wrote: > Indeed I see the new passes on armeb-none-eabi. > However, the new FAILs that I see are ICEs, not just vectorisation failures, > so they need to be looked at. > > The ICEs that I see are: > FAIL: gcc.dg/torture/vshuf-v4hi.c -O2 (internal compiler error) > FAIL: gcc.dg/torture/vshuf-v8qi.c -O2 (internal compiler error) Thanks. I hadn't seen these because I wasn't running the "expensive" tests. > Seems that the code in expr.c asserts that expand_vec_perm returned a > non-NULL result. It seems that my implementation of arm_evpc_neon_vuzp doesn't handle the one vector case correctly. I'm testing a fix. > I'll look at the patches in more detail, but in the meantime I notice that > there are some > GNU style issues that should be resolved, like starting comments with a > capital letter, > two spaces after full stop, two spaces between full stop and close comment, > as well as some > lines over 80 characters. The check_GNU_style.sh script in the contrib/ > directory can help > catch some (if not all) of these. OK, I'll fix those. > Also, can you please send any follow-up versions of the two patches as > separate emails, > so that we can more easily keep track of what's comment goes to which patch. Will do.
[ARM, PATCH v2 0/2] PR68532: Fix VZIP/VUZP recognition for big endian
From: Charles Baylis This is an updated patch, which fixes the following issues: . big endian ICE with vshuf-* tests . style issues reported by check_GNU_style.sh This has no regressions with -mfpu=neon, for arm-unknown-linux-gnueabihf and armeb-unknown-linux-gnueabihf. The new test passes for both, and big endian has new PASSes for the vshuf-* execution tests, which currently fail on trunk. The comment about the failures due to failure to vectorize seems to have been incorrect. Link to previous thread: https://gcc.gnu.org/ml/gcc-patches/2016-02/msg00060.html Charles Baylis (2): [ARM] PR68532: Fix up vuzp for big endian [ARM] PR68532 Fix up vzip recognition for big endian gcc/config/arm/arm.c | 77 +-- gcc/config/arm/arm_neon.h | 72 - gcc/testsuite/gcc.c-torture/execute/pr68532.c | 24 + 3 files changed, 122 insertions(+), 51 deletions(-) create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr68532.c -- 1.9.1
[PATCH 2/2] [ARM] PR68532 Fix up vzip recognition for big endian
From: Charles Baylis gcc/ChangeLog: 2016-02-03 Charles Baylis PR target/68532 * config/arm/arm.c (arm_evpc_neon_vzip): Allow for big endian lane order. * config/arm/arm_neon.h (vzipq_s8): Adjust shuffle patterns for big endian. (vzipq_s16): Likewise. (vzipq_s32): Likewise. (vzipq_f32): Likewise. (vzipq_u8): Likewise. (vzipq_u16): Likewise. (vzipq_u32): Likewise. (vzipq_p8): Likewise. (vzipq_p16): Likewise. Change-Id: I327678f5e73c1de2f413c1d22769ab42ce1d6c16 diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index e9aa982..24239db 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -28318,15 +28318,21 @@ arm_evpc_neon_vzip (struct expand_vec_perm_d *d) unsigned int i, high, mask, nelt = d->nelt; rtx out0, out1, in0, in1; rtx (*gen)(rtx, rtx, rtx, rtx); + int first_elem; + bool is_swapped; if (GET_MODE_UNIT_SIZE (d->vmode) >= 8) return false; + is_swapped = BYTES_BIG_ENDIAN ? true : false; + /* Note that these are little-endian tests. Adjust for big-endian later. */ + first_elem = d->perm[neon_endian_lane_map (d->vmode, 0) ^ is_swapped]; + high = nelt / 2; - if (d->perm[0] == high) + if (first_elem == neon_endian_lane_map (d->vmode, high)) ; - else if (d->perm[0] == 0) + else if (first_elem == neon_endian_lane_map (d->vmode, 0)) high = 0; else return false; @@ -28334,11 +28340,16 @@ arm_evpc_neon_vzip (struct expand_vec_perm_d *d) for (i = 0; i < nelt / 2; i++) { - unsigned elt = (i + high) & mask; - if (d->perm[i * 2] != elt) + unsigned elt = + neon_pair_endian_lane_map (d->vmode, i + high) & mask; + if (d->perm[neon_pair_endian_lane_map (d->vmode, 2 * i + is_swapped)] + != elt) return false; - elt = (elt + nelt) & mask; - if (d->perm[i * 2 + 1] != elt) + elt = + neon_pair_endian_lane_map (d->vmode, i + nelt + high) + & mask; + if (d->perm[neon_pair_endian_lane_map (d->vmode, 2 * i + !is_swapped)] + != elt) return false; } @@ -28362,10 +28373,9 @@ arm_evpc_neon_vzip (struct expand_vec_perm_d *d) in0 = d->op0; in1 = d->op1; - if (BYTES_BIG_ENDIAN) + if (is_swapped) { std::swap (in0, in1); - high = !high; } out0 = d->target; diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h index 2e014b6..aa17f49 100644 --- a/gcc/config/arm/arm_neon.h +++ b/gcc/config/arm/arm_neon.h @@ -8453,9 +8453,9 @@ vzipq_s8 (int8x16_t __a, int8x16_t __b) int8x16x2_t __rv; #ifdef __ARM_BIG_ENDIAN __rv.val[0] = __builtin_shuffle (__a, __b, (uint8x16_t) - { 24, 8, 25, 9, 26, 10, 27, 11, 28, 12, 29, 13, 30, 14, 31, 15 }); + { 20, 4, 21, 5, 22, 6, 23, 7, 16, 0, 17, 1, 18, 2, 19, 3 }); __rv.val[1] = __builtin_shuffle (__a, __b, (uint8x16_t) - { 16, 0, 17, 1, 18, 2, 19, 3, 20, 4, 21, 5, 22, 6, 23, 7 }); + { 28, 12, 29, 13, 30, 14, 31, 15, 24, 8, 25, 9, 26, 10, 27, 11 }); #else __rv.val[0] = __builtin_shuffle (__a, __b, (uint8x16_t) { 0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23 }); @@ -8471,9 +8471,9 @@ vzipq_s16 (int16x8_t __a, int16x8_t __b) int16x8x2_t __rv; #ifdef __ARM_BIG_ENDIAN __rv.val[0] = __builtin_shuffle (__a, __b, (uint16x8_t) - { 12, 4, 13, 5, 14, 6, 15, 7 }); + { 10, 2, 11, 3, 8, 0, 9, 1 }); __rv.val[1] = __builtin_shuffle (__a, __b, (uint16x8_t) - { 8, 0, 9, 1, 10, 2, 11, 3 }); + { 14, 6, 15, 7, 12, 4, 13, 5 }); #else __rv.val[0] = __builtin_shuffle (__a, __b, (uint16x8_t) { 0, 8, 1, 9, 2, 10, 3, 11 }); @@ -8488,8 +8488,8 @@ vzipq_s32 (int32x4_t __a, int32x4_t __b) { int32x4x2_t __rv; #ifdef __ARM_BIG_ENDIAN - __rv.val[0] = __builtin_shuffle (__a, __b, (uint32x4_t) { 6, 2, 7, 3 }); - __rv.val[1] = __builtin_shuffle (__a, __b, (uint32x4_t) { 4, 0, 5, 1 }); + __rv.val[0] = __builtin_shuffle (__a, __b, (uint32x4_t) { 5, 1, 4, 0 }); + __rv.val[1] = __builtin_shuffle (__a, __b, (uint32x4_t) { 7, 3, 6, 2 }); #else __rv.val[0] = __builtin_shuffle (__a, __b, (uint32x4_t) { 0, 4, 1, 5 }); __rv.val[1] = __builtin_shuffle (__a, __b, (uint32x4_t) { 2, 6, 3, 7 }); @@ -8502,8 +8502,8 @@ vzipq_f32 (float32x4_t __a, float32x4_t __b) { float32x4x2_t __rv; #ifdef __ARM_BIG_ENDIAN - __rv.val[0] = __builtin_shuffle (__a, __b, (uint32x4_t) { 6, 2, 7, 3 }); - __rv.val[1] = __builtin_shuffle (__a, __b, (uint32x4_t) { 4, 0, 5, 1 }); + __rv.val[0] = __builtin_shuffle (__a, __b, (uint32x4_t) { 5, 1, 4, 0 }); + __rv.val[1] = __builtin_shuffle (__a, __b, (uint32x4_t) { 7, 3, 6, 2 }); #else __rv.val[0] = __builtin_shuffle (__a, __b, (uint32x4_t) { 0, 4, 1, 5 }); __rv.val[1] = __builtin_shuffle (__a, __b, (uint32x4_t) { 2, 6, 3, 7 }); @@ -8517,9 +8517,9 @@ vzipq_u8 (uint8x16_t __a, uint8x16_t __b) uint8x16x
[PATCH 1/2] [ARM] PR68532: Fix up vuzp for big endian
From: Charles Baylis gcc/ChangeLog: 2016-02-03 Charles Baylis PR target/68532 * config/arm/arm.c (neon_endian_lane_map): New function. (neon_vector_pair_endian_lane_map): New function. (arm_evpc_neon_vuzp): Allow for big endian lane order. * config/arm/arm_neon.h (vuzpq_s8): Adjust shuffle patterns for big endian. (vuzpq_s16): Likewise. (vuzpq_s32): Likewise. (vuzpq_f32): Likewise. (vuzpq_u8): Likewise. (vuzpq_u16): Likewise. (vuzpq_u32): Likewise. (vuzpq_p8): Likewise. (vuzpq_p16): Likewise. gcc/testsuite/ChangeLog: 2015-12-15 Charles Baylis PR target/68532 * gcc.c-torture/execute/pr68532.c: New test. Change-Id: Ifd35d79bd42825f05403a1b96d8f34ef0f21dac3 diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index d8a2745..e9aa982 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -28208,6 +28208,35 @@ arm_expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel) arm_expand_vec_perm_1 (target, op0, op1, sel); } +/* map lane ordering between architectural lane order, and GCC lane order, + taking into account ABI. See comment above output_move_neon for details. */ +static int +neon_endian_lane_map (machine_mode mode, int lane) +{ + if (BYTES_BIG_ENDIAN) + { +int nelems = GET_MODE_NUNITS (mode); +/* Reverse lane order. */ +lane = (nelems - 1 - lane); +/* Reverse D register order, to match ABI. */ +if (GET_MODE_SIZE (mode) == 16) + lane = lane ^ (nelems / 2); + } + return lane; +} + +/* some permutations index into pairs of vectors, this is a helper function + to map indexes into those pairs of vectors. */ +static int +neon_pair_endian_lane_map (machine_mode mode, int lane) +{ + int nelem = GET_MODE_NUNITS (mode); + if (BYTES_BIG_ENDIAN) +lane = + neon_endian_lane_map (mode, lane & (nelem - 1)) + (lane & nelem); + return lane; +} + /* Generate or test for an insn that supports a constant permutation. */ /* Recognize patterns for the VUZP insns. */ @@ -28218,14 +28247,22 @@ arm_evpc_neon_vuzp (struct expand_vec_perm_d *d) unsigned int i, odd, mask, nelt = d->nelt; rtx out0, out1, in0, in1; rtx (*gen)(rtx, rtx, rtx, rtx); + int first_elem; + int swap; if (GET_MODE_UNIT_SIZE (d->vmode) >= 8) return false; - /* Note that these are little-endian tests. Adjust for big-endian later. */ - if (d->perm[0] == 0) + /* arm_expand_vec_perm_const_1 () helpfully swaps the operands for the + big endian pattern on 64 bit vectors, so we correct for that. */ + swap = BYTES_BIG_ENDIAN && !d->one_vector_p +&& GET_MODE_SIZE (d->vmode) == 8 ? d->nelt : 0; + + first_elem = d->perm[neon_endian_lane_map (d->vmode, 0)] ^ swap; + + if (first_elem == neon_endian_lane_map (d->vmode, 0)) odd = 0; - else if (d->perm[0] == 1) + else if (first_elem == neon_endian_lane_map (d->vmode, 1)) odd = 1; else return false; @@ -28233,8 +28270,9 @@ arm_evpc_neon_vuzp (struct expand_vec_perm_d *d) for (i = 0; i < nelt; i++) { - unsigned elt = (i * 2 + odd) & mask; - if (d->perm[i] != elt) + unsigned elt = + (neon_pair_endian_lane_map (d->vmode, i) * 2 + odd) & mask; + if ((d->perm[i] ^ swap) != neon_pair_endian_lane_map (d->vmode, elt)) return false; } @@ -28258,10 +28296,9 @@ arm_evpc_neon_vuzp (struct expand_vec_perm_d *d) in0 = d->op0; in1 = d->op1; - if (BYTES_BIG_ENDIAN) + if (swap) { std::swap (in0, in1); - odd = !odd; } out0 = d->target; diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h index 47816d5..2e014b6 100644 --- a/gcc/config/arm/arm_neon.h +++ b/gcc/config/arm/arm_neon.h @@ -8741,9 +8741,9 @@ vuzpq_s8 (int8x16_t __a, int8x16_t __b) int8x16x2_t __rv; #ifdef __ARM_BIG_ENDIAN __rv.val[0] = __builtin_shuffle (__a, __b, (uint8x16_t) - { 17, 19, 21, 23, 25, 27, 29, 31, 1, 3, 5, 7, 9, 11, 13, 15 }); + { 9, 11, 13, 15, 1, 3, 5, 7, 25, 27, 29, 31, 17, 19, 21, 23 }); __rv.val[1] = __builtin_shuffle (__a, __b, (uint8x16_t) - { 16, 18, 20, 22, 24, 26, 28, 30, 0, 2, 4, 6, 8, 10, 12, 14 }); + { 8, 10, 12, 14, 0, 2, 4, 6, 24, 26, 28, 30, 16, 18, 20, 22 }); #else __rv.val[0] = __builtin_shuffle (__a, __b, (uint8x16_t) { 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30 }); @@ -8759,9 +8759,9 @@ vuzpq_s16 (int16x8_t __a, int16x8_t __b) int16x8x2_t __rv; #ifdef __ARM_BIG_ENDIAN __rv.val[0] = __builtin_shuffle (__a, __b, (uint16x8_t) - { 9, 11, 13, 15, 1, 3, 5, 7 }); + { 5, 7, 1, 3, 13, 15, 9, 11 }); __rv.val[1] = __builtin_shuffle (__a, __b, (uint16x8_t) - { 8, 10, 12, 14, 0, 2, 4, 6 }); + { 4, 6, 0, 2, 12, 14, 8, 10 }); #else __rv.val[0] = __builtin_shuffle (__a, __b, (uint16x8_t) { 0, 2, 4, 6, 8, 10,
Re: [PATCH 1/2] [ARM] PR68532: Fix up vuzp for big endian
On 8 February 2016 at 11:42, Kyrill Tkachov wrote: > Hi Charles, > > > On 03/02/16 18:59, charles.bay...@linaro.org wrote: >> >> --- a/gcc/config/arm/arm.c >> +++ b/gcc/config/arm/arm.c >> @@ -28208,6 +28208,35 @@ arm_expand_vec_perm (rtx target, rtx op0, rtx >> op1, rtx sel) >> arm_expand_vec_perm_1 (target, op0, op1, sel); >> } >> +/* map lane ordering between architectural lane order, and GCC lane >> order, >> + taking into account ABI. See comment above output_move_neon for >> details. */ >> +static int >> +neon_endian_lane_map (machine_mode mode, int lane) > > > s/map/Map/ > New line between comment and function signature. Done. >> +{ >> + if (BYTES_BIG_ENDIAN) >> + { >> +int nelems = GET_MODE_NUNITS (mode); >> +/* Reverse lane order. */ >> +lane = (nelems - 1 - lane); >> +/* Reverse D register order, to match ABI. */ >> +if (GET_MODE_SIZE (mode) == 16) >> + lane = lane ^ (nelems / 2); >> + } >> + return lane; >> +} >> + >> +/* some permutations index into pairs of vectors, this is a helper >> function >> + to map indexes into those pairs of vectors. */ >> +static int >> +neon_pair_endian_lane_map (machine_mode mode, int lane) > > > Similarly, s/some/Some/ and new line after comment. Done. >> +{ >> + int nelem = GET_MODE_NUNITS (mode); >> + if (BYTES_BIG_ENDIAN) >> +lane = >> + neon_endian_lane_map (mode, lane & (nelem - 1)) + (lane & nelem); >> + return lane; >> +} >> + >> /* Generate or test for an insn that supports a constant permutation. >> */ >> /* Recognize patterns for the VUZP insns. */ >> @@ -28218,14 +28247,22 @@ arm_evpc_neon_vuzp (struct expand_vec_perm_d *d) >> unsigned int i, odd, mask, nelt = d->nelt; >> rtx out0, out1, in0, in1; >> rtx (*gen)(rtx, rtx, rtx, rtx); >> + int first_elem; >> + int swap; >> > > Just make this a bool. As discussed on IRC, this variable does contain an integer. I have renamed it as swap_nelt, and changed the test on it below. [snip] >> @@ -28258,10 +28296,9 @@ arm_evpc_neon_vuzp (struct expand_vec_perm_d >> *d) >> in0 = d->op0; >> in1 = d->op1; >> - if (BYTES_BIG_ENDIAN) >> + if (swap) >> { >> std::swap (in0, in1); >> - odd = !odd; >> } > > remove the braces around the std::swap Done. Also changed if (swap) to if (swap_nelt != 0) [snip] >> @@ -0,0 +1,24 @@ >> +/* { dg-options "-O2 -ftree-vectorize -fno-vect-cost-model" } */ >> + >> +#define SIZE 128 >> +unsigned short _Alignas (16) in[SIZE]; >> + >> +extern void abort (void); >> + >> +__attribute__ ((noinline)) int >> +test (unsigned short sum, unsigned short *in, int x) >> +{ >> + for (int j = 0; j < SIZE; j += 8) >> +sum += in[j] * x; >> + return sum; >> +} >> + >> +int >> +main () >> +{ >> + for (int i = 0; i < SIZE; i++) >> +in[i] = i; >> + if (test (0, in, 1) != 960) >> +abort (); > > > AFAIK tests here usually prefer __builtin_abort (); > That way you don't have to declare the abort prototype in the beginning. Done. Updated patch attached From 99a536e2e10e3759a5de88422fadcabb22084b2f Mon Sep 17 00:00:00 2001 From: Charles Baylis Date: Tue, 9 Feb 2016 15:18:43 + Subject: [PATCH 1/2] [ARM] PR68532: Fix up vuzp for big endian gcc/ChangeLog: 2016-02-09 Charles Baylis PR target/68532 * config/arm/arm.c (neon_endian_lane_map): New function. (neon_vector_pair_endian_lane_map): New function. (arm_evpc_neon_vuzp): Allow for big endian lane order. * config/arm/arm_neon.h (vuzpq_s8): Adjust shuffle patterns for big endian. (vuzpq_s16): Likewise. (vuzpq_s32): Likewise. (vuzpq_f32): Likewise. (vuzpq_u8): Likewise. (vuzpq_u16): Likewise. (vuzpq_u32): Likewise. (vuzpq_p8): Likewise. (vuzpq_p16): Likewise. gcc/testsuite/ChangeLog: 2016-02-09 Charles Baylis PR target/68532 * gcc.c-torture/execute/pr68532.c: New test. Change-Id: Ifd35d79bd42825f05403a1b96d8f34ef0f21dac3 diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index d8a2745..95ee9a5 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -28208,6 +28208,37 @@ arm_expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel) arm_expand_vec_perm_1 (target, op0, op1, sel); } +/* Map lane ordering between architectural lane order, and GCC lane order, + taking into account ABI. See comment above output_move_neon for details. */ + +static int +neon_endian_lane_map (machine_
Re: [PATCH 2/2] [ARM] PR68532 Fix up vzip recognition for big endian
On 8 February 2016 at 11:42, Kyrill Tkachov wrote: > On 03/02/16 18:59, charles.bay...@linaro.org wrote: >> --- a/gcc/config/arm/arm.c >> +++ b/gcc/config/arm/arm.c >> @@ -28318,15 +28318,21 @@ arm_evpc_neon_vzip (struct expand_vec_perm_d *d) >> unsigned int i, high, mask, nelt = d->nelt; >> rtx out0, out1, in0, in1; >> rtx (*gen)(rtx, rtx, rtx, rtx); >> + int first_elem; >> + bool is_swapped; >> if (GET_MODE_UNIT_SIZE (d->vmode) >= 8) >> return false; >> + is_swapped = BYTES_BIG_ENDIAN ? true : false; > > > This is just "is_swapped = BYTES_BIG_ENDIAN;" Done. >> + >> /* Note that these are little-endian tests. Adjust for big-endian >> later. */ > > > I think you can remove this comment now, like in patch 1/2 Done. >> + first_elem = d->perm[neon_endian_lane_map (d->vmode, 0) ^ is_swapped]; >> + >> high = nelt / 2; >> - if (d->perm[0] == high) >> + if (first_elem == neon_endian_lane_map (d->vmode, high)) >> ; >> - else if (d->perm[0] == 0) >> + else if (first_elem == neon_endian_lane_map (d->vmode, 0)) >> high = 0; >> else >> return false; >> @@ -28334,11 +28340,16 @@ arm_evpc_neon_vzip (struct expand_vec_perm_d *d) >> for (i = 0; i < nelt / 2; i++) >> { >> - unsigned elt = (i + high) & mask; >> - if (d->perm[i * 2] != elt) >> + unsigned elt = >> + neon_pair_endian_lane_map (d->vmode, i + high) & mask; >> + if (d->perm[neon_pair_endian_lane_map (d->vmode, 2 * i + >> is_swapped)] >> + != elt) >> return false; >> - elt = (elt + nelt) & mask; >> - if (d->perm[i * 2 + 1] != elt) >> + elt = >> + neon_pair_endian_lane_map (d->vmode, i + nelt + high) >> + & mask; > > > The "& mask" can go on the previous line. Done >> + if (d->perm[neon_pair_endian_lane_map (d->vmode, 2 * i + >> !is_swapped)] >> + != elt) >> return false; >> } >> @@ -28362,10 +28373,9 @@ arm_evpc_neon_vzip (struct expand_vec_perm_d >> *d) >> in0 = d->op0; >> in1 = d->op1; >> - if (BYTES_BIG_ENDIAN) >> + if (is_swapped) >> { >> std::swap (in0, in1); >> - high = !high; >> } > > > remove the braces around the std::swap. Done. > Ok with these changes. > I've tried out both patch and they do fix execution failures on big-endian > and don't break any NEON intrinsics tests that I threw at them. Attached for completeness, will commit once the VUZP patch is OKd. From 469f82610a4e70284bf23c373b8a73685cad0ec1 Mon Sep 17 00:00:00 2001 From: Charles Baylis Date: Tue, 9 Feb 2016 15:18:44 + Subject: [PATCH 2/2] [ARM] PR68532 Fix up vzip recognition for big endian gcc/ChangeLog: 2016-02-09 Charles Baylis PR target/68532 * config/arm/arm.c (arm_evpc_neon_vzip): Allow for big endian lane order. * config/arm/arm_neon.h (vzipq_s8): Adjust shuffle patterns for big endian. (vzipq_s16): Likewise. (vzipq_s32): Likewise. (vzipq_f32): Likewise. (vzipq_u8): Likewise. (vzipq_u16): Likewise. (vzipq_u32): Likewise. (vzipq_p8): Likewise. (vzipq_p16): Likewise. Change-Id: I327678f5e73c1de2f413c1d22769ab42ce1d6c16 diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 95ee9a5..5562baa 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -28318,15 +28318,20 @@ arm_evpc_neon_vzip (struct expand_vec_perm_d *d) unsigned int i, high, mask, nelt = d->nelt; rtx out0, out1, in0, in1; rtx (*gen)(rtx, rtx, rtx, rtx); + int first_elem; + bool is_swapped; if (GET_MODE_UNIT_SIZE (d->vmode) >= 8) return false; - /* Note that these are little-endian tests. Adjust for big-endian later. */ + is_swapped = BYTES_BIG_ENDIAN; + + first_elem = d->perm[neon_endian_lane_map (d->vmode, 0) ^ is_swapped]; + high = nelt / 2; - if (d->perm[0] == high) + if (first_elem == neon_endian_lane_map (d->vmode, high)) ; - else if (d->perm[0] == 0) + else if (first_elem == neon_endian_lane_map (d->vmode, 0)) high = 0; else return false; @@ -28334,11 +28339,15 @@ arm_evpc_neon_vzip (struct expand_vec_perm_d *d) for (i = 0; i < nelt / 2; i++) { - unsigned elt = (i + high) & mask; - if (d->perm[i * 2] != elt) + unsigned elt = + neon_pair_endian_lane_map (d->vmode, i + high) & mask; + if (d->perm[neon_pair_endian_lane_map (d->vmode, 2 * i + is_swapped)] + != elt) return false; - elt = (elt + nelt) & mask; - if (d->perm[i
Re: [PATCH 2/2] [ARM] PR68532 Fix up vzip recognition for big endian
Committed to trunk as r233252 On 9 February 2016 at 17:07, Charles Baylis wrote: > On 8 February 2016 at 11:42, Kyrill Tkachov > wrote: > >> On 03/02/16 18:59, charles.bay...@linaro.org wrote: >>> --- a/gcc/config/arm/arm.c >>> +++ b/gcc/config/arm/arm.c >>> @@ -28318,15 +28318,21 @@ arm_evpc_neon_vzip (struct expand_vec_perm_d *d) >>> unsigned int i, high, mask, nelt = d->nelt; >>> rtx out0, out1, in0, in1; >>> rtx (*gen)(rtx, rtx, rtx, rtx); >>> + int first_elem; >>> + bool is_swapped; >>> if (GET_MODE_UNIT_SIZE (d->vmode) >= 8) >>> return false; >>> + is_swapped = BYTES_BIG_ENDIAN ? true : false; >> >> >> This is just "is_swapped = BYTES_BIG_ENDIAN;" > > Done. > >>> + >>> /* Note that these are little-endian tests. Adjust for big-endian >>> later. */ >> >> >> I think you can remove this comment now, like in patch 1/2 > > Done. > >>> + first_elem = d->perm[neon_endian_lane_map (d->vmode, 0) ^ is_swapped]; >>> + >>> high = nelt / 2; >>> - if (d->perm[0] == high) >>> + if (first_elem == neon_endian_lane_map (d->vmode, high)) >>> ; >>> - else if (d->perm[0] == 0) >>> + else if (first_elem == neon_endian_lane_map (d->vmode, 0)) >>> high = 0; >>> else >>> return false; >>> @@ -28334,11 +28340,16 @@ arm_evpc_neon_vzip (struct expand_vec_perm_d *d) >>> for (i = 0; i < nelt / 2; i++) >>> { >>> - unsigned elt = (i + high) & mask; >>> - if (d->perm[i * 2] != elt) >>> + unsigned elt = >>> + neon_pair_endian_lane_map (d->vmode, i + high) & mask; >>> + if (d->perm[neon_pair_endian_lane_map (d->vmode, 2 * i + >>> is_swapped)] >>> + != elt) >>> return false; >>> - elt = (elt + nelt) & mask; >>> - if (d->perm[i * 2 + 1] != elt) >>> + elt = >>> + neon_pair_endian_lane_map (d->vmode, i + nelt + high) >>> + & mask; >> >> >> The "& mask" can go on the previous line. > > Done > >>> + if (d->perm[neon_pair_endian_lane_map (d->vmode, 2 * i + >>> !is_swapped)] >>> + != elt) >>> return false; >>> } >>> @@ -28362,10 +28373,9 @@ arm_evpc_neon_vzip (struct expand_vec_perm_d >>> *d) >>> in0 = d->op0; >>> in1 = d->op1; >>> - if (BYTES_BIG_ENDIAN) >>> + if (is_swapped) >>> { >>> std::swap (in0, in1); >>> - high = !high; >>> } >> >> >> remove the braces around the std::swap. > > Done. > >> Ok with these changes. >> I've tried out both patch and they do fix execution failures on big-endian >> and don't break any NEON intrinsics tests that I threw at them. > > Attached for completeness, will commit once the VUZP patch is OKd.
Re: [PATCH 1/2] [ARM] PR68532: Fix up vuzp for big endian
On 9 February 2016 at 17:08, Kyrill Tkachov wrote: > > On 09/02/16 17:00, Charles Baylis wrote: >> >> On 8 February 2016 at 11:42, Kyrill Tkachov >> wrote: >>> >>> Hi Charles, >>> >>> >>> On 03/02/16 18:59, charles.bay...@linaro.org wrote: >>>> >>>> --- a/gcc/config/arm/arm.c >>>> +++ b/gcc/config/arm/arm.c >>>> @@ -28208,6 +28208,35 @@ arm_expand_vec_perm (rtx target, rtx op0, rtx >>>> op1, rtx sel) >>>> arm_expand_vec_perm_1 (target, op0, op1, sel); >>>>} >>>>+/* map lane ordering between architectural lane order, and GCC lane >>>> order, >>>> + taking into account ABI. See comment above output_move_neon for >>>> details. */ >>>> +static int >>>> +neon_endian_lane_map (machine_mode mode, int lane) >>> >>> >>> s/map/Map/ >>> New line between comment and function signature. >> >> Done. >> >>>> +{ >>>> + if (BYTES_BIG_ENDIAN) >>>> + { >>>> +int nelems = GET_MODE_NUNITS (mode); >>>> +/* Reverse lane order. */ >>>> +lane = (nelems - 1 - lane); >>>> +/* Reverse D register order, to match ABI. */ >>>> +if (GET_MODE_SIZE (mode) == 16) >>>> + lane = lane ^ (nelems / 2); >>>> + } >>>> + return lane; >>>> +} >>>> + >>>> +/* some permutations index into pairs of vectors, this is a helper >>>> function >>>> + to map indexes into those pairs of vectors. */ >>>> +static int >>>> +neon_pair_endian_lane_map (machine_mode mode, int lane) >>> >>> >>> Similarly, s/some/Some/ and new line after comment. >> >> Done. >> >>>> +{ >>>> + int nelem = GET_MODE_NUNITS (mode); >>>> + if (BYTES_BIG_ENDIAN) >>>> +lane = >>>> + neon_endian_lane_map (mode, lane & (nelem - 1)) + (lane & nelem); >>>> + return lane; >>>> +} >>>> + >>>>/* Generate or test for an insn that supports a constant permutation. >>>> */ >>>> /* Recognize patterns for the VUZP insns. */ >>>> @@ -28218,14 +28247,22 @@ arm_evpc_neon_vuzp (struct expand_vec_perm_d >>>> *d) >>>> unsigned int i, odd, mask, nelt = d->nelt; >>>> rtx out0, out1, in0, in1; >>>> rtx (*gen)(rtx, rtx, rtx, rtx); >>>> + int first_elem; >>>> + int swap; >>>> >>> Just make this a bool. >> >> As discussed on IRC, this variable does contain an integer. I have >> renamed it as swap_nelt, and changed the test on it below. > > > This is ok. Thanks. Committed to trunk as r233251
[PATCH ARM] RFC: PR69770 -mlong-calls does not affect calls to __gnu_mcount_nc generated by -pg
When compiling with -mlong-calls and -pg, calls to the __gnu_mcount_nc function are not generated as long calls. This is encountered when building an allyesconfig Linux kernel because the Linux build system generates very large sections by partial linking a large number of object files. This causes link failures, which don't go away with -mlong-calls due to this bug. (However, with this patch linking still fails due to calls in inline asm) For example: extern void g(void); int f() { g(); return 0; } compiles to: push{r4, lr} push{lr} bl __gnu_mcount_nc;// not a long call ldr r3, .L2 blx r3 ;// a long call to g() mov r0, #0 pop {r4, pc} The call to __gnu_mcount_nc is generated from ARM_FUNCTION_PROFILER in config/arm/bpabi.h. For targets without MOVW/MOVT, the long call sequence requires a load from the literal pool, and it is too late to set up a literal pool entry from within ARM_FUNCTION_PROFILER. My approach to fix this is to modify the prologue generation to load the address of __gnu_mcount_nc into ip, so that it is ready when the call is generated. This patch only implements the fix for ARM and Thumb-2. A similar fix is possible for Thumb-1, but requires more slightly complex changes to the prologue generation to make sure there is a low register available. This feels like a bit of a hack to me, so ideas for a cleaner solution are welcome, if none, is this acceptable for trunk now, or should it wait until GCC 7? From 34993396a43fcfc263db5b02b2d1837c490f52ad Mon Sep 17 00:00:00 2001 From: Charles Baylis Date: Thu, 11 Feb 2016 18:07:00 + Subject: [PATCH] [ARM] PR69770 fix -mlong-calls with -pg gcc/ChangeLog: 2016-02-12 Charles Baylis * config/arm/arm.c (arm_expand_prologue): Load address of __gnu_mcount_nc in r12 if profiling and long calls are enabled. * config/arm/bpabi.h (ARM_FUNCTION_PROFILER): Emit long call to __gnu_mcount_nc long calls are enabled. (ARM_FUNCTION_PROFILER_SUPPORTS_LONG_CALLS): New define. gcc/testsuite/ChangeLog: 2016-02-12 Charles Baylis * gcc.target/arm/pr69770.c: New test. Change-Id: I4c639a5edf32fa8c67324d37faee1cb4ddd57a5c diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 27aecf7..9ce9a58 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -21739,6 +21739,15 @@ arm_expand_prologue (void) arm_load_pic_register (mask); } + if (crtl->profile && TARGET_LONG_CALLS + && ARM_FUNCTION_PROFILER_SUPPORTS_LONG_CALLS) +{ + rtx tmp = gen_rtx_SET (gen_rtx_REG (SImode, IP_REGNUM), + gen_rtx_SYMBOL_REF (Pmode, "__gnu_mcount_nc")); + emit_insn (tmp); + emit_insn (gen_rtx_USE (VOIDmode, gen_rtx_REG (SImode, IP_REGNUM))); +} + /* If we are profiling, make sure no instructions are scheduled before the call to mcount. Similarly if the user has requested no scheduling in the prolog. Similarly if we want non-call exceptions diff --git a/gcc/config/arm/bpabi.h b/gcc/config/arm/bpabi.h index 82128ef..b734a24 100644 --- a/gcc/config/arm/bpabi.h +++ b/gcc/config/arm/bpabi.h @@ -173,11 +173,21 @@ #undef NO_PROFILE_COUNTERS #define NO_PROFILE_COUNTERS 1 +#undef ARM_FUNCTION_PROFILER_SUPPORTS_LONG_CALLS +#define ARM_FUNCTION_PROFILER_SUPPORTS_LONG_CALLS 1 #undef ARM_FUNCTION_PROFILER #define ARM_FUNCTION_PROFILER(STREAM, LABELNO) \ { \ - fprintf (STREAM, "\tpush\t{lr}\n"); \ - fprintf (STREAM, "\tbl\t__gnu_mcount_nc\n");\ + if (TARGET_LONG_CALLS && TARGET_32BIT)\ + { \ +fprintf (STREAM, "\tpush\t{lr}\n"); \ +/* arm_expand_prolog() has already set up ip to contain the */ \ +/* address of __gnu_mcount_nc. */ \ +fprintf (STREAM, "\tblx\tip\n"); \ + } else {\ +fprintf (STREAM, "\tpush\t{lr}\n"); \ +fprintf (STREAM, "\tbl\t__gnu_mcount_nc\n"); \ + } \ } #undef SUBTARGET_FRAME_POINTER_REQUIRED diff --git a/gcc/testsuite/gcc.target/arm/pr69770.c b/gcc/testsuite/gcc.target/arm/pr69770.c new file mode 100644 index 000..61e5c6d --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/pr69770.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-pg -mlong-calls" } */ + +extern void g(void); + +int f() { g(); return 0; } + +/* { dg-final { scan-assembler-not "bl\[ \t\]+__gnu_mcount_nc" } } */ +/* { dg-final { scan-assembler "__gnu_mcount_nc" } } */ -- 1.9.1
Re: [PATCH, ARM, v2] Fix PR target/59142: internal compiler error while compiling OpenCV 2.4.7
On 19 December 2013 16:13, Richard Earnshaw wrote: > > OK with that change. Thanks. The bugzilla entry is targeted at 4.8, but it is a latent problem which affects 4.7 too. Is it ok for 4.8, and should it be considered for 4.7?
Re: [PATCH v3] [AArch64] PR63870 Improve error messages for NEON single lane memory access intrinsics
Ping? On 26 June 2015 at 20:14, Charles Baylis wrote: > Since the last ping, I've tweaked the test cases a bit... > > Since I've been working on doing the same changes for the ARM backend, > I've moved the tests into the advsimd-intrinsics directory, marked as > XFAIL for ARM targets for now. The gcc/ part of the patch is > unchanged. > > gcc/ChangeLog: > > Charles Baylis > > PR target/63870 > * config/aarch64/aarch64-builtins.c (enum aarch64_type_qualifiers): > Add qualifier_struct_load_store_lane_index. > (aarch64_types_loadstruct_lane_qualifiers): Use > qualifier_struct_load_store_lane_index for lane index argument for > last argument. > (aarch64_types_storestruct_lane_qualifiers): Ditto. > (builtin_simd_arg): Add SIMD_ARG_STRUCT_LOAD_STORE_LANE_INDEX. > (aarch64_simd_expand_args): Add new argument describing mode of > builtin. Check lane bounds for arguments with > SIMD_ARG_STRUCT_LOAD_STORE_LANE_INDEX. > (aarch64_simd_expand_builtin): Emit error for incorrect lane indices > if marked with SIMD_ARG_STRUCT_LOAD_STORE_LANE_INDEX. > (aarch64_simd_expand_builtin): Handle arguments with > qualifier_struct_load_store_lane_index. Pass machine mode of builtin to > aarch64_simd_expand_args. > * config/aarch64/aarch64-simd-builtins.def: Declare ld[234]_lane and > vst[234]_lane with BUILTIN_VALLDIF. > * config/aarch64/aarch64-simd.md: > (aarch64_vec_load_lanesoi_lane): Use VALLDIF iterator. Perform > endianness reversal on lane index. > (aarch64_vec_load_lanesci_lane): Ditto. > (aarch64_vec_load_lanesxi_lane): Ditto. > (vec_store_lanesoi_lane): Use VALLDIF iterator. Fix typo > in attribute. > (vec_store_lanesci_lane): Use VALLDIF iterator. > (vec_store_lanesxi_lane): Ditto. > (aarch64_ld2_lane): Use VALLDIF iterator. Remove endianness > reversal of lane index. > (aarch64_ld3_lane): Ditto. > (aarch64_ld4_lane): Ditto. > (aarch64_st2_lane): Ditto. > (aarch64_st3_lane): Ditto. > (aarch64_st4_lane): Ditto. > * config/aarch64/arm_neon.h (__LD2_LANE_FUNC): Rename mode parameter > to qmode. Add new mode parameter. Update uses. >(__LD3_LANE_FUNC): Ditto. > (__LD4_LANE_FUNC): Ditto. > (__ST2_LANE_FUNC): Ditto. > (__ST3_LANE_FUNC): Ditto. > (__ST4_LANE_FUNC): Ditto. > > gcc/testsuite/ChangeLog: > > Charles Baylis > > gcc/testsuite/ChangeLog: > > Charles Baylis > > * gcc.target/aarch64/advsimd-intrinsics/vld2_lane_f32_indices_1.c: > New test. > * gcc.target/aarch64/advsimd-intrinsics/vld2_lane_f64_indices_1.c: > New test. > * gcc.target/aarch64/advsimd-intrinsics/vld2_lane_p8_indices_1.c: > New test. > * gcc.target/aarch64/advsimd-intrinsics/vld2_lane_s16_indices_1.c: > New test. > * gcc.target/aarch64/advsimd-intrinsics/vld2_lane_s32_indices_1.c: > New test. > * gcc.target/aarch64/advsimd-intrinsics/vld2_lane_s64_indices_1.c: > New test. > * gcc.target/aarch64/advsimd-intrinsics/vld2_lane_s8_indices_1.c: > New test. > * gcc.target/aarch64/advsimd-intrinsics/vld2_lane_u16_indices_1.c: > New test. > * gcc.target/aarch64/advsimd-intrinsics/vld2_lane_u32_indices_1.c: > New test. > * gcc.target/aarch64/advsimd-intrinsics/vld2_lane_u64_indices_1.c: > New test. > * gcc.target/aarch64/advsimd-intrinsics/vld2_lane_u8_indices_1.c: > New test. > * gcc.target/aarch64/advsimd-intrinsics/vld2q_lane_f32_indices_1.c: > New test. > * gcc.target/aarch64/advsimd-intrinsics/vld2q_lane_f64_indices_1.c: > New test. > * gcc.target/aarch64/advsimd-intrinsics/vld2q_lane_p8_indices_1.c: > New test. > * gcc.target/aarch64/advsimd-intrinsics/vld2q_lane_s16_indices_1.c: > New test. > * gcc.target/aarch64/advsimd-intrinsics/vld2q_lane_s32_indices_1.c: > New test. > * gcc.target/aarch64/advsimd-intrinsics/vld2q_lane_s64_indices_1.c: > New test. > * gcc.target/aarch64/advsimd-intrinsics/vld2q_lane_s8_indices_1.c: > New test. > * gcc.target/aarch64/advsimd-intrinsics/vld2q_lane_u16_indices_1.c: > New test. > * gcc.target/aarch64/advsimd-intrinsics/vld2q_lane_u32_indices_1.c: > New test. > * gcc.target/aarch64/advsimd-intrinsics/vld2q_lane_u64_indices_1.c: > New test. > * gcc.target/aarch64/advsimd-intrinsics/vld2q_lane_u8_indices_1.c: > New test. > * gcc.target/aarch64/advsimd-intrinsics/vld3_lane_f32_indices_1.c: > New test. > * gcc.target/aarch64/advsimd-intrinsics/vld3_lane_f64_indices_1.c: > New test. > * gcc.target/aarch64/advsimd-i
[PATCH] [AArch64] fix typo in vec_store_lanesoi_lane
Committed as obvious r226061. gcc/ChangeLog: 2015-07-22 Charles Baylis * config/aarch64/aarch64-simd.md (vec_store_lanesoi_lane): Fix typo in attribute. From 7d98f7fc82cfc3012b460e4f4f91200fedcb04db Mon Sep 17 00:00:00 2001 From: Charles Baylis Date: Tue, 21 Jul 2015 16:54:32 +0100 Subject: [PATCH 2/2] [AArch64] fix typo in vec_store_lanesoi_lane gcc/ChangeLog: Charles Baylis * config/aarch64/aarch64-simd.md (vec_store_lanesoi_lane): Fix typo in attribute. Change-Id: I299ea5c01d64cfc72a29c386128ce9e0fef2624b --- gcc/config/aarch64/aarch64-simd.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index d5da35a..40afced 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -3970,7 +3970,7 @@ operands[2] = GEN_INT (ENDIAN_LANE_N (mode, INTVAL (operands[2]))); return "st2\\t{%S1. - %T1.}[%2], %0"; } - [(set_attr "type" "neon_store3_one_lane")] + [(set_attr "type" "neon_store2_one_lane")] ) (define_expand "vec_store_lanesoi" -- 1.9.1
Re: [PATCH v3] [AArch64] PR63870 Improve error messages for NEON single lane memory access intrinsics
On 17 July 2015 at 09:32, James Greenhalgh wrote: > This seems an odd limitation, presumably this is a side effect of waiting > until expand time to throw an error... It does suggest that we're tackling > the problem in the wrong way by pushing this to so late in the compilation > pipeline. The property here is on a type itself, which must take a constant > value within a given range. That feels much more like the sort of thing > we should be detecting and bailing out on closer to the front-end - perhaps > with a more generic extension allowing you to annotate any type with an > expected/required range (both as a helping hand for VRP and as a way to > express programmer defined preconditions). > > But, given that adding such an extension is likely more effort than needed Agreed on all counts :) > I think this is OK for now! Thanks. Committed in r226059 with suggested fixes. The attribute typo fix was applied separately (https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01816.html). Thanks Charles
Re: [ARM] implement division using vrecpe/vrecps with -funsafe-math-optimizations
On 31 July 2015 at 10:34, Ramana Radhakrishnan wrote: > I've tried this in the past and never been convinced that 2 iterations are > enough to get to stability with this given that the results are only precise > for 8 bits / iteration. Thus I've always believed you need 3 iterations > rather than 2 at which point I've never been sure that it's worth it. So the > testing that you've done with this currently is not enough for this to go > into the tree. My understanding is that 2 iterations is sufficient for single precision floating point (although not for double precision), because each iteration of Newton-Raphson doubles the number of bits of accuracy. I haven't worked through the maths myself, but https://en.wikipedia.org/wiki/Division_algorithm#Newton.E2.80.93Raphson_division says "This squaring of the error at each iteration step — the so-called quadratic convergence of Newton–Raphson's method — has the effect that the number of correct digits in the result roughly doubles for every iteration, a property that becomes extremely valuable when the numbers involved have many digits" Therefore: vrecpe -> 8 bits of accuracy +1 iteration -> 16 bits of accuracy +2 iterations -> 32 bits of accuracy (but in reality limited to precision of 32bit float) Since 32 bits is much more accuracy than the 24 bits of precision in a single precision FP value, 2 iterations should be sufficient. > I'd like this to be tested on a couple of different AArch32 implementations > with a wider range of inputs to verify that the results are acceptable as > well as running something like SPEC2k(6) with atleast one iteration to ensure > correctness. I can't argue with confirming theory matches practice :) Some corner cases (eg numbers around FLT_MAX, FLT_MIN etc) may result in denormals or out of range values during the reciprocal calculation which could result in answers which are less accurate than the typical case but I think that is acceptable with -ffast-math. Charles
Re: [PATCH] [ARM, Callgraph] Fix PR67280: function incorrectly marked as nothrow
On 7 September 2015 at 09:35, Charles Baylis wrote: >>> >gcc/ChangeLog: >>> > >>> >2015-08-28 Charles Baylis >>> > >>> > * cgraphunit.c (cgraph_node::create_wrapper): Set >>> > can_throw_external >>> > in new callgraph edge. > > Committed to trunk as r227407. > > Are you happy for me to backport to gcc-5-branch? Hi Jan, I'd still like to backport this patch to gcc 5. Is that OK Thanks Charles
[PATCH 1/3] [ARM] PR63870 Add qualifiers for NEON builtins
From: Charles Baylis gcc/ChangeLog: Charles Baylis PR target/63870 * config/arm/arm-builtins.c (enum arm_type_qualifiers): New enumerator qualifier_struct_load_store_lane_index. (builtin_arg): New enumerator NEON_ARG_STRUCT_LOAD_STORE_LANE_INDEX. (arm_expand_neon_args): New parameter. Remove ellipsis. Handle NEON argument qualifiers. (arm_expand_neon_builtin): Handle new NEON argument qualifier. * config/arm/arm.h (ENDIAN_LANE_N): New macro. Change-Id: Iaa14d8736879fa53776319977eda2089f0a26647 --- gcc/config/arm/arm-builtins.c | 46 --- gcc/config/arm/arm.c | 1 + gcc/config/arm/arm.h | 3 +++ 3 files changed, 34 insertions(+), 16 deletions(-) diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c index 0f5a1f1..a29f8d6 100644 --- a/gcc/config/arm/arm-builtins.c +++ b/gcc/config/arm/arm-builtins.c @@ -77,7 +77,9 @@ enum arm_type_qualifiers /* Polynomial types. */ qualifier_poly = 0x100, /* Lane indices - must be within range of previous argument = a vector. */ - qualifier_lane_index = 0x200 + qualifier_lane_index = 0x200, + /* Lane indices for single lane structure loads and stores. */ + qualifier_struct_load_store_lane_index = 0x400 }; /* The qualifier_internal allows generation of a unary builtin from @@ -1973,6 +1975,7 @@ typedef enum { NEON_ARG_COPY_TO_REG, NEON_ARG_CONSTANT, NEON_ARG_LANE_INDEX, + NEON_ARG_STRUCT_LOAD_STORE_LANE_INDEX, NEON_ARG_MEMORY, NEON_ARG_STOP } builtin_arg; @@ -2030,9 +2033,9 @@ neon_dereference_pointer (tree exp, tree type, machine_mode mem_mode, /* Expand a Neon builtin. */ static rtx arm_expand_neon_args (rtx target, machine_mode map_mode, int fcode, - int icode, int have_retval, tree exp, ...) + int icode, int have_retval, tree exp, + builtin_arg *args) { - va_list ap; rtx pat; tree arg[SIMD_MAX_BUILTIN_ARGS]; rtx op[SIMD_MAX_BUILTIN_ARGS]; @@ -2047,13 +2050,11 @@ arm_expand_neon_args (rtx target, machine_mode map_mode, int fcode, || !(*insn_data[icode].operand[0].predicate) (target, tmode))) target = gen_reg_rtx (tmode); - va_start (ap, exp); - formals = TYPE_ARG_TYPES (TREE_TYPE (arm_builtin_decls[fcode])); for (;;) { - builtin_arg thisarg = (builtin_arg) va_arg (ap, int); + builtin_arg thisarg = args[argc]; if (thisarg == NEON_ARG_STOP) break; @@ -2089,6 +2090,18 @@ arm_expand_neon_args (rtx target, machine_mode map_mode, int fcode, op[argc] = copy_to_mode_reg (mode[argc], op[argc]); break; + case NEON_ARG_STRUCT_LOAD_STORE_LANE_INDEX: + gcc_assert (argc > 1); + if (CONST_INT_P (op[argc])) + { + neon_lane_bounds (op[argc], 0, + GET_MODE_NUNITS (map_mode), exp); + /* Keep to GCC-vector-extension lane indices in the RTL. */ + op[argc] = + GEN_INT (ENDIAN_LANE_N (map_mode, INTVAL (op[argc]))); + } + goto constant_arg; + case NEON_ARG_LANE_INDEX: /* Previous argument must be a vector, which this indexes. */ gcc_assert (argc > 0); @@ -2099,17 +2112,22 @@ arm_expand_neon_args (rtx target, machine_mode map_mode, int fcode, } /* Fall through - if the lane index isn't a constant then the next case will error. */ + case NEON_ARG_CONSTANT: +constant_arg: if (!(*insn_data[icode].operand[opno].predicate) (op[argc], mode[argc])) - error_at (EXPR_LOCATION (exp), "incompatible type for argument %d, " - "expected %", argc + 1); + { + error ("%Kargument %d must be a constant immediate", +exp, argc + 1); + return const0_rtx; + } break; + case NEON_ARG_MEMORY: /* Check if expand failed. */ if (op[argc] == const0_rtx) { - va_end (ap); return 0; } gcc_assert (MEM_P (op[argc])); @@ -2132,8 +2150,6 @@ arm_expand_neon_args (rtx target, machine_mode map_mode, int fcode, } } - va_end (ap); - if (have_retval) switch (argc) { @@ -2245,6 +2261,8 @@ arm_expand_neon_builtin (int fcode, tree exp, rtx target) if (d->qualifiers[qualifiers_k] & qualifier_lane_index) args[k] = NEON_ARG_LANE_INDEX; + else if (d->qualifiers[qualifiers_k] & qualifier_struct_load_store_lane_index) + args[k] = NEON_ARG_STRUCT_LOAD_STORE_LANE_INDEX; else if (d->qualifiers[qualifiers_k] & qualifier_imm
[PATCH v2 0/3] [ARM] PR63870 vldN_lane/vstN_lane error messages
From: Charles Baylis This patch series fixes up the error messages for single lane vector load/stores, similarly to AArch64. make check on arm-linux-gnueabihf/qemu completes with no new regressions. Changes since the last version: . removed the duplicate arm_neon_lane_bounds function . resolved conflicts with other NEON work . whitespace clean up Charles Baylis (3): [ARM] PR63870 Add qualifiers for NEON builtins [ARM] PR63870 Mark lane indices of vldN/vstN with appropriate qualifier [ARM] PR63870 Enable test cases for ARM gcc/config/arm/arm-builtins.c | 50 ++ gcc/config/arm/arm.c | 1 + gcc/config/arm/arm.h | 3 ++ gcc/config/arm/neon.md | 49 +++-- .../advsimd-intrinsics/vld2_lane_f16_indices_1.c | 5 +-- .../advsimd-intrinsics/vld2_lane_f32_indices_1.c | 5 +-- .../advsimd-intrinsics/vld2_lane_f64_indices_1.c | 5 +-- .../advsimd-intrinsics/vld2_lane_p8_indices_1.c| 5 +-- .../advsimd-intrinsics/vld2_lane_s16_indices_1.c | 5 +-- .../advsimd-intrinsics/vld2_lane_s32_indices_1.c | 5 +-- .../advsimd-intrinsics/vld2_lane_s64_indices_1.c | 5 +-- .../advsimd-intrinsics/vld2_lane_s8_indices_1.c| 5 +-- .../advsimd-intrinsics/vld2_lane_u16_indices_1.c | 5 +-- .../advsimd-intrinsics/vld2_lane_u32_indices_1.c | 5 +-- .../advsimd-intrinsics/vld2_lane_u64_indices_1.c | 5 +-- .../advsimd-intrinsics/vld2_lane_u8_indices_1.c| 5 +-- .../advsimd-intrinsics/vld2q_lane_f16_indices_1.c | 5 +-- .../advsimd-intrinsics/vld2q_lane_f32_indices_1.c | 5 +-- .../advsimd-intrinsics/vld2q_lane_f64_indices_1.c | 5 +-- .../advsimd-intrinsics/vld2q_lane_p8_indices_1.c | 5 +-- .../advsimd-intrinsics/vld2q_lane_s16_indices_1.c | 5 +-- .../advsimd-intrinsics/vld2q_lane_s32_indices_1.c | 5 +-- .../advsimd-intrinsics/vld2q_lane_s64_indices_1.c | 5 +-- .../advsimd-intrinsics/vld2q_lane_s8_indices_1.c | 5 +-- .../advsimd-intrinsics/vld2q_lane_u16_indices_1.c | 5 +-- .../advsimd-intrinsics/vld2q_lane_u32_indices_1.c | 5 +-- .../advsimd-intrinsics/vld2q_lane_u64_indices_1.c | 5 +-- .../advsimd-intrinsics/vld2q_lane_u8_indices_1.c | 5 +-- .../advsimd-intrinsics/vld3_lane_f16_indices_1.c | 5 +-- .../advsimd-intrinsics/vld3_lane_f32_indices_1.c | 5 +-- .../advsimd-intrinsics/vld3_lane_f64_indices_1.c | 5 +-- .../advsimd-intrinsics/vld3_lane_p8_indices_1.c| 5 +-- .../advsimd-intrinsics/vld3_lane_s16_indices_1.c | 5 +-- .../advsimd-intrinsics/vld3_lane_s32_indices_1.c | 5 +-- .../advsimd-intrinsics/vld3_lane_s64_indices_1.c | 5 +-- .../advsimd-intrinsics/vld3_lane_s8_indices_1.c| 5 +-- .../advsimd-intrinsics/vld3_lane_u16_indices_1.c | 5 +-- .../advsimd-intrinsics/vld3_lane_u32_indices_1.c | 5 +-- .../advsimd-intrinsics/vld3_lane_u64_indices_1.c | 5 +-- .../advsimd-intrinsics/vld3_lane_u8_indices_1.c| 5 +-- .../advsimd-intrinsics/vld3q_lane_f16_indices_1.c | 5 +-- .../advsimd-intrinsics/vld3q_lane_f32_indices_1.c | 5 +-- .../advsimd-intrinsics/vld3q_lane_f64_indices_1.c | 5 +-- .../advsimd-intrinsics/vld3q_lane_p8_indices_1.c | 5 +-- .../advsimd-intrinsics/vld3q_lane_s16_indices_1.c | 5 +-- .../advsimd-intrinsics/vld3q_lane_s32_indices_1.c | 5 +-- .../advsimd-intrinsics/vld3q_lane_s64_indices_1.c | 5 +-- .../advsimd-intrinsics/vld3q_lane_s8_indices_1.c | 5 +-- .../advsimd-intrinsics/vld3q_lane_u16_indices_1.c | 5 +-- .../advsimd-intrinsics/vld3q_lane_u32_indices_1.c | 5 +-- .../advsimd-intrinsics/vld3q_lane_u64_indices_1.c | 5 +-- .../advsimd-intrinsics/vld3q_lane_u8_indices_1.c | 5 +-- .../advsimd-intrinsics/vld4_lane_f16_indices_1.c | 5 +-- .../advsimd-intrinsics/vld4_lane_f32_indices_1.c | 5 +-- .../advsimd-intrinsics/vld4_lane_f64_indices_1.c | 5 +-- .../advsimd-intrinsics/vld4_lane_p8_indices_1.c| 5 +-- .../advsimd-intrinsics/vld4_lane_s16_indices_1.c | 5 +-- .../advsimd-intrinsics/vld4_lane_s32_indices_1.c | 5 +-- .../advsimd-intrinsics/vld4_lane_s64_indices_1.c | 5 +-- .../advsimd-intrinsics/vld4_lane_s8_indices_1.c| 5 +-- .../advsimd-intrinsics/vld4_lane_u16_indices_1.c | 5 +-- .../advsimd-intrinsics/vld4_lane_u32_indices_1.c | 5 +-- .../advsimd-intrinsics/vld4_lane_u64_indices_1.c | 5 +-- .../advsimd-intrinsics/vld4_lane_u8_indices_1.c| 5 +-- .../advsimd-intrinsics/vld4q_lane_f16_indices_1.c | 5 +-- .../advsimd-intrinsics/vld4q_lane_f32_indices_1.c | 5 +-- .../advsimd-intrinsics/vld4q_lane_f64_indices_1.c | 5 +-- .../advsimd-intrinsics/vld4q_lane_p8_indices_1.c | 5 +-- .../advsimd-intrinsics/vld4q_lane_s16_indices_1.c | 5 +-- .../advsimd-intrinsics/vld4q_lane_s32_indices_1.c | 5 +-- .../advsimd-intrinsics/vld4q_lane_s64_indices_1.c | 5 +-- .../advsimd-intrinsics/vld4q_lane_s8_indices_1.c | 5 +-- .../advsimd-intrinsics
[PATCH 2/3] [ARM] PR63870 Mark lane indices of vldN/vstN with appropriate qualifier
From: Charles Baylis gcc/ChangeLog: Charles Baylis PR target/63870 * config/arm/arm-builtins.c: (arm_load1_qualifiers) Use qualifier_struct_load_store_lane_index. (arm_storestruct_lane_qualifiers) Likewise. * config/arm/neon.md: (neon_vld1_lane) Reverse lane numbers for big-endian. (neon_vst1_lane) Likewise. (neon_vld2_lane) Likewise. (neon_vst2_lane) Likewise. (neon_vld3_lane) Likewise. (neon_vst3_lane) Likewise. (neon_vld4_lane) Likewise. (neon_vst4_lane) Likewise. Change-Id: Ic39898d288701bc5b712490265be688f5620c4e2 --- gcc/config/arm/arm-builtins.c | 4 ++-- gcc/config/arm/neon.md| 49 +++ 2 files changed, 28 insertions(+), 25 deletions(-) diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c index a29f8d6..cbe96e4 100644 --- a/gcc/config/arm/arm-builtins.c +++ b/gcc/config/arm/arm-builtins.c @@ -162,7 +162,7 @@ arm_load1_qualifiers[SIMD_MAX_BUILTIN_ARGS] static enum arm_type_qualifiers arm_load1_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_none, qualifier_const_pointer_map_mode, - qualifier_none, qualifier_immediate }; + qualifier_none, qualifier_struct_load_store_lane_index }; #define LOAD1LANE_QUALIFIERS (arm_load1_lane_qualifiers) /* The first argument (return type) of a store should be void type, @@ -181,7 +181,7 @@ arm_store1_qualifiers[SIMD_MAX_BUILTIN_ARGS] static enum arm_type_qualifiers arm_storestruct_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_void, qualifier_pointer_map_mode, - qualifier_none, qualifier_immediate }; + qualifier_none, qualifier_struct_load_store_lane_index }; #define STORE1LANE_QUALIFIERS (arm_storestruct_lane_qualifiers) #define v8qi_UP V8QImode diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index 2667866..251afdc 100644 --- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -4261,8 +4261,9 @@ if (BYTES_BIG_ENDIAN) UNSPEC_VLD1_LANE))] "TARGET_NEON" { - HOST_WIDE_INT lane = INTVAL (operands[3]); + HOST_WIDE_INT lane = ENDIAN_LANE_N(mode, INTVAL (operands[3])); HOST_WIDE_INT max = GET_MODE_NUNITS (mode); + operands[3] = GEN_INT (lane); if (lane < 0 || lane >= max) error ("lane out of range"); if (max == 1) @@ -4281,8 +4282,9 @@ if (BYTES_BIG_ENDIAN) UNSPEC_VLD1_LANE))] "TARGET_NEON" { - HOST_WIDE_INT lane = INTVAL (operands[3]); + HOST_WIDE_INT lane = ENDIAN_LANE_N(mode, INTVAL (operands[3])); HOST_WIDE_INT max = GET_MODE_NUNITS (mode); + operands[3] = GEN_INT (lane); int regno = REGNO (operands[0]); if (lane < 0 || lane >= max) error ("lane out of range"); @@ -4367,8 +4369,9 @@ if (BYTES_BIG_ENDIAN) UNSPEC_VST1_LANE))] "TARGET_NEON" { - HOST_WIDE_INT lane = INTVAL (operands[2]); + HOST_WIDE_INT lane = ENDIAN_LANE_N(mode, INTVAL (operands[2])); HOST_WIDE_INT max = GET_MODE_NUNITS (mode); + operands[2] = GEN_INT (lane); if (lane < 0 || lane >= max) error ("lane out of range"); if (max == 1) @@ -4387,7 +4390,7 @@ if (BYTES_BIG_ENDIAN) UNSPEC_VST1_LANE))] "TARGET_NEON" { - HOST_WIDE_INT lane = INTVAL (operands[2]); + HOST_WIDE_INT lane = ENDIAN_LANE_N(mode, INTVAL (operands[2])); HOST_WIDE_INT max = GET_MODE_NUNITS (mode); int regno = REGNO (operands[1]); if (lane < 0 || lane >= max) @@ -4396,8 +4399,8 @@ if (BYTES_BIG_ENDIAN) { lane -= max / 2; regno += 2; - operands[2] = GEN_INT (lane); } + operands[2] = GEN_INT (lane); operands[1] = gen_rtx_REG (mode, regno); if (max == 2) return "vst1.\t{%P1}, %A0"; @@ -4457,7 +4460,7 @@ if (BYTES_BIG_ENDIAN) UNSPEC_VLD2_LANE))] "TARGET_NEON" { - HOST_WIDE_INT lane = INTVAL (operands[3]); + HOST_WIDE_INT lane = ENDIAN_LANE_N(mode, INTVAL (operands[3])); HOST_WIDE_INT max = GET_MODE_NUNITS (mode); int regno = REGNO (operands[0]); rtx ops[4]; @@ -4466,7 +4469,7 @@ if (BYTES_BIG_ENDIAN) ops[0] = gen_rtx_REG (DImode, regno); ops[1] = gen_rtx_REG (DImode, regno + 2); ops[2] = operands[1]; - ops[3] = operands[3]; + ops[3] = GEN_INT (lane); output_asm_insn ("vld2.\t{%P0[%c3], %P1[%c3]}, %A2", ops); return ""; } @@ -4482,7 +4485,7 @@ if (BYTES_BIG_ENDIAN) UNSPEC_VLD2_LANE))] "TARGET_NEON" { - HOST_WIDE_INT lane = INTVAL (operands[3]); + HOST_WIDE_INT lane = ENDIAN_LANE_N(mode, INTVAL (operands[3])); HOST_WIDE_INT max = GET_MODE_NUNITS (mode); int regno = REGNO (operands[0]); rtx ops[4]; @@ -4572,7 +4575,7 @@ if (BYTES_BIG_ENDIAN) UNSPEC_VST2_LANE))] "TARGET_NEON" { - HOST_WIDE_INT lane = INTVAL (operands[2]); + HOST_WIDE_INT lane = ENDIAN_LANE_N(mode, INTVAL (o
Re: [PATCH v2 0/3] [ARM] PR63870 vldN_lane/vstN_lane error messages
On 7 October 2015 at 00:59, wrote: > From: Charles Baylis > > This patch series fixes up the error messages for single lane vector > load/stores, similarly to AArch64. > > make check on arm-linux-gnueabihf/qemu completes with no new regressions. > > Changes since the last version: > . removed the duplicate arm_neon_lane_bounds function > . resolved conflicts with other NEON work > . whitespace clean up > A bit more info is required here. Original patch submission was at: https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00142.html > Charles Baylis (3): > [ARM] PR63870 Add qualifiers for NEON builtins This adds the qualifier_struct_load_store_lane_index qualifier, ported from my AArch64 changes. This causes the lane bounds check to be performed and, on big-endian targets, reverses the lane number used in the RTL (in common with other NEON patterns). This isn't strictly required for correctness, since the vectorizer never generates these patterns, but the consistency seems like a good idea. > [ARM] PR63870 Mark lane indices of vldN/vstN with appropriate > qualifier This marks the builtins with qualifier_struct_load_store_lane_index. The patterns are also updated to un-reverse the lane order at assembly time. > [ARM] PR63870 Enable test cases for ARM Removes the xfails from the test cases.
Re: [PATCH 1/3] [ARM] PR63870 Add qualifiers for NEON builtins
On 12 October 2015 at 11:58, Alan Lawrence wrote: > On 07/10/15 00:59, charles.bay...@linaro.org wrote: >> >> diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c > > ... >> >> case NEON_ARG_MEMORY: >> /* Check if expand failed. */ >> if (op[argc] == const0_rtx) >> { >> - va_end (ap); >> return 0; >> } > > > ...and drop the braces? Will do. >> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c >> index 02f5dc3..448cde3 100644 >> --- a/gcc/config/arm/arm.c >> +++ b/gcc/config/arm/arm.c >> @@ -30117,4 +30117,5 @@ arm_sched_fusion_priority (rtx_insn *insn, int >> max_pri, >> *pri = tmp; >> return; >> } >> + >> #include "gt-arm.h" > > > This looks unrelated (and is the only change to arm.c) - perhaps commit > separately? (Note I am not a maintainer! But this looks "obvious"...) It doesn't seem very useful. I'll drop it. >> diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h >> index 87c9f90..27ac4dc 100644 >> --- a/gcc/config/arm/arm.h >> +++ b/gcc/config/arm/arm.h >> @@ -288,6 +288,9 @@ extern void >> (*arm_lang_output_object_attributes_hook)(void); >> #define TARGET_BPABI false >> #endif >> >> +#define ENDIAN_LANE_N(mode, n) \ >> + (BYTES_BIG_ENDIAN ? GET_MODE_NUNITS (mode) - 1 - n : n) >> + > > > Given we are making changes here to how this all works on bigendian, have > you tested armeb at all? I tested on big endian, and it passes, except for a testsuite issue with the *_f16 tests, which fail because they are built without the fp16 options on big endian. This is because check_effective_target_arm_neon_fp16_ok_nocache gets an ICE when it attempts to compile the test program. I think those fp16 intrinsics are in your area, do you want to take a look? :) Thanks for the review Charles
Re: [ARM] Use vector wide add for mixed-mode adds
On 20 October 2015 at 08:54, Michael Collison wrote: > I want to ask a question about existing patterns in neon.md that utilize the > vec_select and all the lanes as my example does: Why are the following > pattern not matched if the target is big endian? > (define_insn "neon_vec_unpack_lo_" > [(set (match_operand: 0 "register_operand" "=w") > (SE: (vec_select: > (match_operand:VU 1 "register_operand" "w") > (match_operand:VU 2 "vect_par_constant_low" ""] > "TARGET_NEON && !BYTES_BIG_ENDIAN" > "vmovl. %q0, %e1" > [(set_attr "type" "neon_shift_imm_long")] > ) > > (define_insn "neon_vec_unpack_hi_" > [(set (match_operand: 0 "register_operand" "=w") > (SE: (vec_select: > (match_operand:VU 1 "register_operand" "w") > (match_operand:VU 2 "vect_par_constant_high" ""] > "TARGET_NEON && !BYTES_BIG_ENDIAN" > "vmovl. %q0, %f1" > [(set_attr "type" "neon_shift_imm_long")] > > These patterns are similar to the new patterns I am adding and I am > wondering if my patterns should exclude BYTES_BIG_ENDIAN? These patterns use %e and %f to access the low and high part of the input operand - so %e is used to match the use of _lo in the pattern name, and vect_par_constant_low, and %f with _hi and vect_par_constant_high. For big-endian, the use of %e and %f would need to be swapped. Looking at the patch you posted last month (possibly not the latest version?): This is a pattern which is supposed to act on the low part of the input vector, hence _lo in the name: +(define_insn "vec_sel_widen_ssum_lo3" + [(set (match_operand: 0 "s_register_operand" "=w") + (plus: (sign_extend: (vec_select:VW (match_operand:VQI 1 "s_register_operand" "%w") + (match_operand:VQI 2 "vect_par_constant_low" ""))) +(match_operand: 3 "s_register_operand" "0")))] + "TARGET_NEON" + "vaddw.\t%q0, %q3, %e1" Here, using %e1 carries an implicit assumption that the low part of the input vector is in the lowest numbered of the pair of D registers, which is only true on little-endian. This is a bit ugly (and untested) but perhaps something like this would fix the problem { return BYTES_BIG_ENDIAN ? "vaddw.\t%q0, %q3, %f1" : "vaddw.\t%q0, %q3, %e1"; } + [(set_attr "type" "neon_add_widen") + (set_attr "length" "8")] +) Similarly, here. Pattern is _hi, register is %f1: +(define_insn "vec_sel_widen_ssum_hi3" + [(set (match_operand: 0 "s_register_operand" "=w") + (plus: (sign_extend: (vec_select:VW (match_operand:VQI 1 "s_register_operand" "%w") + (match_operand:VQI 2 "vect_par_constant_high" ""))) +(match_operand: 3 "s_register_operand" "0")))] + "TARGET_NEON" + "vaddw.\t%q0, %q3, %f1" + [(set_attr "type" "neon_add_widen") + (set_attr "length" "8")] +) However, as far as I can see, there isn't an endianness dependency in widen_ssum3/widen_usum3 because both halves of the vector are used and added together. Hope this helps Charles
[PATCH] [ARM] PR61551 RFC: Improve costs for NEON addressing modes
Hi Ramana, [revisiting https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01593.html] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61551 This patch is an initial attempt to rework the ARM rtx costs to better handle the costs of various addressing modes, in particular to remove the incorrect large costs associated with post-indexed addressing in NEON memory operations. This patch introduces per-core tables for the costs of using different addressing modes for different access modes. I have retained the original code so that the calculated costs can be compared. Currently, the tables replicate the costs calculated by the original code, and a debug assert is left in place. Obviously, a fair amount of clean up is needed before this can be applied, but I would like a quick comment on the general approach to check that I haven't completely missed the point before continuing. After that, I will clean up the coding style, check for impact on the AArch64 backend, remove the debug code and in a separate patch improve the tuning for the vector modes. Thanks Charles From b10c6dd7af1f5b9821946783ba9d96b08c751f2b Mon Sep 17 00:00:00 2001 From: Charles Baylis Date: Wed, 28 Oct 2015 18:48:16 + Subject: [PATCH] WIP Change-Id: If349ffd7dbbe13a814be4a0d022382ddc8270973 --- gcc/config/arm/aarch-common-protos.h | 28 ++ gcc/config/arm/aarch-cost-tables.h | 328 + gcc/config/arm/arm.c | 677 ++- 3 files changed, 1023 insertions(+), 10 deletions(-) diff --git a/gcc/config/arm/aarch-common-protos.h b/gcc/config/arm/aarch-common-protos.h index 348ae74..dae42d7 100644 --- a/gcc/config/arm/aarch-common-protos.h +++ b/gcc/config/arm/aarch-common-protos.h @@ -130,6 +130,33 @@ struct vector_cost_table const int alu; }; +struct cbmem_cost_table +{ + enum access_type + { +REG, +POST_INCDEC, +PRE_INCDEC, +/*PRE_MODIFY,*/ +POST_MODIFY, +PLUS, +ACCESS_TYPE_LAST = PLUS + }; + const int si[ACCESS_TYPE_LAST + 1]; + const int di[ACCESS_TYPE_LAST + 1]; + const int cdi[ACCESS_TYPE_LAST + 1]; + const int sf[ACCESS_TYPE_LAST + 1]; + const int df[ACCESS_TYPE_LAST + 1]; + const int cdf[ACCESS_TYPE_LAST + 1]; + const int blk[ACCESS_TYPE_LAST + 1]; + const int vec64[ACCESS_TYPE_LAST + 1]; + const int vec128[ACCESS_TYPE_LAST + 1]; + const int vec192[ACCESS_TYPE_LAST + 1]; + const int vec256[ACCESS_TYPE_LAST + 1]; + const int vec384[ACCESS_TYPE_LAST + 1]; + const int vec512[ACCESS_TYPE_LAST + 1]; +}; + struct cpu_cost_table { const struct alu_cost_table alu; @@ -137,6 +164,7 @@ struct cpu_cost_table const struct mem_cost_table ldst; const struct fp_cost_table fp[2]; /* SFmode and DFmode. */ const struct vector_cost_table vect; + const struct cbmem_cost_table addr; }; diff --git a/gcc/config/arm/aarch-cost-tables.h b/gcc/config/arm/aarch-cost-tables.h index 66e09a8..c5ecdcf 100644 --- a/gcc/config/arm/aarch-cost-tables.h +++ b/gcc/config/arm/aarch-cost-tables.h @@ -122,6 +122,88 @@ const struct cpu_cost_table generic_extra_costs = /* Vector */ { COSTS_N_INSNS (1) /* alu. */ + }, + /* Memory */ + { +{ 0, 0, 0, 0, 0 }, /* si */ +{ + 0, + COSTS_N_INSNS (1), + COSTS_N_INSNS (1), + COSTS_N_INSNS (1), + COSTS_N_INSNS (1) +},/* di */ +{ + 0, + COSTS_N_INSNS (3), + COSTS_N_INSNS (3), + COSTS_N_INSNS (3), + COSTS_N_INSNS (3) +},/* cdi */ +{ 0, 0, 0, 0, 0 }, /* sf */ +{ + 0, + COSTS_N_INSNS (1), + COSTS_N_INSNS (1), + COSTS_N_INSNS (1), + COSTS_N_INSNS (1) +},/* df */ +{ + 0, + COSTS_N_INSNS (3), + COSTS_N_INSNS (3), + COSTS_N_INSNS (3), + COSTS_N_INSNS (3) +},/* cdf */ +{ + 0, + - COSTS_N_INSNS (1), + - COSTS_N_INSNS (1), + - COSTS_N_INSNS (1), + - COSTS_N_INSNS (1), +},/* blk */ +{ + 0, + COSTS_N_INSNS (1), + COSTS_N_INSNS (1), + COSTS_N_INSNS (1), + COSTS_N_INSNS (1) +},/* vec64 */ +{ + 0, + COSTS_N_INSNS (3), + COSTS_N_INSNS (3), + COSTS_N_INSNS (3), + COSTS_N_INSNS (3) +},/* vec128 */ +{ + 0, + COSTS_N_INSNS (5), + COSTS_N_INSNS (5), + COSTS_N_INSNS (5), + COSTS_N_INSNS (5) +},/* vec192 */ +{ + 0, + COSTS_N_INSNS (7), + COSTS_N_INSNS (7), + COSTS_N_INSNS (7), + COSTS_N_INSNS (7) +},/* vec256 */ +{ + 0, + COSTS_N_INSNS (11), + COSTS_N_INSNS (11), + COSTS_N_INSNS (11), + COSTS_N_INSNS (11) +},/* vec384 */ +{ + 0, + COSTS_N_INSNS (15), + COSTS_N_INSNS (15), + COSTS_N_INSNS (15), + COSTS_N_INSNS (15) +}/* vec512 */ } }; @@ -225,6 +307,88 @@ const struct cpu_cost_table cortexa53_extra_costs = /* Vector */ { COSTS_N_INSNS (1) /* alu. */ + }, + /* Memory */ + { +{ 0,
Re: [PATCH] [ARM] PR61551 RFC: Improve costs for NEON addressing modes
On 4 November 2015 at 08:05, Ramana Radhakrishnan wrote: > Hi Charles, > > Sorry I missed this completely in my inbox. > > On 31/10/15 03:34, Charles Baylis wrote: >> Hi Ramana, >> >> [revisiting https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01593.html] >> >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61551 >> >> This patch is an initial attempt to rework the ARM rtx costs to better >> handle the costs of various addressing modes, in particular to remove >> the incorrect large costs associated with post-indexed addressing in >> NEON memory operations. >> >> This patch introduces per-core tables for the costs of using different >> addressing modes for different access modes. I have retained the >> original code so that the calculated costs can be compared. Currently, >> the tables replicate the costs calculated by the original code, and a >> debug assert is left in place. >> >> Obviously, a fair amount of clean up is needed before this can be >> applied, but I would like a quick comment on the general approach to >> check that I haven't completely missed the point before continuing. > > No you haven't missed the point - this is the direction I wanted this taken > in though not expecting this degree of detail. OK, Thanks :) >> +struct cbmem_cost_table >> +{ >> + enum access_type >> + { >> +REG, >> +POST_INCDEC, >> +PRE_INCDEC, >> +/*PRE_MODIFY,*/ >> +POST_MODIFY, >> +PLUS, >> +ACCESS_TYPE_LAST = PLUS >> + }; >> + const int si[ACCESS_TYPE_LAST + 1]; >> + const int di[ACCESS_TYPE_LAST + 1]; >> + const int cdi[ACCESS_TYPE_LAST + 1]; >> + const int sf[ACCESS_TYPE_LAST + 1]; >> + const int df[ACCESS_TYPE_LAST + 1]; >> + const int cdf[ACCESS_TYPE_LAST + 1]; >> + const int blk[ACCESS_TYPE_LAST + 1]; >> + const int vec64[ACCESS_TYPE_LAST + 1]; >> + const int vec128[ACCESS_TYPE_LAST + 1]; >> + const int vec192[ACCESS_TYPE_LAST + 1]; >> + const int vec256[ACCESS_TYPE_LAST + 1]; >> + const int vec384[ACCESS_TYPE_LAST + 1]; >> + const int vec512[ACCESS_TYPE_LAST + 1]; >> +}; >> + >> >> After that, I will clean up the coding style, check for impact on the >> AArch64 backend, remove the debug code and in a separate patch improve >> the tuning for the vector modes. > > I think adding additional costs for zero / sign extension of registers would > be appropriate for the AArch64 backend. Further more I think Alan recently > had patches to change the use of vector modes to BLKmode in the AArch64 > backend, so some of the vector costing might become interesting. The aarch64 already has a mechanism for doing costs for those operations in aarch64_address_cost(). Using BLKmode will certainly make this difficult. > If you can start turning this around quickly I'd like to keep the review > momentum going but it will need time and effort from a number of parties to > get this working. This is however likely to be a high impact change on the > backends as this is an invasive change and I'm not sure if it will meet the > Stage3 cutoff point. I'll see what I can do. In the short term, the only part of the cost model I want changed is the excessive costs for the pre/post-indexed addressing on vector modes. >> From b10c6dd7af1f5b9821946783ba9d96b08c751f2b Mon Sep 17 00:00:00 2001 >> From: Charles Baylis >> Date: Wed, 28 Oct 2015 18:48:16 + >> Subject: [PATCH] WIP >> >> Change-Id: If349ffd7dbbe13a814be4a0d022382ddc8270973 >> --- >> gcc/config/arm/aarch-common-protos.h | 28 ++ >> gcc/config/arm/aarch-cost-tables.h | 328 + >> gcc/config/arm/arm.c | 677 >> ++- >> 3 files changed, 1023 insertions(+), 10 deletions(-) >> >> diff --git a/gcc/config/arm/aarch-common-protos.h >> b/gcc/config/arm/aarch-common-protos.h >> index 348ae74..dae42d7 100644 >> --- a/gcc/config/arm/aarch-common-protos.h >> +++ b/gcc/config/arm/aarch-common-protos.h >> @@ -130,6 +130,33 @@ struct vector_cost_table >>const int alu; >> }; >> >> +struct cbmem_cost_table >> +{ >> + enum access_type >> + { >> +REG, >> +POST_INCDEC, >> +PRE_INCDEC, >> +/*PRE_MODIFY,*/ >> +POST_MODIFY, >> +PLUS, >> +ACCESS_TYPE_LAST = PLUS >> + }; >> + const int si[ACCESS_TYPE_LAST + 1]; >> + const int di[ACCESS_TYPE_LAST + 1]; >> + const int cdi[ACCESS_TYPE_LAST + 1]; >> + const int sf[ACCESS_TYPE_LAST + 1]; >>
[PATCH 1/4] [ARM] PR63870 Add qualifiers for NEON builtins
From: Charles Baylis gcc/ChangeLog: Charles Baylis PR target/63870 * config/arm/arm-builtins.c (enum arm_type_qualifiers): New enumerator qualifier_struct_load_store_lane_index. (builtin_arg): New enumerator NEON_ARG_STRUCT_LOAD_STORE_LANE_INDEX. (arm_expand_neon_args): New parameter. Remove ellipsis. Handle NEON argument qualifiers. (arm_expand_neon_builtin): Handle new NEON argument qualifier. * config/arm/arm.h (ENDIAN_LANE_N): New macro. Change-Id: Iaa14d8736879fa53776319977eda2089f0a26647 --- gcc/config/arm/arm-builtins.c | 48 +++ gcc/config/arm/arm.c | 1 + gcc/config/arm/arm.h | 3 +++ 3 files changed, 34 insertions(+), 18 deletions(-) diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c index bad3dc3..6e3aad4 100644 --- a/gcc/config/arm/arm-builtins.c +++ b/gcc/config/arm/arm-builtins.c @@ -67,7 +67,9 @@ enum arm_type_qualifiers /* Polynomial types. */ qualifier_poly = 0x100, /* Lane indices - must be within range of previous argument = a vector. */ - qualifier_lane_index = 0x200 + qualifier_lane_index = 0x200, + /* Lane indices for single lane structure loads and stores. */ + qualifier_struct_load_store_lane_index = 0x400 }; /* The qualifier_internal allows generation of a unary builtin from @@ -1963,6 +1965,7 @@ typedef enum { NEON_ARG_COPY_TO_REG, NEON_ARG_CONSTANT, NEON_ARG_LANE_INDEX, + NEON_ARG_STRUCT_LOAD_STORE_LANE_INDEX, NEON_ARG_MEMORY, NEON_ARG_STOP } builtin_arg; @@ -2020,9 +2023,9 @@ neon_dereference_pointer (tree exp, tree type, machine_mode mem_mode, /* Expand a Neon builtin. */ static rtx arm_expand_neon_args (rtx target, machine_mode map_mode, int fcode, - int icode, int have_retval, tree exp, ...) + int icode, int have_retval, tree exp, + builtin_arg *args) { - va_list ap; rtx pat; tree arg[SIMD_MAX_BUILTIN_ARGS]; rtx op[SIMD_MAX_BUILTIN_ARGS]; @@ -2037,13 +2040,11 @@ arm_expand_neon_args (rtx target, machine_mode map_mode, int fcode, || !(*insn_data[icode].operand[0].predicate) (target, tmode))) target = gen_reg_rtx (tmode); - va_start (ap, exp); - formals = TYPE_ARG_TYPES (TREE_TYPE (arm_builtin_decls[fcode])); for (;;) { - builtin_arg thisarg = (builtin_arg) va_arg (ap, int); + builtin_arg thisarg = args[argc]; if (thisarg == NEON_ARG_STOP) break; @@ -2079,6 +2080,18 @@ arm_expand_neon_args (rtx target, machine_mode map_mode, int fcode, op[argc] = copy_to_mode_reg (mode[argc], op[argc]); break; + case NEON_ARG_STRUCT_LOAD_STORE_LANE_INDEX: + gcc_assert (argc > 1); + if (CONST_INT_P (op[argc])) + { + neon_lane_bounds (op[argc], 0, + GET_MODE_NUNITS (map_mode), exp); + /* Keep to GCC-vector-extension lane indices in the RTL. */ + op[argc] = + GEN_INT (ENDIAN_LANE_N (map_mode, INTVAL (op[argc]))); + } + goto constant_arg; + case NEON_ARG_LANE_INDEX: /* Previous argument must be a vector, which this indexes. */ gcc_assert (argc > 0); @@ -2089,19 +2102,22 @@ arm_expand_neon_args (rtx target, machine_mode map_mode, int fcode, } /* Fall through - if the lane index isn't a constant then the next case will error. */ + case NEON_ARG_CONSTANT: +constant_arg: if (!(*insn_data[icode].operand[opno].predicate) (op[argc], mode[argc])) - error_at (EXPR_LOCATION (exp), "incompatible type for argument %d, " - "expected %", argc + 1); + { + error ("%Kargument %d must be a constant immediate", +exp, argc + 1); + return const0_rtx; + } break; + case NEON_ARG_MEMORY: /* Check if expand failed. */ if (op[argc] == const0_rtx) - { - va_end (ap); return 0; - } gcc_assert (MEM_P (op[argc])); PUT_MODE (op[argc], mode[argc]); /* ??? arm_neon.h uses the same built-in functions for signed @@ -2122,8 +2138,6 @@ arm_expand_neon_args (rtx target, machine_mode map_mode, int fcode, } } - va_end (ap); - if (have_retval) switch (argc) { @@ -2235,6 +2249,8 @@ arm_expand_neon_builtin (int fcode, tree exp, rtx target) if (d->qualifiers[qualifiers_k] & qualifier_lane_index) args[k] = NEON_ARG_LANE_INDEX; + else if (d->qualifiers[qualifiers_k] & qualifier_struct_load_store_lane_index)
[PATCH 2/4] [ARM] PR63870 Mark lane indices of vldN/vstN with appropriate qualifier
From: Charles Baylis gcc/ChangeLog: Charles Baylis PR target/63870 * config/arm/arm-builtins.c: (arm_load1_qualifiers) Use qualifier_struct_load_store_lane_index. (arm_storestruct_lane_qualifiers) Likewise. * config/arm/neon.md: (neon_vld1_lane) Reverse lane numbers for big-endian. (neon_vst1_lane) Likewise. (neon_vld2_lane) Likewise. (neon_vst2_lane) Likewise. (neon_vld3_lane) Likewise. (neon_vst3_lane) Likewise. (neon_vld4_lane) Likewise. (neon_vst4_lane) Likewise. Change-Id: Ic39898d288701bc5b712490265be688f5620c4e2 --- gcc/config/arm/arm-builtins.c | 4 ++-- gcc/config/arm/neon.md| 49 +++ 2 files changed, 28 insertions(+), 25 deletions(-) diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c index 6e3aad4..113e3da 100644 --- a/gcc/config/arm/arm-builtins.c +++ b/gcc/config/arm/arm-builtins.c @@ -152,7 +152,7 @@ arm_load1_qualifiers[SIMD_MAX_BUILTIN_ARGS] static enum arm_type_qualifiers arm_load1_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_none, qualifier_const_pointer_map_mode, - qualifier_none, qualifier_immediate }; + qualifier_none, qualifier_struct_load_store_lane_index }; #define LOAD1LANE_QUALIFIERS (arm_load1_lane_qualifiers) /* The first argument (return type) of a store should be void type, @@ -171,7 +171,7 @@ arm_store1_qualifiers[SIMD_MAX_BUILTIN_ARGS] static enum arm_type_qualifiers arm_storestruct_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_void, qualifier_pointer_map_mode, - qualifier_none, qualifier_immediate }; + qualifier_none, qualifier_struct_load_store_lane_index }; #define STORE1LANE_QUALIFIERS (arm_storestruct_lane_qualifiers) #define v8qi_UP V8QImode diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index e5a2b0f..e8db020 100644 --- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -4261,8 +4261,9 @@ if (BYTES_BIG_ENDIAN) UNSPEC_VLD1_LANE))] "TARGET_NEON" { - HOST_WIDE_INT lane = INTVAL (operands[3]); + HOST_WIDE_INT lane = ENDIAN_LANE_N(mode, INTVAL (operands[3])); HOST_WIDE_INT max = GET_MODE_NUNITS (mode); + operands[3] = GEN_INT (lane); if (lane < 0 || lane >= max) error ("lane out of range"); if (max == 1) @@ -4281,8 +4282,9 @@ if (BYTES_BIG_ENDIAN) UNSPEC_VLD1_LANE))] "TARGET_NEON" { - HOST_WIDE_INT lane = INTVAL (operands[3]); + HOST_WIDE_INT lane = ENDIAN_LANE_N(mode, INTVAL (operands[3])); HOST_WIDE_INT max = GET_MODE_NUNITS (mode); + operands[3] = GEN_INT (lane); int regno = REGNO (operands[0]); if (lane < 0 || lane >= max) error ("lane out of range"); @@ -4367,8 +4369,9 @@ if (BYTES_BIG_ENDIAN) UNSPEC_VST1_LANE))] "TARGET_NEON" { - HOST_WIDE_INT lane = INTVAL (operands[2]); + HOST_WIDE_INT lane = ENDIAN_LANE_N(mode, INTVAL (operands[2])); HOST_WIDE_INT max = GET_MODE_NUNITS (mode); + operands[2] = GEN_INT (lane); if (lane < 0 || lane >= max) error ("lane out of range"); if (max == 1) @@ -4387,7 +4390,7 @@ if (BYTES_BIG_ENDIAN) UNSPEC_VST1_LANE))] "TARGET_NEON" { - HOST_WIDE_INT lane = INTVAL (operands[2]); + HOST_WIDE_INT lane = ENDIAN_LANE_N(mode, INTVAL (operands[2])); HOST_WIDE_INT max = GET_MODE_NUNITS (mode); int regno = REGNO (operands[1]); if (lane < 0 || lane >= max) @@ -4396,8 +4399,8 @@ if (BYTES_BIG_ENDIAN) { lane -= max / 2; regno += 2; - operands[2] = GEN_INT (lane); } + operands[2] = GEN_INT (lane); operands[1] = gen_rtx_REG (mode, regno); if (max == 2) return "vst1.\t{%P1}, %A0"; @@ -4457,7 +4460,7 @@ if (BYTES_BIG_ENDIAN) UNSPEC_VLD2_LANE))] "TARGET_NEON" { - HOST_WIDE_INT lane = INTVAL (operands[3]); + HOST_WIDE_INT lane = ENDIAN_LANE_N(mode, INTVAL (operands[3])); HOST_WIDE_INT max = GET_MODE_NUNITS (mode); int regno = REGNO (operands[0]); rtx ops[4]; @@ -4466,7 +4469,7 @@ if (BYTES_BIG_ENDIAN) ops[0] = gen_rtx_REG (DImode, regno); ops[1] = gen_rtx_REG (DImode, regno + 2); ops[2] = operands[1]; - ops[3] = operands[3]; + ops[3] = GEN_INT (lane); output_asm_insn ("vld2.\t{%P0[%c3], %P1[%c3]}, %A2", ops); return ""; } @@ -4482,7 +4485,7 @@ if (BYTES_BIG_ENDIAN) UNSPEC_VLD2_LANE))] "TARGET_NEON" { - HOST_WIDE_INT lane = INTVAL (operands[3]); + HOST_WIDE_INT lane = ENDIAN_LANE_N(mode, INTVAL (operands[3])); HOST_WIDE_INT max = GET_MODE_NUNITS (mode); int regno = REGNO (operands[0]); rtx ops[4]; @@ -4572,7 +4575,7 @@ if (BYTES_BIG_ENDIAN) UNSPEC_VST2_LANE))] "TARGET_NEON" { - HOST_WIDE_INT lane = INTVAL (operands[2]); + HOST_WIDE_INT lane = ENDIAN_LANE_N(mode, INTVAL (o
[PATCH v3 0/4] [ARM] PR63870 vldN_lane/vstN_lane error messages
From: Charles Baylis Previous discussion: https://gcc.gnu.org/ml/gcc-patches/2015-10/msg00657.html This is a minor update to the previous patch set, fixing one coding style issue in the first patch, and adding a fourth patch for which there are two options, described below. [ARM] PR63870 Add qualifiers for NEON builtins [ARM] PR63870 Mark lane indices of vldN/vstN with appropriate qualifier [ARM] PR63870 Add test cases These two patches are alternate options. Alan suggested removing the error checks at assembly time, since the user-supplied lane number is always be checked earlier. I thought it might be better to catch this case as an internal error, to guard against future bugs.. If we don't use the internal error, then the assembler will catch use of invalid lane numbers. Not sure which is prefered, so both options are presented. Either one can be applied: [ARM] PR63870 Use internal_error() for invalid lane numbers [ARM] PR63870 Remove error for invalid lane numbers Passes make check for arm-unknown-linux-gnueabihf and armeb-unknown-linux-gnueabihf with no regressions. As mentioned in the last thread, the new *_f16 tests fail on armeb-* due to unrelated problems with half float moves. OK for trunk? I prefer patch 4a, but will commit 4b if that is prefered. gcc/config/arm/arm-builtins.c | 52 +++- gcc/config/arm/arm.c | 1 + gcc/config/arm/arm.h | 3 + gcc/config/arm/neon.md | 97 -- .../advsimd-intrinsics/vld2_lane_f16_indices_1.c | 5 +- .../advsimd-intrinsics/vld2_lane_f32_indices_1.c | 5 +- .../advsimd-intrinsics/vld2_lane_f64_indices_1.c | 5 +- .../advsimd-intrinsics/vld2_lane_p8_indices_1.c| 5 +- .../advsimd-intrinsics/vld2_lane_s16_indices_1.c | 5 +- .../advsimd-intrinsics/vld2_lane_s32_indices_1.c | 5 +- .../advsimd-intrinsics/vld2_lane_s64_indices_1.c | 5 +- .../advsimd-intrinsics/vld2_lane_s8_indices_1.c| 5 +- .../advsimd-intrinsics/vld2_lane_u16_indices_1.c | 5 +- .../advsimd-intrinsics/vld2_lane_u32_indices_1.c | 5 +- .../advsimd-intrinsics/vld2_lane_u64_indices_1.c | 5 +- .../advsimd-intrinsics/vld2_lane_u8_indices_1.c| 5 +- .../advsimd-intrinsics/vld2q_lane_f16_indices_1.c | 5 +- .../advsimd-intrinsics/vld2q_lane_f32_indices_1.c | 5 +- .../advsimd-intrinsics/vld2q_lane_f64_indices_1.c | 5 +- .../advsimd-intrinsics/vld2q_lane_p8_indices_1.c | 5 +- .../advsimd-intrinsics/vld2q_lane_s16_indices_1.c | 5 +- .../advsimd-intrinsics/vld2q_lane_s32_indices_1.c | 5 +- .../advsimd-intrinsics/vld2q_lane_s64_indices_1.c | 5 +- .../advsimd-intrinsics/vld2q_lane_s8_indices_1.c | 5 +- .../advsimd-intrinsics/vld2q_lane_u16_indices_1.c | 5 +- .../advsimd-intrinsics/vld2q_lane_u32_indices_1.c | 5 +- .../advsimd-intrinsics/vld2q_lane_u64_indices_1.c | 5 +- .../advsimd-intrinsics/vld2q_lane_u8_indices_1.c | 5 +- .../advsimd-intrinsics/vld3_lane_f16_indices_1.c | 5 +- .../advsimd-intrinsics/vld3_lane_f32_indices_1.c | 5 +- .../advsimd-intrinsics/vld3_lane_f64_indices_1.c | 5 +- .../advsimd-intrinsics/vld3_lane_p8_indices_1.c| 5 +- .../advsimd-intrinsics/vld3_lane_s16_indices_1.c | 5 +- .../advsimd-intrinsics/vld3_lane_s32_indices_1.c | 5 +- .../advsimd-intrinsics/vld3_lane_s64_indices_1.c | 5 +- .../advsimd-intrinsics/vld3_lane_s8_indices_1.c| 5 +- .../advsimd-intrinsics/vld3_lane_u16_indices_1.c | 5 +- .../advsimd-intrinsics/vld3_lane_u32_indices_1.c | 5 +- .../advsimd-intrinsics/vld3_lane_u64_indices_1.c | 5 +- .../advsimd-intrinsics/vld3_lane_u8_indices_1.c| 5 +- .../advsimd-intrinsics/vld3q_lane_f16_indices_1.c | 5 +- .../advsimd-intrinsics/vld3q_lane_f32_indices_1.c | 5 +- .../advsimd-intrinsics/vld3q_lane_f64_indices_1.c | 5 +- .../advsimd-intrinsics/vld3q_lane_p8_indices_1.c | 5 +- .../advsimd-intrinsics/vld3q_lane_s16_indices_1.c | 5 +- .../advsimd-intrinsics/vld3q_lane_s32_indices_1.c | 5 +- .../advsimd-intrinsics/vld3q_lane_s64_indices_1.c | 5 +- .../advsimd-intrinsics/vld3q_lane_s8_indices_1.c | 5 +- .../advsimd-intrinsics/vld3q_lane_u16_indices_1.c | 5 +- .../advsimd-intrinsics/vld3q_lane_u32_indices_1.c | 5 +- .../advsimd-intrinsics/vld3q_lane_u64_indices_1.c | 5 +- .../advsimd-intrinsics/vld3q_lane_u8_indices_1.c | 5 +- .../advsimd-intrinsics/vld4_lane_f16_indices_1.c | 5 +- .../advsimd-intrinsics/vld4_lane_f32_indices_1.c | 5 +- .../advsimd-intrinsics/vld4_lane_f64_indices_1.c | 5 +- .../advsimd-intrinsics/vld4_lane_p8_indices_1.c| 5 +- .../advsimd-intrinsics/vld4_lane_s16_indices_1.c | 5 +- .../advsimd-intrinsics/vld4_lane_s32_indices_1.c | 5 +- .../advsimd-intrinsics/vld4_lane_s64_indices_1.c | 5 +- .../advsimd-intrinsics/vld4_lane_s8_indices_1.c| 5 +- .../advsimd-intrinsics/vld4_lane_u16_indices_1.c
[PATCH 4a/4] [ARM] PR63870 Use internal_error() for invalid lane numbers
From: Charles Baylis Charles Baylis * config/arm/neon.md (neon_vld1_lane): Use internal_error for invalid lane number. (neon_vst1_lane): Likewise. (neon_vld2_lane): Likewise. (neon_vst2_lane): Likewise. (neon_vld3_lane): Likewise. (neon_vst3_lane): Likewise. (neon_vld4_lane): Likewise. (neon_vst4_lane): Likewise. Change-Id: I72686845119df2f857fed98e7e0a588c532159a7 --- gcc/config/arm/neon.md | 32 1 file changed, 16 insertions(+), 16 deletions(-) diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index e8db020..99caf96 100644 --- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -4265,7 +4265,7 @@ if (BYTES_BIG_ENDIAN) HOST_WIDE_INT max = GET_MODE_NUNITS (mode); operands[3] = GEN_INT (lane); if (lane < 0 || lane >= max) -error ("lane out of range"); +internal_error ("lane out of range"); if (max == 1) return "vld1.\t%P0, %A1"; else @@ -4287,7 +4287,7 @@ if (BYTES_BIG_ENDIAN) operands[3] = GEN_INT (lane); int regno = REGNO (operands[0]); if (lane < 0 || lane >= max) -error ("lane out of range"); +internal_error ("lane out of range"); else if (lane >= max / 2) { lane -= max / 2; @@ -4373,7 +4373,7 @@ if (BYTES_BIG_ENDIAN) HOST_WIDE_INT max = GET_MODE_NUNITS (mode); operands[2] = GEN_INT (lane); if (lane < 0 || lane >= max) -error ("lane out of range"); +internal_error ("lane out of range"); if (max == 1) return "vst1.\t{%P1}, %A0"; else @@ -4394,7 +4394,7 @@ if (BYTES_BIG_ENDIAN) HOST_WIDE_INT max = GET_MODE_NUNITS (mode); int regno = REGNO (operands[1]); if (lane < 0 || lane >= max) -error ("lane out of range"); +internal_error ("lane out of range"); else if (lane >= max / 2) { lane -= max / 2; @@ -4465,7 +4465,7 @@ if (BYTES_BIG_ENDIAN) int regno = REGNO (operands[0]); rtx ops[4]; if (lane < 0 || lane >= max) -error ("lane out of range"); +internal_error ("lane out of range"); ops[0] = gen_rtx_REG (DImode, regno); ops[1] = gen_rtx_REG (DImode, regno + 2); ops[2] = operands[1]; @@ -4490,7 +4490,7 @@ if (BYTES_BIG_ENDIAN) int regno = REGNO (operands[0]); rtx ops[4]; if (lane < 0 || lane >= max) -error ("lane out of range"); +internal_error ("lane out of range"); else if (lane >= max / 2) { lane -= max / 2; @@ -4580,7 +4580,7 @@ if (BYTES_BIG_ENDIAN) int regno = REGNO (operands[1]); rtx ops[4]; if (lane < 0 || lane >= max) -error ("lane out of range"); +internal_error ("lane out of range"); ops[0] = operands[0]; ops[1] = gen_rtx_REG (DImode, regno); ops[2] = gen_rtx_REG (DImode, regno + 2); @@ -4605,7 +4605,7 @@ if (BYTES_BIG_ENDIAN) int regno = REGNO (operands[1]); rtx ops[4]; if (lane < 0 || lane >= max) -error ("lane out of range"); +internal_error ("lane out of range"); else if (lane >= max / 2) { lane -= max / 2; @@ -4724,7 +4724,7 @@ if (BYTES_BIG_ENDIAN) int regno = REGNO (operands[0]); rtx ops[5]; if (lane < 0 || lane >= max) -error ("lane out of range"); +internal_error ("lane out of range"); ops[0] = gen_rtx_REG (DImode, regno); ops[1] = gen_rtx_REG (DImode, regno + 2); ops[2] = gen_rtx_REG (DImode, regno + 4); @@ -4751,7 +4751,7 @@ if (BYTES_BIG_ENDIAN) int regno = REGNO (operands[0]); rtx ops[5]; if (lane < 0 || lane >= max) -error ("lane out of range"); +internal_error ("lane out of range"); else if (lane >= max / 2) { lane -= max / 2; @@ -4896,7 +4896,7 @@ if (BYTES_BIG_ENDIAN) int regno = REGNO (operands[1]); rtx ops[5]; if (lane < 0 || lane >= max) -error ("lane out of range"); +internal_error ("lane out of range"); ops[0] = operands[0]; ops[1] = gen_rtx_REG (DImode, regno); ops[2] = gen_rtx_REG (DImode, regno + 2); @@ -4923,7 +4923,7 @@ if (BYTES_BIG_ENDIAN) int regno = REGNO (operands[1]); rtx ops[5]; if (lane < 0 || lane >= max) -error ("lane out of range"); +internal_error ("lane out of range"); else if (lane >= max / 2) { lane -= max / 2; @@ -5046,7 +5046,7 @@ if (BYTES_BIG_ENDIAN) int regno = REGNO (operands[0]); rtx ops[6]; if (lane < 0 || lane >= max) -error ("lane out of range"); +internal_error ("lane out of range"); ops[0] = gen_rtx_REG (DImode, regno); ops[1] = gen_rtx_REG (DImode, regno + 2); ops[2] = gen_rtx_REG (DImode, regno + 4); @@ -5074,7 +5074,7 @@ if (BYTES_BIG_ENDIAN) int regno = REGNO (operands[
[PATCH 4b/4] [ARM] PR63870 Remove error for invalid lane numbers
From: Charles Baylis Charles Baylis * config/arm/neon.md (neon_vld1_lane): Remove error for invalid lane number. (neon_vst1_lane): Likewise. (neon_vld2_lane): Likewise. (neon_vst2_lane): Likewise. (neon_vld3_lane): Likewise. (neon_vst3_lane): Likewise. (neon_vld4_lane): Likewise. (neon_vst4_lane): Likewise. Change-Id: Id7b4b6fa7320157e62e5bae574b4c4688d921774 --- gcc/config/arm/neon.md | 48 1 file changed, 8 insertions(+), 40 deletions(-) diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index e8db020..6574e6e 100644 --- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -4264,8 +4264,6 @@ if (BYTES_BIG_ENDIAN) HOST_WIDE_INT lane = ENDIAN_LANE_N(mode, INTVAL (operands[3])); HOST_WIDE_INT max = GET_MODE_NUNITS (mode); operands[3] = GEN_INT (lane); - if (lane < 0 || lane >= max) -error ("lane out of range"); if (max == 1) return "vld1.\t%P0, %A1"; else @@ -4286,9 +4284,7 @@ if (BYTES_BIG_ENDIAN) HOST_WIDE_INT max = GET_MODE_NUNITS (mode); operands[3] = GEN_INT (lane); int regno = REGNO (operands[0]); - if (lane < 0 || lane >= max) -error ("lane out of range"); - else if (lane >= max / 2) + if (lane >= max / 2) { lane -= max / 2; regno += 2; @@ -4372,8 +4368,6 @@ if (BYTES_BIG_ENDIAN) HOST_WIDE_INT lane = ENDIAN_LANE_N(mode, INTVAL (operands[2])); HOST_WIDE_INT max = GET_MODE_NUNITS (mode); operands[2] = GEN_INT (lane); - if (lane < 0 || lane >= max) -error ("lane out of range"); if (max == 1) return "vst1.\t{%P1}, %A0"; else @@ -4393,9 +4387,7 @@ if (BYTES_BIG_ENDIAN) HOST_WIDE_INT lane = ENDIAN_LANE_N(mode, INTVAL (operands[2])); HOST_WIDE_INT max = GET_MODE_NUNITS (mode); int regno = REGNO (operands[1]); - if (lane < 0 || lane >= max) -error ("lane out of range"); - else if (lane >= max / 2) + if (lane >= max / 2) { lane -= max / 2; regno += 2; @@ -4464,8 +4456,6 @@ if (BYTES_BIG_ENDIAN) HOST_WIDE_INT max = GET_MODE_NUNITS (mode); int regno = REGNO (operands[0]); rtx ops[4]; - if (lane < 0 || lane >= max) -error ("lane out of range"); ops[0] = gen_rtx_REG (DImode, regno); ops[1] = gen_rtx_REG (DImode, regno + 2); ops[2] = operands[1]; @@ -4489,9 +4479,7 @@ if (BYTES_BIG_ENDIAN) HOST_WIDE_INT max = GET_MODE_NUNITS (mode); int regno = REGNO (operands[0]); rtx ops[4]; - if (lane < 0 || lane >= max) -error ("lane out of range"); - else if (lane >= max / 2) + if (lane >= max / 2) { lane -= max / 2; regno += 2; @@ -4579,8 +4567,6 @@ if (BYTES_BIG_ENDIAN) HOST_WIDE_INT max = GET_MODE_NUNITS (mode); int regno = REGNO (operands[1]); rtx ops[4]; - if (lane < 0 || lane >= max) -error ("lane out of range"); ops[0] = operands[0]; ops[1] = gen_rtx_REG (DImode, regno); ops[2] = gen_rtx_REG (DImode, regno + 2); @@ -4604,9 +4590,7 @@ if (BYTES_BIG_ENDIAN) HOST_WIDE_INT max = GET_MODE_NUNITS (mode); int regno = REGNO (operands[1]); rtx ops[4]; - if (lane < 0 || lane >= max) -error ("lane out of range"); - else if (lane >= max / 2) + if (lane >= max / 2) { lane -= max / 2; regno += 2; @@ -4723,8 +4707,6 @@ if (BYTES_BIG_ENDIAN) HOST_WIDE_INT max = GET_MODE_NUNITS (mode); int regno = REGNO (operands[0]); rtx ops[5]; - if (lane < 0 || lane >= max) -error ("lane out of range"); ops[0] = gen_rtx_REG (DImode, regno); ops[1] = gen_rtx_REG (DImode, regno + 2); ops[2] = gen_rtx_REG (DImode, regno + 4); @@ -4750,9 +4732,7 @@ if (BYTES_BIG_ENDIAN) HOST_WIDE_INT max = GET_MODE_NUNITS (mode); int regno = REGNO (operands[0]); rtx ops[5]; - if (lane < 0 || lane >= max) -error ("lane out of range"); - else if (lane >= max / 2) + if (lane >= max / 2) { lane -= max / 2; regno += 2; @@ -4895,8 +4875,6 @@ if (BYTES_BIG_ENDIAN) HOST_WIDE_INT max = GET_MODE_NUNITS (mode); int regno = REGNO (operands[1]); rtx ops[5]; - if (lane < 0 || lane >= max) -error ("lane out of range"); ops[0] = operands[0]; ops[1] = gen_rtx_REG (DImode, regno); ops[2] = gen_rtx_REG (DImode, regno + 2); @@ -4922,9 +4900,7 @@ if (BYTES_BIG_ENDIAN) HOST_WIDE_INT max = GET_MODE_NUNITS (mode); int regno = REGNO (operands[1]); rtx ops[5]; - if (lane < 0 || lane >= max) -error ("lane out of range"); - else if (lane >= max / 2) + if (lane >= max / 2) { lane -= max / 2; regno += 2; @@ -5045,8 +5021,6 @@ if (BYTES_BIG_ENDIAN) HOST_WIDE_INT max = GET_MODE_NUNITS (mode); int regno = REGNO (operands[0]); rtx ops[6]; - if (lane &
Re: [PATCH 4b/4] [ARM] PR63870 Remove error for invalid lane numbers
On 11 November 2015 at 11:22, Kyrill Tkachov wrote: > Hi Charles, > > On 08/11/15 00:26, charles.bay...@linaro.org wrote: >> >> From: Charles Baylis >> >> Charles Baylis >> >> * config/arm/neon.md (neon_vld1_lane): Remove error for >> invalid >> lane number. >> (neon_vst1_lane): Likewise. >> (neon_vld2_lane): Likewise. >> (neon_vst2_lane): Likewise. >> (neon_vld3_lane): Likewise. >> (neon_vst3_lane): Likewise. >> (neon_vld4_lane): Likewise. >> (neon_vst4_lane): Likewise. >> > In this pattern the 'max' variable is now unused, causing a bootstrap > -Werror failure on arm. > I'll test a patch to fix it unless you beat me to it... Thanks for catching this. I have a patch, and have started a bootstrap. Unless you have objections, I'll apply as obvious once the bootstrap is complete later this afternoon. gcc/ChangeLog: 2015-11-11 Charles Baylis * config/arm/neon.md: (neon_vld2_lane): Remove unused max variable. (neon_vst2_lane): Likewise. (neon_vld3_lane): Likewise. (neon_vst3_lane): Likewise. (neon_vld4_lane): Likewise. (neon_vst4_lane): Likewise. From f111cb543bff0ad8756a0240f8bb1af1f19b Mon Sep 17 00:00:00 2001 From: Charles Baylis Date: Wed, 11 Nov 2015 11:59:44 + Subject: [PATCH] [ARM] remove unused variable gcc/ChangeLog: 2015-11-11 Charles Baylis * config/arm/neon.md: (neon_vld2_lane): Remove unused max variable. (neon_vst2_lane): Likewise. (neon_vld3_lane): Likewise. (neon_vst3_lane): Likewise. (neon_vld4_lane): Likewise. (neon_vst4_lane): Likewise. Change-Id: Ifed53e2d4c5a581770848cab65cf2e8d1d9039c3 --- gcc/config/arm/neon.md | 6 -- 1 file changed, 6 deletions(-) diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index 119550c..62fb6da 100644 --- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -4464,7 +4464,6 @@ if (BYTES_BIG_ENDIAN) "TARGET_NEON" { HOST_WIDE_INT lane = NEON_ENDIAN_LANE_N(mode, INTVAL (operands[3])); - HOST_WIDE_INT max = GET_MODE_NUNITS (mode); int regno = REGNO (operands[0]); rtx ops[4]; ops[0] = gen_rtx_REG (DImode, regno); @@ -4579,7 +4578,6 @@ if (BYTES_BIG_ENDIAN) "TARGET_NEON" { HOST_WIDE_INT lane = NEON_ENDIAN_LANE_N(mode, INTVAL (operands[2])); - HOST_WIDE_INT max = GET_MODE_NUNITS (mode); int regno = REGNO (operands[1]); rtx ops[4]; ops[0] = operands[0]; @@ -4723,7 +4721,6 @@ if (BYTES_BIG_ENDIAN) "TARGET_NEON" { HOST_WIDE_INT lane = NEON_ENDIAN_LANE_N (mode, INTVAL (operands[3])); - HOST_WIDE_INT max = GET_MODE_NUNITS (mode); int regno = REGNO (operands[0]); rtx ops[5]; ops[0] = gen_rtx_REG (DImode, regno); @@ -4895,7 +4892,6 @@ if (BYTES_BIG_ENDIAN) "TARGET_NEON" { HOST_WIDE_INT lane = NEON_ENDIAN_LANE_N(mode, INTVAL (operands[2])); - HOST_WIDE_INT max = GET_MODE_NUNITS (mode); int regno = REGNO (operands[1]); rtx ops[5]; ops[0] = operands[0]; @@ -5045,7 +5041,6 @@ if (BYTES_BIG_ENDIAN) "TARGET_NEON" { HOST_WIDE_INT lane = NEON_ENDIAN_LANE_N(mode, INTVAL (operands[3])); - HOST_WIDE_INT max = GET_MODE_NUNITS (mode); int regno = REGNO (operands[0]); rtx ops[6]; ops[0] = gen_rtx_REG (DImode, regno); @@ -5225,7 +5220,6 @@ if (BYTES_BIG_ENDIAN) "TARGET_NEON" { HOST_WIDE_INT lane = NEON_ENDIAN_LANE_N(mode, INTVAL (operands[2])); - HOST_WIDE_INT max = GET_MODE_NUNITS (mode); int regno = REGNO (operands[1]); rtx ops[6]; ops[0] = operands[0]; -- 1.9.1
Re: [PATCH 4b/4] [ARM] PR63870 Remove error for invalid lane numbers
On 11 November 2015 at 12:10, Kyrill Tkachov wrote: > > On 11/11/15 12:08, Charles Baylis wrote: >> >> On 11 November 2015 at 11:22, Kyrill Tkachov >> wrote: >>> >>> Hi Charles, >>> >>> On 08/11/15 00:26, charles.bay...@linaro.org wrote: >>>> >>>> From: Charles Baylis >>>> >>>> Charles Baylis >>>> >>>> * config/arm/neon.md (neon_vld1_lane): Remove error for >>>> invalid >>>> lane number. >>>> (neon_vst1_lane): Likewise. >>>> (neon_vld2_lane): Likewise. >>>> (neon_vst2_lane): Likewise. >>>> (neon_vld3_lane): Likewise. >>>> (neon_vst3_lane): Likewise. >>>> (neon_vld4_lane): Likewise. >>>> (neon_vst4_lane): Likewise. >>>> >>> In this pattern the 'max' variable is now unused, causing a bootstrap >>> -Werror failure on arm. >>> I'll test a patch to fix it unless you beat me to it... >> >> Thanks for catching this. >> >> I have a patch, and have started a bootstrap. Unless you have >> objections, I'll apply as obvious once the bootstrap is complete later >> this afternoon. > > > Yes, that's the exact patch I'm testing as well. > I'll let you finish the bootstrap and commit it. >> gcc/ChangeLog: >> >> 2015-11-11 Charles Baylis >> >> * config/arm/neon.md: (neon_vld2_lane): Remove unused max >> variable. >> (neon_vst2_lane): Likewise. >> (neon_vld3_lane): Likewise. >> (neon_vst3_lane): Likewise. >> (neon_vld4_lane): Likewise. >> (neon_vst4_lane): Likewise. Applied as r230203 after successful bootstrap on arm-unknown-linux-gnueabihf.
Re: [PATCH 1/4] [ARM] PR63870 Add qualifiers for NEON builtins
On 9 November 2015 at 09:03, Ramana Radhakrishnan wrote: > > Missing comment and please prefix this with NEON_ or SIMD_ . > >> >> +#define ENDIAN_LANE_N(mode, n) \ >> + (BYTES_BIG_ENDIAN ? GET_MODE_NUNITS (mode) - 1 - n : n) >> + > > Otherwise OK - With those changes, the attached patch was applied as r230142 From 4a05b67a1757e88e1989ce7515cd10de0a6def1c Mon Sep 17 00:00:00 2001 From: Charles Baylis Date: Wed, 17 Jun 2015 17:08:30 +0100 Subject: [PATCH 1/4] [ARM] PR63870 Add qualifiers for NEON builtins gcc/ChangeLog: Charles Baylis PR target/63870 * config/arm/arm-builtins.c (enum arm_type_qualifiers): New enumerator qualifier_struct_load_store_lane_index. (builtin_arg): New enumerator NEON_ARG_STRUCT_LOAD_STORE_LANE_INDEX. (arm_expand_neon_args): New parameter. Remove ellipsis. Handle NEON argument qualifiers. (arm_expand_neon_builtin): Handle new NEON argument qualifier. * config/arm/arm.h (NEON_ENDIAN_LANE_N): New macro. Change-Id: Iaa14d8736879fa53776319977eda2089f0a26647 --- gcc/config/arm/arm-builtins.c | 48 +++ gcc/config/arm/arm.c | 1 + gcc/config/arm/arm.h | 6 ++ 3 files changed, 37 insertions(+), 18 deletions(-) diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c index bad3dc3..d0bd777 100644 --- a/gcc/config/arm/arm-builtins.c +++ b/gcc/config/arm/arm-builtins.c @@ -67,7 +67,9 @@ enum arm_type_qualifiers /* Polynomial types. */ qualifier_poly = 0x100, /* Lane indices - must be within range of previous argument = a vector. */ - qualifier_lane_index = 0x200 + qualifier_lane_index = 0x200, + /* Lane indices for single lane structure loads and stores. */ + qualifier_struct_load_store_lane_index = 0x400 }; /* The qualifier_internal allows generation of a unary builtin from @@ -1963,6 +1965,7 @@ typedef enum { NEON_ARG_COPY_TO_REG, NEON_ARG_CONSTANT, NEON_ARG_LANE_INDEX, + NEON_ARG_STRUCT_LOAD_STORE_LANE_INDEX, NEON_ARG_MEMORY, NEON_ARG_STOP } builtin_arg; @@ -2020,9 +2023,9 @@ neon_dereference_pointer (tree exp, tree type, machine_mode mem_mode, /* Expand a Neon builtin. */ static rtx arm_expand_neon_args (rtx target, machine_mode map_mode, int fcode, - int icode, int have_retval, tree exp, ...) + int icode, int have_retval, tree exp, + builtin_arg *args) { - va_list ap; rtx pat; tree arg[SIMD_MAX_BUILTIN_ARGS]; rtx op[SIMD_MAX_BUILTIN_ARGS]; @@ -2037,13 +2040,11 @@ arm_expand_neon_args (rtx target, machine_mode map_mode, int fcode, || !(*insn_data[icode].operand[0].predicate) (target, tmode))) target = gen_reg_rtx (tmode); - va_start (ap, exp); - formals = TYPE_ARG_TYPES (TREE_TYPE (arm_builtin_decls[fcode])); for (;;) { - builtin_arg thisarg = (builtin_arg) va_arg (ap, int); + builtin_arg thisarg = args[argc]; if (thisarg == NEON_ARG_STOP) break; @@ -2079,6 +2080,18 @@ arm_expand_neon_args (rtx target, machine_mode map_mode, int fcode, op[argc] = copy_to_mode_reg (mode[argc], op[argc]); break; + case NEON_ARG_STRUCT_LOAD_STORE_LANE_INDEX: + gcc_assert (argc > 1); + if (CONST_INT_P (op[argc])) + { + neon_lane_bounds (op[argc], 0, +GET_MODE_NUNITS (map_mode), exp); + /* Keep to GCC-vector-extension lane indices in the RTL. */ + op[argc] = + GEN_INT (NEON_ENDIAN_LANE_N (map_mode, INTVAL (op[argc]))); + } + goto constant_arg; + case NEON_ARG_LANE_INDEX: /* Previous argument must be a vector, which this indexes. */ gcc_assert (argc > 0); @@ -2089,19 +2102,22 @@ arm_expand_neon_args (rtx target, machine_mode map_mode, int fcode, } /* Fall through - if the lane index isn't a constant then the next case will error. */ + case NEON_ARG_CONSTANT: +constant_arg: if (!(*insn_data[icode].operand[opno].predicate) (op[argc], mode[argc])) - error_at (EXPR_LOCATION (exp), "incompatible type for argument %d, " - "expected %", argc + 1); + { + error ("%Kargument %d must be a constant immediate", + exp, argc + 1); + return const0_rtx; + } break; + case NEON_ARG_MEMORY: /* Check if expand failed. */ if (op[argc] == const0_rtx) - { - va_end (ap); return 0; - } gcc_assert (MEM_P (op[argc])); PUT_MODE (op[argc], mode[argc]); /* ??? arm_neon.h uses the same built-in functions for signed @@ -2122,8 +2138,6 @@ arm_expand_neon_args (rtx target, machine_mode map_mode, int fcode, } } - va_end (ap); - if (have_retval) switch (argc) { @@ -2235,6 +2249,8 @@ arm_expand_neon_builtin (int fcode, tree exp, rtx target) if (d->qualifiers[qualifiers_k] & qualifier_lane_index) args[k] = NEON_ARG_LANE_INDEX; + else if (d->qualifiers[qualifiers_k] & qualifier_stru
Re: [PATCH 4b/4] [ARM] PR63870 Remove error for invalid lane numbers
On 9 November 2015 at 13:35, Ramana Radhakrishnan wrote: > > > On 08/11/15 00:26, charles.bay...@linaro.org wrote: >> From: Charles Baylis >> >> Charles Baylis >> >> * config/arm/neon.md (neon_vld1_lane): Remove error for invalid >> lane number. >> (neon_vst1_lane): Likewise. >> (neon_vld2_lane): Likewise. >> (neon_vst2_lane): Likewise. >> (neon_vld3_lane): Likewise. >> (neon_vst3_lane): Likewise. >> (neon_vld4_lane): Likewise. >> (neon_vst4_lane): Likewise. >> > > The only way we can get here is through the intrinsics - we do a check for > lane numbers earlier. > > If things go horribly wrong - the assembler will complain, so it's ok to > elide this internal_error here, thus OK. Applied as r230144
[PATCH v2] [ARM] PR61551 RFC: Improve costs for NEON addressing modes
Hi Following on from previous discussion: https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03464.html and IRC. I'm going to try once more to make the case for fixing the worst problem for GCC 6, pending a rewrite of the address_cost infrastructure for GCC 7. I think the rewrite you're describing is overkill for this problem. There is one specific problem which I would like to fix for GCC6, and that is the failure of the ARM backend to allow use of post-indexed addressing for some vector modes. Test program: #include char *f(char *p, int8x8x4_t v, int r) { vst4_s8(p, v); p+=32; return p; } Desired code: f: vst4.8 {d0-d3}, [r0]! bx lr Currently generated code: f: mov r3, r0 addsr0, r0, #32 vst4.8 {d0-d3}, [r3] bx lr The auto-inc-dec phase does not apply in this case, because the costs for RTXs which use POST_INC are wrong. Using gdb to poke at this, we can see: $ arm-unknown-linux-gnueabihf-gcc -mfpu=neon -O3 -S /tmp/foo.c -wrapper gdb,--args GNU gdb (Ubuntu 7.9-1ubuntu1) 7.9 Reading symbols from /home/charles.baylis/tools/tools-arm-unknown-linux-gnueabihf-git/bin/../libexec/gcc/arm-unknown-linux-gnueabihf/6.0.0/cc1...done. (gdb) b auto-inc-dec.c:473 Breakpoint 1 at 0x102c253: file /home/charles.baylis/srcarea/gcc/gcc-git/gcc/auto-inc-dec.c, line 473. (gdb) r (gdb) print debug_rtx(mem) (mem:OI (reg/v/f:SI 112 [ p ]) [0 MEM[(signed char[32] *)p_2(D)]+0 S32 A8]) $1 = void (gdb) print rtx_cost(mem, V16QImode, SET, 1, false) $2 = 4 (gdb) print debug_rtx(mem_tmp) (mem:OI (post_inc:SI (reg/f:SI 115 [ p ])) [0 S32 A64]) $3 = void (gdb) print rtx_cost(mem_tmp, V16QImode, SET, 1, false) $4 = 32 So, the cost of (mem:OI (reg/v/f:SI 112 [ p ])) is 4, while the cost of (mem:OI (post_inc:SI (reg/f:SI 115 [ p ]))) is 32. That is a difference equivalent to 7 insns, which has no basis in reality. It is just a bug. Addressing some specific review points from the previous version. > > +{ > > + 0, > > + COSTS_N_INSNS (15), > > + COSTS_N_INSNS (15), > > + COSTS_N_INSNS (15), > > + COSTS_N_INSNS (15) > > +} /* vec512 */ > >} > > }; > > I'm curious as to the numbers here - The costs should reflect the relative > costs of the > addressing modes not the costs of the loads and stores - thus having high > numbers > here for vector modes may just prevent this from even triggering in > auto-inc-dec > code ? In my experience with GCC I've never satisfactorily answered the > question > whether these should be comparable to rtx_costs or not. In an ideal world > they should > be but I'm never sure. IOW I'm not sure if using COSTS_N_INSNS or plain > numbers > here is appropriate. That's the point of the patch. These numbers give the same behaviour as the current arm_rtx_costs code, and they are obviously wrong. > 17:45 < ramana> My problem is that the mid-end in a number of other places > compares the cost coming out of rtx_cost and address_cost and if the 2 > are not in sync we get funny values. There is already no correspondence at all between the two at present. My patch doesn't address this, but I think it must at least make it better. However, I don't really understand this comment - as you point out above, address_cost and rtx_cost return values measured in different units. I don't see how they can be made to correspond, given that. > Right, but this does not change arm_address_costs - so how is this going to > work? > I would like this moved into a new function aarch_address_costs and that > replacing > arm_address_costs only to be called from here. I could do that, but if I did, I would have to resubmit the patch at https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00387.html along with a reimplemention of arm_address_costs which used a table without changing its numerical results (pending subsequent tuning). Since the former would already solve my problem, and the latter would then be a pure code clean up of a separate function, why not accept the '387 patch as is, and leave the clean up until GCC 7? Alternatively, this is an updated patch series which changes the costs for MEMs in arm_rtx_costs using the table. Passes make check with no regressions for arm-unknown-linux-gnueabihf on qemu. From d8110f141a449c62f1ba2c4f47832ee2633d3998 Mon Sep 17 00:00:00 2001 From: Charles Baylis Date: Wed, 28 Oct 2015 18:48:16 + Subject: [PATCH 1/4] Add table-driven implemention of "case MEM:" in arm_rtx_costs_new. This patch replicates the existing cost calculation using a table, so that the costs can be tuned cleanly. The old implementation is retained for comparison, and check is made that the same result is obtained from both methods. Change-Id: If349ffd7dbbe13a814be4a
Re: [PATCH] [ARM, Callgraph] Fix PR67280: function incorrectly marked as nothrow
Backported r227407 to gcc-5-branch following approval on IRC. The patch applied without conflicts. 2015-11-16 Charles Baylis Backport from mainline r227407 PR ipa/67280 * cgraphunit.c (cgraph_node::create_wrapper): Set can_throw_external in new callgraph edge. On 20 September 2015 at 23:53, Charles Baylis wrote: > On 7 September 2015 at 09:35, Charles Baylis > wrote: >>>> >gcc/ChangeLog: >>>> > >>>> >2015-08-28 Charles Baylis >>>> > >>>> > * cgraphunit.c (cgraph_node::create_wrapper): Set >>>> > can_throw_external >>>> > in new callgraph edge. >> >> Committed to trunk as r227407. >> >> Are you happy for me to backport to gcc-5-branch? > > Hi Jan, > > I'd still like to backport this patch to gcc 5. Is that OK > > Thanks > Charles
Re: Incorrect code due to indirect tail call of varargs function with hard float ABI
On 16 November 2015 at 22:24, Kugan wrote: > Please note that we have a sibcall from "broken" to "indirect". > > "direct" is variadic function so it is conforming to AAPCS base standard. > > "broken" is a non-variadic function and will return the value in > floating point register for TARGET_HARD_FLOAT. Thus we should not be > doing sibcall here. > > Attached patch fixes this. Bootstrap and regression testing is ongoing. > Is this OK if no issues with the testing? Hi Kugan, It looks like this patch should work, but I think this is an overly conservative fix, as it prevents all sibcalls for hardfloat targets. It would be better if only variadic sibcalls were prevented on hardfloat. You can check for variadic calls by checking the function_type in the call expression (exp) using stdarg_p(). As an example to show how to test for variadic function calls, this is how to test it in gdb: (gdb) b arm_function_ok_for_sibcall Breakpoint 1 at 0xdae59c: file /home/cbaylis/srcarea/gcc/gcc-git/gcc/config/arm/arm.c, line 6634. (gdb) r ... Breakpoint 1, arm_function_ok_for_sibcall (decl=0x0, exp=0x76104ce8) at /home/cbaylis/srcarea/gcc/gcc-git/gcc/config/arm/arm.c:6634 6634 if (cfun->machine->sibcall_blocked) (gdb) print debug_tree(exp) unit size align 64 symtab 0 alias set -1 canonical type 0x762835e8 precision 64 pointer_to_this > side-effects addressable fn ... (gdb) print stdarg_p((tree)0x760e9348)<--- from function_type ^ $2 = true
Re: [PATCH 3/4] [ARM] PR63870 Add test cases
Applied to trunk as r231077. On 26 November 2015 at 09:43, James Greenhalgh wrote: > On Thu, Nov 26, 2015 at 09:41:15AM +0000, Charles Baylis wrote: >> Hi James, >> >> Ping. This needs an ack from an AArch64 reviewer/maintainer > > Fine by me, it will considerably clean up my test results for ARM! > > Thanks, > James > >
Re: [PATCH][Aarch64] Add vectorized mersenne twister
On 6 June 2017 at 11:07, James Greenhalgh wrote: > If we don't mind that, and instead take a GCC-centric view, we could avoid > namespace polution by using the GCC-internal names for the intrinsics > (__builtin_aarch64_...). As those names don't form a guaranteed interface, > that would tie us to a GCC version. > > So we have a few solutions to choose from, each of which invokes a trade-off: > > 1 Use the current names and pollute the namespace. > 2 Use the GCC internal __builtin_aarch64* names and tie libstdc++ to GCC > internals. > 3 Define a new set of namespace-clean names and complicate the Neon > intrinsic interface while we migrate old users to new names. > > I can see why the libstdc++ community would prefer 3) over the other options, > but I'm reticent to take that route as the cost to our intrinsic maintainers > and users looks high. I've added some of the other ARM port maintainers > for their opinion. > > Are there any other options I'm missing? If solving for C++ only is OK, then it might be feasible to do something like: namespace __arm_neon_for_ext_random { #include// like arm_neon.h, but without include guards [*] } Then the libstdc++ headers can use "using namespace __arm_neon_for_ext_random" inside the functions which use NEON intrinsics. [*] without include guards so that other header files can use the same trick in their own namespace. I'm not sure if this will work for all host compilers, with GCC I think it's OK because the intrinsics are implemented as inline functions, rather than #defines, but not all compilers have taken that approach.
Re: [AArch64, testsuite] gfortran.dg/ieee/ieee_8.f90: xfail for aarch64+ilp32
On 30 October 2017 at 08:52, Janne Blomqvist wrote: > On Tue, Oct 24, 2017 at 9:27 PM, Charles Baylis > wrote: >> The test is already marked xfail for aarch64*-*-gnu, but this needs to >> be changed to aarch64*-*-gnu* in order to match >> aarch64-linux-gnu_ilp32. >> >> Test was previously xfail'd in [1]. >> >> Shows the expected FAIL->XFAILs on aarch64-linux-gnu_ilp32. >> >> gcc/testsuite: >> Charles Baylis >> >> * gfortran.dg/ieee/ieee_8.f90: xfail for aarch64*-*-gnu* >> >> [1] https://gcc.gnu.org/ml/gcc-patches/2016-11/msg02188.html > > Ok, thanks for the patch. Committed as r254689.
Re: [AArch64, testsuite] gcc.target/aarch64/extend.c: xfails for ilp32
On 24 October 2017 at 19:40, Andrew Pinski wrote: > On Tue, Oct 24, 2017 at 11:27 AM, Charles Baylis > wrote: >> In ILP32, GCC fails to merge pointer arithmetic into the addressing >> mode of a load instruction, as >> add w0, w0, w1, lsl 2 >> ldr w0, [x0] >> is not equivalent to: >> ldr w0, [x0, w1, lsl 2] >> >> Shows the expected FAIL->XFAILs on aarch64-linux-gnu_ilp32, no >> regressions on aarch64-linux-gnu. > > Then this is not a xfail but rather the dg-final should be skipped for ilp32. > xfail means the failure can be fixed in the future but in this case, > the failure is not fixable. > > Something like: > { target { !ilp32 } } > > or (what I used in my changes which I was going to submit but had > higher priorities than submitting testcase fixes): > > { target { lp64 } } Patch updated. gcc/testsuite/ChangeLog: Charles Baylis * gcc.target/aarch64/extend.c (ldr_uxtw): Don't scan assembler for ilp32. (ldr_uxtw0): Likewise. (ldr_sxtw): Likewise. (ldr_sxtw0): Likewise. From a6cf8b5e0928fa2c775f4ef1c3d607cc4700305f Mon Sep 17 00:00:00 2001 From: Charles Baylis Date: Tue, 24 Oct 2017 14:22:11 +0100 Subject: [PATCH 2/4] [AArch64] gcc.target/aarch64/extend.c: Skip ldr tests for ilp32 gcc/testsuite/ChangeLog: Charles Baylis * gcc.target/aarch64/extend.c (ldr_uxtw): Don't scan assembler for ilp32. (ldr_uxtw0): Likewise. (ldr_sxtw): Likewise. (ldr_sxtw0): Likewise. --- gcc/testsuite/gcc.target/aarch64/extend.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/gcc/testsuite/gcc.target/aarch64/extend.c b/gcc/testsuite/gcc.target/aarch64/extend.c index f399e55..be89cba 100644 --- a/gcc/testsuite/gcc.target/aarch64/extend.c +++ b/gcc/testsuite/gcc.target/aarch64/extend.c @@ -4,28 +4,28 @@ int ldr_uxtw (int *arr, unsigned int i) { - /* { dg-final { scan-assembler "ldr\tw\[0-9\]+,.*uxtw #?2]" } } */ + /* { dg-final { scan-assembler "ldr\tw\[0-9\]+,.*uxtw #?2]" { target { ! ilp32 } } } } */ return arr[i]; } int ldr_uxtw0 (char *arr, unsigned int i) { - /* { dg-final { scan-assembler "ldr\tw\[0-9\]+,.*uxtw]" } } */ + /* { dg-final { scan-assembler "ldr\tw\[0-9\]+,.*uxtw]" { target { ! ilp32 } } } } */ return arr[i]; } int ldr_sxtw (int *arr, int i) { - /* { dg-final { scan-assembler "ldr\tw\[0-9\]+,.*sxtw #?2]" } } */ + /* { dg-final { scan-assembler "ldr\tw\[0-9\]+,.*sxtw #?2]" { target { ! ilp32 } } } } */ return arr[i]; } int ldr_sxtw0 (char *arr, int i) { - /* { dg-final { scan-assembler "ldr\tw\[0-9\]+,.*sxtw]" } } */ + /* { dg-final { scan-assembler "ldr\tw\[0-9\]+,.*sxtw]" { target { ! ilp32 } } } } */ return arr[i]; } -- 2.7.4
Re: [PATCH 2/3] [ARM] Refactor costs calculation for MEM.
On 15 September 2017 at 18:01, Kyrill Tkachov wrote: > > On 15/09/17 16:38, Charles Baylis wrote: >> >> On 13 September 2017 at 10:02, Kyrill Tkachov >> wrote: >>> >>> Hi Charles, >>> >>> On 12/09/17 09:34, charles.bay...@linaro.org wrote: >>>> >>>> From: Charles Baylis >>>> >>>> This patch moves the calculation of costs for MEM into a >>>> separate function, and reforms the calculation into two >>>> parts. Firstly any additional cost of the addressing mode >>>> is calculated, and then the cost of the memory access itself >>>> is added. >>>> >>>> In this patch, the calculation of the cost of the addressing >>>> mode is left as a placeholder, to be added in a subsequent >>>> patch. >>>> >>> Can you please mention how has this series been tested? >>> A bootstrap and test run on arm-none-linux-gnueabihf is required at >>> least. >> >> It has been tested with make check on arm-unknown-linux-gnueabihf with >> no regressions. I've successfully bootstrapped the next spin. > > > Thanks. > >>> Also, do you have any benchmarking results for this? >>> I agree that generating the addressing modes in the new tests is >>> desirable. >>> So I'm not objecting to the goal of this patch, but a check to make sure >>> that this doesn't regress SPEC >>> would be great. Further comments on the patch inline. >> >> SPEC2006 scores are unaffected by this patch on Cortex-A57. > > > Good, thanks for checking :) > > >>>> +/* Helper function for arm_rtx_costs_internal. Calculates the cost of >>>> a >>>> MEM, >>>> + considering the costs of the addressing mode and memory access >>>> + separately. */ >>>> +static bool >>>> +arm_mem_costs (rtx x, const struct cpu_cost_table *extra_cost, >>>> + int *cost, bool speed_p) >>>> +{ >>>> + machine_mode mode = GET_MODE (x); >>>> + if (flag_pic >>>> + && GET_CODE (XEXP (x, 0)) == PLUS >>>> + && will_be_in_index_register (XEXP (XEXP (x, 0), 1))) >>>> +/* This will be split into two instructions. Add the cost of the >>>> + additional instruction here. The cost of the memory access is >>>> computed >>>> + below. See arm.md:calculate_pic_address. */ >>>> +*cost = COSTS_N_INSNS (1); >>>> + else >>>> +*cost = 0; >>> >>> >>> For speed_p we want the size cost of the insn (COSTS_N_INSNS (1) for a >>> each >>> insn) >>> plus the appropriate field in extra_cost. So you should unconditionally >>> initialise the cost >>> to COSTS_N_INSNS (1), conditionally increment it by COSTS_N_INSNS (1) >>> with >>> the condition above. >> >> OK. I also have to subtract that COSTS_N_INSNS (1) in the if (speed_p) >> part because the cost of a single bus transfer is included in that >> initial cost. >> >>>> + >>>> + /* Calculate cost of the addressing mode. */ >>>> + if (speed_p) >>>> +{ >>>> + /* TODO: Add table-driven costs for addressing modes. (See patch >>>> 2) */ >>>> +} >>> >>> >>> You mean "patch 3". I recommend you just remove this conditional from >>> this >>> patch and add the logic >>> in patch 3 entirely. >> >> OK. >> >>>> + >>>> + /* Calculate cost of memory access. */ >>>> + if (speed_p) >>>> +{ >>>> + /* data transfer is transfer size divided by bus width. */ >>>> + int bus_width_bytes = current_tune->bus_width / 4; >>> >>> >>> This should be bus_width / BITS_PER_UNIT to get the size in bytes. >>> BITS_PER_UNIT is 8 though, so you'll have to double check to make sure >>> the >>> cost calculation and generated code is still appropriate. >> >> Oops, I changed the units around and messed this up. I'll fix this. >> >>>> + *cost += CEIL (GET_MODE_SIZE (mode), bus_width_bytes); >>>> + *cost += extra_cost->ldst.load; >>>> +} >>>> + else >>>> +{ >>>> + *cost += COSTS_N_INSNS (1); >>>> +} >>> >>> Given my first comment above this else woul
Re: [PATCH 1/3] [ARM] Add bus_width_bits to tune_params
On 15 September 2017 at 18:01, Kyrill Tkachov wrote: > From what I can tell Ramana and Richard preferred to encode this attribute > as > a tuning struct property rather than an inline conditional based on > arm_arch7. > I agree that if we want to use that information, it should be encoded this > way. > What I'm not convinced about is whether we do want this parameter in the > first place. > > The cost tables already encode information about the costs of different > sized loads/stores. > In patch 2, for example, you add the cost for extra_cost->ldst.load which is > nominally just > the cost of a normal 32-bit ldr. But we also have costs for ldst.ldrd which > is the 64-bit two-register load > which should reflect any extra cost due to a narrower bus in it. We also > have costs for ldst.loadf (for 32-bit > VFP loads) and ldst.loadd (for 64-bit VFP D-register loads). So I think we > should use those cost fields > depending on the mode class and size instead of using ldst.load > unconditionally and adding a new bus_size parameter. > > So I think the way forward is to drop this patch and modify patch 2/3 to use > the extra_cost->ldst fields as described above. > > Sorry for the back-and-forth. I think this is the best approach because it > uses the existing fields more naturally and > doesn't add new parameters that partly duplicate the information encoded in > the existing fields. > Ramana, Richard: if you prefer the bus_width approach I won't block it, but > could you clarify your preference? > If we do end up adding the bus_width parameter then this patch and patch 2/3 > look ok. > Thanks, > Kyrill > > P.S. I'm going on a 4-week holiday from today, so I won't be able to do any > further review in that timeframe. > As I said, if we go with the bus_size approach then these patches are ok. If > we go with my suggestion, this would > be dropped and patch 2 would be extended to select the appropriate > extra_cost->ldst field depending on mode. OK, I agree with dropping this patch. I have posted an updated patch 2 which does not require it.
Re: [PATCH 2/3] [ARM] Refactor costs calculation for MEM.
On 20 November 2017 at 21:09, Charles Baylis wrote: > I have modified this patch accordingly. Patch 1 is no longer needed. > > Passes "make check" (with patch 3) on arm-linux-gnueabihf with no > regressions. Bootstrap is in progress. Bootstrap built successfully using qemu host. > Can I still get this in during stage 3? > > gcc/ChangeLog: > > Charles Baylis > > * config/arm/arm.c (arm_mem_costs): New function. > (arm_rtx_costs_internal): Use arm_mem_costs. > > gcc/testsuite/ChangeLog: > > Charles Baylis > > * gcc.target/arm/addr-modes-float.c: New test. > * gcc.target/arm/addr-modes-int.c: New test. > * gcc.target/arm/addr-modes.h: New header.
Re: [PATCH 2/3] [ARM] Refactor costs calculation for MEM.
On 23 November 2017 at 10:01, Kyrill Tkachov wrote: > > Thanks, these are ok for trunk. > They were originally posted way before stage 3 and this is just a rework, > so it's acceptable at this stage as far as I'm concerned. Thanks. Committed to trunk as r255111.
Re: [PATCH 3/3] [ARM] Add table of costs for AAarch32 addressing modes.
On 15 September 2017 at 17:57, Kyrill Tkachov wrote: > > Thanks, this is ok once the prerequisites are sorted. Patch 1 was abandoned, and a later version of patch 2 has been committed, so this was applied to trunk as r255112.
[PATCH] ARM testsuite: force hardfp for addr-modes-float.c
Some of the new tests in addr-modes-float.c, which were introduced for the rework of addressing modes costs [1] fail when GCC is configured to default to a softfp calling convention. Fix this by annotating the test functions with __attribute__((pcs("aapcs-vfp"))). Thanks to Christophe for pointing this out. [1] https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02149.html Charles Baylis * gcc.target/arm/addr-modes-float.c (ATTR): New define. (POST_STORE): Pass ATTR as 2nd argument. (POST_LOAD): Likewise. (POST_STORE_VEC): Likewise. * gcc.target/arm/addr-modes-int.c (ATTR): New define. (PRE_STORE): Pass ATTR as 2nd argument. (POST_STORE): Likewise. (PRE_LOAD): Likewise. (POST_LOAD): Likewise. * gcc.target/arm/addr-modes.h (PRE_STORE): New parameter. (POST_STORE): Likewise. (POST_STORE_VEC): Likewise. (PRE_LOAD): Likewise. (POST_LOAD): Likewise. (POST_LOAD_VEC): Likewise. From c8743026e53429131e6677aaca7b0840ecc11e25 Mon Sep 17 00:00:00 2001 From: Charles Baylis Date: Fri, 24 Nov 2017 16:24:18 + Subject: [PATCH] [ARM] testsuite: force hardfp in addr-modes-float.c gcc/testsuite/ChangeLog: Charles Baylis * gcc.target/arm/addr-modes-float.c (ATTR): New define. (POST_STORE): Pass ATTR as 2nd argument. (POST_LOAD): Likewise. (POST_STORE_VEC): Likewise. * gcc.target/arm/addr-modes-int.c (ATTR): New define. (PRE_STORE): Pass ATTR as 2nd argument. (POST_STORE): Likewise. (PRE_LOAD): Likewise. (POST_LOAD): Likewise. * gcc.target/arm/addr-modes.h (PRE_STORE): New parameter. (POST_STORE): Likewise. (POST_STORE_VEC): Likewise. (PRE_LOAD): Likewise. (POST_LOAD): Likewise. (POST_LOAD_VEC): Likewise. Change-Id: I7f85e811194098da8f1b7d243653d7873f132fff --- gcc/testsuite/gcc.target/arm/addr-modes-float.c | 26 +- gcc/testsuite/gcc.target/arm/addr-modes-int.c | 35 ++--- gcc/testsuite/gcc.target/arm/addr-modes.h | 30 ++--- 3 files changed, 48 insertions(+), 43 deletions(-) diff --git a/gcc/testsuite/gcc.target/arm/addr-modes-float.c b/gcc/testsuite/gcc.target/arm/addr-modes-float.c index 3b4235c..300a2bea 100644 --- a/gcc/testsuite/gcc.target/arm/addr-modes-float.c +++ b/gcc/testsuite/gcc.target/arm/addr-modes-float.c @@ -7,35 +7,37 @@ #include "addr-modes.h" -POST_STORE(float) +#define ATTR __attribute__((__pcs__("aapcs-vfp"))) + +POST_STORE(float, ATTR) /* { dg-final { scan-assembler "vstmia.32" } } */ -POST_STORE(double) +POST_STORE(double, ATTR) /* { dg-final { scan-assembler "vstmia.64" } } */ -POST_LOAD(float) +POST_LOAD(float, ATTR) /* { dg-final { scan-assembler "vldmia.32" } } */ -POST_LOAD(double) +POST_LOAD(double, ATTR) /* { dg-final { scan-assembler "vldmia.64" } } */ -POST_STORE_VEC (int8_t, int8x8_t, vst1_s8) +POST_STORE_VEC (int8_t, int8x8_t, vst1_s8, ATTR) /* { dg-final { scan-assembler "vst1.8\t\{.*\}, \\\[r\[0-9\]+\\\]!" } } */ -POST_STORE_VEC (int8_t, int8x16_t, vst1q_s8) +POST_STORE_VEC (int8_t, int8x16_t, vst1q_s8, ATTR) /* { dg-final { scan-assembler "vst1.8\t\{.*\[-,\]d.*\}, \\\[r\[0-9\]+\\\]!" } } */ -POST_STORE_VEC (int8_t, int8x8x2_t, vst2_s8) +POST_STORE_VEC (int8_t, int8x8x2_t, vst2_s8, ATTR) /* { dg-final { scan-assembler "vst2.8\t\{.*\}, \\\[r\[0-9\]+\\\]!" } } */ -POST_STORE_VEC (int8_t, int8x16x2_t, vst2q_s8) +POST_STORE_VEC (int8_t, int8x16x2_t, vst2q_s8, ATTR) /* { dg-final { scan-assembler "vst2.8\t\{.*-d.*\}, \\\[r\[0-9\]+\\\]!" } } */ -POST_STORE_VEC (int8_t, int8x8x3_t, vst3_s8) +POST_STORE_VEC (int8_t, int8x8x3_t, vst3_s8, ATTR) /* { dg-final { scan-assembler "vst3.8\t\{.*\}, \\\[r\[0-9\]+\\\]!" } } */ -POST_STORE_VEC (int8_t, int8x16x3_t, vst3q_s8) +POST_STORE_VEC (int8_t, int8x16x3_t, vst3q_s8, ATTR) /* { dg-final { scan-assembler "vst3.8\t\{d\[02468\], d\[02468\], d\[02468\]\}, \\\[r\[0-9\]+\\\]!" } } */ /* { dg-final { scan-assembler "vst3.8\t\{d\[13579\], d\[13579\], d\[13579\]\}, \\\[r\[0-9\]+\\\]!" { xfail *-*-* } } } */ -POST_STORE_VEC (int8_t, int8x8x4_t, vst4_s8) +POST_STORE_VEC (int8_t, int8x8x4_t, vst4_s8, ATTR) /* { dg-final { scan-assembler "vst4.8\t\{.*\}, \\\[r\[0-9\]+\\\]!" } } */ -POST_STORE_VEC (int8_t, int8x16x4_t, vst4q_s8) +POST_STORE_VEC (int8_t, int8x16x4_t, vst4q_s8, ATTR) /* { dg-final { scan-assembler "vst4.8\t\{d\[02468\], d\[02468\], d\[02468\], d\[02468\]\}, \\\[r\[0-9\]+\\\]!" } } */ /* { dg-final { scan-assembler "vst4.8\t\{d\[13579\], d\[13579\], d\[13579\], d\[13579\]\}, \\\[r\[0-9\]+\\\]!" { xfail *-*-* } } } */ diff --git a/gcc/testsuite/gcc.target/arm/addr-modes-int.c b/gcc/testsuite/gcc.target/arm/addr-modes-int.c index e3e1e6a..90b7425 100644 --- a/gcc/testsuite/gcc.target/arm/addr-modes-int.c +++ b/gcc/testsuite/gcc.target/arm/addr-mod
Re: [PATCH] ARM testsuite: force hardfp for addr-modes-float.c
On 27 November 2017 at 17:47, Kyrill Tkachov wrote: > Hi Charles, > > On 27/11/17 17:03, Charles Baylis wrote: >> >> Some of the new tests in addr-modes-float.c, which were introduced for >> the rework of addressing modes costs [1] fail when GCC is configured >> to default to a softfp calling convention. Fix this by annotating the >> test functions with __attribute__((pcs("aapcs-vfp"))). > > > The usual approach to this problem is to add an -mfloat-abi=hard to the > dg-options > of the test (the tests are not dg-run, so there's no link-time mismatch > concerns). > Any particular reason to use the pcs attribute instead? With the way I have GCC configured, it doesn't work to do this when including certain system headers, such as arm_neon.h. In file included from /home/cbaylis/tools/sysroot-arm-unknown-linux-gnueabi-git/usr/include/features.h:447, from /home/cbaylis/tools/sysroot-arm-unknown-linux-gnueabi-git/usr/include/bits/libc-header-start.h:33, from /home/cbaylis/tools/sysroot-arm-unknown-linux-gnueabi-git/usr/include/stdint.h:26, from /home/cbaylis/buildarea/gcc/build2/gcc/include/stdint.h:9, from /home/cbaylis/buildarea/gcc/build2/gcc/include/arm_fp16.h:34, from /home/cbaylis/buildarea/gcc/build2/gcc/include/arm_neon.h:41, from /home/cbaylis/srcarea/gcc/gcc-git/gcc/testsuite/gcc.target/arm/addr-modes-float.c:6: /home/cbaylis/tools/sysroot-arm-unknown-linux-gnueabi-git/usr/include/gnu/stubs.h:10:11: fatal error: gnu/stubs-hard.h: No such file or directory compilation terminated.
Re: [PATCH] ARM testsuite: force hardfp for addr-modes-float.c
On 30 November 2017 at 15:56, Kyrill Tkachov wrote: > > So is it the case that you don't run any arm tests that include arm_neon.h > in your configuration? No, it is only the case that any arm test which includes arm_neon.h (in fact, any system header) *and* uses dg-add-options -mfloat-abi=hard fails on my configuration (And -mfloat-abi=softfp fails in my configurations which default to hardfp). [1] The only test which currently has -mfloat-abi=hard and #include is gcc.target/arm/pr51534.c, and it FAILs in my arm-unknown-linux-gnueabi configuration. > If so, then I would be fine with leaving this test unsupported on this > configuration. I don't see why, when the test can simply be fixed with attribute((pcs)), but if you prefer I can respin the patch accordingly. > By the way, I notice that in addr-modes-float.c the arm_neon_ok check is > placed before the dg-add-options. > I don't remember the arcane rules exactly, but I think the effective target > check should go before it, so that the test gets skipped properly. OK, I can respin the patch with that change. [1] full details as follows: $ arm-unknown-linux-gnueabi-gcc -v COLLECT_GCC=/home/cbaylis/tools//tools-arm-unknown-linux-gnueabi-git/bin/arm-unknown-linux-gnueabi-gcc COLLECT_LTO_WRAPPER=/home/cbaylis/tools/tools-arm-unknown-linux-gnueabi-git/bin/../libexec/gcc/arm-unknown-linux-gnueabi/8.0.0/lto-wrapper Target: arm-unknown-linux-gnueabi Configured with: /home/cbaylis/srcarea/gcc/gcc-git/configure --prefix=/home/cbaylis/tools//tools-arm-unknown-linux-gnueabi-git --target=arm-unknown-linux-gnueabi --enable-languages=c,c++ --with-sysroot=/home/cbaylis/tools//sysroot-arm-unknown-linux-gnueabi-git --with-arch=armv7-a --with-tune=cortex-a9 --with-fpu=vfpv3-d16 --with-float=softfp --with-mode=thumb Thread model: posix gcc version 8.0.0 20171124 (experimental) (GCC) $ cat tn.c #include $ arm-unknown-linux-gnueabi-gcc -mfloat-abi=hard tn.c In file included from /home/cbaylis/tools/sysroot-arm-unknown-linux-gnueabi-git/usr/include/features.h:447, from /home/cbaylis/tools/sysroot-arm-unknown-linux-gnueabi-git/usr/include/bits/libc-header-start.h:33, from /home/cbaylis/tools/sysroot-arm-unknown-linux-gnueabi-git/usr/include/stdio.h:27, from tn.c:2: /home/cbaylis/tools/sysroot-arm-unknown-linux-gnueabi-git/usr/include/gnu/stubs.h:10:11: fatal error: gnu/stubs-hard.h: Dosiero aŭ dosierujo ne ekzistas # include ^~ compilation terminated.
[PATCH 0/2] [AARCH64,NEON] Improve vld[234](q?)_lane intrinsics v2
From: Charles Baylis This patch series converts the vld[234](q?)_lane intrinsics to use builtin functions instead of the previous inline assembler syntax. Changes since v1: . the type-punning to change between the array of vector types and the internal builtin types has been removed, as this is a separate, more complex problem. (patches 3&4 dropped, patch 2 reworked) . iterator style cleanups (patch 1) . removed broken bigendian lane number conversion. (patch 1) Tested with make check on aarch64-oe-linux with qemu, and also passes clyon's NEON intrinsics tests. Charles Baylis (2): [AARCH64,NEON] Add patterns + builtins for vld[234](q?)_lane_* intrinsics [AARCH64,NEON] Convert arm_neon.h to use new builtins for vld[234](q?)_lane_* gcc/config/aarch64/aarch64-builtins.c| 5 + gcc/config/aarch64/aarch64-simd-builtins.def | 4 + gcc/config/aarch64/aarch64-simd.md | 95 +++ gcc/config/aarch64/aarch64.md| 3 + gcc/config/aarch64/arm_neon.h| 377 ++- 5 files changed, 362 insertions(+), 122 deletions(-) -- 1.9.1
[PATCH 1/2] [AARCH64,NEON] Add patterns + builtins for vld[234](q?)_lane_* intrinsics
From: Charles Baylis This patch adds new patterns and builtins to represent single lane structure loads instructions, which will be used to implement the vld[234](q?)_lane_* intrinsics. Tested (with the rest of the patch series) with make check on aarch64-oe-linux with qemu, and also causes no regressions in clyon's NEON intrinsics tests. Charles Baylis * config/aarch64/aarch64-builtins.c (aarch64_types_loadstruct_lane_qualifiers): Define. * config/aarch64/aarch64-simd-builtins.def (ld2_lane, ld3_lane, ld4_lane): New builtins. * config/aarch64/aarch64-simd.md (vec_load_lanesoi_lane): New pattern. (vec_load_lanesci_lane): Likewise. (vec_load_lanesxi_lane): Likewise. (aarch64_ld2_lane): New expand. (aarch64_ld3_lane): Likewise. (aarch64_ld4_lane): Likewise. * config/aarch64/aarch64.md (define_c_enum "unspec"): Add UNSPEC_LD2_LANE, UNSPEC_LD3_LANE, UNSPEC_LD4_LANE. --- gcc/config/aarch64/aarch64-builtins.c| 5 ++ gcc/config/aarch64/aarch64-simd-builtins.def | 4 ++ gcc/config/aarch64/aarch64-simd.md | 95 gcc/config/aarch64/aarch64.md| 3 + 4 files changed, 107 insertions(+) diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c index 3dba1b2..368d3a7 100644 --- a/gcc/config/aarch64/aarch64-builtins.c +++ b/gcc/config/aarch64/aarch64-builtins.c @@ -201,6 +201,11 @@ aarch64_types_load1_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_none, qualifier_const_pointer_map_mode }; #define TYPES_LOAD1 (aarch64_types_load1_qualifiers) #define TYPES_LOADSTRUCT (aarch64_types_load1_qualifiers) +static enum aarch64_type_qualifiers +aarch64_types_loadstruct_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_none, qualifier_const_pointer_map_mode, + qualifier_none, qualifier_none }; +#define TYPES_LOADSTRUCT_LANE (aarch64_types_loadstruct_lane_qualifiers) static enum aarch64_type_qualifiers aarch64_types_bsl_p_qualifiers[SIMD_MAX_BUILTIN_ARGS] diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def index 2367436..348f0d2 100644 --- a/gcc/config/aarch64/aarch64-simd-builtins.def +++ b/gcc/config/aarch64/aarch64-simd-builtins.def @@ -83,6 +83,10 @@ BUILTIN_VQ (LOADSTRUCT, ld2, 0) BUILTIN_VQ (LOADSTRUCT, ld3, 0) BUILTIN_VQ (LOADSTRUCT, ld4, 0) + /* Implemented by aarch64_ld_lane. */ + BUILTIN_VQ (LOADSTRUCT_LANE, ld2_lane, 0) + BUILTIN_VQ (LOADSTRUCT_LANE, ld3_lane, 0) + BUILTIN_VQ (LOADSTRUCT_LANE, ld4_lane, 0) /* Implemented by aarch64_st. */ BUILTIN_VDC (STORESTRUCT, st2, 0) BUILTIN_VDC (STORESTRUCT, st3, 0) diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index cab26a3..ff71291 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -3991,6 +3991,18 @@ [(set_attr "type" "neon_load2_2reg")] ) +(define_insn "vec_load_lanesoi_lane" + [(set (match_operand:OI 0 "register_operand" "=w") + (unspec:OI [(match_operand: 1 "aarch64_simd_struct_operand" "Utv") + (match_operand:OI 2 "register_operand" "0") + (match_operand:SI 3 "immediate_operand" "i") + (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY) ] + UNSPEC_LD2_LANE))] + "TARGET_SIMD" + "ld2\\t{%S0. - %T0.}[%3], %1" + [(set_attr "type" "neon_load2_one_lane")] +) + (define_insn "vec_store_lanesoi" [(set (match_operand:OI 0 "aarch64_simd_struct_operand" "=Utv") (unspec:OI [(match_operand:OI 1 "register_operand" "w") @@ -4022,6 +4034,18 @@ [(set_attr "type" "neon_load3_3reg")] ) +(define_insn "vec_load_lanesci_lane" + [(set (match_operand:CI 0 "register_operand" "=w") + (unspec:CI [(match_operand: 1 "aarch64_simd_struct_operand" "Utv") + (match_operand:CI 2 "register_operand" "0") + (match_operand:SI 3 "immediate_operand" "i") + (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] + UNSPEC_LD3_LANE))] + "TARGET_SIMD" + "ld3\\t{%S0. - %U0.}[%3], %1" + [(set_attr "type" "neon_load3_one_lane")] +) + (define_insn "vec_store_lanesci" [(set (match_operand:CI 0 "aarch64_simd_struct_operand" "=Utv") (unspec:CI [(match_operand:CI 1 "register_operand" "w") @@ -4053,6 +4077,18 @@ [(set_attr "type" "neon_load4_4reg")] ) +(define_insn "vec_load_lanesxi_lane" + [(set (ma
[PATCH 2/2] [AARCH64,NEON] Convert arm_neon.h to use new builtins for vld[234](q?)_lane_*
From: Charles Baylis This patch replaces the inline assembler implementations of the vld[234](q?)_lane_* intrinsics with new versions which exploit the new builtin functions added in patch 1. Tested (with the rest of the patch series) with make check on aarch64-oe-linux with qemu, and also causes no regressions in clyon's NEON intrinsics tests. Charles Baylis * config/aarch64/arm_neon.h (__LD2_LANE_FUNC): Rewrite using builtins, update uses to use new macro arguments. (__LD3_LANE_FUNC): Likewise. (__LD4_LANE_FUNC): Likewise. Change-Id: I3bd5934b5c4f6127088193c1ab12848144d5540a --- gcc/config/aarch64/arm_neon.h | 377 -- 1 file changed, 255 insertions(+), 122 deletions(-) diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index 9b1873f..19ce261 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -11805,47 +11805,83 @@ __LD2R_FUNC (uint16x8x2_t, uint16x2_t, uint16_t, 8h, u16, q) __LD2R_FUNC (uint32x4x2_t, uint32x2_t, uint32_t, 4s, u32, q) __LD2R_FUNC (uint64x2x2_t, uint64x2_t, uint64_t, 2d, u64, q) -#define __LD2_LANE_FUNC(rettype, ptrtype, regsuffix, \ - lnsuffix, funcsuffix, Q)\ - __extension__ static __inline rettype \ - __attribute__ ((__always_inline__)) \ - vld2 ## Q ## _lane_ ## funcsuffix (const ptrtype *ptr, \ -rettype b, const int c)\ - {\ -rettype result;\ -__asm__ ("ld1 {v16." #regsuffix ", v17." #regsuffix "}, %1\n\t"\ -"ld2 {v16." #lnsuffix ", v17." #lnsuffix "}[%3], %2\n\t" \ -"st1 {v16." #regsuffix ", v17." #regsuffix "}, %0\n\t" \ -: "=Q"(result) \ -: "Q"(b), "Q"(*(const rettype *)ptr), "i"(c) \ -: "memory", "v16", "v17"); \ -return result; \ - } - -__LD2_LANE_FUNC (int8x8x2_t, uint8_t, 8b, b, s8,) -__LD2_LANE_FUNC (float32x2x2_t, float32_t, 2s, s, f32,) -__LD2_LANE_FUNC (float64x1x2_t, float64_t, 1d, d, f64,) -__LD2_LANE_FUNC (poly8x8x2_t, poly8_t, 8b, b, p8,) -__LD2_LANE_FUNC (poly16x4x2_t, poly16_t, 4h, h, p16,) -__LD2_LANE_FUNC (int16x4x2_t, int16_t, 4h, h, s16,) -__LD2_LANE_FUNC (int32x2x2_t, int32_t, 2s, s, s32,) -__LD2_LANE_FUNC (int64x1x2_t, int64_t, 1d, d, s64,) -__LD2_LANE_FUNC (uint8x8x2_t, uint8_t, 8b, b, u8,) -__LD2_LANE_FUNC (uint16x4x2_t, uint16_t, 4h, h, u16,) -__LD2_LANE_FUNC (uint32x2x2_t, uint32_t, 2s, s, u32,) -__LD2_LANE_FUNC (uint64x1x2_t, uint64_t, 1d, d, u64,) -__LD2_LANE_FUNC (float32x4x2_t, float32_t, 4s, s, f32, q) -__LD2_LANE_FUNC (float64x2x2_t, float64_t, 2d, d, f64, q) -__LD2_LANE_FUNC (poly8x16x2_t, poly8_t, 16b, b, p8, q) -__LD2_LANE_FUNC (poly16x8x2_t, poly16_t, 8h, h, p16, q) -__LD2_LANE_FUNC (int8x16x2_t, int8_t, 16b, b, s8, q) -__LD2_LANE_FUNC (int16x8x2_t, int16_t, 8h, h, s16, q) -__LD2_LANE_FUNC (int32x4x2_t, int32_t, 4s, s, s32, q) -__LD2_LANE_FUNC (int64x2x2_t, int64_t, 2d, d, s64, q) -__LD2_LANE_FUNC (uint8x16x2_t, uint8_t, 16b, b, u8, q) -__LD2_LANE_FUNC (uint16x8x2_t, uint16_t, 8h, h, u16, q) -__LD2_LANE_FUNC (uint32x4x2_t, uint32_t, 4s, s, u32, q) -__LD2_LANE_FUNC (uint64x2x2_t, uint64_t, 2d, d, u64, q) +#define __LD2_LANE_FUNC(intype, vectype, largetype, ptrtype, \ +mode, ptrmode, funcsuffix, signedtype)\ +__extension__ static __inline intype __attribute__ ((__always_inline__)) \ +vld2_lane_##funcsuffix (const ptrtype * __ptr, intype __b, const int __c) \ +{ \ + __builtin_aarch64_simd_oi __o; \ + largetype __temp; \ + __temp.val[0] = \ +vcombine_##funcsuffix (__b.val[0], vcreate_##funcsuffix (0)); \ + __temp.val[1] = \ +vcombine_##funcsuffix (__b.val[1], vcreate_##funcsuffix (0)); \ + __o = __builtin_aarch64_set_qregoi##mode (__o, \ + (signedtype) __temp.val[0], \ + 0); \ + __o = __builtin_aarch64_set_qregoi##mode (__o, \ +
Re: [PATCH 2/4] [AARCH64,NEON] Convert arm_neon.h to use new builtins for vld[234](q?)_lane_*
On 26 September 2014 13:47, Tejas Belagod wrote: > If we use type-punning, there are unnecessary spills that are generated > which is also incorrect for BE because of of the way we spill (st1 {v0.16b - > v1.16b}, [sp]) and restore. The implementation without type-punning seems to > give a more optimal result. Did your patches improve on the spills for the > type-punning solution? OK, this part seems too contentious, so I've respun the vldN_lane parts without the type punning and reposted them. This issue can be resolved separately. Trying an example like this gives good code with type punning, and poor code without. void t2(int32_t *p) { int32x4x4_t va = vld4q_s32(p); va = vld4q_lane_s32(p + 500, va, 1); vst4q_s32(p+1000, va); } With type-punning, good code: t2: ld4 {v0.4s - v3.4s}, [x0] add x2, x0, 2000 add x1, x0, 4000 ld4 {v0.s - v3.s}[1], [x2] st4 {v0.4s - v3.4s}, [x1] ret Without type-punning, horrible code: t2: ld4 {v0.4s - v3.4s}, [x0] sub sp, sp, #64 add x14, x0, 2000 add x0, x0, 4000 umovx12, v0.d[0] umovx13, v0.d[1] umovx10, v1.d[0] umovx11, v1.d[1] umovx8, v2.d[0] str x12, [sp] umovx9, v2.d[1] str x13, [sp, 8] str q3, [sp, 48] str x10, [sp, 16] str x11, [sp, 24] str x8, [sp, 32] str x9, [sp, 40] ld1 {v0.16b - v3.16b}, [sp] ld4 {v0.s - v3.s}[1], [x14] umovx10, v0.d[0] umovx11, v0.d[1] umovx8, v1.d[0] umovx9, v1.d[1] umovx6, v2.d[0] str x10, [sp] umovx7, v2.d[1] str x11, [sp, 8] str q3, [sp, 48] str x8, [sp, 16] str x9, [sp, 24] str x6, [sp, 32] str x7, [sp, 40] ld1 {v0.16b - v3.16b}, [sp] add sp, sp, 64 st4 {v0.4s - v3.4s}, [x0] ret >> Maybe the solution is to pass the NEON >> intrinsic types directly to the builtins? Is there a reason that it >> wasn't done that way before? > > How do you mean? Do you mean pass a loaded value int32x2x2_t into a > __builtin? How will that work? > > If you mean why we don't pass an int32x2x2_t into a builtin as a structure, > I don't think that would work as it is struct type which would correspond to > a BLK mode, but we need RTL patterns with reg-lists to work with large int > modes for the regalloc to allocate consecutive regs for the reglists. OK, that makes sense. However, something needs to be done to create the __arch64_simd_ objects without register moves. Since the existing mechanism causes problems because the lifetimes of the inputs overlap with the lifetimes of the outputs, I think there are these options: 1. represent the construction/deconstruction as a single operation, to avoid overlapping variable liveness in the source. 2. add a pass or peephole which can combine the existing builtins into a single operation, so that the lifetimes are normalised. 3. teach the register allocator how to handle overlapping liveness of a register and a subreg of that register. Option 1 would require a new builtin interface which somehow handled a whole int32x2x2_t in one operation. Construction is easy (__builtin_aarch64_simd_construct(v.val[0], v.val[1]) or similar). Deconstruction is less obvious Option 2 sounds like a hack, but would probably be effective, particularly if it can be done before inlining. Option 3 would also help with poor code generation for ARM targets with vget_low_*, vget_high_* and vcombine_*. What do you think is the best approach? Thanks Charles
Re: [PATCH 1/2] [AARCH64,NEON] Add patterns + builtins for vld[234](q?)_lane_* intrinsics
On 9 October 2014 16:03, Tejas Belagod wrote: >> >> +(define_insn "vec_load_lanesoi_lane" > > > Best to prepend "aarch64_" the pattern name, IMHO, else it looks like a > standard pattern name(eg. vec_load_lanes) at first glance. > > Otherwise, LGTM(but I can't approve it). Thanks for this patch. Updated version attached. Patch #2 (https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00678.html) is needed too, but is unchanged. OK for trunk? Charles -- Charles Baylis * config/aarch64/aarch64-builtins.c (aarch64_types_loadstruct_lane_qualifiers): Define. * config/aarch64/aarch64-simd-builtins.def (ld2_lane, ld3_lane, ld4_lane): New builtins. * config/aarch64/aarch64-simd.md (aarch64_vec_load_lanesoi_lane): New pattern. (aarch64_vec_load_lanesci_lane): Likewise. (aarch64_vec_load_lanesxi_lane): Likewise. (aarch64_ld2_lane): New expand. (aarch64_ld3_lane): Likewise. (aarch64_ld4_lane): Likewise. * config/aarch64/aarch64.md (define_c_enum "unspec"): Add UNSPEC_LD2_LANE, UNSPEC_LD3_LANE, UNSPEC_LD4_LANE. From fa14ca29817f3247417a8bf9e70cc8312f4c5edf Mon Sep 17 00:00:00 2001 From: Charles Baylis Date: Thu, 4 Sep 2014 14:59:23 +0100 Subject: [PATCH 1/2] [AARCH64,NEON] Add patterns + builtins for vld[234](q?)_lane_* intrinsics This patch adds new patterns and builtins to represent single lane structure loads instructions, which will be used to implement the vld[234](q?)_lane_* intrinsics. Tested (with the rest of the patch series) with make check on aarch64-oe-linux with qemu, and also causes no regressions in clyon's NEON intrinsics tests. Charles Baylis * config/aarch64/aarch64-builtins.c (aarch64_types_loadstruct_lane_qualifiers): Define. * config/aarch64/aarch64-simd-builtins.def (ld2_lane, ld3_lane, ld4_lane): New builtins. * config/aarch64/aarch64-simd.md (aarch64_vec_load_lanesoi_lane): New pattern. (aarch64_vec_load_lanesci_lane): Likewise. (aarch64_vec_load_lanesxi_lane): Likewise. (aarch64_ld2_lane): New expand. (aarch64_ld3_lane): Likewise. (aarch64_ld4_lane): Likewise. * config/aarch64/aarch64.md (define_c_enum "unspec"): Add UNSPEC_LD2_LANE, UNSPEC_LD3_LANE, UNSPEC_LD4_LANE. Change-Id: I4c36d18072215133573e07483cfe12165201c339 --- gcc/config/aarch64/aarch64-builtins.c| 5 ++ gcc/config/aarch64/aarch64-simd-builtins.def | 4 ++ gcc/config/aarch64/aarch64-simd.md | 95 gcc/config/aarch64/aarch64.md| 3 + 4 files changed, 107 insertions(+) diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c index 3dba1b2..368d3a7 100644 --- a/gcc/config/aarch64/aarch64-builtins.c +++ b/gcc/config/aarch64/aarch64-builtins.c @@ -201,6 +201,11 @@ aarch64_types_load1_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_none, qualifier_const_pointer_map_mode }; #define TYPES_LOAD1 (aarch64_types_load1_qualifiers) #define TYPES_LOADSTRUCT (aarch64_types_load1_qualifiers) +static enum aarch64_type_qualifiers +aarch64_types_loadstruct_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_none, qualifier_const_pointer_map_mode, + qualifier_none, qualifier_none }; +#define TYPES_LOADSTRUCT_LANE (aarch64_types_loadstruct_lane_qualifiers) static enum aarch64_type_qualifiers aarch64_types_bsl_p_qualifiers[SIMD_MAX_BUILTIN_ARGS] diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def index 2367436..348f0d2 100644 --- a/gcc/config/aarch64/aarch64-simd-builtins.def +++ b/gcc/config/aarch64/aarch64-simd-builtins.def @@ -83,6 +83,10 @@ BUILTIN_VQ (LOADSTRUCT, ld2, 0) BUILTIN_VQ (LOADSTRUCT, ld3, 0) BUILTIN_VQ (LOADSTRUCT, ld4, 0) + /* Implemented by aarch64_ld_lane. */ + BUILTIN_VQ (LOADSTRUCT_LANE, ld2_lane, 0) + BUILTIN_VQ (LOADSTRUCT_LANE, ld3_lane, 0) + BUILTIN_VQ (LOADSTRUCT_LANE, ld4_lane, 0) /* Implemented by aarch64_st. */ BUILTIN_VDC (STORESTRUCT, st2, 0) BUILTIN_VDC (STORESTRUCT, st3, 0) diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index cab26a3..90ab104 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -3991,6 +3991,18 @@ [(set_attr "type" "neon_load2_2reg")] ) +(define_insn "aarch64_vec_load_lanesoi_lane" + [(set (match_operand:OI 0 "register_operand" "=w") + (unspec:OI [(match_operand: 1 "aarch64_simd_struct_operand" "Utv") + (match_operand:OI 2 "register_operand" "0") + (match_operand:SI 3 "immediate_operand" "i") + (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY) ] + UNSPEC_LD2_LANE))] + "TARGET_SIMD" + "ld2\\t{%S0. - %T0.}[%3], %1" + [(set_attr "type" "neon_load2_one_lane&qu
Re: [PATCH 2/2] [AARCH64,NEON] Convert arm_neon.h to use new builtins for vld[234](q?)_lane_*
On 23 October 2014 11:14, Marcus Shawcroft wrote: > On 8 October 2014 18:27, wrote: > >> +#define __LD2_LANE_FUNC(intype, vectype, largetype, ptrtype, \ > > Just above the insertion point in arm-neon.h is the comment: > > /* Start of temporary inline asm for vldn, vstn and friends. */ > > This patch removes the "temporary inline asm vldn" implementation, the > replacement implementation should be inserted in the section below, > look for the comment that starts: OK. The vstN_lane intrinsics are similarly misplaced, I'll do a separate patch to move them. > "Start of optimal implementations" > >> +mode, ptrmode, funcsuffix, signedtype)\ >> +__extension__ static __inline intype __attribute__ ((__always_inline__)) \ >> +vld2_lane_##funcsuffix (const ptrtype * __ptr, intype __b, const int __c) \ >> +{ \ >> + __builtin_aarch64_simd_oi __o; \ >> + largetype __temp; \ >> + __temp.val[0] = \ > > There is something odd about the white space here, space before tab? > This is repeated in various places through the rest of the patch. There are a few spaces before tabs, but the weird misalignment of the \'s when reading the patch is just due to the effect of the unified diff misaligning the code with the tabstops. I have respun the patch with those spaces removed. > Otherwise this and the previous 1/2 associated patch look good, can > you respin with these tidy ups? OK for trunk? From a37d24c57f6c7abe4ade05c1f383e82ebd20c052 Mon Sep 17 00:00:00 2001 From: Charles Baylis Date: Wed, 10 Sep 2014 13:45:25 +0100 Subject: [PATCH 2/2] [AARCH64,NEON] Convert arm_neon.h to use new builtins for vld[234](q?)_lane_* This patch replaces the inline assembler implementations of the vld[234](q?)_lane_* intrinsics with new versions which exploit the new builtin functions added in patch 1. Tested (with the rest of the patch series) with make check on aarch64-oe-linux with qemu, and also causes no regressions in clyon's NEON intrinsics tests. Charles Baylis * config/aarch64/arm_neon.h (__LD2_LANE_FUNC): Rewrite using builtins, update uses to use new macro arguments. (__LD3_LANE_FUNC): Likewise. (__LD4_LANE_FUNC): Likewise. --- gcc/config/aarch64/arm_neon.h | 404 +- 1 file changed, 281 insertions(+), 123 deletions(-) diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index 9b1873f..18c6e92 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -11805,47 +11805,6 @@ __LD2R_FUNC (uint16x8x2_t, uint16x2_t, uint16_t, 8h, u16, q) __LD2R_FUNC (uint32x4x2_t, uint32x2_t, uint32_t, 4s, u32, q) __LD2R_FUNC (uint64x2x2_t, uint64x2_t, uint64_t, 2d, u64, q) -#define __LD2_LANE_FUNC(rettype, ptrtype, regsuffix, \ - lnsuffix, funcsuffix, Q) \ - __extension__ static __inline rettype \ - __attribute__ ((__always_inline__)) \ - vld2 ## Q ## _lane_ ## funcsuffix (const ptrtype *ptr, \ - rettype b, const int c) \ - { \ -rettype result; \ -__asm__ ("ld1 {v16." #regsuffix ", v17." #regsuffix "}, %1\n\t" \ - "ld2 {v16." #lnsuffix ", v17." #lnsuffix "}[%3], %2\n\t" \ - "st1 {v16." #regsuffix ", v17." #regsuffix "}, %0\n\t" \ - : "=Q"(result) \ - : "Q"(b), "Q"(*(const rettype *)ptr), "i"(c) \ - : "memory", "v16", "v17"); \ -return result; \ - } - -__LD2_LANE_FUNC (int8x8x2_t, uint8_t, 8b, b, s8,) -__LD2_LANE_FUNC (float32x2x2_t, float32_t, 2s, s, f32,) -__LD2_LANE_FUNC (float64x1x2_t, float64_t, 1d, d, f64,) -__LD2_LANE_FUNC (poly8x8x2_t, poly8_t, 8b, b, p8,) -__LD2_LANE_FUNC (poly16x4x2_t, poly16_t, 4h, h, p16,) -__LD2_LANE_FUNC (int16x4x2_t, int16_t, 4h, h, s16,) -__LD2_LANE_FUNC (int32x2x2_t, int32_t, 2s, s, s32,) -__LD2_LANE_FUNC (int64x1x2_t, int64_t, 1d, d, s64,) -__LD2_LANE_FUNC (uint8x8x2_t, uint8_t, 8b, b, u8,) -__LD2_LANE_FUNC (uint16x4x2_t, uint16_t, 4h, h, u16,) -__LD2_LANE_FUNC (uint32x2x2_t, uint32_t, 2s, s, u32,) -__LD2_LANE_FUNC (uint64x1x2_t, uint64_t, 1d, d, u64,) -__LD2_LANE_FUNC (float32x4x2_t, float32_t, 4s, s, f32, q) -__LD2_LANE_FUNC (float64x2x2_t, float64_t, 2d, d, f64, q) -__LD2_LANE_FUNC (poly8x16x2_t, poly8_t, 16b, b, p8, q) -__LD2_LANE_FUNC (poly16x8x2_t, poly16_t, 8h, h, p16, q) -__LD2_LANE_FUNC (int8x16x2_t, int8_t, 16b, b, s8, q) -__LD2_LANE_FUNC (int16x8x2_t, int16_t, 8h, h, s16, q) -__LD2_LANE_FUNC (int32x4x2_t, int32_t, 4s, s, s32, q) -__LD2_LAN
Re: [PATCH 2/2] [AARCH64,NEON] Convert arm_neon.h to use new builtins for vld[234](q?)_lane_*
On 24 October 2014 11:23, Marcus Shawcroft wrote: > On 23 October 2014 18:51, Charles Baylis wrote: > >>> Otherwise this and the previous 1/2 associated patch look good, can >>> you respin with these tidy ups? >> >> OK for trunk? > > OK > /Marcus Committed to trunk as r216671 and r216672.
[PATCH] [ARM] Post-indexed addressing for NEON memory access
This patch adds support for post-indexed addressing for NEON structure memory accesses. For example VLD1.8 {d0}, [r0], r1 Bootstrapped and checked on arm-unknown-gnueabihf using Qemu. Ok for trunk? gcc/Changelog: 2014-06-02 Charles Baylis * config/arm/arm.c (neon_vector_mem_operand): Allow register POST_MODIFY for neon loads and stores. (arm_print_operand): Output post-index register for neon loads and stores. From a8e0bdbceab00d5e5b655611965d3975ba74365c Mon Sep 17 00:00:00 2001 From: Charles Baylis Date: Tue, 6 May 2014 15:23:46 +0100 Subject: [PATCH] post-indexed addressing for vld/vst 2014-05-09 Charles Baylis * config/arm/arm.c (neon_vector_mem_operand): Allow register POST_MODIFY for neon loads and stores. (arm_print_operand): Output post-index register for neon loads and stores. --- gcc/config/arm/arm.c | 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 1117bd4..6ab02ef 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -12786,7 +12786,11 @@ neon_vector_mem_operand (rtx op, int type, bool strict) || (type == 0 && GET_CODE (ind) == PRE_DEC)) return arm_address_register_rtx_p (XEXP (ind, 0), 0); - /* FIXME: vld1 allows register post-modify. */ + /* Allow post-increment by register for VLDn */ + if (type == 2 && GET_CODE (ind) == POST_MODIFY + && GET_CODE (XEXP (ind, 1)) == PLUS + && REG_P (XEXP (XEXP (ind, 1), 1))) + return true; /* Match: (plus (reg) @@ -21816,6 +21820,7 @@ arm_print_operand (FILE *stream, rtx x, int code) { rtx addr; bool postinc = FALSE; + rtx postinc_reg = NULL; unsigned align, memsize, align_bits; gcc_assert (MEM_P (x)); @@ -21825,6 +21830,11 @@ arm_print_operand (FILE *stream, rtx x, int code) postinc = 1; addr = XEXP (addr, 0); } + if (GET_CODE (addr) == POST_MODIFY) + { + postinc_reg = XEXP( XEXP (addr, 1), 1); + addr = XEXP (addr, 0); + } asm_fprintf (stream, "[%r", REGNO (addr)); /* We know the alignment of this access, so we can emit a hint in the @@ -21850,6 +21860,8 @@ arm_print_operand (FILE *stream, rtx x, int code) if (postinc) fputs("!", stream); + if (postinc_reg) + asm_fprintf (stream, ", %r", REGNO (postinc_reg)); } return; -- 1.9.1
Re: [PATCH][AARCH64]Support full addressing modes for ldr/str in vectorization scenarios
On 3 June 2014 12:08, Marcus Shawcroft wrote: > On 28 May 2014 08:30, Bin.Cheng wrote: >>> So is it OK? >>> >>> >>> 2014-05-28 Bin Cheng >>> >>> * config/aarch64/aarch64.c (aarch64_classify_address) >>> (aarch64_legitimize_reload_address): Support full addressing modes >>> for vector modes. >>> * config/aarch64/aarch64.md (mov, movmisalign) >>> (*aarch64_simd_mov, *aarch64_simd_mov): Relax >>> predicates. > > OK Thanks /Marcus Hi Bin, This resolves an ICE in 4.9 in Neon intrinsics code, so I'd like to see it backported to the branch too, please. Thanks Charles
Re: [PATCH][AARCH64]Support full addressing modes for ldr/str in vectorization scenarios
On 4 June 2014 03:11, Bin.Cheng wrote: > Yes, If there is a PR, I can evaluate how this can help and ask > release maintainer for approval. I'll reduce the test case and create one shortly
Re: [PATCH][AARCH64]Support full addressing modes for ldr/str in vectorization scenarios
On 4 June 2014 10:06, Charles Baylis wrote: > On 4 June 2014 03:11, Bin.Cheng wrote: > >> Yes, If there is a PR, I can evaluate how this can help and ask >> release maintainer for approval. > > I'll reduce the test case and create one shortly I have created PR61411 with a reduced test case.
Re: [PATCH, ARM, v2] Improve 64 bit division performance
ping? On 22 May 2014 11:08, Charles Baylis wrote: > On 1 May 2014 16:41, Richard Earnshaw wrote: >> I think really, you've got three independent changes here: > > Version 2 of this patch series is now a 9 patch series which addresses > most of the following. Exceptions discussed below. > >> 1) Optimize the prologue/epilogue sequences when ldrd is available. >> 2) Replace the call to __gnu_ldivmod_helper with __udivmoddi4 > > I assume you mean __gnu_uldivmod_helper here, as __gnu_ldivmod_helper > performs signed division and can't be directly replaced with the > unsigned division performed by __udivmoddi4. > >> 3) Optimize the code to __aeabi_ldivmod. > > Converting to call __udivmoddi4, fixing up signedness of operands and > results and optimisation are all one change. > >> Ideally, therefore, this is a three patch series, but it's then missing >> a few bits. >> >> 4) Step 2 can also be trivially applied to bpabi-v6m.S as well, since >> it's a direct swap of one function for another (unless I've misread the >> changes, I think the ABI of the two helper functions are the same). > > For __aeabi_uldivmod this is true. For __aeabi_ldivmod this is not > trivial as the signedness fix-ups must be written. > >> 5) Step 4 then makes __gnu_ldivmod_helper in bpabi.c a dead function >> which can be deleted. This is good because currently pulling in either >> 64-bit division function causes both these helper functions to be pulled >> in and thus the whole of the 64-bit div-mod code for both signed and >> unsigned values. That's particularly unfortunate for ARMv6m class >> devices as that's potentially a lot of redundant code. > > Similarly, __gnu_uldivmod_helper not __gnu_ldivmod_helper. > > I've included two patches which do the trivial steps for the unsigned case. > >> >> Finally, I know this was the original code, but the complete lack of >> comments in this code made reviewing even the trivial parts a complete >> nightmare -- it took me half an hour before I remembered that >> __udivmoddi4 took three parameters, the third of which was on the stack: >> thus the messing around with sp/ip in the prologue wasn't just trivial >> padding but a necessary part of the function. Please could you add, at >> least some short comments clarifying the register disposition on input >> and what that prologue code is up to... > > Done. > >> Finally, how was this code tested? > > It has been built and "make check" has been run with no regressions on: > arm-unknown-linux-gnueabihf --with-mode=thumb --with-arch=armv7-a > arm-unknown-linux-gnueabihf --with-mode=arm --with-arch=armv7-a > arm-unknown-linux-gnueabi --with-mode=arm --with-arch=armv5te > arm-unknown-linux-gnueabi --with-mode=arm --with-arch=armv4t > > I have also run a simple test harness which checks the result of > several 64 bit division operations where gcc has been built with the > above configurations. > > I am not currently set up with a way to test v6M, so those parts aren't > tested. > >> Anyway, some additional comments below: >> >> Don't repeat the function name for multiple tweaks to the same function; >> as mentioned above, if these are really separate changes they should be >> in separate submissions. Mixing unrelated changes just makes the >> reviewing step that much harder. > > Done. > > >>> + strd ip,lr, [sp, #-16]! >> >> Space after comma. > > Done > >> Also, since you've essentially rewritten the entire function, can you >> please also reformat them to follow the coding style of the rest of the >> file: namely "OPoperands". > > Done > >>> #else >>> + sub sp, sp, #8 >>> do_push {sp, lr} >>> #endif >> >> Please add a comment that the value at *sp is the address of the the >> slot for the remainder. > > Done >>> +#if defined(__thumb2__) && CAN_USE_LDRD >>> + sub ip, sp, #8 >>> + strd ip,lr, [sp, #-16]! >> >> Space after comma. > > Done > >>> #else >>> + sub sp, sp, #8 >>> do_push {sp, lr} >>> #endif >>> + cmp xxh, #0 >>> + blt 1f >>> + cmp yyh, #0 >>> + blt 2f >>> + >>> +98: cfi_push 98b - __aeabi_ldivmod, 0xe, -0xc, 0x10 >> >> The CFI push should really precede your conditional tests, it relates to >> the do_push expression. > > Done. > >>> + bl SYM(__udivmoddi4) __PLT__ >>> + ldr lr, [s
[PATCH 1/9] Whitespace
2014-05-22 Charles Baylis * config/arm/bpabi.S (__aeabi_uldivmod): Fix whitespace. (__aeabi_ldivmod): Fix whitespace. --- libgcc/config/arm/bpabi.S | 36 ++-- 1 file changed, 18 insertions(+), 18 deletions(-) diff --git a/libgcc/config/arm/bpabi.S b/libgcc/config/arm/bpabi.S index 7772301..f47d715 100644 --- a/libgcc/config/arm/bpabi.S +++ b/libgcc/config/arm/bpabi.S @@ -124,20 +124,20 @@ ARM_FUNC_START aeabi_ulcmp ARM_FUNC_START aeabi_ldivmod cfi_start __aeabi_ldivmod, LSYM(Lend_aeabi_ldivmod) - test_div_by_zero signed + test_div_by_zerosigned - sub sp, sp, #8 + sub sp, sp, #8 #if defined(__thumb2__) - mov ip, sp - push {ip, lr} + mov ip, sp + push{ip, lr} #else - do_push {sp, lr} + do_push {sp, lr} #endif 98:cfi_push 98b - __aeabi_ldivmod, 0xe, -0xc, 0x10 - bl SYM(__gnu_ldivmod_helper) __PLT__ - ldr lr, [sp, #4] - add sp, sp, #8 - do_pop {r2, r3} + bl SYM(__gnu_ldivmod_helper) __PLT__ + ldr lr, [sp, #4] + add sp, sp, #8 + do_pop {r2, r3} RET cfi_end LSYM(Lend_aeabi_ldivmod) @@ -147,20 +147,20 @@ ARM_FUNC_START aeabi_ldivmod ARM_FUNC_START aeabi_uldivmod cfi_start __aeabi_uldivmod, LSYM(Lend_aeabi_uldivmod) - test_div_by_zero unsigned + test_div_by_zerounsigned - sub sp, sp, #8 + sub sp, sp, #8 #if defined(__thumb2__) - mov ip, sp - push {ip, lr} + mov ip, sp + push{ip, lr} #else - do_push {sp, lr} + do_push {sp, lr} #endif 98:cfi_push 98b - __aeabi_uldivmod, 0xe, -0xc, 0x10 - bl SYM(__gnu_uldivmod_helper) __PLT__ - ldr lr, [sp, #4] - add sp, sp, #8 - do_pop {r2, r3} + bl SYM(__gnu_uldivmod_helper) __PLT__ + ldr lr, [sp, #4] + add sp, sp, #8 + do_pop {r2, r3} RET cfi_end LSYM(Lend_aeabi_uldivmod) -- 1.9.1
[PATCH 3/9] Optimise __aeabi_uldivmod (stack manipulation)
2014-05-22 Charles Baylis * config/arm/bpabi.S (__aeabi_uldivmod): Optimise stack pointer manipulation. --- libgcc/config/arm/bpabi.S | 54 +-- 1 file changed, 43 insertions(+), 11 deletions(-) diff --git a/libgcc/config/arm/bpabi.S b/libgcc/config/arm/bpabi.S index ae76cd3..67246b0 100644 --- a/libgcc/config/arm/bpabi.S +++ b/libgcc/config/arm/bpabi.S @@ -120,6 +120,46 @@ ARM_FUNC_START aeabi_ulcmp #endif .endm +/* we can use STRD/LDRD on v5TE and later, and any Thumb-2 architecture. */ +#if (defined(__ARM_EABI__)\ + && (defined(__thumb2__) \ + || (__ARM_ARCH >= 5 && defined(__TARGET_FEATURE_DSP +#define CAN_USE_LDRD 1 +#else +#define CAN_USE_LDRD 0 +#endif + +/* set up stack from for call to __udivmoddi4. At the end of the macro the + stack is arranged as follows: + sp+12 / space for remainder + sp+8\ (written by __udivmoddi4) + sp+4lr + sp+0sp+8 [rp (remainder pointer) argument for __udivmoddi4] + + */ +.macro push_for_divide fname +#if defined(__thumb2__) && CAN_USE_LDRD + sub ip, sp, #8 + strdip, lr, [sp, #-16]! +#else + sub sp, sp, #8 + do_push {sp, lr} +#endif +98:cfi_push98b - \fname, 0xe, -0xc, 0x10 +.endm + +/* restore stack */ +.macro pop_for_divide + ldr lr, [sp, #4] +#if CAN_USE_LDRD + ldrdr2, r3, [sp, #8] + add sp, sp, #16 +#else + add sp, sp, #8 + do_pop {r2, r3} +#endif +.endm + #ifdef L_aeabi_ldivmod /* Perform 64 bit signed division. @@ -165,18 +205,10 @@ ARM_FUNC_START aeabi_uldivmod cfi_start __aeabi_uldivmod, LSYM(Lend_aeabi_uldivmod) test_div_by_zerounsigned - sub sp, sp, #8 -#if defined(__thumb2__) - mov ip, sp - push{ip, lr} -#else - do_push {sp, lr} -#endif -98:cfi_push 98b - __aeabi_uldivmod, 0xe, -0xc, 0x10 + push_for_divide __aeabi_uldivmod + /* arguments in (r0:r1), (r2:r3) and *sp */ bl SYM(__gnu_uldivmod_helper) __PLT__ - ldr lr, [sp, #4] - add sp, sp, #8 - do_pop {r2, r3} + pop_for_divide RET cfi_end LSYM(Lend_aeabi_uldivmod) -- 1.9.1
[PATCH 2/9] Add comments
2014-05-22 Charles Baylis * config/arm/bpabi.S (__aeabi_uldivmod, __aeabi_ldivmod): Add comment describing register usage on function entry and exit. --- libgcc/config/arm/bpabi.S | 16 1 file changed, 16 insertions(+) diff --git a/libgcc/config/arm/bpabi.S b/libgcc/config/arm/bpabi.S index f47d715..ae76cd3 100644 --- a/libgcc/config/arm/bpabi.S +++ b/libgcc/config/arm/bpabi.S @@ -122,6 +122,14 @@ ARM_FUNC_START aeabi_ulcmp #ifdef L_aeabi_ldivmod +/* Perform 64 bit signed division. + Inputs: + r0:r1 numerator + r2:r3 denominator + Outputs: + r0:r1 quotient + r2:r3 remainder + */ ARM_FUNC_START aeabi_ldivmod cfi_start __aeabi_ldivmod, LSYM(Lend_aeabi_ldivmod) test_div_by_zerosigned @@ -145,6 +153,14 @@ ARM_FUNC_START aeabi_ldivmod #ifdef L_aeabi_uldivmod +/* Perform 64 bit signed division. + Inputs: + r0:r1 numerator + r2:r3 denominator + Outputs: + r0:r1 quotient + r2:r3 remainder + */ ARM_FUNC_START aeabi_uldivmod cfi_start __aeabi_uldivmod, LSYM(Lend_aeabi_uldivmod) test_div_by_zerounsigned -- 1.9.1
[PATCH 4/9] Optimise __aeabi_uldivmod
2014-05-22 Charles Baylis * config/arm/bpabi.S (__aeabi_uldivmod): Perform division using call to __udivmoddi4. --- libgcc/config/arm/bpabi.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libgcc/config/arm/bpabi.S b/libgcc/config/arm/bpabi.S index 67246b0..927e37f 100644 --- a/libgcc/config/arm/bpabi.S +++ b/libgcc/config/arm/bpabi.S @@ -207,7 +207,7 @@ ARM_FUNC_START aeabi_uldivmod push_for_divide __aeabi_uldivmod /* arguments in (r0:r1), (r2:r3) and *sp */ - bl SYM(__gnu_uldivmod_helper) __PLT__ + bl SYM(__udivmoddi4) __PLT__ pop_for_divide RET cfi_end LSYM(Lend_aeabi_uldivmod) -- 1.9.1
[PATCH 5/9] Optimise __aeabi_ldivmod (stack manipulation)
2014-05-22 Charles Baylis * config/arm/bpabi.S (__aeabi_ldivmod): Optimise stack manipulation. --- libgcc/config/arm/bpabi.S | 14 +++--- 1 file changed, 3 insertions(+), 11 deletions(-) diff --git a/libgcc/config/arm/bpabi.S b/libgcc/config/arm/bpabi.S index 927e37f..3f9ece5 100644 --- a/libgcc/config/arm/bpabi.S +++ b/libgcc/config/arm/bpabi.S @@ -174,18 +174,10 @@ ARM_FUNC_START aeabi_ldivmod cfi_start __aeabi_ldivmod, LSYM(Lend_aeabi_ldivmod) test_div_by_zerosigned - sub sp, sp, #8 -#if defined(__thumb2__) - mov ip, sp - push{ip, lr} -#else - do_push {sp, lr} -#endif -98:cfi_push 98b - __aeabi_ldivmod, 0xe, -0xc, 0x10 + push_for_divide __aeabi_ldivmod + /* arguments in (r0:r1), (r2:r3) and *sp */ bl SYM(__gnu_ldivmod_helper) __PLT__ - ldr lr, [sp, #4] - add sp, sp, #8 - do_pop {r2, r3} + pop_for_divide RET cfi_end LSYM(Lend_aeabi_ldivmod) -- 1.9.1
[PATCH 6/9] Optimise __aeabi_ldivmod
2014-05-22 Charles Baylis * config/arm/bpabi.S (__aeabi_ldivmod): Perform division using __udivmoddi4, and fixups for negative operands. --- libgcc/config/arm/bpabi.S | 41 - 1 file changed, 40 insertions(+), 1 deletion(-) diff --git a/libgcc/config/arm/bpabi.S b/libgcc/config/arm/bpabi.S index 3f9ece5..c044167 100644 --- a/libgcc/config/arm/bpabi.S +++ b/libgcc/config/arm/bpabi.S @@ -175,10 +175,49 @@ ARM_FUNC_START aeabi_ldivmod test_div_by_zerosigned push_for_divide __aeabi_ldivmod + cmp xxh, #0 + blt 1f + cmp yyh, #0 + blt 2f + /* arguments in (r0:r1), (r2:r3) and *sp */ + bl SYM(__udivmoddi4) __PLT__ + pop_for_divide + RET + +1: /* xxh:xxl is negative */ + negsxxl, xxl + sbc xxh, xxh, xxh, lsl #1 /* Thumb-2 has no RSC, so use X - 2X */ + cmp yyh, #0 + blt 3f + /* arguments in (r0:r1), (r2:r3) and *sp */ + bl SYM(__udivmoddi4) __PLT__ + pop_for_divide + negsxxl, xxl + sbc xxh, xxh, xxh, lsl #1 /* Thumb-2 has no RSC, so use X - 2X */ + negsyyl, yyl + sbc yyh, yyh, yyh, lsl #1 /* Thumb-2 has no RSC, so use X - 2X */ + RET + +2: /* only yyh:yyl is negative */ + negsyyl, yyl + sbc yyh, yyh, yyh, lsl #1 /* Thumb-2 has no RSC, so use X - 2X */ + /* arguments in (r0:r1), (r2:r3) and *sp */ + bl SYM(__udivmoddi4) __PLT__ + pop_for_divide + negsxxl, xxl + sbc xxh, xxh, xxh, lsl #1 /* Thumb-2 has no RSC, so use X - 2X */ + RET + +3: /* both xxh:xxl and yyh:yyl are negative */ + negsyyl, yyl + sbc yyh, yyh, yyh, lsl #1 /* Thumb-2 has no RSC, so use X - 2X */ /* arguments in (r0:r1), (r2:r3) and *sp */ - bl SYM(__gnu_ldivmod_helper) __PLT__ + bl SYM(__udivmoddi4) __PLT__ pop_for_divide + negsyyl, yyl + sbc yyh, yyh, yyh, lsl #1 /* Thumb-2 has no RSC, so use X - 2X */ RET + cfi_end LSYM(Lend_aeabi_ldivmod) #endif /* L_aeabi_ldivmod */ -- 1.9.1
[PATCH 8/9] Use __udivmoddi4 for v6M aeabi_uldivmod
2014-05-22 Charles Baylis * config/arm/bpabi-v6m.S (__aeabi_uldivmod): Perform division using __udivmoddi4. --- libgcc/config/arm/bpabi-v6m.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libgcc/config/arm/bpabi-v6m.S b/libgcc/config/arm/bpabi-v6m.S index 0bf2e55..d549fa6 100644 --- a/libgcc/config/arm/bpabi-v6m.S +++ b/libgcc/config/arm/bpabi-v6m.S @@ -148,7 +148,7 @@ FUNC_START aeabi_uldivmod mov r0, sp push {r0, lr} ldr r0, [sp, #8] - bl SYM(__gnu_uldivmod_helper) + bl SYM(__udivmoddi4) ldr r3, [sp, #4] mov lr, r3 add sp, sp, #8 -- 1.9.1
[PATCH 9/9] Remove __gnu_uldivmod_helper
2014-05-22 Charles Baylis * config/arm/bpabi.c (__gnu_uldivmod_helper): Remove. --- libgcc/config/arm/bpabi.c | 14 -- 1 file changed, 14 deletions(-) diff --git a/libgcc/config/arm/bpabi.c b/libgcc/config/arm/bpabi.c index 7b155cc..e90d044 100644 --- a/libgcc/config/arm/bpabi.c +++ b/libgcc/config/arm/bpabi.c @@ -26,9 +26,6 @@ extern long long __divdi3 (long long, long long); extern unsigned long long __udivdi3 (unsigned long long, unsigned long long); extern long long __gnu_ldivmod_helper (long long, long long, long long *); -extern unsigned long long __gnu_uldivmod_helper (unsigned long long, -unsigned long long, -unsigned long long *); long long @@ -43,14 +40,3 @@ __gnu_ldivmod_helper (long long a, return quotient; } -unsigned long long -__gnu_uldivmod_helper (unsigned long long a, - unsigned long long b, - unsigned long long *remainder) -{ - unsigned long long quotient; - - quotient = __udivdi3 (a, b); - *remainder = a - b * quotient; - return quotient; -} -- 1.9.1
[PATCH 7/9] Fix cfi annotations
2014-05-22 Charles Baylis * config/arm/bpabi.S (__aeabi_ldivmod, __aeabi_uldivmod, push_for_divide, pop_for_divide): Use .cfi_* directives for DWARF annotations. Fix DWARF information. --- libgcc/config/arm/bpabi.S | 21 - 1 file changed, 16 insertions(+), 5 deletions(-) diff --git a/libgcc/config/arm/bpabi.S b/libgcc/config/arm/bpabi.S index c044167..959ecb1 100644 --- a/libgcc/config/arm/bpabi.S +++ b/libgcc/config/arm/bpabi.S @@ -22,6 +22,8 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see <http://www.gnu.org/licenses/>. */ + .cfi_sections .debug_frame + #ifdef __ARM_EABI__ /* Some attributes that are common to all routines in this file. */ /* Tag_ABI_align_needed: This code does not require 8-byte @@ -145,7 +147,8 @@ ARM_FUNC_START aeabi_ulcmp sub sp, sp, #8 do_push {sp, lr} #endif -98:cfi_push98b - \fname, 0xe, -0xc, 0x10 + .cfi_adjust_cfa_offset 16 + .cfi_offset 14, -12 .endm /* restore stack */ @@ -158,6 +161,8 @@ ARM_FUNC_START aeabi_ulcmp add sp, sp, #8 do_pop {r2, r3} #endif + .cfi_restore 14 + .cfi_adjust_cfa_offset 0 .endm #ifdef L_aeabi_ldivmod @@ -171,7 +176,7 @@ ARM_FUNC_START aeabi_ulcmp r2:r3 remainder */ ARM_FUNC_START aeabi_ldivmod - cfi_start __aeabi_ldivmod, LSYM(Lend_aeabi_ldivmod) + .cfi_startproc test_div_by_zerosigned push_for_divide __aeabi_ldivmod @@ -181,16 +186,19 @@ ARM_FUNC_START aeabi_ldivmod blt 2f /* arguments in (r0:r1), (r2:r3) and *sp */ bl SYM(__udivmoddi4) __PLT__ + .cfi_remember_state pop_for_divide RET 1: /* xxh:xxl is negative */ + .cfi_restore_state negsxxl, xxl sbc xxh, xxh, xxh, lsl #1 /* Thumb-2 has no RSC, so use X - 2X */ cmp yyh, #0 blt 3f /* arguments in (r0:r1), (r2:r3) and *sp */ bl SYM(__udivmoddi4) __PLT__ + .cfi_remember_state pop_for_divide negsxxl, xxl sbc xxh, xxh, xxh, lsl #1 /* Thumb-2 has no RSC, so use X - 2X */ @@ -199,16 +207,19 @@ ARM_FUNC_START aeabi_ldivmod RET 2: /* only yyh:yyl is negative */ + .cfi_restore_state negsyyl, yyl sbc yyh, yyh, yyh, lsl #1 /* Thumb-2 has no RSC, so use X - 2X */ /* arguments in (r0:r1), (r2:r3) and *sp */ bl SYM(__udivmoddi4) __PLT__ + .cfi_remember_state pop_for_divide negsxxl, xxl sbc xxh, xxh, xxh, lsl #1 /* Thumb-2 has no RSC, so use X - 2X */ RET 3: /* both xxh:xxl and yyh:yyl are negative */ + .cfi_restore_state negsyyl, yyl sbc yyh, yyh, yyh, lsl #1 /* Thumb-2 has no RSC, so use X - 2X */ /* arguments in (r0:r1), (r2:r3) and *sp */ @@ -218,7 +229,7 @@ ARM_FUNC_START aeabi_ldivmod sbc yyh, yyh, yyh, lsl #1 /* Thumb-2 has no RSC, so use X - 2X */ RET - cfi_end LSYM(Lend_aeabi_ldivmod) + .cfi_endproc #endif /* L_aeabi_ldivmod */ @@ -233,7 +244,7 @@ ARM_FUNC_START aeabi_ldivmod r2:r3 remainder */ ARM_FUNC_START aeabi_uldivmod - cfi_start __aeabi_uldivmod, LSYM(Lend_aeabi_uldivmod) + .cfi_startproc test_div_by_zerounsigned push_for_divide __aeabi_uldivmod @@ -241,7 +252,7 @@ ARM_FUNC_START aeabi_uldivmod bl SYM(__udivmoddi4) __PLT__ pop_for_divide RET - cfi_end LSYM(Lend_aeabi_uldivmod) + .cfi_endproc #endif /* L_aeabi_divmod */ -- 1.9.1
Re: [PATCH] [ARM] [RFC] Fix longstanding push_minipool_fix ICE (PR49423, lp1296601)
Ping? On 6 May 2014 17:05, Charles Baylis wrote: > Ping? > > At this stage looking for general feedback on whether the define_split > approach in this patch is appropriate. If it is, I'll do a clean patch > for full review. > > Archive link: http://gcc.gnu.org/ml/gcc-patches/2014-04/msg00078.html > > On 2 April 2014 14:29, Charles Baylis wrote: >> Hi >> >> This patch fixes the push_minipool_fix ICE, which occurs when the ARM >> backend encounters a zero/sign extending load from a constant pool. >> >> I don't have a current test case for trunk, lp1296601 has a test case >> which affects the linaro-4.8 branch. As far as I know, there has been >> no fix for this on trunk. >> >> The approach taken in this patch is to extend each pattern where this >> can occur, so that it triggers a define_split to synthesise a >> constant move instead. Some but not all extend patterns have >> previously added pool_range attributes to work-around this problem, >> this patch removes those, and also fixes the remaining patterns. Some >> patterns have slightly more complex workarounds, which I have not yet >> analysed, but it seems worth posting the patch at this stage to get >> feedback on the general approach. >> >> Tested on arm-unknown-linux-gnueabihf (qemu), bootstrap in progress. >> >> If this looks good, I'll clean it up for a more detailed review. >> >> Thanks >> Charles
Re: [PATCH] [ARM] Post-indexed addressing for NEON memory access
On 5 June 2014 07:27, Ramana Radhakrishnan wrote: > On Mon, Jun 2, 2014 at 5:47 PM, Charles Baylis > wrote: >> This patch adds support for post-indexed addressing for NEON structure >> memory accesses. >> >> For example VLD1.8 {d0}, [r0], r1 >> >> >> Bootstrapped and checked on arm-unknown-gnueabihf using Qemu. >> >> Ok for trunk? > > This looks like a reasonable start but this work doesn't look complete > to me yet. > > Can you also look at the impact on performance of a range of > benchmarks especially a popular embedded one to see how this behaves > unless you have already done so ? I ran a popular suite of embedded benchmarks, and there is no impact at all on Chromebook (including with the additional attached patch) The patch was developed to address a performance issue with a new version of libvpx which uses intrinsics instead of NEON assembler. The patch results in a 3% improvement for VP8 decode. > POST_INC, POST_MODIFY usually have a funny way of biting you with > either ivopts or the way in which address costs work. I think there > maybe further tweaks needed but for a first step I'd like to know what > the performance impact is. > I would also suggest running this through clyon's neon intrinsics > testsuite to see if that catches any issues especially with the large > vector modes. No issues found in clyon's tests. Your mention of larger vector modes prompted me to check that the patch has the desired result with them. In fact, the costs are estimated incorrectly which means the post_modify pattern is not used. The attached patch fixes that. (used in combination with my original patch) 2014-06-15 Charles Baylis * config/arm/arm.c (arm_new_rtx_costs): Reduce cost for mem with embedded side effects. 0002-Adjust-costs-for-mem-with-post_modify.patch Description: application/download
Re: [PATCH] [ARM] Post-indexed addressing for NEON memory access
On 18 June 2014 11:01, Ramana Radhakrishnan wrote: > On Mon, Jun 2, 2014 at 5:47 PM, Charles Baylis > wrote: >> This patch adds support for post-indexed addressing for NEON structure >> memory accesses. >> >> For example VLD1.8 {d0}, [r0], r1 >> >> >> Bootstrapped and checked on arm-unknown-gnueabihf using Qemu. >> >> Ok for trunk? > > This is OK. Committed as r211783.
Re: [PATCH] [ARM] Post-indexed addressing for NEON memory access
On 18 June 2014 11:06, Ramana Radhakrishnan wrote: >> 2014-06-15 Charles Baylis >> >> * config/arm/arm.c (arm_new_rtx_costs): Reduce cost for mem with >> embedded side effects. > > I'm not too thrilled with putting in more special cases that are not > table driven in there. Can you file a PR with some testcases that show > this so that we don't forget and CC me on it please ? I created PR61551 and CC'd.
Re: [PATCH 1/9] Whitespace
On 11 June 2014 13:55, Richard Earnshaw wrote: > On 11/06/14 11:19, Charles Baylis wrote: >> 2014-05-22 Charles Baylis >> >> * config/arm/bpabi.S (__aeabi_uldivmod): Fix whitespace. >> (__aeabi_ldivmod): Fix whitespace. > > This is OK, but please wait until the others are ready to go in. The series is now committed as r211789-r211797.
Re: [PATCH] [ARM] [RFC] Fix longstanding push_minipool_fix ICE (PR49423, lp1296601)
On 18 June 2014 00:02, Ramana Radhakrishnan wrote: > > Interesting workaround but can we investigate further how to fix this > at the source rather than working around in the backend in this form. > It's still a kludge that we carry in the backend rather than fix the > problem at it's source. I'd rather try to fix the problem at the > source rather than working around this in the backend. I still think this is a back-end bug. Unless I've missed something, it looks like the compiler has generated an insn which meets its constraints (we'd see an "insn does not satisfy its constraints" ICE if not) but the back-end generates an ICE much later, when trying to emit code for it. The problem with trying to fix the bug "at source" in reload is that this inconsistency will remain as a potential latent bug. I see two options to fix it - one is to teach the back-end to successfully generate code for this insn, and the other is to teach the back-end that such an insn is not valid. My proposed patch does the former. The latter can presumably be achieved by providing a different kind of memory constraint which disallows constant pool references for these insns although I haven't tried this yet. Charles
Re: [PATCH] [ARM] [RFC] Fix longstanding push_minipool_fix ICE (PR49423, lp1296601)
On 30 June 2014 14:26, Richard Earnshaw wrote: > On 30/06/14 13:53, Charles Baylis wrote: >> I see two options to fix it - one is to teach the back-end to >> successfully generate code for this insn, and the other is to teach >> the back-end that such an insn is not valid. My proposed patch does >> the former. The latter can presumably be achieved by providing a >> different kind of memory constraint which disallows constant pool >> references for these insns although I haven't tried this yet. > > I think we should be doing the latter (not permitting these operations). > If we wanted to do the former, we could just add an offset range for > the insn. > > The reason we don't want the former is that the offset ranges are too > small and overly constrain literal pool placement. The attached patch adds a 'Uh' constraint, which means the same as 'm', except that literal pool references are not allowed. Patterns which generate ldr[s]b or ldr[s]h have been updated to use it, and the pool_range attributes have been removed from those patterns. Bootstrapped and make-checked with no regressions on qemu for arm-unknown-linux-gnueabihf. Charles Baylis PR target/49423 * config/arm/arm-protos.h (arm_legitimate_address_p, arm_is_constant_pool_ref): Add prototypes. * config/arm/arm.c (arm_legitimate_address_p): Remove static. (arm_is_constant_pool_ref) New function. * config/arm/arm.md (unaligned_loadhis, arm_zero_extendhisi2_v6, arm_zero_extendqisi2_v6): Use Uh constraint for memory operand. (arm_extendhisi2, arm_extendhisi2_v6): Use Uh constraint for memory operand and remove pool_range and neg_pool_range attributes. (arm_extendqihi_insn, arm_extendqisi, arm_extendqisi_v6): Remove pool_range and neg_pool_range attributes. * config/arm/constraints.md (Uh): New constraint. (Uq): Don't allow constant pool references. OK for trunk? 0001-Fix-push_minipool_fix-ICE.patch Description: application/download
Re: [PATCH] [ARM] [RFC] Fix longstanding push_minipool_fix ICE (PR49423, lp1296601)
On 3 July 2014 15:26, Richard Earnshaw wrote: > So OK, but if you're considering back-ports, I suggest you let it bake a > while on trunk first. Committed as r212303.
Re: [PATCH][AArch64] Add bounds checking to vqdm*_lane intrinsics via a qualifier that also flips endianness
On 19 November 2014 16:42, Alan Lawrence wrote: > Of the calls to aarch64_simd_lane_bounds that remain in aarch64-simd.md: > aarch64_get_lanedi > aarch64_im_lane_boundsi > aarch64_ld{2,3,4}_lane > > I'll handle the first two. Do we have a plan for ld2/3/4 ? I'm happy to do those Charles
Re: [PATCH][AArch64] Add bounds checking to vqdm*_lane intrinsics via a qualifier that also flips endianness
On 19 November 2014 16:51, Marcus Shawcroft wrote: > > In the meantime could you respin the patch to drop the default args > and pass explicit NULL ? Done. Charles Baylis PR target/63870 * config/aarch64/aarch64-builtins.c (aarch64_simd_expand_args): Pass expression to aarch64_simd_lane_bounds. * config/aarch64/aarch64-protos.h (aarch64_simd_lane_bounds): Update prototype. * config/aarch64/aarch64.c (aarch64_simd_lane_bounds): Add exp parameter. Report calling function in error message if exp is non-NULL. 0001-Aarch64-Report-inline-site-for-SIMD-builtins.patch Description: application/download
Re: [PATCH][AArch64] Add bounds checking to vqdm*_lane intrinsics via a qualifier that also flips endianness
On 20 November 2014 07:49, Marcus Shawcroft wrote: > On 19 November 2014 19:05, Charles Baylis wrote: > >> PR target/63870 >> * config/aarch64/aarch64-builtins.c (aarch64_simd_expand_args): Pass >> expression to aarch64_simd_lane_bounds. >> * config/aarch64/aarch64-protos.h (aarch64_simd_lane_bounds): Update >> prototype. >> * config/aarch64/aarch64.c (aarch64_simd_lane_bounds): Add exp >> parameter. Report calling function in error message if exp is >> non-NULL. > > These needs to be updated to reflect the changes in the last revision > of the patch where NULL is passed explicitly. Otherwise OK, commit it > with a fixed ChangeLog. Sorry... more haste, less speed. Committed as r217885, with the following ChangeLog: 2014-11-20 Charles Baylis PR target/63870 * config/aarch64/aarch64-builtins.c (aarch64_simd_expand_args): Pass expression to aarch64_simd_lane_bounds. * config/aarch64/aarch64-protos.h (aarch64_simd_lane_bounds): Update prototype. * config/aarch64/aarch64-simd.md: (aarch64_combinez): Update call to aarch64_simd_lane_bounds. (aarch64_get_lanedi): Likewise. (aarch64_ld2_lane): Likewise. (aarch64_ld3_lane): Likewise. (aarch64_ld4_lane): Likewise. (aarch64_im_lane_boundsi): Likewise. * config/aarch64/aarch64.c (aarch64_simd_lane_bounds): Add exp parameter. Report calling function in error message if exp is non-NULL.
Re: [PATCH 1/4] vldN_lane error message enhancements (Q registers)
On 14 April 2015 at 14:45, Alan Lawrence wrote: > Assuming/hoping that this patch is proposed for new stage 1 ;), IIRC the approach of using __builtin_aarch64_im_lane_boundsi doesn't work (results in double error messages), and so the patch needs to be rewritten to avoid it. However, thanks for your comments, I'll reflect those in the next version of the patch. Thanks Charles
Re: [PATCH] [AArch64] PR63870 Improve error messages for NEON single lane memory access intrinsics
On 8 June 2015 at 10:33, Alan Lawrence wrote: > Thanks for working on this! > > I'd been fiddling around with a patch with some similar elements to this, > but many trials with union types, subregs, etc., all worsened the register > allocation and led to more unnecessary shuffling / moves. Kugan has been looking into this at Linaro. We should avoid duplicating effort here. > The only real > thing I tried which you don't do here, was to introduce a set_dreg expander > to clean up some of those macro definitions in arm_neon.h. That could easily > follow in a separate patch if desired! I'd prefer that to be a separate step. > So your patch looks good to me. > > A couple of style nits: > > --- a/gcc/config/aarch64/aarch64-builtins.c > +++ b/gcc/config/aarch64/aarch64-builtins.c > @@ -128,7 +128,9 @@ enum aarch64_type_qualifiers >/* Polynomial types. */ >qualifier_poly = 0x100, >/* Lane indices - must be in range, and flipped for bigendian. */ > - qualifier_lane_index = 0x200 > + qualifier_lane_index = 0x200, > + /* Lane indices for single lane structure loads and stores */ > + qualifier_struct_load_store_lane_index = 0x400 > }; > > should be ...'loads and stores. */' > > also the dg-error messages in the testsuite, do not need to be on the same > line as the statement generating the error, because the trailing 0 tells dg > that the position/line number doesn't matter (i.e. dg should allow the error > to be reported at any line); so these could be brought under 80 chars. OK, thanks. I'll re-spin once I've tested on big endian. > Oh, have you tested bigendian? I have started a bigendian build on our validation infrastructure here. Thanks for the review Charles
Re: [PATCH] [AArch64] PR63870 Improve error messages for NEON single lane memory access intrinsics
Ping? On 11 June 2015 at 00:42, Charles Baylis wrote: > [resending, as previous version was rejected from the list for html] > > On 11 June 2015 at 00:38, Charles Baylis wrote: >> >> >> On 8 June 2015 at 10:44, Alan Lawrence wrote: >>> Oh, have you tested bigendian? >> >> No regressions on aarch64_be-none-elf. >> >> I re-spinned the patch with the cosmetic changes Alan suggested (comment >> punctuation, fix >80 column lines in test cases) >> >> ChangeLog remains as before. >> >> Ok for trunk? >>
Re: [PATCH] [ARM] Post-indexed addressing for NEON memory access
On 18 June 2014 at 11:06, Ramana Radhakrishnan wrote: > On Tue, Jun 17, 2014 at 4:03 PM, Charles Baylis > wrote: >> Your mention of larger vector modes prompted me to check that the >> patch has the desired result with them. In fact, the costs are >> estimated incorrectly which means the post_modify pattern is not used. >> The attached patch fixes that. (used in combination with my original >> patch) >> >> >> 2014-06-15 Charles Baylis >> >> * config/arm/arm.c (arm_new_rtx_costs): Reduce cost for mem with >> embedded side effects. > > I'm not too thrilled with putting in more special cases that are not > table driven in there. Can you file a PR with some testcases that show > this so that we don't forget and CC me on it please ? I created https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61551 at the time. I've come back to look at this again and would like to fix it in this release cycle. I still don't really understand what you mean by table-driven in this context. Do you still hold this view, and if so, could you describe what you'd like to see instead of this patch?
[PATCH 1/3] [ARM] PR63870 NEON error messages
gcc/ChangeLog: Charles Baylis * config/arm/arm-builtins.c (enum arm_type_qualifiers): New enumerators qualifier_lane_index, qualifier_struct_load_store_lane_index. (arm_expand_neon_args): New parameter. Remove ellipsis. Handle NEON argument qualifiers. (arm_expand_neon_builtin): Handle NEON argument qualifiers. * config/arm/arm-protos.h: (arm_neon_lane_bounds) New prototype. * config/arm/arm.c (arm_neon_lane_bounds): New function. * config/arm/arm.h (ENDIAN_LANE_N): New macro. Change-Id: Iaa14d8736879fa53776319977eda2089f0a26647 --- gcc/config/arm/arm-builtins.c | 65 --- gcc/config/arm/arm-protos.h | 4 +++ gcc/config/arm/arm.c | 20 + gcc/config/arm/arm.h | 3 ++ 4 files changed, 75 insertions(+), 17 deletions(-) diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c index f960e0a..8f1253e 100644 --- a/gcc/config/arm/arm-builtins.c +++ b/gcc/config/arm/arm-builtins.c @@ -77,7 +77,11 @@ enum arm_type_qualifiers /* qualifier_const_pointer | qualifier_map_mode */ qualifier_const_pointer_map_mode = 0x86, /* Polynomial types. */ - qualifier_poly = 0x100 + qualifier_poly = 0x100, + /* Lane indices - must be in range, and flipped for bigendian. */ + qualifier_lane_index = 0x200, + /* Lane indices for single lane structure loads and stores. */ + qualifier_struct_load_store_lane_index = 0x400 }; /* The qualifier_internal allows generation of a unary builtin from @@ -1927,6 +1931,8 @@ arm_expand_unop_builtin (enum insn_code icode, typedef enum { NEON_ARG_COPY_TO_REG, NEON_ARG_CONSTANT, + NEON_ARG_LANE_INDEX, + NEON_ARG_STRUCT_LOAD_STORE_LANE_INDEX, NEON_ARG_MEMORY, NEON_ARG_STOP } builtin_arg; @@ -1984,9 +1990,9 @@ neon_dereference_pointer (tree exp, tree type, machine_mode mem_mode, /* Expand a Neon builtin. */ static rtx arm_expand_neon_args (rtx target, machine_mode map_mode, int fcode, - int icode, int have_retval, tree exp, ...) + int icode, int have_retval, tree exp, + builtin_arg *args) { - va_list ap; rtx pat; tree arg[SIMD_MAX_BUILTIN_ARGS]; rtx op[SIMD_MAX_BUILTIN_ARGS]; @@ -2001,13 +2007,11 @@ arm_expand_neon_args (rtx target, machine_mode map_mode, int fcode, || !(*insn_data[icode].operand[0].predicate) (target, tmode))) target = gen_reg_rtx (tmode); - va_start (ap, exp); - formals = TYPE_ARG_TYPES (TREE_TYPE (arm_builtin_decls[fcode])); for (;;) { - builtin_arg thisarg = (builtin_arg) va_arg (ap, int); + builtin_arg thisarg = args[argc]; if (thisarg == NEON_ARG_STOP) break; @@ -2043,17 +2047,46 @@ arm_expand_neon_args (rtx target, machine_mode map_mode, int fcode, op[argc] = copy_to_mode_reg (mode[argc], op[argc]); break; +case NEON_ARG_STRUCT_LOAD_STORE_LANE_INDEX: + gcc_assert (argc > 1); + if (CONST_INT_P (op[argc])) + { + arm_neon_lane_bounds (op[argc], 0, + GET_MODE_NUNITS (map_mode), exp); + /* Keep to GCC-vector-extension lane indices in the RTL. */ + op[argc] = + GEN_INT (ENDIAN_LANE_N (map_mode, INTVAL (op[argc]))); + } + goto constant_arg; + +case NEON_ARG_LANE_INDEX: + /* Must be a previous operand into which this is an index. */ + gcc_assert (argc > 0); + if (CONST_INT_P (op[argc])) + { + machine_mode vmode = insn_data[icode].operand[argc - 1].mode; + arm_neon_lane_bounds (op[argc], + 0, GET_MODE_NUNITS (vmode), exp); + /* Keep to GCC-vector-extension lane indices in the RTL. */ + op[argc] = GEN_INT (ENDIAN_LANE_N (vmode, INTVAL (op[argc]))); + } + /* Fall through - if the lane index isn't a constant then +the next case will error. */ case NEON_ARG_CONSTANT: +constant_arg: if (!(*insn_data[icode].operand[opno].predicate) (op[argc], mode[argc])) - error_at (EXPR_LOCATION (exp), "incompatible type for argument %d, " - "expected %", argc + 1); + { + error ("%Kargument %d must be a constant immediate", +exp, argc + 1); + return const0_rtx; + } break; + case NEON_ARG_MEMORY: /* Check if expand failed. */ if (op[argc] == const0_rtx) { - va_end (ap); return 0; } gcc_assert (MEM_P (op[argc])); @@ -2076,8 +2
[PATCH 2/3] [ARM] PR63870 NEON error messages
gcc/ChangeLog: Charles Baylis * config/arm/arm-builtins.c: (arm_load1_qualifiers) Use qualifier_struct_load_store_lane_index. (arm_storestruct_lane_qualifiers) Likewise. * config/arm/neon.md: (neon_vld1_lane) Reverse lane numbers for big-endian. (neon_vst1_lane) Likewise. (neon_vld2_lane) Likewise. (neon_vst2_lane) Likewise. (neon_vld3_lane) Likewise. (neon_vst3_lane) Likewise. (neon_vld4_lane) Likewise. (neon_vst4_lane) Likewise. Change-Id: Ic39898d288701bc5b712490265be688f5620c4e2 --- gcc/config/arm/arm-builtins.c | 4 ++-- gcc/config/arm/neon.md| 49 +++ 2 files changed, 28 insertions(+), 25 deletions(-) diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c index 8f1253e..b7b7b12 100644 --- a/gcc/config/arm/arm-builtins.c +++ b/gcc/config/arm/arm-builtins.c @@ -145,7 +145,7 @@ arm_load1_qualifiers[SIMD_MAX_BUILTIN_ARGS] static enum arm_type_qualifiers arm_load1_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_none, qualifier_const_pointer_map_mode, - qualifier_none, qualifier_immediate }; + qualifier_none, qualifier_struct_load_store_lane_index }; #define LOAD1LANE_QUALIFIERS (arm_load1_lane_qualifiers) /* The first argument (return type) of a store should be void type, @@ -164,7 +164,7 @@ arm_store1_qualifiers[SIMD_MAX_BUILTIN_ARGS] static enum arm_type_qualifiers arm_storestruct_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_void, qualifier_pointer_map_mode, - qualifier_none, qualifier_immediate }; + qualifier_none, qualifier_struct_load_store_lane_index }; #define STORE1LANE_QUALIFIERS (arm_storestruct_lane_qualifiers) #define v8qi_UP V8QImode diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index 654d9d5..dbd5852 100644 --- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -4277,8 +4277,9 @@ UNSPEC_VLD1_LANE))] "TARGET_NEON" { - HOST_WIDE_INT lane = INTVAL (operands[3]); + HOST_WIDE_INT lane = ENDIAN_LANE_N(mode, INTVAL (operands[3])); HOST_WIDE_INT max = GET_MODE_NUNITS (mode); + operands[3] = GEN_INT (lane); if (lane < 0 || lane >= max) error ("lane out of range"); if (max == 1) @@ -4297,8 +4298,9 @@ UNSPEC_VLD1_LANE))] "TARGET_NEON" { - HOST_WIDE_INT lane = INTVAL (operands[3]); + HOST_WIDE_INT lane = ENDIAN_LANE_N(mode, INTVAL (operands[3])); HOST_WIDE_INT max = GET_MODE_NUNITS (mode); + operands[3] = GEN_INT (lane); int regno = REGNO (operands[0]); if (lane < 0 || lane >= max) error ("lane out of range"); @@ -4383,8 +4385,9 @@ UNSPEC_VST1_LANE))] "TARGET_NEON" { - HOST_WIDE_INT lane = INTVAL (operands[2]); + HOST_WIDE_INT lane = ENDIAN_LANE_N(mode, INTVAL (operands[2])); HOST_WIDE_INT max = GET_MODE_NUNITS (mode); + operands[2] = GEN_INT (lane); if (lane < 0 || lane >= max) error ("lane out of range"); if (max == 1) @@ -4403,7 +4406,7 @@ UNSPEC_VST1_LANE))] "TARGET_NEON" { - HOST_WIDE_INT lane = INTVAL (operands[2]); + HOST_WIDE_INT lane = ENDIAN_LANE_N(mode, INTVAL (operands[2])); HOST_WIDE_INT max = GET_MODE_NUNITS (mode); int regno = REGNO (operands[1]); if (lane < 0 || lane >= max) @@ -4412,8 +4415,8 @@ { lane -= max / 2; regno += 2; - operands[2] = GEN_INT (lane); } + operands[2] = GEN_INT (lane); operands[1] = gen_rtx_REG (mode, regno); if (max == 2) return "vst1.\t{%P1}, %A0"; @@ -4473,7 +4476,7 @@ UNSPEC_VLD2_LANE))] "TARGET_NEON" { - HOST_WIDE_INT lane = INTVAL (operands[3]); + HOST_WIDE_INT lane = ENDIAN_LANE_N(mode, INTVAL (operands[3])); HOST_WIDE_INT max = GET_MODE_NUNITS (mode); int regno = REGNO (operands[0]); rtx ops[4]; @@ -4482,7 +4485,7 @@ ops[0] = gen_rtx_REG (DImode, regno); ops[1] = gen_rtx_REG (DImode, regno + 2); ops[2] = operands[1]; - ops[3] = operands[3]; + ops[3] = GEN_INT (lane); output_asm_insn ("vld2.\t{%P0[%c3], %P1[%c3]}, %A2", ops); return ""; } @@ -4498,7 +4501,7 @@ UNSPEC_VLD2_LANE))] "TARGET_NEON" { - HOST_WIDE_INT lane = INTVAL (operands[3]); + HOST_WIDE_INT lane = ENDIAN_LANE_N(mode, INTVAL (operands[3])); HOST_WIDE_INT max = GET_MODE_NUNITS (mode); int regno = REGNO (operands[0]); rtx ops[4]; @@ -4588,7 +4591,7 @@ UNSPEC_VST2_LANE))] "TARGET_NEON" { - HOST_WIDE_INT lane = INTVAL (operands[2]); + HOST_WIDE_INT lane = ENDIAN_LANE_N(mode, INTVAL (operands[2])); HOST_WIDE_INT max = GET_MODE_NUNITS (mode); int regno = REGNO (operands[1]); rtx ops[4]; @@ -4597,7 +4600,7 @@ ops[0] = operands[0]; ops[1] = gen_rtx_REG (DImode, regno); ops[2] = gen_rtx_REG (DImode, regno +
[PATCH 3/3] [ARM] PR63870 NEON error messages
gcc/testsuite/ChangeLog: Charles Baylis * gcc.target/aarch64/advsimd-intrinsics/vld2_lane_f32_indices_1.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vld2_lane_f64_indices_1.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vld2_lane_p8_indices_1.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vld2_lane_s16_indices_1.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vld2_lane_s32_indices_1.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vld2_lane_s64_indices_1.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vld2_lane_s8_indices_1.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vld2_lane_u16_indices_1.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vld2_lane_u32_indices_1.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vld2_lane_u64_indices_1.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vld2_lane_u8_indices_1.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vld2q_lane_f32_indices_1.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vld2q_lane_f64_indices_1.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vld2q_lane_p8_indices_1.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vld2q_lane_s16_indices_1.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vld2q_lane_s32_indices_1.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vld2q_lane_s64_indices_1.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vld2q_lane_s8_indices_1.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vld2q_lane_u16_indices_1.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vld2q_lane_u32_indices_1.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vld2q_lane_u64_indices_1.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vld2q_lane_u8_indices_1.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vld3_lane_f32_indices_1.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vld3_lane_f64_indices_1.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vld3_lane_p8_indices_1.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vld3_lane_s16_indices_1.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vld3_lane_s32_indices_1.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vld3_lane_s64_indices_1.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vld3_lane_s8_indices_1.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vld3_lane_u16_indices_1.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vld3_lane_u32_indices_1.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vld3_lane_u64_indices_1.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vld3_lane_u8_indices_1.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vld3q_lane_f32_indices_1.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vld3q_lane_f64_indices_1.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vld3q_lane_p8_indices_1.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vld3q_lane_s16_indices_1.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vld3q_lane_s32_indices_1.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vld3q_lane_s64_indices_1.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vld3q_lane_s8_indices_1.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vld3q_lane_u16_indices_1.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vld3q_lane_u32_indices_1.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vld3q_lane_u64_indices_1.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vld3q_lane_u8_indices_1.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vld4_lane_f32_indices_1.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vld4_lane_f64_indices_1.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vld4_lane_p8_indices_1.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vld4_lane_s16_indices_1.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vld4_lane_s32_indices_1.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vld4_lane_s64_indices_1.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vld4_lane_s8_indices_1.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vld4_lane_u16_indices_1.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vld4_lane_u32_indices_1.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vld4_lane_u64_indices_1.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vld4_lane_u8_indices_1.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vld4q_lane_f32_indices_1.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vld4q_lane_f64_indices_1.c: New test. * gcc.target/aarch64/advsimd
[PATCH 0/3] [ARM] PR63870 improve error messages for NEON vldN_lane/vstN_lane
These patches are a port of the changes do the same thing for AArch64 (see https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01984.html) The first patch ports over some infrastructure, and the second converts the vldN_lane and vstN_lane intrinsics. The changes required for vget_lane and vset_lane will be done in a future patch. The third patch includes the test cases from the AArch64 version, except that the xfails for arm targets have been removed. If this series gets approved before the AArch64 patch, I will commit the tests with xfail for aarch64 targets. OK for trunk? Charles Baylis (3): [ARM] PR63870 Add qualifiers for NEON builtins [ARM] PR63870 Mark lane indices of vldN/vstN with appropriate qualifier [ARM] PR63870 Add test cases gcc/config/arm/arm-builtins.c | 69 -- gcc/config/arm/arm-protos.h| 4 ++ gcc/config/arm/arm.c | 20 +++ gcc/config/arm/arm.h | 3 + gcc/config/arm/neon.md | 49 +++ .../advsimd-intrinsics/vld2_lane_f32_indices_1.c | 15 + .../advsimd-intrinsics/vld2_lane_f64_indices_1.c | 16 + .../advsimd-intrinsics/vld2_lane_p8_indices_1.c| 15 + .../advsimd-intrinsics/vld2_lane_s16_indices_1.c | 15 + .../advsimd-intrinsics/vld2_lane_s32_indices_1.c | 15 + .../advsimd-intrinsics/vld2_lane_s64_indices_1.c | 16 + .../advsimd-intrinsics/vld2_lane_s8_indices_1.c| 15 + .../advsimd-intrinsics/vld2_lane_u16_indices_1.c | 15 + .../advsimd-intrinsics/vld2_lane_u32_indices_1.c | 15 + .../advsimd-intrinsics/vld2_lane_u64_indices_1.c | 16 + .../advsimd-intrinsics/vld2_lane_u8_indices_1.c| 15 + .../advsimd-intrinsics/vld2q_lane_f32_indices_1.c | 15 + .../advsimd-intrinsics/vld2q_lane_f64_indices_1.c | 16 + .../advsimd-intrinsics/vld2q_lane_p8_indices_1.c | 16 + .../advsimd-intrinsics/vld2q_lane_s16_indices_1.c | 15 + .../advsimd-intrinsics/vld2q_lane_s32_indices_1.c | 15 + .../advsimd-intrinsics/vld2q_lane_s64_indices_1.c | 16 + .../advsimd-intrinsics/vld2q_lane_s8_indices_1.c | 16 + .../advsimd-intrinsics/vld2q_lane_u16_indices_1.c | 15 + .../advsimd-intrinsics/vld2q_lane_u32_indices_1.c | 15 + .../advsimd-intrinsics/vld2q_lane_u64_indices_1.c | 16 + .../advsimd-intrinsics/vld2q_lane_u8_indices_1.c | 16 + .../advsimd-intrinsics/vld3_lane_f32_indices_1.c | 15 + .../advsimd-intrinsics/vld3_lane_f64_indices_1.c | 16 + .../advsimd-intrinsics/vld3_lane_p8_indices_1.c| 15 + .../advsimd-intrinsics/vld3_lane_s16_indices_1.c | 15 + .../advsimd-intrinsics/vld3_lane_s32_indices_1.c | 15 + .../advsimd-intrinsics/vld3_lane_s64_indices_1.c | 16 + .../advsimd-intrinsics/vld3_lane_s8_indices_1.c| 15 + .../advsimd-intrinsics/vld3_lane_u16_indices_1.c | 15 + .../advsimd-intrinsics/vld3_lane_u32_indices_1.c | 15 + .../advsimd-intrinsics/vld3_lane_u64_indices_1.c | 16 + .../advsimd-intrinsics/vld3_lane_u8_indices_1.c| 15 + .../advsimd-intrinsics/vld3q_lane_f32_indices_1.c | 15 + .../advsimd-intrinsics/vld3q_lane_f64_indices_1.c | 16 + .../advsimd-intrinsics/vld3q_lane_p8_indices_1.c | 16 + .../advsimd-intrinsics/vld3q_lane_s16_indices_1.c | 15 + .../advsimd-intrinsics/vld3q_lane_s32_indices_1.c | 15 + .../advsimd-intrinsics/vld3q_lane_s64_indices_1.c | 16 + .../advsimd-intrinsics/vld3q_lane_s8_indices_1.c | 16 + .../advsimd-intrinsics/vld3q_lane_u16_indices_1.c | 15 + .../advsimd-intrinsics/vld3q_lane_u32_indices_1.c | 15 + .../advsimd-intrinsics/vld3q_lane_u64_indices_1.c | 16 + .../advsimd-intrinsics/vld3q_lane_u8_indices_1.c | 16 + .../advsimd-intrinsics/vld4_lane_f32_indices_1.c | 15 + .../advsimd-intrinsics/vld4_lane_f64_indices_1.c | 16 + .../advsimd-intrinsics/vld4_lane_p8_indices_1.c| 15 + .../advsimd-intrinsics/vld4_lane_s16_indices_1.c | 15 + .../advsimd-intrinsics/vld4_lane_s32_indices_1.c | 15 + .../advsimd-intrinsics/vld4_lane_s64_indices_1.c | 16 + .../advsimd-intrinsics/vld4_lane_s8_indices_1.c| 15 + .../advsimd-intrinsics/vld4_lane_u16_indices_1.c | 15 + .../advsimd-intrinsics/vld4_lane_u32_indices_1.c | 15 + .../advsimd-intrinsics/vld4_lane_u64_indices_1.c | 16 + .../advsimd-intrinsics/vld4_lane_u8_indices_1.c| 15 + .../advsimd-intrinsics/vld4q_lane_f32_indices_1.c | 15 + .../advsimd-intrinsics/vld4q_lane_f64_indices_1.c | 16 + .../advsimd-intrinsics/vld4q_lane_p8_indices_1.c | 16 + .../advsimd-intrinsics/vld4q_lane_s16_indices_1.c | 15 + .../advsimd-intrinsics/vld4q_lane_s32_indices_1.c | 15 + .../advsimd-intrinsics/vld4q_lane_s64_indices_1.c | 16 + .../advsimd-intrinsics/vld4q_lane_s8_indices_1.c | 16
Re: [PATCH 1/3] [ARM] PR63870 NEON error messages
On 6 July 2015 at 11:18, Alan Lawrence wrote: > I note some parts of this duplicate my > https://gcc.gnu.org/ml/gcc-patches/2015-01/msg01422.html , which has been > pinged a couple of times. Both Charles' patch, and my two, contain parts the > other does not... To resolve the conflicts, I suggest that Alan's patches should be applied as-is first, and I'll rebase mine afterwards. ...and... > Further to that - the main difference/conflict between Charles' patch and mine > looks to be that I added the const_tree parameter to the existing > neon_lane_bounds method, whereas Charles' patch adds a new method > arm_neon_lane_bounds. ... I'll clean up this duplication when I do.
Re: [PATCH][AArch64] Add bounds checking to vqdm*_lane intrinsics via a qualifier that also flips endianness
Resending as text/plain On 11 November 2014 15:14, Charles Baylis wrote: > > > On 6 November 2014 10:19, Alan Lawrence wrote: >> >> This generates out-of-range errors at compile- (rather than assemble-)time >> for the vqdm*_lane intrinsics, and also provides a single place to do >> bigendian lane-swapping for all those intrinsics (and others to follow in >> later patches). This allows us to remove many define_expands that just do a >> range-check and endian-swap before outputting the RTL for a corresponding >> "_internal" insn. >> >> Changes to aarch64-simd.md are not as big as they look, they are highly >> repetitive, like the code they are removing! Testcases are also repetitive, >> as unfortunately dg-error doesn't care *how many* errors there were matching >> it's pattern, as long as at least 1, hence having to separate each into own >> file - the last "0" in the dg-error disables the line-number checking, as >> the line numbers in our error messages refer to lines within arm_neon.h >> rather than within the test case. (They do at least mention the user >> function containing the call to the intrinsic.) >> >> Ok for trunk? >> > > It looks like there are a few places where you have 8 spaces where a tab > ought to be. Other than that, it looks good to me (but I can't approve) > > I am looking making errors found in arm_neon.h a bit more user friendly, > which depends on checking bounds on constant int parameters as you've done > here. > > Do you plan to do similar changes for loads/stores/shifts, and also for the > ARM back-end? I can help out if you don't already have patches in > development. > > Charles
Re: [PATCH][AArch64] Add bounds checking to vqdm*_lane intrinsics via a qualifier that also flips endianness
On 11 November 2014 15:25, Alan Lawrence wrote: > [Resending in gcc-patches-accepted form] > > I'm working on a patch for vget_lane (that removes the be_checked_get_lane > thing which isn't an intrinsic). Other than that, no not yet - loads and > stores I was thinking to wait until David Sherwood + Alan Hayward's patches > have been settled, but there's still ARM, indeed. > > If you have any way/ideas to get better error messages (i.e. line numbers), > that'd be particularly good, tho :) This is the best idea I have at the moment... The attached patch starts to improve the error messages for NEON intrinsics, by adding %K to the error message string, as suggested by Jakub a while ago. == error message without this patch: In file included from /home/cbaylis/srcarea/gcc/gcc-git/gcc/testsuite/gcc.target/aarch64/simd/vqrdmulhq_laneq_s16_indices_1.c:3:0: /tmp/arm_neon.h: In function ‘main’: /tmp/arm_neon.h:12019:10: error: lane -1 out of range 0 - 7 return __builtin_aarch64_sqrdmulh_laneqv8hi (__a, __b, __c); == error message with this patch (gives point of use of the NEON intrinsic) In file included from /home/cbaylis/srcarea/gcc/gcc-git/gcc/testsuite/gcc.target/aarch64/simd/vqrdmulhq_laneq_s16_indices_1.c:3:0: In function ‘vqrdmulhq_laneq_s16’, inlined from ‘main’ at /home/cbaylis/srcarea/gcc/gcc-git/gcc/testsuite/gcc.target/aarch64/simd/vqrdmulhq_laneq_s16_indices_1.c:17:3: /tmp/arm_neon.h:12019:10: error: lane -1 out of range 0 - 7 return __builtin_aarch64_sqrdmulh_laneqv8hi (__a, __b, __c); This patch depends on Alan's patch (upthread). If conflicts were resolved, it could be applied without, but would have no effect. OK for trunk? Further clean up (more work like Alan's patch) is needed to address the other intrinsics which have arguments with constant integer range constraints (vget_lane/vset_lane/vldN_lane/vstN_lane/shifts. Richard, Marcus: is such clean up suitable for after stage 1 closes? Charles Baylis * config/aarch64/aarch64-builtins.c (aarch64_simd_expand_args): Pass expression to aarch64_simd_lane_bounds. * config/aarch64/aarch64-protos.h (aarch64_simd_lane_bounds): Update prototype. * config/aarch64/aarch64.c (aarch64_simd_lane_bounds): Add exp parameter. Report calling function in error message if exp is non-NULL. 0001-Aarch64-Report-inline-site-for-SIMD-builtins.patch Description: application/download
Re: [PATCH][AArch64] Add bounds checking to vqdm*_lane intrinsics via a qualifier that also flips endianness
On 12 November 2014 15:35, Alan Lawrence wrote: > Nice! One nit - can the extra "tree" argument be a "const_tree" ? - I'll > defer to the maintainers on the use of C++ default arguments in the AArch64 > backend. But LGTM. Thanks, good catch. The default parameter will go away once all of the calls in the machine description are removed. I've respun the patch with const_tree. 0001-Aarch64-Report-inline-site-for-SIMD-builtins.patch Description: application/download
[AArch64, testsuite] gcc.target/aarch64/fmul_fcvt_1.c: ilp32 fixes
This test includes the implicit assumption that the 'long' type on AArch64 is a 64 bit type. This is not the case for ILP32, so use 'long long' instead. Shows the expected new PASSes on aarch64-linux-gnu_ilp32, no regressions on aarch64-linux-gnu, gcc/testsuite: Charles Baylis * gcc.target/aarch64/fmul_fcvt_1.c (lsffoo##__a): Rename to... (llsffoo##__a): ... and make return type long long. (ulsffoo##__a): Rename to... (ullsffoo##__a): ... and make return type unsigned long long. From 42632399661326b850e40ededc61bb105421b828 Mon Sep 17 00:00:00 2001 From: Charles Baylis Date: Mon, 23 Oct 2017 17:08:36 +0100 Subject: [PATCH 1/4] [AArch64] gcc.target/aarch64/fmul_fcvt_1.c: ilp32 fixes Charles Baylis * gcc.target/aarch64/fmul_fcvt_1.c (lsffoo##__a): Rename to... (llsffoo##__a): ... and make return type long long. (ulsffoo##__a): Rename to... (ullsffoo##__a): ... and make return type unsigned long long. --- gcc/testsuite/gcc.target/aarch64/fmul_fcvt_1.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/gcc/testsuite/gcc.target/aarch64/fmul_fcvt_1.c b/gcc/testsuite/gcc.target/aarch64/fmul_fcvt_1.c index f78f6ee..3be182d 100644 --- a/gcc/testsuite/gcc.target/aarch64/fmul_fcvt_1.c +++ b/gcc/testsuite/gcc.target/aarch64/fmul_fcvt_1.c @@ -14,14 +14,14 @@ usffoo##__a (float x) \ return x * __a##.0f; \ } \ \ -long \ -lsffoo##__a (float x) \ +long long \ +llsffoo##__a (float x) \ { \ return x * __a##.0f; \ } \ \ -unsigned long \ -ulsffoo##__a (float x) \ +unsigned long long \ +ullsffoo##__a (float x) \ { \ return x * __a##.0f; \ } @@ -101,9 +101,9 @@ do\ __builtin_abort (); \ if (usffoo##__a (__b) != (unsigned int)(__b * __a)) \ __builtin_abort (); \ -if (lsffoo##__a (__b) != (long long)(__b * __a)) \ +if (llsffoo##__a (__b) != (long long)(__b * __a)) \ __builtin_abort (); \ -if (ulsffoo##__a (__b) != (unsigned long long)(__b * __a)) \ +if (ullsffoo##__a (__b) != (unsigned long long)(__b * __a)) \ __builtin_abort (); \ } while (0) -- 2.7.4
[AArch64, testsuite] gcc.target/aarch64/extend.c: xfails for ilp32
In ILP32, GCC fails to merge pointer arithmetic into the addressing mode of a load instruction, as add w0, w0, w1, lsl 2 ldr w0, [x0] is not equivalent to: ldr w0, [x0, w1, lsl 2] Shows the expected FAIL->XFAILs on aarch64-linux-gnu_ilp32, no regressions on aarch64-linux-gnu. gcc/testsuite: Charles Baylis * gcc.target/aarch64/extend.c (ldr_uxtw): Add xfail for ilp32. (ldr_uxtw0): Likewise. (ldr_sxtw): Likewise. (ldr_sxtw0): Likewise. From 70d43eb4f783d434e7996ebdde40b4ffea4a4a20 Mon Sep 17 00:00:00 2001 From: Charles Baylis Date: Tue, 24 Oct 2017 14:22:11 +0100 Subject: [PATCH 2/4] [AArch64] gcc.target/aarch64/extend.c: xfails for ilp32 Charles Baylis * gcc.target/aarch64/extend.c (ldr_uxtw): Add xfail for ilp32. (ldr_uxtw0): Likewise. (ldr_sxtw): Likewise. (ldr_sxtw0): Likewise. --- gcc/testsuite/gcc.target/aarch64/extend.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/gcc/testsuite/gcc.target/aarch64/extend.c b/gcc/testsuite/gcc.target/aarch64/extend.c index f399e55..a9eb852 100644 --- a/gcc/testsuite/gcc.target/aarch64/extend.c +++ b/gcc/testsuite/gcc.target/aarch64/extend.c @@ -4,28 +4,28 @@ int ldr_uxtw (int *arr, unsigned int i) { - /* { dg-final { scan-assembler "ldr\tw\[0-9\]+,.*uxtw #?2]" } } */ + /* { dg-final { scan-assembler "ldr\tw\[0-9\]+,.*uxtw #?2]" { xfail { aarch64*-*-* && ilp32 } } } } */ return arr[i]; } int ldr_uxtw0 (char *arr, unsigned int i) { - /* { dg-final { scan-assembler "ldr\tw\[0-9\]+,.*uxtw]" } } */ + /* { dg-final { scan-assembler "ldr\tw\[0-9\]+,.*uxtw]" { xfail { aarch64*-*-* && ilp32 } } } } */ return arr[i]; } int ldr_sxtw (int *arr, int i) { - /* { dg-final { scan-assembler "ldr\tw\[0-9\]+,.*sxtw #?2]" } } */ + /* { dg-final { scan-assembler "ldr\tw\[0-9\]+,.*sxtw #?2]" { xfail { aarch64*-*-* && ilp32 } } } } */ return arr[i]; } int ldr_sxtw0 (char *arr, int i) { - /* { dg-final { scan-assembler "ldr\tw\[0-9\]+,.*sxtw]" } } */ + /* { dg-final { scan-assembler "ldr\tw\[0-9\]+,.*sxtw]" { xfail { aarch64*-*-* && ilp32 } } } } */ return arr[i]; } -- 2.7.4
[AArch64, testsuite] gfortran.dg/ieee/ieee_8.f90: xfail for aarch64+ilp32
The test is already marked xfail for aarch64*-*-gnu, but this needs to be changed to aarch64*-*-gnu* in order to match aarch64-linux-gnu_ilp32. Test was previously xfail'd in [1]. Shows the expected FAIL->XFAILs on aarch64-linux-gnu_ilp32. gcc/testsuite: Charles Baylis * gfortran.dg/ieee/ieee_8.f90: xfail for aarch64*-*-gnu* [1] https://gcc.gnu.org/ml/gcc-patches/2016-11/msg02188.html From 5e877e35665eb37488da407d1b45d23b84d3803d Mon Sep 17 00:00:00 2001 From: Charles Baylis Date: Tue, 24 Oct 2017 14:46:42 +0100 Subject: [PATCH 3/4] [AArch64] gfortran.dg/ieee/ieee_8.f90: xfail for aarch64+ilp32 2017-10-24 Charles Baylis * gfortran.dg/ieee/ieee_8.f90: xfail for aarch64*-*-gnu* --- gcc/testsuite/gfortran.dg/ieee/ieee_8.f90 | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/gfortran.dg/ieee/ieee_8.f90 b/gcc/testsuite/gfortran.dg/ieee/ieee_8.f90 index a47f9c1..0a3213a 100644 --- a/gcc/testsuite/gfortran.dg/ieee/ieee_8.f90 +++ b/gcc/testsuite/gfortran.dg/ieee/ieee_8.f90 @@ -1,4 +1,4 @@ -! { dg-do run { xfail aarch64*-*-gnu arm*-*-gnueabi arm*-*-gnueabihf } } +! { dg-do run { xfail aarch64*-*-gnu* arm*-*-gnueabi arm*-*-gnueabihf } } ! XFAIL because of PR libfortran/78449. module foo -- 2.7.4
[AArch64, testsuite] gcc.target/aarch64/symbol-range.c: skip for ilp32
This test relies on an object size >4GB, so cannot be compiled for ILP32. Shows the expected FAIL->UNSUPPORTED on aarch64-linux-gnu_ilp32, no regressions for aarch64-linux-gnu. Charles Baylis * gcc.target/aarch64/symbol-range.c: Add dg-skip-if for ilp32 targets. From 7ac10484589cdee75a1224e559055f7cbf98b4e2 Mon Sep 17 00:00:00 2001 From: Charles Baylis Date: Tue, 24 Oct 2017 15:25:32 +0100 Subject: [PATCH 4/4] [AArch64] gcc.target/aarch64/symbol-range.c: skip for ilp32 Charles Baylis * gcc.target/aarch64/symbol-range.c: Add dg-skip-if for ilp32 targets. --- gcc/testsuite/gcc.target/aarch64/symbol-range.c | 1 + 1 file changed, 1 insertion(+) diff --git a/gcc/testsuite/gcc.target/aarch64/symbol-range.c b/gcc/testsuite/gcc.target/aarch64/symbol-range.c index 6574cf4..4379dbb 100644 --- a/gcc/testsuite/gcc.target/aarch64/symbol-range.c +++ b/gcc/testsuite/gcc.target/aarch64/symbol-range.c @@ -1,5 +1,6 @@ /* { dg-do compile } */ /* { dg-options "-O3 -save-temps -mcmodel=small" } */ +/* { dg-skip-if "" { ilp32 } } */ int fixed_regs[0x2ULL]; -- 2.7.4
Re: [PATCH 1/2] [ARM] Refactor costs calculation for MEM.
On 9 June 2017 at 14:59, Richard Earnshaw (lists) wrote: > On 21/02/17 16:54, charles.bay...@linaro.org wrote: >> From: Charles Baylis >> >> This patch moves the calculation of costs for MEM into a >> separate function, and reforms the calculation into two >> parts. Firstly any additional cost of the addressing mode >> is calculated, and then the cost of the memory access itself >> is added. >> >> In this patch, the calculation of the cost of the addressing >> mode is left as a placeholder, to be added in a subsequent >> patch. >> >> gcc/ChangeLog: >> >> Charles Baylis >> >> * config/arm/arm.c (arm_mem_costs): New function. >> (arm_rtx_costs_internal): Use arm_mem_costs. > > I like the idea of this patch, but it needs further work... > > Comments inline. > > R. > >> >> Change-Id: I99e93406ea39ee31f71c7bf428ad3e127b7a618e >> --- >> gcc/config/arm/arm.c | 66 >> +--- >> 1 file changed, 42 insertions(+), 24 deletions(-) >> >> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c >> index 6cae178..7f002f1 100644 >> --- a/gcc/config/arm/arm.c >> +++ b/gcc/config/arm/arm.c >> @@ -9072,6 +9072,47 @@ arm_unspec_cost (rtx x, enum rtx_code /* outer_code >> */, bool speed_p, int *cost) >> } \ >> while (0); >> >> +/* Helper function for arm_rtx_costs_internal. Calculates the cost of a MEM, >> + considering the costs of the addressing mode and memory access >> + separately. */ >> +static bool >> +arm_mem_costs (rtx x, const struct cpu_cost_table *extra_cost, >> +int *cost, bool speed_p) >> +{ >> + machine_mode mode = GET_MODE (x); >> + if (flag_pic >> + && GET_CODE (XEXP (x, 0)) == PLUS >> + && will_be_in_index_register (XEXP (XEXP (x, 0), 1))) >> +/* This will be split into two instructions. Add the cost of the >> + additional instruction here. The cost of the memory access is >> computed >> + below. See arm.md:calculate_pic_address. */ >> +*cost = COSTS_N_INSNS (1); >> + else >> +*cost = 0; >> + >> + /* Calculate cost of the addressing mode. */ >> + if (speed_p) >> + { > > This patch needs to be reformatted in the GNU style (indentation of > braces, braces and else clauses on separate lines etc). Done. >> +/* TODO: Add table-driven costs for addressing modes. */ > > You need to sort out the comment. What's missing here? What's missing is patch 2... I've updated the comment for clarity. >> + } >> + >> + /* cost of memory access */ >> + if (speed_p) >> + { >> +/* data transfer is transfer size divided by bus width. */ >> +int bus_width = arm_arch7 ? 8 : 4; > > Basing bus width on the architecture is a bit too simplistic. Instead > this should be a parameter that comes from the CPU cost tables, based on > the current tune target. This was actually Ramana's suggestion, so I've left it as-is in this patch. If necessary, I think it's better to move this to a table in a separate patch, as I'll need to guess the correct bus width for a number of CPUs and will probably get some wrong. >> +*cost += COSTS_N_INSNS((GET_MODE_SIZE (mode) + bus_width - 1) / >> bus_width); > > Use CEIL (from system.h) Done. Updated patch attached. From 18629835ba12fdfa693e2f9492a5fc23d95ef165 Mon Sep 17 00:00:00 2001 From: Charles Baylis Date: Wed, 8 Feb 2017 16:52:10 + Subject: [PATCH 1/3] [ARM] Refactor costs calculation for MEM. This patch moves the calculation of costs for MEM into a separate function, and reforms the calculation into two parts. Firstly any additional cost of the addressing mode is calculated, and then the cost of the memory access itself is added. In this patch, the calculation of the cost of the addressing mode is left as a placeholder, to be added in a subsequent patch. gcc/ChangeLog: Charles Baylis * config/arm/arm.c (arm_mem_costs): New function. (arm_rtx_costs_internal): Use arm_mem_costs. Change-Id: I99e93406ea39ee31f71c7bf428ad3e127b7a618e --- gcc/config/arm/arm.c | 67 1 file changed, 42 insertions(+), 25 deletions(-) diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index fa3e2fa..13cd421 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -9198,8 +9198,48 @@ arm_unspec_cost (rtx x, enum rtx_code /* outer_code */, bool speed_p, int *cost) }\ while (0); +
Re: [PATCH 2/2] [ARM] Add table of costs for AAarch32 addressing modes.
On 9 June 2017 at 15:13, Richard Earnshaw (lists) wrote: > On 21/02/17 16:54, charles.bay...@linaro.org wrote: >> From: Charles Baylis >> >> This patch adds support for modelling the varying costs of >> different addressing modes. The generic cost table treats >> all addressing modes as having equal cost. The cost table >> for Cortex-A57 is derived from >> http://infocenter.arm.com/help/topic/com.arm.doc.uan0015b/Cortex_A57_Software_Optimization_Guide_external.pdf >> and treats addressing modes with write-back as having a >> cost equal to one additional instruction. >> >> gcc/ChangeLog: >> >> Charles Baylis >> >> * config/arm/aarch-common-protos.h (enum arm_addr_mode_op): New. >> (struct addr_mode_cost_table): New. >> (struct cpu_cost_table): Add pointer to an addr_mode_cost_table. >> * config/arm/aarch-cost-tables.h: (generic_addr_mode_costs): New. >> (generic_extra_costs) Initialise aarch32_addr_mode. >> (cortexa53_extra_costs) Likewise. >> (addr_mode_costs_cortexa57) New. >> (cortexa57_extra_costs) Initialise aarch32_addr_mode. >> (exynosm1_extra_costs) Likewise. >> (xgene1_extra_costs) Likewise. >> (qdf24xx_extra_costs) Likewise. >> * config/arm/arm.c (cortexa9_extra_costs) Initialise aarch32_addr_mode. >> (cortexa9_extra_costs) Likewise. >> (cortexa8_extra_costs) Likewise. >> (cortexa5_extra_costs) Likewise. >> (cortexa7_extra_costs) Likewise. >> (cortexa12_extra_costs) Likewise. >> (cortexv7m_extra_costs) Likewise. >> (arm_mem_costs): Use table lookup to calculate cost of addressing >> mode. >> >> Change-Id: If71bd7c4f4bb876c5ed82dc28791130efb8bf89e >> --- >> gcc/config/arm/aarch-common-protos.h | 16 +++ >> gcc/config/arm/aarch-cost-tables.h | 54 ++ >> gcc/config/arm/arm.c | 56 >> ++-- >> 3 files changed, 113 insertions(+), 13 deletions(-) >> >> diff --git a/gcc/config/arm/aarch-common-protos.h >> b/gcc/config/arm/aarch-common-protos.h >> index 7c2bb4c..f6fcc94 100644 >> --- a/gcc/config/arm/aarch-common-protos.h >> +++ b/gcc/config/arm/aarch-common-protos.h >> @@ -130,6 +130,21 @@ struct vector_cost_table >>const int alu; >> }; >> >> +enum arm_addr_mode_op >> +{ >> + AMO_DEFAULT, >> + AMO_NO_WB, >> + AMO_WB, >> + AMO_MAX /* for array size */ > > Comment style: Capital letter at start, full stop and two spaces at the end. Done. > The enum and structure below should have some comments explaining what > they are for. Done. >> +const struct addr_mode_cost_table generic_addr_mode_costs = >> +{ >> + /* int */ >> + { 0, 0, 0 }, > > Elements need to be commented, otherwise minor changes to the contents > of the arrays is hard to update and maintain. Done. >> + /* Addressing mode */ > > Comment style. Done. >> -/* TODO: Add table-driven costs for addressing modes. */ >> +arm_addr_mode_op op_type; >> +switch (GET_CODE(XEXP (x, 0))) >> +{ >> +case REG: >> +default: >> + op_type = AMO_DEFAULT; > > Why would REG and default share the same cost? Presumably default is > too complex to recognize, but then it's unlikely to be cheap. Default covers literals in various forms of RTL, for which the cost is the same as regular, and PIC, which is handled in the original code above this section. >> + break; >> +case PLUS: >> +case MINUS: >> + op_type = AMO_NO_WB; > > GCC doesn't support MINUS in addresses, even though the architecture > could in theory handle this. I've noted that in a comment, but kept the "case MINUS:" in place. > Also, I think you need a different cost for scaled offset addressing, > and possibly even for different scaling factors and types of scaling. ... see below: >> + break; >> +case PRE_INC: >> +case PRE_DEC: >> +case POST_INC: >> +case POST_DEC: >> +case PRE_MODIFY: >> +case POST_MODIFY: >> + op_type = AMO_WB; > > Pre and post might also need separate entries (plus further entries for > scaling). A post operation might happen in parallel with the data > fetch, while a pre operation must happen before the address can be sent > to the load/store pipe. The {DEFAULT, NO_WB, WB} range is also Ramana's requested design. I think this is OK because it is sufficient to describe the currently