On Thu, 12 Aug 2021 at 19:04, Christophe Lyon <christophe.lyon....@gmail.com> wrote: > > > > On Thu, Aug 12, 2021 at 1:54 PM Prathamesh Kulkarni > <prathamesh.kulka...@linaro.org> wrote: >> >> On Wed, 11 Aug 2021 at 22:23, Christophe Lyon >> <christophe.lyon....@gmail.com> wrote: >> > >> > >> > >> > On Thu, Jun 24, 2021 at 6:29 PM Kyrylo Tkachov via Gcc-patches >> > <gcc-patches@gcc.gnu.org> wrote: >> >> >> >> >> >> >> >> > -----Original Message----- >> >> > From: Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> >> >> > Sent: 24 June 2021 12:11 >> >> > To: gcc Patches <gcc-patches@gcc.gnu.org>; Kyrylo Tkachov >> >> > <kyrylo.tkac...@arm.com> >> >> > Subject: [ARM] PR66791: Replace builtins for vdup_n and vmov_n >> >> > intrinsics >> >> > >> >> > Hi, >> >> > This patch replaces builtins for vdup_n and vmov_n. >> >> > The patch results in regression for pr51534.c. >> >> > Consider following function: >> >> > >> >> > uint8x8_t f1 (uint8x8_t a) { >> >> > return vcgt_u8(a, vdup_n_u8(0)); >> >> > } >> >> > >> >> > code-gen before patch: >> >> > f1: >> >> > vmov.i32 d16, #0 @ v8qi >> >> > vcgt.u8 d0, d0, d16 >> >> > bx lr >> >> > >> >> > code-gen after patch: >> >> > f1: >> >> > vceq.i8 d0, d0, #0 >> >> > vmvn d0, d0 >> >> > bx lr >> >> > >> >> > I am not sure which one is better tho ? >> >> >> > >> > Hi Prathamesh, >> > >> > This patch introduces a regression on non-hardfp configs (eg >> > arm-linux-gnueabi or arm-eabi): >> > FAIL: gcc:gcc.target/arm/arm.exp=gcc.target/arm/pr51534.c >> > scan-assembler-times vmov.i32[ \t]+[dD][0-9]+, #0xffffffff 3 >> > FAIL: gcc:gcc.target/arm/arm.exp=gcc.target/arm/pr51534.c >> > scan-assembler-times vmov.i32[ \t]+[qQ][0-9]+, #4294967295 3 >> > >> > Can you fix this? >> The issue is, for following test: >> >> #include <arm_neon.h> >> >> uint8x8_t f1 (uint8x8_t a) { >> return vcge_u8(a, vdup_n_u8(0)); >> } >> >> armhf code-gen: >> f1: >> vmov.i32 d0, #0xffffffff @ v8qi >> bx lr >> >> arm softfp code-gen: >> f1: >> mov r0, #-1 >> mov r1, #-1 >> bx lr >> >> The code-gen for both is same upto split2 pass: >> >> (insn 10 6 11 2 (set (reg/i:V8QI 16 s0) >> (const_vector:V8QI [ >> (const_int -1 [0xffffffffffffffff]) repeated x8 >> ])) "foo.c":5:1 1052 {*neon_movv8qi} >> (expr_list:REG_EQUAL (const_vector:V8QI [ >> (const_int -1 [0xffffffffffffffff]) repeated x8 >> ]) >> (nil))) >> (insn 11 10 13 2 (use (reg/i:V8QI 16 s0)) "foo.c":5:1 -1 >> (nil)) >> >> and for softfp target, split2 pass splits the assignment to r0 and r1: >> >> (insn 15 6 16 2 (set (reg:SI 0 r0) >> (const_int -1 [0xffffffffffffffff])) "foo.c":5:1 740 >> {*thumb2_movsi_vfp} >> (nil)) >> (insn 16 15 11 2 (set (reg:SI 1 r1 [+4 ]) >> (const_int -1 [0xffffffffffffffff])) "foo.c":5:1 740 >> {*thumb2_movsi_vfp} >> (nil)) >> (insn 11 16 13 2 (use (reg/i:V8QI 0 r0)) "foo.c":5:1 -1 >> (nil)) >> >> I suppose we could use a dg-scan for r[0-9]+, #-1 for softfp targets ? >> > Yes, probably, or try with check-function-bodies. Hi, Sorry for the late response. Does the attached patch look OK ?
Thanks, Prathamesh > > Christophe > >> Thanks, >> Prathamesh >> > >> > Thanks >> > >> > Christophe >> > >> > >> >> >> >> I think they're equivalent in practice, in any case the patch itself is >> >> good (move away from RTL builtins). >> >> Ok. >> >> Thanks, >> >> Kyrill >> >> >> >> > >> >> > Also, this patch regressed bf16_dup.c on arm-linux-gnueabi, >> >> > which is due to a missed opt in lowering. I had filed it as >> >> > PR98435, and posted a fix for it here: >> >> > https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572648.html >> >> > >> >> > Thanks, >> >> > Prathamesh
diff --git a/gcc/testsuite/gcc.target/arm/pr51534.c b/gcc/testsuite/gcc.target/arm/pr51534.c index ac7f1ea4722..5e121f5fb99 100644 --- a/gcc/testsuite/gcc.target/arm/pr51534.c +++ b/gcc/testsuite/gcc.target/arm/pr51534.c @@ -64,8 +64,9 @@ GEN_COND_TESTS(vceq) /* { dg-final { scan-assembler-times "vceq\.i8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #0" 4 } } */ /* { dg-final { scan-assembler-times "vceq\.i16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #0" 4 } } */ /* { dg-final { scan-assembler-times "vceq\.i32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #0" 4 } } */ -/* { dg-final { scan-assembler-times "vmov\.i32\[ \]+\[dD\]\[0-9\]+, #0xffffffff" 3 } } */ -/* { dg-final { scan-assembler-times "vmov\.i32\[ \]+\[qQ\]\[0-9\]+, #4294967295" 3 } } */ +/* { dg-final { scan-assembler-times "vmov\.i32\[ \]+\[dD\]\[0-9\]+, #0xffffffff" 3 { target { arm_hard_ok } } } } */ +/* { dg-final { scan-assembler-times "vmov\.i32\[ \]+\[qQ\]\[0-9\]+, #4294967295" 3 { target { arm_hard_ok } } } } */ +/* { dg-final { scan-assembler-times "mov\[ \]+r\[0-9\]+, #-1" 6 { target { arm_softfp_ok } } } } */ /* And ensure we don't have unexpected output too. */ /* { dg-final { scan-assembler-not "vc\[gl\]\[te\]\.u\[0-9\]+\[ \]+\[qQdD\]\[0-9\]+, \[qQdD\]\[0-9\]+, #0" } } */