https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98730
Bug ID: 98730 Summary: vceqzq_p64 does not generate vceq with immediate 0 Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: clyon at gcc dot gnu.org Target Milestone: --- vceqzq_p64 intrinsic was introduced with commit r11-6719 (g:63999d751df9bcde4ab9107edb4c635d274b248d) defined as: vceqzq_p64 (poly64x2_t __a) { poly64x2_t __b = vreinterpretq_p64_u32 (vdupq_n_u32 (0)); return vceqq_p64 (__a, __b); } which is similar to what vceqz_p64 does: vceqz_p64 (poly64x1_t __a) { poly64x1_t __b = vreinterpret_p64_u32 (vdup_n_u32 (0)); return vceq_p64 (__a, __b); } vceqzq_p64 uses vceqq_p64 which is defined as: vceqq_p64 (poly64x2_t __a, poly64x2_t __b) { poly64_t __high_a = vget_high_p64 (__a); poly64_t __high_b = vget_high_p64 (__b); uint64x1_t __high = vceq_p64 (__high_a, __high_b); poly64_t __low_a = vget_low_p64 (__a); poly64_t __low_b = vget_low_p64 (__b); uint64x1_t __low = vceq_p64 (__low_a, __low_b); return vcombine_u64 (__low, __high); } Unlike vceqz_p64, vceqzq_p64 does not use the vceq alternative with an immediate, as is shown by the vceqzq_p64.c testcase, which generates: ldr r3, .L3 vmov.i32 q10, #0 @ v4si vld1.64 {d16-d17}, [r3:64] vceq.i32 d18, d17, d21 vceq.i32 d16, d16, d21 vpmin.u32 d18, d18, d18 vpmin.u32 d16, d16, d16 vmov.f64 d17, d18 @ int vstr d16, [r3, #16] vstr d17, [r3, #24] bx lr By comparison, vceqz_p64 generates: ldr r3, .L3 vldr.64 d16, [r3] @ int vceq.i32 d16, d16, #0 vpmin.u32 d16, d16, d16 vstr.64 d16, [r3, #8] @ int bx lr The reload trace for vceqzq_p64 say: Choosing alt 0 in insn 19: (0) =w (1) w (2) w {neon_vceqv2si_insn} alt=0,overall=0,losers=0,rld_nregs=0 Choosing alt 0 in insn 15: (0) =w (1) w (2) w {neon_vceqv2si_insn} alt=0,overall=0,losers=0,rld_nregs=0 (insn 19 8 15 2 (set (reg:V2SI 48 d16 [orig:128 _18 ] [128]) (neg:V2SI (eq:V2SI (reg:V2SI 48 d16 [orig:139 v1 ] [139]) (reg:V2SI 54 d19 [ _5+8 ])))) "/home/christophe.lyon/src/GCC/builds/gcc-fsf-git-neon-intrinsics/tools/lib/gcc/arm-none-linux-gnueabihf/11.0.0/include/arm_neon.h":2404:22 1650 {neon_vceqv2si_insn} (expr_list:REG_EQUAL (neg:V2SI (eq:V2SI (subreg:V2SI (reg:DI 48 d16 [orig:139 v1 ] [139]) 0) (const_vector:V2SI [ (const_int 0 [0]) repeated x2 ]))) (nil))) (insn 15 19 20 2 (set (reg:V2SI 50 d17 [orig:121 _11 ] [121]) (neg:V2SI (eq:V2SI (reg:V2SI 50 d17 [orig:141 v2 ] [141]) (reg:V2SI 54 d19 [ _5+8 ])))) "/home/christophe.lyon/src/GCC/builds/gcc-fsf-git-neon-intrinsics/tools/lib/gcc/arm-none-linux-gnueabihf/11.0.0/include/arm_neon.h":2404:22 1650 {neon_vceqv2si_insn} (expr_list:REG_EQUAL (neg:V2SI (eq:V2SI (subreg:V2SI (reg:DI 50 d17 [orig:141 v2 ] [141]) 0) (const_vector:V2SI [ (const_int 0 [0]) repeated x2 ]))) (nil)))