Hi Prathamesh, > -----Original Message----- > From: Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> > Sent: 05 May 2021 09:35 > To: Kyrylo Tkachov <kyrylo.tkac...@arm.com> > Cc: gcc Patches <gcc-patches@gcc.gnu.org> > Subject: Re: [PR97903][ARM] Missed optimization in lowering to vtst > > On Fri, 5 Feb 2021 at 15:42, Kyrylo Tkachov <kyrylo.tkac...@arm.com> > wrote: > > > > Hi Prathamesh, > > > > > -----Original Message----- > > > From: Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> > > > Sent: 05 February 2021 09:53 > > > To: gcc Patches <gcc-patches@gcc.gnu.org>; Kyrylo Tkachov > > > <kyrylo.tkac...@arm.com> > > > Subject: [PR97903][ARM] Missed optimization in lowering to vtst > > > > > > Hi, > > > For the following test-case: > > > > > > #include <arm_neon.h> > > > > > > uint8x8_t f1(int8x8_t a, int8x8_t b) { > > > return (uint8x8_t) ((a & b) != 0); > > > } > > > > > > gcc fails to lower test operation to vtst, and instead emits: > > > f1: > > > vand d0, d0, d1 > > > vceq.i8 d0, d0, #0 > > > vmvn d0, d0 > > > bx lr > > > > > > The attached patch tries to fix this by adding a pattern to match this > combine: > > > Trying 7, 8 -> 9: > > > 7: r120:V8QI=r123:V8QI&r124:V8QI > > > REG_DEAD r124:V8QI > > > REG_DEAD r123:V8QI > > > 8: r122:V8QI=-r120:V8QI==const_vector > > > REG_DEAD r120:V8QI > > > 9: r121:V8QI=~r122:V8QI > > > REG_DEAD r122:V8QI > > > Failed to match this instruction: > > > (set (reg:V8QI 121) > > > (plus:V8QI (eq:V8QI (and:V8QI (reg:V8QI 123) > > > (reg:V8QI 124)) > > > (const_vector:V8QI [ > > > (const_int 0 [0]) repeated x8 > > > ])) > > > (const_vector:V8QI [ > > > (const_int -1 [0xffffffffffffffff]) repeated x8 > > > ]))) > > > > > > Essentially it converts: > > > r120 = (and r123 r124) > > > r122 = (neg (eq r120 0)) > > > r121 = (not r122) > > > --> > > > r121 = vtst r123, r124 > > > > > > (I guess it simplifies (not (neg X)) to (plus X -1) above). > > > > > > Code-gen after patch: > > > f1: > > > vtst.8 d0, d0, d1 > > > bx lr > > > > > > > +(define_insn "neon_vtst_combine<mode>" > > + [(set (match_operand:VDQIW 0 "s_register_operand" "=w") > > + (plus:VDQIW > > + (eq:VDQIW > > + (and:VDQIW (match_operand:VDQIW 1 "s_register_operand" "w") > > + (match_operand:VDQIW 2 "s_register_operand" "w")) > > + (match_operand:VDQIW 3 "zero_operand" "i")) > > + (match_operand:VDQIW 4 "minus_one_operand" "i")))] > > + "TARGET_NEON" > > + "vtst.<V_sz_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2" > > +) > > > > This will need a type attribute for scheduling. > > > > > Bootstrapped + tested on arm-linux-gnueabihf, and > > > cross tested on arm*-*-*. > > > Does it look OK for next stage-1 ? > > > > It looks sensible to me for stage 1. > Hi Kyrill, > Would it be OK to commit the attached patch after testing passes ?
Ok. Thanks, Kyrill > > Thanks, > Prathamesh > > Thanks, > > Kyrill > > > > > > > > Thanks, > > > Prathamesh