Hi Prathamesh,

> -----Original Message-----
> From: Prathamesh Kulkarni <prathamesh.kulka...@linaro.org>
> Sent: 05 May 2021 09:35
> To: Kyrylo Tkachov <kyrylo.tkac...@arm.com>
> Cc: gcc Patches <gcc-patches@gcc.gnu.org>
> Subject: Re: [PR97903][ARM] Missed optimization in lowering to vtst
> 
> On Fri, 5 Feb 2021 at 15:42, Kyrylo Tkachov <kyrylo.tkac...@arm.com>
> wrote:
> >
> > Hi Prathamesh,
> >
> > > -----Original Message-----
> > > From: Prathamesh Kulkarni <prathamesh.kulka...@linaro.org>
> > > Sent: 05 February 2021 09:53
> > > To: gcc Patches <gcc-patches@gcc.gnu.org>; Kyrylo Tkachov
> > > <kyrylo.tkac...@arm.com>
> > > Subject: [PR97903][ARM] Missed optimization in lowering to vtst
> > >
> > > Hi,
> > > For the following test-case:
> > >
> > > #include <arm_neon.h>
> > >
> > > uint8x8_t f1(int8x8_t a, int8x8_t b) {
> > >   return (uint8x8_t) ((a & b) != 0);
> > > }
> > >
> > > gcc fails to lower test operation to vtst, and instead emits:
> > > f1:
> > >         vand    d0, d0, d1
> > >         vceq.i8 d0, d0, #0
> > >         vmvn    d0, d0
> > >         bx      lr
> > >
> > > The attached patch tries to fix this by adding a pattern to match this
> combine:
> > > Trying 7, 8 -> 9:
> > >     7: r120:V8QI=r123:V8QI&r124:V8QI
> > >       REG_DEAD r124:V8QI
> > >       REG_DEAD r123:V8QI
> > >     8: r122:V8QI=-r120:V8QI==const_vector
> > >       REG_DEAD r120:V8QI
> > >     9: r121:V8QI=~r122:V8QI
> > >       REG_DEAD r122:V8QI
> > > Failed to match this instruction:
> > > (set (reg:V8QI 121)
> > >     (plus:V8QI (eq:V8QI (and:V8QI (reg:V8QI 123)
> > >                 (reg:V8QI 124))
> > >             (const_vector:V8QI [
> > >                     (const_int 0 [0]) repeated x8
> > >                 ]))
> > >         (const_vector:V8QI [
> > >                 (const_int -1 [0xffffffffffffffff]) repeated x8
> > >             ])))
> > >
> > > Essentially it converts:
> > > r120 = (and r123 r124)
> > > r122 = (neg (eq r120 0))
> > > r121 = (not r122)
> > > -->
> > > r121 = vtst r123, r124
> > >
> > > (I guess it simplifies (not (neg X)) to (plus X -1) above).
> > >
> > > Code-gen after patch:
> > > f1:
> > >         vtst.8  d0, d0, d1
> > >         bx      lr
> > >
> >
> > +(define_insn "neon_vtst_combine<mode>"
> > +  [(set (match_operand:VDQIW 0 "s_register_operand" "=w")
> > +        (plus:VDQIW
> > +         (eq:VDQIW
> > +           (and:VDQIW (match_operand:VDQIW 1 "s_register_operand" "w")
> > +                      (match_operand:VDQIW 2 "s_register_operand" "w"))
> > +           (match_operand:VDQIW 3 "zero_operand" "i"))
> > +         (match_operand:VDQIW 4 "minus_one_operand" "i")))]
> > +  "TARGET_NEON"
> > +  "vtst.<V_sz_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
> > +)
> >
> > This will need a type attribute for scheduling.
> >
> > > Bootstrapped + tested on arm-linux-gnueabihf, and
> > > cross tested on arm*-*-*.
> > > Does it look OK for next stage-1 ?
> >
> > It looks sensible to me for stage 1.
> Hi Kyrill,
> Would it be OK to commit the attached patch after testing passes ?

Ok.
Thanks,
Kyrill

> 
> Thanks,
> Prathamesh
> > Thanks,
> > Kyrill
> >
> > >
> > > Thanks,
> > > Prathamesh

Reply via email to