On Wed, Dec 28, 2022 at 1:22 PM Roger Sayle <ro...@nextmovesoftware.com> wrote:
>
>
> Hi Uros,
> Many thanks for your reviews.
>
> > On Wed, Dec 28, 2022 at 2:15 AM Roger Sayle
> > <ro...@nextmovesoftware.com> wrote:
> > >
> > >
> > > Back in September, the review of my patch for PR
> > > rtl-optimization/106594,
> > > https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601501.html
> > > suggested that I submit the x86 backend bits, independently and first.
> > >
> > > The executive summary is that the middle-end doesn't have a preferred
> > > canonical form for expressing zero-extension, sometimes using an AND
> > > and sometimes using zero_extend.  Pending changes to RTL
> > > simplification will/may alter some of these representations, so a few
> > > additional patterns are required to recognize these alternate
> > > representations and avoid any testsuite regressions.
> > >
> > > As an example, *popcountsi2_zext is currently represented as:
> > >   [(set (match_operand:DI 0 "register_operand" "=r")
> > >         (and:DI
> > >           (subreg:DI
> > >             (popcount:SI
> > >               (match_operand:SI 1 "nonimmediate_operand" "rm")) 0)
> > >           (const_int 63)))
> > >    (clobber (reg:CC FLAGS_REG))]
> > >
> > > this patch adds an alternate/equivalent pattern that matches:
> > >   [(set (match_operand:DI 0 "register_operand" "=r")
> > >        (zero_extend:DI
> > >          (popcount:SI (match_operand:SI 1 "nonimmediate_operand" "rm"))))
> > >    (clobber (reg:CC FLAGS_REG))]
> > >
> > > Another example is *popcounthi2 which is currently represented as:
> > >   [(set (match_operand:SI 0 "register_operand")
> > >         (popcount:SI
> > >           (zero_extend:SI (match_operand:HI 1 "nonimmediate_operand"))))
> > >    (clobber (reg:CC FLAGS_REG))]
> > >
> > > this patch adds an alternate/equivalent pattern that matches:
> > >   [(set (match_operand:SI 0 "register_operand")
> > >         (zero_extend:SI
> > >           (popcount:HI (match_operand:HI 1 "nonimmediate_operand"))))
> > >    (clobber (reg:CC FLAGS_REG))]
> > >
> > > The contents of the machine description definitions remain the same,
> > > it's just the expected RTL is slightly different but equivalent.
> > > Providing both forms makes the backend more robust to middle-end
> > > changes [and possibly catches some missed optimizations].
> >
> > It would be nice to have a canonical representation of zero-extended 
> > patterns,
> > but this is what we have now. Unfortunately, a certain HW limitation 
> > requires
> > several patterns for one insn, so the canonical representation is even more
> > desirable here.  Hopefully, a "future"
> > patch will allow us some cleanups in this area.
> >
> > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > > and make -k check, both with and without --target_board=unix{-m32},
> > > with no new failures.  Ok for mainline?
> >
> > OK, but please split out HImode popcount&1 pattern to a separate patch to 
> > not
> > mix separate topics in one patch.
>
> The peephole2 is the exact same topic; it's not a new optimization or feature,
> but identical fall-out from the representational changes to ZERO_EXTEND.
>
> The existing peephole2 (line 18305 of i386.md) matches this sequence:
>
>    (parallel [(set (match_operand:HI 1 "register_operand")
>                    (popcount:HI
>                     (match_operand:HI 2 "nonimmediate_operand")))
>               (clobber (reg:CC FLAGS_REG))])
>    (set (match_operand 3 "register_operand")
>         (zero_extend (match_dup 1)))
>    (set (reg:CCZ FLAGS_REG)
>         (compare:CCZ (and:QI (match_operand:QI 4 "register_operand")
>                              (const_int 1))
>                      (const_int 0)))
>    (set (pc) (if_then_else (match_operator 5 "bt_comparison_operator"
>                             [(reg:CCZ FLAGS_REG)
>                              (const_int 0)])
>                            (label_ref (match_operand 6))
>                            (pc)))]
>
> but with pending tweaks/changes to middle-end zero_extend handing, the
> peephole2 pass now sees the equivalent (apologies for the renumbering):
>
>    (parallel [(set (match_operand:HI 1 "register_operand")
>                   (popcount:HI
>                    (match_operand:HI 2 "nonimmediate_operand")))
>              (clobber (reg:CC FLAGS_REG))])
>    (set (reg:CCZ FLAGS_REG)
>         (compare:CCZ (and:QI (match_operand:QI 3 "register_operand")
>                             (const_int 1))
>                     (const_int 0)))
>    (set (pc) (if_then_else (match_operator 4 "bt_comparison_operator"
>                            [(reg:CCZ FLAGS_REG)
>                             (const_int 0)])
>                            (label_ref (match_operand 5))
>                            (pc)))]
>
> Notice that the zero_extension has been eliminated, as match_dup 3
> and match_dup 4 are the same register, and we only need to inspect
> the lowest bit.  i.e. the current transform matches a redundant
> instruction, that the RTL optimizers will soon be able to eliminate.
>
> Hence, IMHO, the issue is identical, the RTL that the backend is expecting
> is sensitive the representations and transformations performed upstream.
>
> All the new patterns appear in the MD file immediately after their originals,
> so this peephole2 appears immediately after the peephole2 that used to
> match.
>
> I'm happy to perform two commits, if that might make bisecting issues
> easier, but these changes really are just a single topic (already split into
> three pieces from a serious P1 regression fix).  It may be possible to
> remove the current patterns, once they become "dead", perhaps when
> we return to stage1 (after the middle-end pieces have been approved).

Thanks for the explanation, please go ahead with your original patch.

Thanks,
Uros.

>
> >
> > Thanks,
> > Uros.
> >
> > >
> > >
> > > 2022-12-28  Roger Sayle  <ro...@nextmovesoftware.com>
> > >
> > > gcc/ChangeLog
> > >         * config/i386/i386.md (*clzsi2_lzcnt_zext_2): 
> > > define_insn_and_split
> > >         to match ZERO_EXTEND form of *clzsi2_lzcnt_zext.
> > >         (*clzsi2_lzcnt_zext_2_falsedep): Likewise, new define_insn to 
> > > match
> > >         ZERO_EXTEND form of *clzsi2_lzcnt_zext_falsedep.
> > >         (*bmi2_bzhi_zero_extendsidi_5): Likewise, new define_insn to match
> > >         ZERO_EXTEND form of *bmi2_bzhi_zero_extendsidi.
> > >         (*popcountsi2_zext_2): Likewise, new define_insn_and_split to 
> > > match
> > >         ZERO_EXTEND form of *popcountsi2_zext.
> > >         (*popcountsi2_zext_2_falsedep): Likewise, new define_insn to match
> > >         ZERO_EXTEND form of *popcountsi2_zext_falsedep.
> > >         (*popcounthi2_2): Likewise, new define_insn_and_split to match
> > >         ZERO_EXTEND form of *popcounthi2.
> > >         (define_peephole2): ZERO_EXTEND variant of HImode popcount&1 using
> > >         parity flag peephole2.
> > >
> > > Thanks in advance,
> > > Roger
> > > --
> > >
>

Reply via email to