Hi Uros,
Many thanks for your reviews.

> On Wed, Dec 28, 2022 at 2:15 AM Roger Sayle
> <ro...@nextmovesoftware.com> wrote:
> >
> >
> > Back in September, the review of my patch for PR
> > rtl-optimization/106594,
> > https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601501.html
> > suggested that I submit the x86 backend bits, independently and first.
> >
> > The executive summary is that the middle-end doesn't have a preferred
> > canonical form for expressing zero-extension, sometimes using an AND
> > and sometimes using zero_extend.  Pending changes to RTL
> > simplification will/may alter some of these representations, so a few
> > additional patterns are required to recognize these alternate
> > representations and avoid any testsuite regressions.
> >
> > As an example, *popcountsi2_zext is currently represented as:
> >   [(set (match_operand:DI 0 "register_operand" "=r")
> >         (and:DI
> >           (subreg:DI
> >             (popcount:SI
> >               (match_operand:SI 1 "nonimmediate_operand" "rm")) 0)
> >           (const_int 63)))
> >    (clobber (reg:CC FLAGS_REG))]
> >
> > this patch adds an alternate/equivalent pattern that matches:
> >   [(set (match_operand:DI 0 "register_operand" "=r")
> >        (zero_extend:DI
> >          (popcount:SI (match_operand:SI 1 "nonimmediate_operand" "rm"))))
> >    (clobber (reg:CC FLAGS_REG))]
> >
> > Another example is *popcounthi2 which is currently represented as:
> >   [(set (match_operand:SI 0 "register_operand")
> >         (popcount:SI
> >           (zero_extend:SI (match_operand:HI 1 "nonimmediate_operand"))))
> >    (clobber (reg:CC FLAGS_REG))]
> >
> > this patch adds an alternate/equivalent pattern that matches:
> >   [(set (match_operand:SI 0 "register_operand")
> >         (zero_extend:SI
> >           (popcount:HI (match_operand:HI 1 "nonimmediate_operand"))))
> >    (clobber (reg:CC FLAGS_REG))]
> >
> > The contents of the machine description definitions remain the same,
> > it's just the expected RTL is slightly different but equivalent.
> > Providing both forms makes the backend more robust to middle-end
> > changes [and possibly catches some missed optimizations].
> 
> It would be nice to have a canonical representation of zero-extended patterns,
> but this is what we have now. Unfortunately, a certain HW limitation requires
> several patterns for one insn, so the canonical representation is even more
> desirable here.  Hopefully, a "future"
> patch will allow us some cleanups in this area.
> 
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32},
> > with no new failures.  Ok for mainline?
> 
> OK, but please split out HImode popcount&1 pattern to a separate patch to not
> mix separate topics in one patch.

The peephole2 is the exact same topic; it's not a new optimization or feature, 
but identical fall-out from the representational changes to ZERO_EXTEND.

The existing peephole2 (line 18305 of i386.md) matches this sequence:

   (parallel [(set (match_operand:HI 1 "register_operand")
                   (popcount:HI
                    (match_operand:HI 2 "nonimmediate_operand")))
              (clobber (reg:CC FLAGS_REG))])
   (set (match_operand 3 "register_operand")
        (zero_extend (match_dup 1)))
   (set (reg:CCZ FLAGS_REG)
        (compare:CCZ (and:QI (match_operand:QI 4 "register_operand")
                             (const_int 1))
                     (const_int 0)))
   (set (pc) (if_then_else (match_operator 5 "bt_comparison_operator"
                            [(reg:CCZ FLAGS_REG)
                             (const_int 0)])
                           (label_ref (match_operand 6))
                           (pc)))]

but with pending tweaks/changes to middle-end zero_extend handing, the
peephole2 pass now sees the equivalent (apologies for the renumbering):

   (parallel [(set (match_operand:HI 1 "register_operand")
                  (popcount:HI
                   (match_operand:HI 2 "nonimmediate_operand")))
             (clobber (reg:CC FLAGS_REG))])
   (set (reg:CCZ FLAGS_REG)
        (compare:CCZ (and:QI (match_operand:QI 3 "register_operand")
                            (const_int 1))
                    (const_int 0)))
   (set (pc) (if_then_else (match_operator 4 "bt_comparison_operator"
                           [(reg:CCZ FLAGS_REG)
                            (const_int 0)])
                           (label_ref (match_operand 5))
                           (pc)))]

Notice that the zero_extension has been eliminated, as match_dup 3
and match_dup 4 are the same register, and we only need to inspect
the lowest bit.  i.e. the current transform matches a redundant
instruction, that the RTL optimizers will soon be able to eliminate.

Hence, IMHO, the issue is identical, the RTL that the backend is expecting
is sensitive the representations and transformations performed upstream.

All the new patterns appear in the MD file immediately after their originals,
so this peephole2 appears immediately after the peephole2 that used to
match.

I'm happy to perform two commits, if that might make bisecting issues
easier, but these changes really are just a single topic (already split into 
three pieces from a serious P1 regression fix).  It may be possible to
remove the current patterns, once they become "dead", perhaps when
we return to stage1 (after the middle-end pieces have been approved).

> 
> Thanks,
> Uros.
> 
> >
> >
> > 2022-12-28  Roger Sayle  <ro...@nextmovesoftware.com>
> >
> > gcc/ChangeLog
> >         * config/i386/i386.md (*clzsi2_lzcnt_zext_2): define_insn_and_split
> >         to match ZERO_EXTEND form of *clzsi2_lzcnt_zext.
> >         (*clzsi2_lzcnt_zext_2_falsedep): Likewise, new define_insn to match
> >         ZERO_EXTEND form of *clzsi2_lzcnt_zext_falsedep.
> >         (*bmi2_bzhi_zero_extendsidi_5): Likewise, new define_insn to match
> >         ZERO_EXTEND form of *bmi2_bzhi_zero_extendsidi.
> >         (*popcountsi2_zext_2): Likewise, new define_insn_and_split to match
> >         ZERO_EXTEND form of *popcountsi2_zext.
> >         (*popcountsi2_zext_2_falsedep): Likewise, new define_insn to match
> >         ZERO_EXTEND form of *popcountsi2_zext_falsedep.
> >         (*popcounthi2_2): Likewise, new define_insn_and_split to match
> >         ZERO_EXTEND form of *popcounthi2.
> >         (define_peephole2): ZERO_EXTEND variant of HImode popcount&1 using
> >         parity flag peephole2.
> >
> > Thanks in advance,
> > Roger
> > --
> >

Reply via email to