On Wed, Dec 28, 2022 at 2:15 AM Roger Sayle <ro...@nextmovesoftware.com> wrote:
>
>
> Back in September, the review of my patch for PR rtl-optimization/106594,
> https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601501.html
> suggested that I submit the x86 backend bits, independently and first.
>
> The executive summary is that the middle-end doesn't have a preferred
> canonical form for expressing zero-extension, sometimes using an AND
> and sometimes using zero_extend.  Pending changes to RTL simplification
> will/may alter some of these representations, so a few additional
> patterns are required to recognize these alternate representations
> and avoid any testsuite regressions.
>
> As an example, *popcountsi2_zext is currently represented as:
>   [(set (match_operand:DI 0 "register_operand" "=r")
>         (and:DI
>           (subreg:DI
>             (popcount:SI
>               (match_operand:SI 1 "nonimmediate_operand" "rm")) 0)
>           (const_int 63)))
>    (clobber (reg:CC FLAGS_REG))]
>
> this patch adds an alternate/equivalent pattern that matches:
>   [(set (match_operand:DI 0 "register_operand" "=r")
>        (zero_extend:DI
>          (popcount:SI (match_operand:SI 1 "nonimmediate_operand" "rm"))))
>    (clobber (reg:CC FLAGS_REG))]
>
> Another example is *popcounthi2 which is currently represented as:
>   [(set (match_operand:SI 0 "register_operand")
>         (popcount:SI
>           (zero_extend:SI (match_operand:HI 1 "nonimmediate_operand"))))
>    (clobber (reg:CC FLAGS_REG))]
>
> this patch adds an alternate/equivalent pattern that matches:
>   [(set (match_operand:SI 0 "register_operand")
>         (zero_extend:SI
>           (popcount:HI (match_operand:HI 1 "nonimmediate_operand"))))
>    (clobber (reg:CC FLAGS_REG))]
>
> The contents of the machine description definitions remain the same,
> it's just the expected RTL is slightly different but equivalent.
> Providing both forms makes the backend more robust to middle-end
> changes [and possibly catches some missed optimizations].

It would be nice to have a canonical representation of zero-extended
patterns, but this is what we have now. Unfortunately, a certain HW
limitation requires several patterns for one insn, so the canonical
representation is even more desirable here.  Hopefully, a "future"
patch will allow us some cleanups in this area.

> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32},
> with no new failures.  Ok for mainline?

OK, but please split out HImode popcount&1 pattern to a separate patch
to not mix separate topics in one patch.

Thanks,
Uros.

>
>
> 2022-12-28  Roger Sayle  <ro...@nextmovesoftware.com>
>
> gcc/ChangeLog
>         * config/i386/i386.md (*clzsi2_lzcnt_zext_2): define_insn_and_split
>         to match ZERO_EXTEND form of *clzsi2_lzcnt_zext.
>         (*clzsi2_lzcnt_zext_2_falsedep): Likewise, new define_insn to match
>         ZERO_EXTEND form of *clzsi2_lzcnt_zext_falsedep.
>         (*bmi2_bzhi_zero_extendsidi_5): Likewise, new define_insn to match
>         ZERO_EXTEND form of *bmi2_bzhi_zero_extendsidi.
>         (*popcountsi2_zext_2): Likewise, new define_insn_and_split to match
>         ZERO_EXTEND form of *popcountsi2_zext.
>         (*popcountsi2_zext_2_falsedep): Likewise, new define_insn to match
>         ZERO_EXTEND form of *popcountsi2_zext_falsedep.
>         (*popcounthi2_2): Likewise, new define_insn_and_split to match
>         ZERO_EXTEND form of *popcounthi2.
>         (define_peephole2): ZERO_EXTEND variant of HImode popcount&1 using
>         parity flag peephole2.
>
> Thanks in advance,
> Roger
> --
>

Reply via email to