On Wed, Dec 28, 2022 at 2:15 AM Roger Sayle <[email protected]> wrote:
>
>
> Back in September, the review of my patch for PR rtl-optimization/106594,
> https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601501.html
> suggested that I submit the x86 backend bits, independently and first.
>
> The executive summary is that the middle-end doesn't have a preferred
> canonical form for expressing zero-extension, sometimes using an AND
> and sometimes using zero_extend. Pending changes to RTL simplification
> will/may alter some of these representations, so a few additional
> patterns are required to recognize these alternate representations
> and avoid any testsuite regressions.
>
> As an example, *popcountsi2_zext is currently represented as:
> [(set (match_operand:DI 0 "register_operand" "=r")
> (and:DI
> (subreg:DI
> (popcount:SI
> (match_operand:SI 1 "nonimmediate_operand" "rm")) 0)
> (const_int 63)))
> (clobber (reg:CC FLAGS_REG))]
>
> this patch adds an alternate/equivalent pattern that matches:
> [(set (match_operand:DI 0 "register_operand" "=r")
> (zero_extend:DI
> (popcount:SI (match_operand:SI 1 "nonimmediate_operand" "rm"))))
> (clobber (reg:CC FLAGS_REG))]
>
> Another example is *popcounthi2 which is currently represented as:
> [(set (match_operand:SI 0 "register_operand")
> (popcount:SI
> (zero_extend:SI (match_operand:HI 1 "nonimmediate_operand"))))
> (clobber (reg:CC FLAGS_REG))]
>
> this patch adds an alternate/equivalent pattern that matches:
> [(set (match_operand:SI 0 "register_operand")
> (zero_extend:SI
> (popcount:HI (match_operand:HI 1 "nonimmediate_operand"))))
> (clobber (reg:CC FLAGS_REG))]
>
> The contents of the machine description definitions remain the same,
> it's just the expected RTL is slightly different but equivalent.
> Providing both forms makes the backend more robust to middle-end
> changes [and possibly catches some missed optimizations].
It would be nice to have a canonical representation of zero-extended
patterns, but this is what we have now. Unfortunately, a certain HW
limitation requires several patterns for one insn, so the canonical
representation is even more desirable here. Hopefully, a "future"
patch will allow us some cleanups in this area.
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32},
> with no new failures. Ok for mainline?
OK, but please split out HImode popcount&1 pattern to a separate patch
to not mix separate topics in one patch.
Thanks,
Uros.
>
>
> 2022-12-28 Roger Sayle <[email protected]>
>
> gcc/ChangeLog
> * config/i386/i386.md (*clzsi2_lzcnt_zext_2): define_insn_and_split
> to match ZERO_EXTEND form of *clzsi2_lzcnt_zext.
> (*clzsi2_lzcnt_zext_2_falsedep): Likewise, new define_insn to match
> ZERO_EXTEND form of *clzsi2_lzcnt_zext_falsedep.
> (*bmi2_bzhi_zero_extendsidi_5): Likewise, new define_insn to match
> ZERO_EXTEND form of *bmi2_bzhi_zero_extendsidi.
> (*popcountsi2_zext_2): Likewise, new define_insn_and_split to match
> ZERO_EXTEND form of *popcountsi2_zext.
> (*popcountsi2_zext_2_falsedep): Likewise, new define_insn to match
> ZERO_EXTEND form of *popcountsi2_zext_falsedep.
> (*popcounthi2_2): Likewise, new define_insn_and_split to match
> ZERO_EXTEND form of *popcounthi2.
> (define_peephole2): ZERO_EXTEND variant of HImode popcount&1 using
> parity flag peephole2.
>
> Thanks in advance,
> Roger
> --
>