On Wed, Dec 28, 2022 at 2:15 AM Roger Sayle <ro...@nextmovesoftware.com> wrote: > > > Back in September, the review of my patch for PR rtl-optimization/106594, > https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601501.html > suggested that I submit the x86 backend bits, independently and first. > > The executive summary is that the middle-end doesn't have a preferred > canonical form for expressing zero-extension, sometimes using an AND > and sometimes using zero_extend. Pending changes to RTL simplification > will/may alter some of these representations, so a few additional > patterns are required to recognize these alternate representations > and avoid any testsuite regressions. > > As an example, *popcountsi2_zext is currently represented as: > [(set (match_operand:DI 0 "register_operand" "=r") > (and:DI > (subreg:DI > (popcount:SI > (match_operand:SI 1 "nonimmediate_operand" "rm")) 0) > (const_int 63))) > (clobber (reg:CC FLAGS_REG))] > > this patch adds an alternate/equivalent pattern that matches: > [(set (match_operand:DI 0 "register_operand" "=r") > (zero_extend:DI > (popcount:SI (match_operand:SI 1 "nonimmediate_operand" "rm")))) > (clobber (reg:CC FLAGS_REG))] > > Another example is *popcounthi2 which is currently represented as: > [(set (match_operand:SI 0 "register_operand") > (popcount:SI > (zero_extend:SI (match_operand:HI 1 "nonimmediate_operand")))) > (clobber (reg:CC FLAGS_REG))] > > this patch adds an alternate/equivalent pattern that matches: > [(set (match_operand:SI 0 "register_operand") > (zero_extend:SI > (popcount:HI (match_operand:HI 1 "nonimmediate_operand")))) > (clobber (reg:CC FLAGS_REG))] > > The contents of the machine description definitions remain the same, > it's just the expected RTL is slightly different but equivalent. > Providing both forms makes the backend more robust to middle-end > changes [and possibly catches some missed optimizations].
It would be nice to have a canonical representation of zero-extended patterns, but this is what we have now. Unfortunately, a certain HW limitation requires several patterns for one insn, so the canonical representation is even more desirable here. Hopefully, a "future" patch will allow us some cleanups in this area. > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, both with and without --target_board=unix{-m32}, > with no new failures. Ok for mainline? OK, but please split out HImode popcount&1 pattern to a separate patch to not mix separate topics in one patch. Thanks, Uros. > > > 2022-12-28 Roger Sayle <ro...@nextmovesoftware.com> > > gcc/ChangeLog > * config/i386/i386.md (*clzsi2_lzcnt_zext_2): define_insn_and_split > to match ZERO_EXTEND form of *clzsi2_lzcnt_zext. > (*clzsi2_lzcnt_zext_2_falsedep): Likewise, new define_insn to match > ZERO_EXTEND form of *clzsi2_lzcnt_zext_falsedep. > (*bmi2_bzhi_zero_extendsidi_5): Likewise, new define_insn to match > ZERO_EXTEND form of *bmi2_bzhi_zero_extendsidi. > (*popcountsi2_zext_2): Likewise, new define_insn_and_split to match > ZERO_EXTEND form of *popcountsi2_zext. > (*popcountsi2_zext_2_falsedep): Likewise, new define_insn to match > ZERO_EXTEND form of *popcountsi2_zext_falsedep. > (*popcounthi2_2): Likewise, new define_insn_and_split to match > ZERO_EXTEND form of *popcounthi2. > (define_peephole2): ZERO_EXTEND variant of HImode popcount&1 using > parity flag peephole2. > > Thanks in advance, > Roger > -- >