On Wed, Dec 28, 2022 at 1:22 PM Roger Sayle <ro...@nextmovesoftware.com> wrote: > > > Hi Uros, > Many thanks for your reviews. > > > On Wed, Dec 28, 2022 at 2:15 AM Roger Sayle > > <ro...@nextmovesoftware.com> wrote: > > > > > > > > > Back in September, the review of my patch for PR > > > rtl-optimization/106594, > > > https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601501.html > > > suggested that I submit the x86 backend bits, independently and first. > > > > > > The executive summary is that the middle-end doesn't have a preferred > > > canonical form for expressing zero-extension, sometimes using an AND > > > and sometimes using zero_extend. Pending changes to RTL > > > simplification will/may alter some of these representations, so a few > > > additional patterns are required to recognize these alternate > > > representations and avoid any testsuite regressions. > > > > > > As an example, *popcountsi2_zext is currently represented as: > > > [(set (match_operand:DI 0 "register_operand" "=r") > > > (and:DI > > > (subreg:DI > > > (popcount:SI > > > (match_operand:SI 1 "nonimmediate_operand" "rm")) 0) > > > (const_int 63))) > > > (clobber (reg:CC FLAGS_REG))] > > > > > > this patch adds an alternate/equivalent pattern that matches: > > > [(set (match_operand:DI 0 "register_operand" "=r") > > > (zero_extend:DI > > > (popcount:SI (match_operand:SI 1 "nonimmediate_operand" "rm")))) > > > (clobber (reg:CC FLAGS_REG))] > > > > > > Another example is *popcounthi2 which is currently represented as: > > > [(set (match_operand:SI 0 "register_operand") > > > (popcount:SI > > > (zero_extend:SI (match_operand:HI 1 "nonimmediate_operand")))) > > > (clobber (reg:CC FLAGS_REG))] > > > > > > this patch adds an alternate/equivalent pattern that matches: > > > [(set (match_operand:SI 0 "register_operand") > > > (zero_extend:SI > > > (popcount:HI (match_operand:HI 1 "nonimmediate_operand")))) > > > (clobber (reg:CC FLAGS_REG))] > > > > > > The contents of the machine description definitions remain the same, > > > it's just the expected RTL is slightly different but equivalent. > > > Providing both forms makes the backend more robust to middle-end > > > changes [and possibly catches some missed optimizations]. > > > > It would be nice to have a canonical representation of zero-extended > > patterns, > > but this is what we have now. Unfortunately, a certain HW limitation > > requires > > several patterns for one insn, so the canonical representation is even more > > desirable here. Hopefully, a "future" > > patch will allow us some cleanups in this area. > > > > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > > > and make -k check, both with and without --target_board=unix{-m32}, > > > with no new failures. Ok for mainline? > > > > OK, but please split out HImode popcount&1 pattern to a separate patch to > > not > > mix separate topics in one patch. > > The peephole2 is the exact same topic; it's not a new optimization or feature, > but identical fall-out from the representational changes to ZERO_EXTEND. > > The existing peephole2 (line 18305 of i386.md) matches this sequence: > > (parallel [(set (match_operand:HI 1 "register_operand") > (popcount:HI > (match_operand:HI 2 "nonimmediate_operand"))) > (clobber (reg:CC FLAGS_REG))]) > (set (match_operand 3 "register_operand") > (zero_extend (match_dup 1))) > (set (reg:CCZ FLAGS_REG) > (compare:CCZ (and:QI (match_operand:QI 4 "register_operand") > (const_int 1)) > (const_int 0))) > (set (pc) (if_then_else (match_operator 5 "bt_comparison_operator" > [(reg:CCZ FLAGS_REG) > (const_int 0)]) > (label_ref (match_operand 6)) > (pc)))] > > but with pending tweaks/changes to middle-end zero_extend handing, the > peephole2 pass now sees the equivalent (apologies for the renumbering): > > (parallel [(set (match_operand:HI 1 "register_operand") > (popcount:HI > (match_operand:HI 2 "nonimmediate_operand"))) > (clobber (reg:CC FLAGS_REG))]) > (set (reg:CCZ FLAGS_REG) > (compare:CCZ (and:QI (match_operand:QI 3 "register_operand") > (const_int 1)) > (const_int 0))) > (set (pc) (if_then_else (match_operator 4 "bt_comparison_operator" > [(reg:CCZ FLAGS_REG) > (const_int 0)]) > (label_ref (match_operand 5)) > (pc)))] > > Notice that the zero_extension has been eliminated, as match_dup 3 > and match_dup 4 are the same register, and we only need to inspect > the lowest bit. i.e. the current transform matches a redundant > instruction, that the RTL optimizers will soon be able to eliminate. > > Hence, IMHO, the issue is identical, the RTL that the backend is expecting > is sensitive the representations and transformations performed upstream. > > All the new patterns appear in the MD file immediately after their originals, > so this peephole2 appears immediately after the peephole2 that used to > match. > > I'm happy to perform two commits, if that might make bisecting issues > easier, but these changes really are just a single topic (already split into > three pieces from a serious P1 regression fix). It may be possible to > remove the current patterns, once they become "dead", perhaps when > we return to stage1 (after the middle-end pieces have been approved).
Thanks for the explanation, please go ahead with your original patch. Thanks, Uros. > > > > > Thanks, > > Uros. > > > > > > > > > > > 2022-12-28 Roger Sayle <ro...@nextmovesoftware.com> > > > > > > gcc/ChangeLog > > > * config/i386/i386.md (*clzsi2_lzcnt_zext_2): > > > define_insn_and_split > > > to match ZERO_EXTEND form of *clzsi2_lzcnt_zext. > > > (*clzsi2_lzcnt_zext_2_falsedep): Likewise, new define_insn to > > > match > > > ZERO_EXTEND form of *clzsi2_lzcnt_zext_falsedep. > > > (*bmi2_bzhi_zero_extendsidi_5): Likewise, new define_insn to match > > > ZERO_EXTEND form of *bmi2_bzhi_zero_extendsidi. > > > (*popcountsi2_zext_2): Likewise, new define_insn_and_split to > > > match > > > ZERO_EXTEND form of *popcountsi2_zext. > > > (*popcountsi2_zext_2_falsedep): Likewise, new define_insn to match > > > ZERO_EXTEND form of *popcountsi2_zext_falsedep. > > > (*popcounthi2_2): Likewise, new define_insn_and_split to match > > > ZERO_EXTEND form of *popcounthi2. > > > (define_peephole2): ZERO_EXTEND variant of HImode popcount&1 using > > > parity flag peephole2. > > > > > > Thanks in advance, > > > Roger > > > -- > > > >