On Tue, Oct 17, 2023 at 7:54 PM Roger Sayle <ro...@nextmovesoftware.com> wrote: > > > Hi Uros, > Thanks for the speedy review. > > > From: Uros Bizjak <ubiz...@gmail.com> > > Sent: 17 October 2023 17:38 > > > > On Tue, Oct 17, 2023 at 3:08 PM Roger Sayle <ro...@nextmovesoftware.com> > > wrote: > > > > > > > > > This patch is the backend piece of a solution to PRs 101955 and > > > 106245, that adds a define_insn_and_split to the i386 backend, to > > > perform sign extension of a single (least significant) bit using AND $1 > > > then NEG. > > > > > > Previously, (x<<31)>>31 would be generated as > > > > > > sall $31, %eax // 3 bytes > > > sarl $31, %eax // 3 bytes > > > > > > with this patch the backend now generates: > > > > > > andl $1, %eax // 3 bytes > > > negl %eax // 2 bytes > > > > > > Not only is this smaller in size, but microbenchmarking confirms that > > > it's a performance win on both Intel and AMD; Intel sees only a 2% > > > improvement (perhaps just a size effect), but AMD sees a 7% win. > > > > > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > > > and make -k check, both with and without --target_board=unix{-m32} > > > with no new failures. Ok for mainline? > > > > > > > > > 2023-10-17 Roger Sayle <ro...@nextmovesoftware.com> > > > > > > gcc/ChangeLog > > > PR middle-end/101955 > > > PR tree-optimization/106245 > > > * config/i386/i386.md (*extv<mode>_1_0): New > > > define_insn_and_split. > > > > > > gcc/testsuite/ChangeLog > > > PR middle-end/101955 > > > PR tree-optimization/106245 > > > * gcc.target/i386/pr106245-2.c: New test case. > > > * gcc.target/i386/pr106245-3.c: New 32-bit test case. > > > * gcc.target/i386/pr106245-4.c: New 64-bit test case. > > > * gcc.target/i386/pr106245-5.c: Likewise. > > > > +;; Split sign-extension of single least significant bit as and x,$1;neg > > +x (define_insn_and_split "*extv<mode>_1_0" > > + [(set (match_operand:SWI48 0 "register_operand" "=r") > > + (sign_extract:SWI48 (match_operand:SWI48 1 "register_operand" "0") > > + (const_int 1) > > + (const_int 0))) > > + (clobber (reg:CC FLAGS_REG))] > > + "" > > + "#" > > + "&& 1" > > > > No need to use "&&" for an empty insn constraint. Just use > > "reload_completed" in > > this case. > > > > + [(parallel [(set (match_dup 0) (and:SWI48 (match_dup 1) (const_int 1))) > > + (clobber (reg:CC FLAGS_REG))]) > > + (parallel [(set (match_dup 0) (neg:SWI48 (match_dup 0))) > > + (clobber (reg:CC FLAGS_REG))])]) > > > > Did you intend to split this after reload? If this is the case, then > > reload_completed > > is missing. > > Because this splitter neither required the allocation of a new pseudo, nor a > hard register assignment, i.e. it's a splitter that can be run before or after > reload, it's written to split "whenever". If you'd prefer it to only split > after > reload, I agree a "reload_completed" can be added (alternatively, adding > "ix86_pre_reload_split ()" would also work).
No, this part is OK. I just forgot that we have universal splitters ;) > I now see from "*load_tp_<mode>" that "" is perhaps preferred over "&& 1" > In these cases. Please let me know which you prefer. "" please for an empty insn constraint. OK otherwise. Thanks, Uros.