Hi Uros, Many thanks for the speedy review. Here's the revised version of the patch incorporating all of your suggested fixes and improvements.
This has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check with no new failures. Ok for mainline? 2022-04-13 Roger Sayle <ro...@nextmovesoftware.com> Uroš Bizjak <ubiz...@gmail.com> gcc/ChangeLog * config/i386/i386.md (peephole2): Transform *and_qi_maybe_si into *andsi_2 with -Os when the instruction encoding is shorter. gcc/testsuite/ChangeLog * gcc.target/i386/and-1.c: New test case. Thanks again, Roger -- > -----Original Message----- > From: Uros Bizjak <ubiz...@gmail.com> > Sent: 12 April 2022 17:37 > To: Roger Sayle <ro...@nextmovesoftware.com> > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [x86_64 PATCH] Avoid andb %dil when optimizing for size. > > On Tue, Apr 12, 2022 at 5:42 PM Roger Sayle <ro...@nextmovesoftware.com> > wrote: > > > > > > This stand-alone patch avoids (-Os) regressions from a fix for PR 70321. > > > > The simple test case below has the unfortunate property that on x86_64 > > it is larger when compiled with -Os than when compiled with -O2. > > > > int foo(char x) > > { > > return (x & 123) != 0; > > } > > > > The issue is x86's complex instruction encoding, where andb $XX,%dil > > requires more bytes than andl $XX,%edi. This patch provides a > > peephole2 to convert *and_qi_2_maybe_si into *andsi_2 for the problematic > cases. > > > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > > and make -k check with no new failures. Ok for mainline? > > > > > > 2022-04-12 Roger Sayle <ro...@nextmovesoftware.com> > > > > gcc/ChangeLog > > * config/i386/i386.md (peephole2): Transform *and_qi_maybe_si into > > *andsi_2 with -Os when the instruction encoding is shorter. > > > > gcc/testsuite/ChangeLog > > * gcc.target/i386/and-1.c: New test case. > > +(define_peephole2 > + [(parallel > + [(set (match_operand 0 "flags_reg_operand") > + (match_operator 1 "compare_operator" > + [(and:QI (match_operand:QI 2 "register_operand") > + (match_operand:QI 3 "immediate_operand")) > > const_int_operand here, you have INTVAL accessor for this operand below. > > + (const_int 0)])) > + (set (match_dup 2) > + (and:QI (match_dup 2) (match_dup 3)))])] "TARGET_64BIT > + && !TARGET_PARTIAL_REG_STALL > + && optimize_insn_for_size_p () > + && ix86_match_ccmode (insn, CCNOmode) > + && (INTVAL (operands[3]) & 0x80) == 0 > > val_signbit_known_clear_p (QImode, INTVAL (operands[3])) > > + /* NON_Q_REG_P (operands[2]) : %sil, %dil, %bpl, %spl. */ > + && LEGACY_INT_REG_P (operands[2]) && !QI_REGNO_P (REGNO > (operands[2])) > + && peep2_reg_dead_p (1, operands[2])" > > Do we really need the above condition? Operand 2 is effectively inout operand, > and when ANDed in SImode with zero-extended immediate, it will result in the > same QImode value. The value outside QImode is not guaranteed without using > strict_low_part. > > + [(parallel > + [(set (match_dup 0) > + (match_op_dup 1 [(and:SI (match_dup 2) (match_dup 3)) > + (const_int 0)])) > + (set (match_dup 2) > + (and:SI (match_dup 2) (match_dup 3)))])] { > + operands[2] = gen_rtx_REG (SImode, REGNO (operands[2])); > > gen_lowpart (Simode, operands[2]) should work here. > > + operands[3] = gen_int_mode (INTVAL (operands[3]) & 0xff, SImode); > > No need to convert the immediate to SImode. The insn condition also > guarantees it to be unsigned. > > +}) > > Uros. > > > Thanks in advance, > > Roger > > -- > >
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index c74edd1..4ad3223 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -10140,6 +10140,35 @@ [(set_attr "type" "alu") (set_attr "mode" "<MODE>")]) +;; On TARGET_64BIT, andb $XX,%dil is larger than andl $XX,%edi so +;; convert *and_qi_2_maybe_si into *andsi_2 when optimizing for size. + +(define_peephole2 + [(parallel + [(set (match_operand 0 "flags_reg_operand") + (match_operator 1 "compare_operator" + [(and:QI (match_operand:QI 2 "register_operand") + (match_operand:QI 3 "const_int_operand")) + (const_int 0)])) + (set (match_dup 2) + (and:QI (match_dup 2) (match_dup 3)))])] + "TARGET_64BIT + && !TARGET_PARTIAL_REG_STALL + && optimize_insn_for_size_p () + && ix86_match_ccmode (insn, CCNOmode) + && val_signbit_known_clear_p (QImode, INTVAL (operands[3])) + /* NON_Q_REG_P (operands[2]) : %sil, %dil, %bpl, %spl. */ + && LEGACY_INT_REG_P (operands[2]) && !QI_REGNO_P (REGNO (operands[2]))" + [(parallel + [(set (match_dup 0) + (match_op_dup 1 [(and:SI (match_dup 2) (match_dup 3)) + (const_int 0)])) + (set (match_dup 2) + (and:SI (match_dup 2) (match_dup 3)))])] +{ + operands[2] = gen_lowpart (SImode, operands[2]); +}) + (define_expand "andqi_ext_1" [(parallel [(set (zero_extract:HI (match_operand:HI 0 "register_operand") diff --git a/gcc/testsuite/gcc.target/i386/and-1.c b/gcc/testsuite/gcc.target/i386/and-1.c new file mode 100644 index 0000000..11890d8 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/and-1.c @@ -0,0 +1,9 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-Os" } */ + +int foo(char x) +{ + return (x & 123) != 0; +} + +/* { dg-final { scan-assembler-not "%dil" } } */