Hi Uros,
Many thanks for the speedy review.  Here's the revised version of the
patch incorporating all of your suggested fixes and improvements.

This has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check with no new failures.  Ok for mainline?

2022-04-13  Roger Sayle  <ro...@nextmovesoftware.com>
            Uroš Bizjak  <ubiz...@gmail.com>

gcc/ChangeLog
        * config/i386/i386.md (peephole2): Transform *and_qi_maybe_si into
        *andsi_2 with -Os when the instruction encoding is shorter.

gcc/testsuite/ChangeLog
        * gcc.target/i386/and-1.c: New test case.


Thanks again,
Roger
--

> -----Original Message-----
> From: Uros Bizjak <ubiz...@gmail.com>
> Sent: 12 April 2022 17:37
> To: Roger Sayle <ro...@nextmovesoftware.com>
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [x86_64 PATCH] Avoid andb %dil when optimizing for size.
> 
> On Tue, Apr 12, 2022 at 5:42 PM Roger Sayle <ro...@nextmovesoftware.com>
> wrote:
> >
> >
> > This stand-alone patch avoids (-Os) regressions from a fix for PR 70321.
> >
> > The simple test case below has the unfortunate property that on x86_64
> > it is larger when compiled with -Os than when compiled with -O2.
> >
> > int foo(char x)
> > {
> >   return (x & 123) != 0;
> > }
> >
> > The issue is x86's complex instruction encoding, where andb $XX,%dil
> > requires more bytes than andl $XX,%edi.  This patch provides a
> > peephole2 to convert *and_qi_2_maybe_si into *andsi_2 for the problematic
> cases.
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check with no new failures.  Ok for mainline?
> >
> >
> > 2022-04-12  Roger Sayle  <ro...@nextmovesoftware.com>
> >
> > gcc/ChangeLog
> >         * config/i386/i386.md (peephole2): Transform *and_qi_maybe_si into
> >         *andsi_2 with -Os when the instruction encoding is shorter.
> >
> > gcc/testsuite/ChangeLog
> >         * gcc.target/i386/and-1.c: New test case.
> 
> +(define_peephole2
> +  [(parallel
> +    [(set (match_operand 0 "flags_reg_operand")
> +  (match_operator 1 "compare_operator"
> +    [(and:QI (match_operand:QI 2 "register_operand")
> +     (match_operand:QI 3 "immediate_operand"))
> 
> const_int_operand here, you have INTVAL accessor for this operand below.
> 
> +     (const_int 0)]))
> +      (set (match_dup 2)
> +   (and:QI (match_dup 2) (match_dup 3)))])]  "TARGET_64BIT
> +   && !TARGET_PARTIAL_REG_STALL
> +   && optimize_insn_for_size_p ()
> +   && ix86_match_ccmode (insn, CCNOmode)
> +   && (INTVAL (operands[3]) & 0x80) == 0
> 
> val_signbit_known_clear_p (QImode, INTVAL (operands[3]))
> 
> +   /* NON_Q_REG_P (operands[2]) : %sil, %dil, %bpl, %spl.  */
> +   && LEGACY_INT_REG_P (operands[2]) && !QI_REGNO_P (REGNO
> (operands[2]))
> +   && peep2_reg_dead_p (1, operands[2])"
> 
> Do we really need the above condition? Operand 2 is effectively inout operand,
> and when ANDed in SImode with zero-extended immediate, it will result in the
> same QImode value. The value outside QImode is not guaranteed without using
> strict_low_part.
> 
> +  [(parallel
> +     [(set (match_dup 0)
> +   (match_op_dup 1 [(and:SI (match_dup 2) (match_dup 3))
> +    (const_int 0)]))
> +      (set (match_dup 2)
> +   (and:SI (match_dup 2) (match_dup 3)))])] {
> +  operands[2] = gen_rtx_REG (SImode, REGNO (operands[2]));
> 
> gen_lowpart (Simode, operands[2]) should work here.
> 
> +  operands[3] = gen_int_mode (INTVAL (operands[3]) & 0xff, SImode);
> 
> No need to convert the immediate to SImode. The insn condition also
> guarantees it to be unsigned.
> 
> +})
> 
> Uros.
> 
> > Thanks in advance,
> > Roger
> > --
> >
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index c74edd1..4ad3223 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -10140,6 +10140,35 @@
   [(set_attr "type" "alu")
    (set_attr "mode" "<MODE>")])
 
+;; On TARGET_64BIT, andb $XX,%dil is larger than andl $XX,%edi so
+;; convert *and_qi_2_maybe_si into *andsi_2 when optimizing for size.
+
+(define_peephole2
+  [(parallel
+    [(set (match_operand 0 "flags_reg_operand")
+         (match_operator 1 "compare_operator"
+           [(and:QI (match_operand:QI 2 "register_operand")
+                    (match_operand:QI 3 "const_int_operand"))
+            (const_int 0)]))
+      (set (match_dup 2)
+          (and:QI (match_dup 2) (match_dup 3)))])]
+  "TARGET_64BIT
+   && !TARGET_PARTIAL_REG_STALL
+   && optimize_insn_for_size_p ()
+   && ix86_match_ccmode (insn, CCNOmode)
+   && val_signbit_known_clear_p (QImode, INTVAL (operands[3]))
+   /* NON_Q_REG_P (operands[2]) : %sil, %dil, %bpl, %spl.  */
+   && LEGACY_INT_REG_P (operands[2]) && !QI_REGNO_P (REGNO (operands[2]))"
+  [(parallel
+     [(set (match_dup 0)
+          (match_op_dup 1 [(and:SI (match_dup 2) (match_dup 3))
+                           (const_int 0)]))
+      (set (match_dup 2)
+          (and:SI (match_dup 2) (match_dup 3)))])]
+{
+  operands[2] = gen_lowpart (SImode, operands[2]);
+})
+
 (define_expand "andqi_ext_1"
   [(parallel
      [(set (zero_extract:HI (match_operand:HI 0 "register_operand")
diff --git a/gcc/testsuite/gcc.target/i386/and-1.c 
b/gcc/testsuite/gcc.target/i386/and-1.c
new file mode 100644
index 0000000..11890d8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/and-1.c
@@ -0,0 +1,9 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-Os" } */
+
+int foo(char x)
+{
+  return (x & 123) != 0;
+}
+
+/* { dg-final { scan-assembler-not "%dil" } } */

Reply via email to