Hi Uros,
Many thanks for the review.  As requested here is a revised version of the
patch that's slightly more aggressive in its clean-ups (duplicating 6 lines
of code allowed me to eliminate 9 lines of code), but most significantly
also includes support for the andn for TARGET_BMI, and allows the NOT
operand of andn/pandn to appear as either the first or second operand
(combine/reload/output will swap them as appropriate).

Once again this patch has been tested on x86_64-pc-linux-gnu with 
make bootstrap and make -k check, with and without -m32, with no
new failures.  Ok for mainline?

2022-05-20  Roger Sayle  <ro...@nextmovesoftware.com>

gcc/ChangeLog
        * config/i386/i386.cc (ix86_rtx_costs): Split AND from XOR/IOR.
        Multi-word binary logic operations require two instructions.
        For vector integer modes, AND with a NOT operand requires only
        a single instruction (pandn).  Likewise, integer andn with -mbmi.
        [NOT]: Vector integer NOT requires more than 1 instruction (pxor).
        [NEG]: Multi-word negation requires 3 instructions.


Thanks in advance,
Roger

> -----Original Message-----
> From: Uros Bizjak <ubiz...@gmail.com>
> Sent: 19 May 2022 07:29
> To: Roger Sayle <ro...@nextmovesoftware.com>
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [x86 PATCH] Some additional ix86_rtx_costs clean-ups: NEG, AND
> and pandn.
> 
> On Wed, May 18, 2022 at 7:30 PM Roger Sayle
> <ro...@nextmovesoftware.com> wrote:
> >
> >
> > Hi Uros,
> > Very many thanks for the speedy review and approval of my
> > ix86_rtx_costs patch to correct the cost of multi-word
> > multiplications.  In the same spirit, this patch tidies up a few additional 
> > nits I
> noticed while there:
> > Multi-word NOT requires two operations, but multi-word NEG requires
> > three operations.  Using SSE, vector NOT requires a pxor with -1, but
> > AND of NOT is cheap thanks to the existence of pandn.  There's also
> > some legacy (aka incorrect) logic explicitly testing for DImode
> > [independently of TARGET_64BIT] in determining the cost of logic
> > operations that's not required.
> >
> > There should be no behavioural changes from this patch (as confirmed
> > by
> > testing) but hopefully vectorization and other middle-end passes can
> > now rely on more sensible "cost" estimates.
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32},
> > with no new failures.  Ok for mainline?
> >
> >
> > 2022-05-18  Roger Sayle  <ro...@nextmovesoftware.com>
> >
> > gcc/ChangeLog
> >         * config/i386/i386.cc (ix86_rtx_costs): Split AND from XOR/IOR.
> >         Multi-word binary logic operations require two instructions.
> >         For vector integer modes, AND with a NOT operand requires only
> >         a single instruction (pandn).
> >         [NOT]: Vector integer NOT requires more than 1 instruction (pxor).
> >         [NEG]: Multi-word negation requires 3 instructions.
> 
> -    case AND:
>      case IOR:
>      case XOR:
>        if (GET_MODE_CLASS (mode) == MODE_INT
>        && GET_MODE_SIZE (mode) > UNITS_PER_WORD)
>      {
> -      *total = (cost->add * 2
> -            + (rtx_cost (XEXP (x, 0), mode, outer_code, opno, speed)
> -               << (GET_MODE (XEXP (x, 0)) != DImode))
> -            + (rtx_cost (XEXP (x, 1), mode, outer_code, opno, speed)
> -                   << (GET_MODE (XEXP (x, 1)) != DImode)));
> +      *total = cost->add * 2
> +           + rtx_cost (XEXP (x, 0), mode, outer_code, opno, speed)
> +           + rtx_cost (XEXP (x, 1), mode, outer_code, opno, speed);
> +      return true;
> +    }
> +      else if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
> +    *total = ix86_vec_cost (mode, cost->sse_op);
> +      else if (GET_MODE_SIZE (mode) > UNITS_PER_WORD)
> +    *total = cost->add * 2;
> +      else
> +    *total = cost->add;
> +      return false;
> 
> Shouldn't this be just:
> 
> if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
>   *total = ix86_vec_cost (mode, cost->sse_op); else if (GET_MODE_SIZE (mode)
> > UNITS_PER_WORD)
>   *total = cost->add * 2;
> else
>   *total = cost->add;
> return false;
> 
> When returning false, subexpressions will be scanned automatically.
> 
> +      else if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
> +    {
> +      /* pandn is a single instruction.  */
> +      if (GET_CODE (XEXP (x, 0)) == NOT)
> 
> pandn is also a single instruction with BMI.  IMO, this should be also 
> reflected in
> RTX costs.
> 
> Uros.
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 30a9cd0..daa60ac 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -20738,70 +20738,125 @@ ix86_rtx_costs (rtx x, machine_mode mode, int 
outer_code_i, int opno,
        }
 
       if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
-       {
-         *total = cost->addss;
-         return false;
-       }
+       *total = cost->addss;
       else if (X87_FLOAT_MODE_P (mode))
-       {
-         *total = cost->fadd;
-         return false;
-       }
+       *total = cost->fadd;
       else if (FLOAT_MODE_P (mode))
-       {
-         *total = ix86_vec_cost (mode, cost->addss);
-         return false;
-       }
-      /* FALLTHRU */
+       *total = ix86_vec_cost (mode, cost->addss);
+      else if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
+       *total = ix86_vec_cost (mode, cost->sse_op);
+      else if (GET_MODE_SIZE (mode) > UNITS_PER_WORD)
+       *total = cost->add * 2;
+      else
+       *total = cost->add;
+      return false;
 
-    case AND:
     case IOR:
     case XOR:
-      if (GET_MODE_CLASS (mode) == MODE_INT
-         && GET_MODE_SIZE (mode) > UNITS_PER_WORD)
+      if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
+       *total = ix86_vec_cost (mode, cost->sse_op);
+      else if (GET_MODE_SIZE (mode) > UNITS_PER_WORD)
+       *total = cost->add * 2;
+      else
+       *total = cost->add;
+      return false;
+
+    case AND:
+      if (address_no_seg_operand (x, mode))
        {
-         *total = (cost->add * 2
-                   + (rtx_cost (XEXP (x, 0), mode, outer_code, opno, speed)
-                      << (GET_MODE (XEXP (x, 0)) != DImode))
-                   + (rtx_cost (XEXP (x, 1), mode, outer_code, opno, speed)
-                      << (GET_MODE (XEXP (x, 1)) != DImode)));
+         *total = cost->lea;
          return true;
        }
-      else if (code == AND
-              && address_no_seg_operand (x, mode))
+      else if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
        {
-         *total = cost->lea;
-         return true;
+         /* pandn is a single instruction.  */
+         if (GET_CODE (XEXP (x, 0)) == NOT)
+           {
+             *total = ix86_vec_cost (mode, cost->sse_op)
+                      + rtx_cost (XEXP (XEXP (x, 0), 0), mode,
+                                  outer_code, opno, speed)
+                      + rtx_cost (XEXP (x, 1), mode,
+                                  outer_code, opno, speed);
+             return true;
+           }
+         else if (GET_CODE (XEXP (x, 1)) == NOT)
+           {
+             *total = ix86_vec_cost (mode, cost->sse_op)
+                      + rtx_cost (XEXP (x, 0), mode,
+                                  outer_code, opno, speed)
+                      + rtx_cost (XEXP (XEXP (x, 1), 0), mode,
+                                  outer_code, opno, speed);
+             return true;
+           }
+         *total = ix86_vec_cost (mode, cost->sse_op);
        }
-      /* FALLTHRU */
-
-    case NEG:
-      if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
+      else if (GET_MODE_SIZE (mode) > UNITS_PER_WORD)
        {
-         *total = cost->sse_op;
-         return false;
+         if (TARGET_BMI && GET_CODE (XEXP (x,0)) == NOT)
+           {
+             *total = cost->add * 2
+                      + rtx_cost (XEXP (XEXP (x, 0), 0), mode,
+                                  outer_code, opno, speed)
+                      + rtx_cost (XEXP (x, 1), mode,
+                                  outer_code, opno, speed);
+             return true;
+           }
+         else if (TARGET_BMI && GET_CODE (XEXP (x, 1)) == NOT)
+           {
+             *total = cost->add * 2
+                      + rtx_cost (XEXP (x, 0), mode,
+                                  outer_code, opno, speed)
+                      + rtx_cost (XEXP (XEXP (x, 1), 0), mode,
+                                  outer_code, opno, speed);
+             return true;
+           }
+         *total = cost->add * 2;
        }
-      else if (X87_FLOAT_MODE_P (mode))
+      else if (TARGET_BMI && GET_CODE (XEXP (x,0)) == NOT)
        {
-         *total = cost->fchs;
-         return false;
+         *total = cost->add
+                  + rtx_cost (XEXP (XEXP (x, 0), 0), mode,
+                              outer_code, opno, speed)
+                  + rtx_cost (XEXP (x, 1), mode, outer_code, opno, speed);
+         return true;
        }
-      else if (FLOAT_MODE_P (mode))
+      else if (TARGET_BMI && GET_CODE (XEXP (x,1)) == NOT)
        {
-         *total = ix86_vec_cost (mode, cost->sse_op);
-         return false;
+         *total = cost->add
+                  + rtx_cost (XEXP (x, 0), mode, outer_code, opno, speed)
+                  + rtx_cost (XEXP (XEXP (x, 1), 0), mode,
+                              outer_code, opno, speed);
+         return true;
        }
-      /* FALLTHRU */
+      else
+       *total = cost->add;
+      return false;
 
     case NOT:
       if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
-       *total = ix86_vec_cost (mode, cost->sse_op);
+       // vnot is pxor -1.
+       *total = ix86_vec_cost (mode, cost->sse_op) + 1;
       else if (GET_MODE_SIZE (mode) > UNITS_PER_WORD)
        *total = cost->add * 2;
       else
        *total = cost->add;
       return false;
 
+    case NEG:
+      if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
+       *total = cost->sse_op;
+      else if (X87_FLOAT_MODE_P (mode))
+       *total = cost->fchs;
+      else if (FLOAT_MODE_P (mode))
+       *total = ix86_vec_cost (mode, cost->sse_op);
+      else if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
+       *total = ix86_vec_cost (mode, cost->sse_op);
+      else if (GET_MODE_SIZE (mode) > UNITS_PER_WORD)
+       *total = cost->add * 3;
+      else
+       *total = cost->add;
+      return false;
+
     case COMPARE:
       rtx op0, op1;
       op0 = XEXP (x, 0);

Reply via email to