On Wed, May 18, 2022 at 7:30 PM Roger Sayle <ro...@nextmovesoftware.com> wrote:
>
>
> Hi Uros,
> Very many thanks for the speedy review and approval of my ix86_rtx_costs
> patch to correct the cost of multi-word multiplications.  In the same
> spirit, this patch tidies up a few additional nits I noticed while there:
> Multi-word NOT requires two operations, but multi-word NEG requires
> three operations.  Using SSE, vector NOT requires a pxor with -1, but
> AND of NOT is cheap thanks to the existence of pandn.  There's also some
> legacy (aka incorrect) logic explicitly testing for DImode [independently
> of TARGET_64BIT] in determining the cost of logic operations that's not
> required.
>
> There should be no behavioural changes from this patch (as confirmed by
> testing) but hopefully vectorization and other middle-end passes can now
> rely on more sensible "cost" estimates.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}, with
> no new failures.  Ok for mainline?
>
>
> 2022-05-18  Roger Sayle  <ro...@nextmovesoftware.com>
>
> gcc/ChangeLog
>         * config/i386/i386.cc (ix86_rtx_costs): Split AND from XOR/IOR.
>         Multi-word binary logic operations require two instructions.
>         For vector integer modes, AND with a NOT operand requires only
>         a single instruction (pandn).
>         [NOT]: Vector integer NOT requires more than 1 instruction (pxor).
>         [NEG]: Multi-word negation requires 3 instructions.

-    case AND:
     case IOR:
     case XOR:
       if (GET_MODE_CLASS (mode) == MODE_INT
       && GET_MODE_SIZE (mode) > UNITS_PER_WORD)
     {
-      *total = (cost->add * 2
-            + (rtx_cost (XEXP (x, 0), mode, outer_code, opno, speed)
-               << (GET_MODE (XEXP (x, 0)) != DImode))
-            + (rtx_cost (XEXP (x, 1), mode, outer_code, opno, speed)
-                   << (GET_MODE (XEXP (x, 1)) != DImode)));
+      *total = cost->add * 2
+           + rtx_cost (XEXP (x, 0), mode, outer_code, opno, speed)
+           + rtx_cost (XEXP (x, 1), mode, outer_code, opno, speed);
+      return true;
+    }
+      else if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
+    *total = ix86_vec_cost (mode, cost->sse_op);
+      else if (GET_MODE_SIZE (mode) > UNITS_PER_WORD)
+    *total = cost->add * 2;
+      else
+    *total = cost->add;
+      return false;

Shouldn't this be just:

if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
  *total = ix86_vec_cost (mode, cost->sse_op);
else if (GET_MODE_SIZE (mode) > UNITS_PER_WORD)
  *total = cost->add * 2;
else
  *total = cost->add;
return false;

When returning false, subexpressions will be scanned automatically.

+      else if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
+    {
+      /* pandn is a single instruction.  */
+      if (GET_CODE (XEXP (x, 0)) == NOT)

pandn is also a single instruction with BMI.  IMO, this should be also
reflected in RTX costs.

Uros.

Reply via email to