Re: [PATCH][AArch64][1/3] Expand signed mod by power of 2 using CSNEG

Kyrill Tkachov Tue, 01 Sep 2015 01:37:26 -0700

Ping.

Thanks,
Kyrill


On 20/08/15 09:24, Kyrill Tkachov wrote:

Ping.
https://gcc.gnu.org/ml/gcc-patches/2015-08/msg00710.html

Thanks,
Kyrill

On 03/08/15 14:01, James Greenhalgh wrote:

On Fri, Jul 24, 2015 at 11:55:33AM +0100, Kyrill Tkachov wrote:

Hi all,

This patch implements an aarch64-specific expansion of the signed modulo

by a power of 2.

The proposed sequence makes use of the conditional negate instruction

CSNEG.

For a power of N, x % N can be calculated with:
negs   x1, x0
and    x0, x0, #(N - 1)
and    x1, x1, #(N - 1)
csneg  x0, x0, x1, mi

So, for N == 256 this would be:
negs   x1, x0
and    x0, x0, #255
and    x1, x1, #255
csneg  x0, x0, x1, mi

For comparison, the existing sequence emitted by expand_smod_pow2 in

expmed.c is:

asr     x1, x0, 63
lsr     x1, x1, 56
add     x0, x0, x1
and     x0, x0, 255
sub     x0, x0, x1

Note that the CSNEG sequence is one instruction shorter and that the two

and operations

are independent, compared to the existing sequence where all instructions

are dependent

on the preceeding instructions.

For the special case of N == 2 we can do even better:
cmp     x0, xzr
and     x0, x0, 1
csneg   x0, x0, x0, ge

I first tried implementing this in the generic code in expmed.c but that

didn't work

out for a few reasons:

* This relies on having a conditional-negate instruction. We could gate

it on

HAVE_conditional_move and the combiner is capable of merging the final

negate into

the conditional move if a conditional negate is available (like on

aarch64) but on

targets without a conditional negate this would end up emitting a

separate negate.

* The first negs has to be a negs for the sequence to be a win i.e.

having a separate

negate and compare makes the sequence slower than the existing one (at

least in my

microbenchmarking) and I couldn't get subsequent passes to combine the

negate and combine

into the negs (presumably due to the use of the negated result in one of

the ands).

Doing it in the aarch64 backend where I could just call the exact gen_*

functions that

I need worked much more cleanly.

The costing logic is updated to reflect this sequence during the

intialisation of

expmed.c where it calculates the smod_pow2_cheap metric.

The tests will come in patch 3 of the series which are partly shared with

the equivalent

arm implementation.

Bootstrapped and tested on aarch64.
Ok for trunk?

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 9d88a60..7bb4a55 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -6639,8 +6639,26 @@ cost_plus:
          if (VECTOR_MODE_P (mode))
            *cost += extra_cost->vect.alu;
          else if (GET_MODE_CLASS (mode) == MODE_INT)
-           *cost += (extra_cost->mult[mode == DImode].add
-                     + extra_cost->mult[mode == DImode].idiv);
+           {
+             /* We can expand signed mod by power of 2 using a
+                NEGS, two parallel ANDs and a CSNEG.  Assume here
+                that CSNEG is COSTS_N_INSNS (1).  This case should
+                only ever be reached through the set_smod_pow2_cheap check
+                in expmed.c.  */
+             if (code == MOD
+                 && CONST_INT_P (XEXP (x, 1))
+                 && exact_log2 (INTVAL (XEXP (x, 1))) > 0
+                 && (mode == SImode || mode == DImode))
+               {
+                 *cost += COSTS_N_INSNS (3)
+                          + 2 * extra_cost->alu.logical
+                          + extra_cost->alu.arith;
+                 return true;
+               }
+
+             *cost += (extra_cost->mult[mode == DImode].add
+                       + extra_cost->mult[mode == DImode].idiv);
+           }
          else if (mode == DFmode)
            *cost += (extra_cost->fp[1].mult
                      + extra_cost->fp[1].div);

This looks like it calculates the wrong cost for !speed. I think we will
still expand through mod<mode>3 when compiling for size, so we probably
still want to cost the multiple instructions.

Have I misunderstood?

You're right, the logic needs a bit of wiggling to do the right
thing. I've moved this case into a separate MOD case and added
a gate on speed for the extra costs addition.

Ok?

Thanks,
Kyrill

2015-08-13  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>

      * config/aarch64/aarch64.md (mod<mode>3): New define_expand.
      (*neg<mode>2_compare0): Rename to...
      (neg<mode>2_compare0): ... This.
      * config/aarch64/aarch64.c (aarch64_rtx_costs, MOD case):
      Move check for speed inside the if-then-elses.  Reflect
      CSNEG sequence in MOD by power of 2 case.

Thanks,
James


aarch64-mod-2.patch

commit de67e5fba716ce835f595c4167f57ec4faf61607
Author: Kyrylo Tkachov <kyrylo.tkac...@arm.com>
Date:   Wed Jul 15 17:01:13 2015 +0100

     [AArch64][1/3] Expand signed mod by power of 2 using CSNEG

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 1394ed7..c8bd8d2 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -6652,6 +6652,25 @@ cost_plus:
        return true;

case MOD:

+    /* We can expand signed mod by power of 2 using a
+       NEGS, two parallel ANDs and a CSNEG.  Assume here
+       that CSNEG is COSTS_N_INSNS (1).  This case should
+       only ever be reached through the set_smod_pow2_cheap check
+       in expmed.c.  */
+      if (CONST_INT_P (XEXP (x, 1))
+         && exact_log2 (INTVAL (XEXP (x, 1))) > 0
+         && (mode == SImode || mode == DImode))
+       {
+         *cost += COSTS_N_INSNS (3);
+
+         if (speed)
+           *cost += 2 * extra_cost->alu.logical
+                    + extra_cost->alu.arith;
+
+         return true;
+       }
+
+    /* Fall-through.  */
      case UMOD:
        if (speed)
        {
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index b7b04c4..a515573 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -302,6 +302,62 @@ (define_expand "cmp<mode>"
    }
  )

+;; AArch64-specific expansion of signed mod by power of 2 using CSNEG.

+;; For x0 % n where n is a power of 2 produce:
+;; negs   x1, x0
+;; and    x0, x0, #(n - 1)
+;; and    x1, x1, #(n - 1)
+;; csneg  x0, x0, x1, mi
+
+(define_expand "mod<mode>3"
+  [(match_operand:GPI 0 "register_operand" "")
+   (match_operand:GPI 1 "register_operand" "")
+   (match_operand:GPI 2 "const_int_operand" "")]
+  ""
+  {
+    HOST_WIDE_INT val = INTVAL (operands[2]);
+
+    if (val <= 0
+       || exact_log2 (INTVAL (operands[2])) <= 0
+       || !aarch64_bitmask_imm (INTVAL (operands[2]) - 1, <MODE>mode))
+      FAIL;
+
+    rtx mask = GEN_INT (val - 1);
+
+    /* In the special case of x0 % 2 we can do the even shorter:
+       cmp     x0, xzr
+       and     x0, x0, 1
+       csneg   x0, x0, x0, ge.  */
+    if (val == 2)
+      {
+       rtx masked = gen_reg_rtx (<MODE>mode);
+       rtx ccreg = aarch64_gen_compare_reg (LT, operands[1], const0_rtx);
+       emit_insn (gen_and<mode>3 (masked, operands[1], mask));
+       rtx x = gen_rtx_LT (VOIDmode, ccreg, const0_rtx);
+       emit_insn (gen_csneg3<mode>_insn (operands[0], x, masked, masked));
+       DONE;
+      }
+
+    rtx neg_op = gen_reg_rtx (<MODE>mode);
+    rtx_insn *insn = emit_insn (gen_neg<mode>2_compare0 (neg_op,
operands[1]));
+
+    /* Extract the condition register and mode.  */
+    rtx cmp = XVECEXP (PATTERN (insn), 0, 0);
+    rtx cc_reg = SET_DEST (cmp);
+    rtx cond = gen_rtx_GE (VOIDmode, cc_reg, const0_rtx);
+
+    rtx masked_pos = gen_reg_rtx (<MODE>mode);
+    emit_insn (gen_and<mode>3 (masked_pos, operands[1], mask));
+
+    rtx masked_neg = gen_reg_rtx (<MODE>mode);
+    emit_insn (gen_and<mode>3 (masked_neg, neg_op, mask));
+
+    emit_insn (gen_csneg3<mode>_insn (operands[0], cond,
+                                      masked_neg, masked_pos));
+    DONE;
+  }
+)
+
  (define_insn "*condjump"
    [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
                            [(match_operand 1 "cc_register" "") (const_int
0)])
@@ -2392,7 +2448,7 @@ (define_insn "*ngcsi_uxtw"
    [(set_attr "type" "adc_reg")]
  )

-(define_insn "*neg<mode>2_compare0"

+(define_insn "neg<mode>2_compare0"
    [(set (reg:CC_NZ CC_REGNUM)
        (compare:CC_NZ (neg:GPI (match_operand:GPI 1 "register_operand"
"r"))
                       (const_int 0)))

Re: [PATCH][AArch64][1/3] Expand signed mod by power of 2 using CSNEG

Reply via email to