[RFC] Improving GCSE to reduce constant splits on ARM

Dmitry Melnik Thu, 23 Dec 2010 08:03:20 -0800

Hi,

We've found that constant splitting on ARM can be very inefficient, ifit's done inside a loop.

For example, the expression


  a = a & 0xff00ff00;

will be translated into the following code (on ARM, only 8-bit valuesshifted by an even number can be used as immediate arguments):


  bic     r0, r0, #16711680
  bic     r0, r0, #255

This makes perfect sense, unless this code is in a loop, and there aremany instructions using the same bit mask. In that case, we would wantto put 0xff00ff00 constant into a register, letpass_rtl_move_loop_invariants put it outside a loop and reuse it forevery appropriate bitwise AND inside a loop.

This is a real-life example (from evas rasterization library), wherefixing this issue speeds up expedite test suite on average by 6% and upto 20% on several tests.


Why the splitting happens?

On 4.4, the only problem was GCSE, which propagated separate pseudoregister with a constant into a consumer insn, i.e.

  r123 = 0xff00ff00; r124 = r125 & r123
was transformed into
  r124 = r125 & 0xff00ff00

After that, the constant within AND expression is not considered as loopinvariant any more, and is not moved outside a loop. This can be fixedby checking whether the insn transformed by GCSE will require splitting,and if it does, then the transformation should not be done at earlierGCSE passes. We may check it by comparing rtx_cost of the constantwe're going to propagate with GCSE with rtx_cost of const_int(1).If moving loop invariant fails (e.g. due to register pressure), thenpass_combine still can propagate it inside AND, and in this case it willresult in the same code.

After this patch http://gcc.gnu.org/ml/gcc-patches/2009-08/msg01032.html, such constants are split as early as expand pass, so there's no chancefor loop invariant code motion pass to deal with them.


So, the questions are:

1) Is it really necessary to split constants on ARM at the time ofexpand? At least, loop invariant code motion can work better ifsplitting happens later.2) Is there any reason we shouldn't prevent GCSE from propagatingconstants that we know will be split?

In the attachment is the prototype patch that fixes GCSE to allowpropagating only those constants that won't cause split, and disablessplitting in expand on ARM.


--
Best regards,
  Dmitry

--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -2512,13 +2512,13 @@ arm_split_constant (enum rtx_code code, enum machine_mode mode, rtx insn,
&& (arm_gen_constant (code, mode, NULL_RTX, val, target, source,
                                1, 0)
> (arm_constant_limit (optimize_function_for_size_p (cfun))
-                + (code != SET))))
+                + (code != SET) - arm_fix_split)))
        {
          if (code == SET)
            {
              /* Currently SET is the only monadic value for CODE, all
                 the rest are diadic.  */
-             if (TARGET_USE_MOVT)
+             if (TARGET_USE_MOVT && !arm_fix_split)
                arm_emit_movpair (target, GEN_INT (val));
              else
                emit_set_insn (target, GEN_INT (val));
@@ -2529,7 +2529,7 @@ arm_split_constant (enum rtx_code code, enum machine_mode mode, rtx insn,
            {
              rtx temp = subtargets ? gen_reg_rtx (mode) : target;

-             if (TARGET_USE_MOVT)
+             if (TARGET_USE_MOVT && !arm_fix_split)
                arm_emit_movpair (temp, GEN_INT (val));
              else
                emit_set_insn (temp, GEN_INT (val));
diff --git a/gcc/config/arm/arm.opt b/gcc/config/arm/arm.opt
index a39bb3a..0c3952d 100644
--- a/gcc/config/arm/arm.opt
+++ b/gcc/config/arm/arm.opt
@@ -169,3 +169,8 @@ mfix-cortex-m3-ldrd
 Target Report Var(fix_cm3_ldrd) Init(2)
 Avoid overlapping destination and address registers on LDRD instructions
 that may trigger Cortex-M3 errata.
+
+mfix-split
+Target Report Var(arm_fix_split) Init(0)
+Deny to use movt in case of arm thumb2, and prefer to use memory loads
+than split insns
diff --git a/gcc/gcse.c b/gcc/gcse.c
index 9ff0da8..ed15997 100644
--- a/gcc/gcse.c
+++ b/gcc/gcse.c
@@ -2571,6 +2572,7 @@ constprop_register (rtx insn, rtx from, rtx to)

   /* Handle normal insns next.  */
   if (NONJUMP_INSN_P (insn)
+ && rtx_cost (to, GET_CODE (to), false)
+ <= rtx_cost (GEN_INT(1), CONST_INT, false)
&& try_replace_reg (from, to, insn))
     return 1;

[RFC] Improving GCSE to reduce constant splits on ARM

Reply via email to