Re: [PATCH] Try to coalesce for unary and binary ops

Richard Biener Thu, 17 Apr 2014 04:53:53 -0700

On Thu, 17 Apr 2014, Richard Biener wrote:

> 
> The patch below increases the number of coalescs we attempt
> to also cover unary and binary operations.  This improves
> initial code generation for code like
> 
> int foo (int i, int j, int k, int l)
> {
>   int res = i;
>   res += j;
>   res += k;
>   res += l;
>   return res;
> }
> 
> from
> 
> ;; res_3 = i_1(D) + j_2(D);
> 
> (insn 9 8 0 (parallel [
>             (set (reg/v:SI 83 [ res ])
>                 (plus:SI (reg/v:SI 87 [ i ])
>                     (reg/v:SI 88 [ j ])))
>             (clobber (reg:CC 17 flags))
>         ]) t.c:4 -1
>      (nil))
> 
> ;; res_5 = res_3 + k_4(D);
> 
> (insn 10 9 0 (parallel [
>             (set (reg/v:SI 84 [ res ])
>                 (plus:SI (reg/v:SI 83 [ res ])
>                     (reg/v:SI 89 [ k ])))
>             (clobber (reg:CC 17 flags))
>         ]) t.c:5 -1
>      (nil))
> ...
> 
> to
> 
> ;; res_3 = i_1(D) + j_2(D);
> 
> (insn 9 8 0 (parallel [
>             (set (reg/v:SI 83 [ res ])
>                 (plus:SI (reg/v:SI 85 [ i ])
>                     (reg/v:SI 86 [ j ])))
>             (clobber (reg:CC 17 flags))
>         ]) t.c:4 -1
>      (nil))
> 
> ;; res_5 = res_3 + k_4(D);
> 
> (insn 10 9 0 (parallel [
>             (set (reg/v:SI 83 [ res ])
>                 (plus:SI (reg/v:SI 83 [ res ])
>                     (reg/v:SI 87 [ k ])))
>             (clobber (reg:CC 17 flags))
>         ]) t.c:5 -1
>      (nil))
> 
> re-using the same pseudo for the LHS.
> 
> Expansion has special code to improve coalescing of op1 with
> target thus this is what we try to match here.
> 
> Overall there are positive and negative size effects during
> a bootstrap on x86_64, but overall it seems to be a loss
> - stage3 cc1 text size is 18261647 bytes without the patch
> compared to 18265751 bytes with the patch.
> 
> Now the question is what does this tell us?  Not re-using
> the same pseudo as op and target is always better?
> 
> Btw, I tried this to find a convincing metric for a intra-BB
> scheduling pass (during out-of-SSA) on GIMPLE (to be able
> to kill that odd scheduling code we now have in reassoc).
> And to have sth that TER not immediately un-does we have
> to disable TER which conveniently happens for coalesced
> SSA names.  Thus -> schedule for "register pressure", and thus
> reduce SSA name lifetime - with the goal that out-of-SSA can
> do more coalescing.  But it won't even try to coalesce
> anything else than PHI copies (not affected by scheduling)
> or plain SSA name copies (shouldn't happen anyway due to
> copy propagation).
> 
> So - any ideas?  Or is the overall negative for cc1 just
> an artifact to ignore and we _should_ coalesce as much
> as possible (even if it doesn't avoid copies - thus the
> "cost" of 0 used in the patch)?


One example where it delivers bad initial expansion on x86_64 is

int foo (int *p)
{
  int res = p[0];
  res += p[1];
  res += p[2];
  res += p[3];
  return res;
}

where i386.c:ix86_fixup_binary_operands tries to be clever
and "improve address combine", generating two instructions
for (plus:SI (reg/v:SI 83 [ res ]) (mem:SI (...))) and thus
triggering expand_binop_directly

  pat = maybe_gen_insn (icode, 3, ops);
  if (pat)
    {
      /* If PAT is composed of more than one insn, try to add an 
appropriate
         REG_EQUAL note to it.  If we can't because TEMP conflicts with an
         operand, call expand_binop again, this time without a target.  */
      if (INSN_P (pat) && NEXT_INSN (pat) != NULL_RTX
          && ! add_equal_note (pat, ops[0].value, optab_to_code 
(binoptab),
                               ops[1].value, ops[2].value))
        {
          delete_insns_since (last);
          return expand_binop (mode, binoptab, op0, op1, NULL_RTX,
                               unsignedp, methods);
        }

and thus we end up with

(insn 9 6 10 (set (reg:SI 91)
        (mem:SI (plus:DI (reg/v/f:DI 88 [ p ])
                (const_int 4 [0x4])) [0 MEM[(int *)p_2(D) + 4B]+0 S4 
A32])) t.c:4 -1
     (nil))

(insn 10 9 11 (parallel [
            (set (reg:SI 90)
                (plus:SI (reg/v:SI 83 [ res ])
                    (reg:SI 91)))
            (clobber (reg:CC 17 flags))
        ]) t.c:4 -1
     (expr_list:REG_EQUAL (plus:SI (reg/v:SI 83 [ res ])
            (mem:SI (plus:DI (reg/v/f:DI 88 [ p ])
                    (const_int 4 [0x4])) [0 MEM[(int *)p_2(D) + 4B]+0 S4 
A32]))
        (nil)))

(insn 11 10 0 (set (reg/v:SI 83 [ res ])
        (reg:SI 90)) t.c:4 -1
     (nil))

unpatched we avoid the last move (the tiny testcase of course
ends up optimizing the same anyway).  Not sure if that strong
desire to add a REG_EQUAL note makes up for the losses.  At
least it looks backwards to the code preceeding it:

  /* If operation is commutative,
     try to make the first operand a register.
     Even better, try to make it the same as the target.
     Also try to make the last operand a constant.  */
  if (commutative_p
      && swap_commutative_operands_with_target (target, xop0, xop1))
    {
      swap = xop1;
      xop1 = xop0;
      xop0 = swap;
    }

"Even better"?  It will cause the REG_EQUAL failure path and
make that swap moot ... (same in expand_unop_directly).  Looks
like REG_EQUAL notes are more important than register coalescing
(of course I can't know if a multi-insn expansion will happen
during out-of-SSA :/)

Richard.


> Otherwise the patch bootstraps and tests fine on x86_64-unknown-linux-gnu.
> 
> Thanks,
> Richard.
> 
> 2014-04-17  Richard Biener  <[email protected]>
> 
>       * tree-ssa-coalesce.c (create_outofssa_var_map): Try to
>       coalesce SSA name uses with SSA name results in all unary
>       and binary operations.
> 
> Index: gcc/tree-ssa-coalesce.c
> ===================================================================
> *** gcc/tree-ssa-coalesce.c   (revision 209469)
> --- gcc/tree-ssa-coalesce.c   (working copy)
> *************** create_outofssa_var_map (coalesce_list_p
> *** 991,1007 ****
>           case GIMPLE_ASSIGN:
>             {
>               tree lhs = gimple_assign_lhs (stmt);
>               tree rhs1 = gimple_assign_rhs1 (stmt);
> !             if (gimple_assign_ssa_name_copy_p (stmt)
>                   && gimple_can_coalesce_p (lhs, rhs1))
>                 {
>                   v1 = SSA_NAME_VERSION (lhs);
>                   v2 = SSA_NAME_VERSION (rhs1);
> !                 cost = coalesce_cost_bb (bb);
> !                 add_coalesce (cl, v1, v2, cost);
>                   bitmap_set_bit (used_in_copy, v1);
>                   bitmap_set_bit (used_in_copy, v2);
>                 }
>             }
>             break;
>   
> --- 993,1031 ----
>           case GIMPLE_ASSIGN:
>             {
>               tree lhs = gimple_assign_lhs (stmt);
> +             if (TREE_CODE (lhs) != SSA_NAME)
> +               break;
> + 
> +             /* Expansion handles target == op1 properly and also
> +                target == op2 for commutative binary ops.  */
>               tree rhs1 = gimple_assign_rhs1 (stmt);
> !             enum tree_code code = gimple_assign_rhs_code (stmt);
> !             enum gimple_rhs_class klass = get_gimple_rhs_class (code);
> !             if (TREE_CODE (rhs1) == SSA_NAME
>                   && gimple_can_coalesce_p (lhs, rhs1))
>                 {
>                   v1 = SSA_NAME_VERSION (lhs);
>                   v2 = SSA_NAME_VERSION (rhs1);
> !                 add_coalesce (cl, v1, v2,
> !                               klass == GIMPLE_SINGLE_RHS
> !                               ? coalesce_cost_bb (bb) : 0);
>                   bitmap_set_bit (used_in_copy, v1);
>                   bitmap_set_bit (used_in_copy, v2);
>                 }
> +             if (klass == GIMPLE_BINARY_RHS
> +                 && commutative_tree_code (code))
> +               {
> +                 tree rhs2 = gimple_assign_rhs2 (stmt);
> +                 if (TREE_CODE (rhs2) == SSA_NAME
> +                     && gimple_can_coalesce_p (lhs, rhs2))
> +                   {
> +                     v1 = SSA_NAME_VERSION (lhs);
> +                     v2 = SSA_NAME_VERSION (rhs2);
> +                     add_coalesce (cl, v1, v2, 0);
> +                     bitmap_set_bit (used_in_copy, v1);
> +                     bitmap_set_bit (used_in_copy, v2);
> +                   }
> +               }
>             }
>             break;
>   
> 

-- 
Richard Biener <[email protected]>
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer

Re: [PATCH] Try to coalesce for unary and binary ops

Reply via email to