On Thu, 17 Apr 2014, Richard Biener wrote:
>
> The patch below increases the number of coalescs we attempt
> to also cover unary and binary operations. This improves
> initial code generation for code like
>
> int foo (int i, int j, int k, int l)
> {
> int res = i;
> res += j;
> res += k;
> res += l;
> return res;
> }
>
> from
>
> ;; res_3 = i_1(D) + j_2(D);
>
> (insn 9 8 0 (parallel [
> (set (reg/v:SI 83 [ res ])
> (plus:SI (reg/v:SI 87 [ i ])
> (reg/v:SI 88 [ j ])))
> (clobber (reg:CC 17 flags))
> ]) t.c:4 -1
> (nil))
>
> ;; res_5 = res_3 + k_4(D);
>
> (insn 10 9 0 (parallel [
> (set (reg/v:SI 84 [ res ])
> (plus:SI (reg/v:SI 83 [ res ])
> (reg/v:SI 89 [ k ])))
> (clobber (reg:CC 17 flags))
> ]) t.c:5 -1
> (nil))
> ...
>
> to
>
> ;; res_3 = i_1(D) + j_2(D);
>
> (insn 9 8 0 (parallel [
> (set (reg/v:SI 83 [ res ])
> (plus:SI (reg/v:SI 85 [ i ])
> (reg/v:SI 86 [ j ])))
> (clobber (reg:CC 17 flags))
> ]) t.c:4 -1
> (nil))
>
> ;; res_5 = res_3 + k_4(D);
>
> (insn 10 9 0 (parallel [
> (set (reg/v:SI 83 [ res ])
> (plus:SI (reg/v:SI 83 [ res ])
> (reg/v:SI 87 [ k ])))
> (clobber (reg:CC 17 flags))
> ]) t.c:5 -1
> (nil))
>
> re-using the same pseudo for the LHS.
>
> Expansion has special code to improve coalescing of op1 with
> target thus this is what we try to match here.
>
> Overall there are positive and negative size effects during
> a bootstrap on x86_64, but overall it seems to be a loss
> - stage3 cc1 text size is 18261647 bytes without the patch
> compared to 18265751 bytes with the patch.
>
> Now the question is what does this tell us? Not re-using
> the same pseudo as op and target is always better?
>
> Btw, I tried this to find a convincing metric for a intra-BB
> scheduling pass (during out-of-SSA) on GIMPLE (to be able
> to kill that odd scheduling code we now have in reassoc).
> And to have sth that TER not immediately un-does we have
> to disable TER which conveniently happens for coalesced
> SSA names. Thus -> schedule for "register pressure", and thus
> reduce SSA name lifetime - with the goal that out-of-SSA can
> do more coalescing. But it won't even try to coalesce
> anything else than PHI copies (not affected by scheduling)
> or plain SSA name copies (shouldn't happen anyway due to
> copy propagation).
>
> So - any ideas? Or is the overall negative for cc1 just
> an artifact to ignore and we _should_ coalesce as much
> as possible (even if it doesn't avoid copies - thus the
> "cost" of 0 used in the patch)?
One example where it delivers bad initial expansion on x86_64 is
int foo (int *p)
{
int res = p[0];
res += p[1];
res += p[2];
res += p[3];
return res;
}
where i386.c:ix86_fixup_binary_operands tries to be clever
and "improve address combine", generating two instructions
for (plus:SI (reg/v:SI 83 [ res ]) (mem:SI (...))) and thus
triggering expand_binop_directly
pat = maybe_gen_insn (icode, 3, ops);
if (pat)
{
/* If PAT is composed of more than one insn, try to add an
appropriate
REG_EQUAL note to it. If we can't because TEMP conflicts with an
operand, call expand_binop again, this time without a target. */
if (INSN_P (pat) && NEXT_INSN (pat) != NULL_RTX
&& ! add_equal_note (pat, ops[0].value, optab_to_code
(binoptab),
ops[1].value, ops[2].value))
{
delete_insns_since (last);
return expand_binop (mode, binoptab, op0, op1, NULL_RTX,
unsignedp, methods);
}
and thus we end up with
(insn 9 6 10 (set (reg:SI 91)
(mem:SI (plus:DI (reg/v/f:DI 88 [ p ])
(const_int 4 [0x4])) [0 MEM[(int *)p_2(D) + 4B]+0 S4
A32])) t.c:4 -1
(nil))
(insn 10 9 11 (parallel [
(set (reg:SI 90)
(plus:SI (reg/v:SI 83 [ res ])
(reg:SI 91)))
(clobber (reg:CC 17 flags))
]) t.c:4 -1
(expr_list:REG_EQUAL (plus:SI (reg/v:SI 83 [ res ])
(mem:SI (plus:DI (reg/v/f:DI 88 [ p ])
(const_int 4 [0x4])) [0 MEM[(int *)p_2(D) + 4B]+0 S4
A32]))
(nil)))
(insn 11 10 0 (set (reg/v:SI 83 [ res ])
(reg:SI 90)) t.c:4 -1
(nil))
unpatched we avoid the last move (the tiny testcase of course
ends up optimizing the same anyway). Not sure if that strong
desire to add a REG_EQUAL note makes up for the losses. At
least it looks backwards to the code preceeding it:
/* If operation is commutative,
try to make the first operand a register.
Even better, try to make it the same as the target.
Also try to make the last operand a constant. */
if (commutative_p
&& swap_commutative_operands_with_target (target, xop0, xop1))
{
swap = xop1;
xop1 = xop0;
xop0 = swap;
}
"Even better"? It will cause the REG_EQUAL failure path and
make that swap moot ... (same in expand_unop_directly). Looks
like REG_EQUAL notes are more important than register coalescing
(of course I can't know if a multi-insn expansion will happen
during out-of-SSA :/)
Richard.
> Otherwise the patch bootstraps and tests fine on x86_64-unknown-linux-gnu.
>
> Thanks,
> Richard.
>
> 2014-04-17 Richard Biener <[email protected]>
>
> * tree-ssa-coalesce.c (create_outofssa_var_map): Try to
> coalesce SSA name uses with SSA name results in all unary
> and binary operations.
>
> Index: gcc/tree-ssa-coalesce.c
> ===================================================================
> *** gcc/tree-ssa-coalesce.c (revision 209469)
> --- gcc/tree-ssa-coalesce.c (working copy)
> *************** create_outofssa_var_map (coalesce_list_p
> *** 991,1007 ****
> case GIMPLE_ASSIGN:
> {
> tree lhs = gimple_assign_lhs (stmt);
> tree rhs1 = gimple_assign_rhs1 (stmt);
> ! if (gimple_assign_ssa_name_copy_p (stmt)
> && gimple_can_coalesce_p (lhs, rhs1))
> {
> v1 = SSA_NAME_VERSION (lhs);
> v2 = SSA_NAME_VERSION (rhs1);
> ! cost = coalesce_cost_bb (bb);
> ! add_coalesce (cl, v1, v2, cost);
> bitmap_set_bit (used_in_copy, v1);
> bitmap_set_bit (used_in_copy, v2);
> }
> }
> break;
>
> --- 993,1031 ----
> case GIMPLE_ASSIGN:
> {
> tree lhs = gimple_assign_lhs (stmt);
> + if (TREE_CODE (lhs) != SSA_NAME)
> + break;
> +
> + /* Expansion handles target == op1 properly and also
> + target == op2 for commutative binary ops. */
> tree rhs1 = gimple_assign_rhs1 (stmt);
> ! enum tree_code code = gimple_assign_rhs_code (stmt);
> ! enum gimple_rhs_class klass = get_gimple_rhs_class (code);
> ! if (TREE_CODE (rhs1) == SSA_NAME
> && gimple_can_coalesce_p (lhs, rhs1))
> {
> v1 = SSA_NAME_VERSION (lhs);
> v2 = SSA_NAME_VERSION (rhs1);
> ! add_coalesce (cl, v1, v2,
> ! klass == GIMPLE_SINGLE_RHS
> ! ? coalesce_cost_bb (bb) : 0);
> bitmap_set_bit (used_in_copy, v1);
> bitmap_set_bit (used_in_copy, v2);
> }
> + if (klass == GIMPLE_BINARY_RHS
> + && commutative_tree_code (code))
> + {
> + tree rhs2 = gimple_assign_rhs2 (stmt);
> + if (TREE_CODE (rhs2) == SSA_NAME
> + && gimple_can_coalesce_p (lhs, rhs2))
> + {
> + v1 = SSA_NAME_VERSION (lhs);
> + v2 = SSA_NAME_VERSION (rhs2);
> + add_coalesce (cl, v1, v2, 0);
> + bitmap_set_bit (used_in_copy, v1);
> + bitmap_set_bit (used_in_copy, v2);
> + }
> + }
> }
> break;
>
>
--
Richard Biener <[email protected]>
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer