Richard Henderson wrote:
> On 08/06/2012 11:34 AM, Ulrich Weigand wrote:
> > There is one particular inefficiency I have noticed. This code:
> >
> > if (!__atomic_compare_exchange_n (&v, &expected, max, 0 , 0, 0))
> > abort ();
> >
> > from atomic-compare-exchange-3.c gets compiled into:
> >
> > l %r3,0(%r2)
> > larl %r1,v
> > cs %r3,%r4,0(%r1)
> > ipm %r1
> > sra %r1,28
> > st %r3,0(%r2)
> > ltr %r1,%r1
> > jne .L3
> >
> > which is extremely inefficient; it converts the condition code into
> > an integer using the slow ipm, sra sequence, just so that it can
> > convert the integer back into a condition code via ltr and branch
> > on it ...
>
> This was caused (or perhaps abetted by) the representation of EQ
> as NE ^ 1. With the subsequent truncation and zero-extend, I
> think combine reached its insn limit of 3 before seeing everything
> it needed to see.
Yes, combine isn't able to handle everything in one go, but it finds
an intermediate insn. With the old __sync_compare_and_swap, it is
then able to optimize everything away in a second step; the only
reason this doesn't work any more is the intervening store.
The following patch changes the builtin expander to pass a MEM oldval
as-is to the back-end expander, so that the back-end can move the
store to before the CC operation. With that patch I'm also seeing
all the IPMs disappear.
[ Well, all except for this one:
> This happens because CSE notices the cbranch vs 0, and sets r116
> to zero along the path to
>
> 32 if (!__atomic_compare_exchange_n (&v, &expected, 0, STRONG,
> __ATOMIC_RELEASE,
> __ATOMIC_ACQUIRE))
>
> at which point CSE decides that it would be cheaper to "re-use"
> the zero already in r116 instead of load another constant 0 here.
> After that, combine is ham-strung because r116 is not dead.
As you point out, this does indeed fix this problem as well:
> A short-term possibility is to have the CAS insns accept general_operand,
> so that the 0 gets merged.
]
What do you think about this solution? It has the advantage that
we still get the same xor code if we actually do need the ipm ...
Bye,
Ulrich
diff -ur gcc/builtins.c gcc.new/builtins.c
--- gcc/builtins.c 2012-08-07 16:04:45.054348099 +0200
+++ gcc.new/builtins.c 2012-08-07 15:44:01.304349225 +0200
@@ -5376,6 +5376,7 @@
expect = expand_normal (CALL_EXPR_ARG (exp, 1));
expect = convert_memory_address (Pmode, expect);
+ expect = gen_rtx_MEM (mode, expect);
desired = expand_expr_force_mode (CALL_EXPR_ARG (exp, 2), mode);
weak = CALL_EXPR_ARG (exp, 3);
@@ -5383,14 +5384,15 @@
if (host_integerp (weak, 0) && tree_low_cst (weak, 0) != 0)
is_weak = true;
- oldval = copy_to_reg (gen_rtx_MEM (mode, expect));
-
+ oldval = expect;
if (!expand_atomic_compare_and_swap ((target == const0_rtx ? NULL : &target),
&oldval, mem, oldval, desired,
is_weak, success, failure))
return NULL_RTX;
- emit_move_insn (gen_rtx_MEM (mode, expect), oldval);
+ if (oldval != expect)
+ emit_move_insn (expect, oldval);
+
return target;
}
diff -ur gcc/config/s390/s390.md gcc.new/config/s390/s390.md
--- gcc/config/s390/s390.md 2012-08-07 16:04:54.204348621 +0200
+++ gcc.new/config/s390/s390.md 2012-08-07 16:00:21.934348628 +0200
@@ -8870,7 +8870,7 @@
(define_expand "atomic_compare_and_swap<mode>"
[(match_operand:SI 0 "register_operand") ;; bool success output
- (match_operand:DGPR 1 "register_operand") ;; oldval output
+ (match_operand:DGPR 1 "nonimmediate_operand");; oldval output
(match_operand:DGPR 2 "memory_operand") ;; memory
(match_operand:DGPR 3 "register_operand") ;; expected intput
(match_operand:DGPR 4 "register_operand") ;; newval intput
@@ -8879,9 +8879,17 @@
(match_operand:SI 7 "const_int_operand")] ;; failure model
""
{
- rtx cc, cmp;
+ rtx cc, cmp, output = operands[1];
+
+ if (!register_operand (output, <MODE>mode))
+ output = gen_reg_rtx (<MODE>mode);
+
emit_insn (gen_atomic_compare_and_swap<mode>_internal
- (operands[1], operands[2], operands[3], operands[4]));
+ (output, operands[2], operands[3], operands[4]));
+
+ if (output != operands[1])
+ emit_move_insn (operands[1], output);
+
cc = gen_rtx_REG (CCZ1mode, CC_REGNUM);
cmp = gen_rtx_EQ (SImode, cc, const0_rtx);
emit_insn (gen_cstorecc4 (operands[0], cmp, cc, const0_rtx));
--
Dr. Ulrich Weigand
GNU Toolchain for Linux on System z and Cell BE
[email protected]