Hi,
This patch is part of a patch series to add support for atomic operations on
ARMv8-M Baseline targets in GCC. This specific patch refactors the expander and
splitter for atomics to make the logic work with ARMv8-M Baseline which has
limitation of Thumb-1 in terms of CC flag setting and different conditional
compare insn patterns.
ChangeLog entry is as follows:
*** gcc/ChangeLog ***
2016-09-02 Thomas Preud'homme <thomas.preudho...@arm.com>
* config/arm/arm.c (arm_expand_compare_and_swap): Add new bdst local
variable. Add the new parameter to the insn generator. Set that
parameter to be CC flag for 32-bit targets, bval otherwise. Set the
return value from the negation of that parameter for Thumb-1, keeping
the logic unchanged otherwise except for using bdst as the destination
register of the compare_and_swap insn.
(arm_split_compare_and_swap): Add explanation about how is the value
returned to the function comment. Rename scratch variable to
neg_bval. Adapt initialization of variables holding operands to the
new operand numbers. Use return register to hold result of store
exclusive for Thumb-1, scratch register otherwise. Construct the
appropriate cbranch for Thumb-1 targets, keeping the logic unchanged
for 32-bit targets. Guard Z flag setting to restrict to 32bit targets.
Use gen_cbranchsi4 rather than hand-written conditional branch to loop
for strongly ordered compare_and_swap.
* config/arm/predicates.md (cc_register_operand): New predicate.
* config/arm/sync.md (atomic_compare_and_swap<mode>_1): Use a
match_operand with the new predicate to accept either the CC flag or a
destination register for the boolean return value, restricting it to
CC flag only via constraint. Adapt operand numbers accordingly.
Testing: No code generation difference for ARMv7-A, ARMv7VE and ARMv8-A on all
atomic and synchronization testcases in the testsuite [2]. Patchset was also
bootstrapped with --enable-itm --enable-gomp on ARMv8-A in ARM and Thumb mode at
optimization level -O1 and above [1] without any regression in the testsuite and
no code generation difference in libitm and libgomp.
Code generation for ARMv8-M Baseline has been manually examined and compared
against ARMv8-A Thumb-2 for the following configuration without finding any issue:
gcc.dg/atomic-op-2.c at -Os
gcc.dg/atomic-compare-exchange-2.c at -Os
gcc.dg/atomic-compare-exchange-3.c at -O3
Is this ok for trunk?
Best regards,
Thomas
[1] CFLAGS_FOR_TARGET and CXXFLAGS_FOR_TARGET were set to "-O1 -g", "-O3 -g" and
undefined ("-O2 -g")
[2] The exact list is:
gcc/testsuite/gcc.dg/atomic-compare-exchange-1.c
gcc/testsuite/gcc.dg/atomic-compare-exchange-2.c
gcc/testsuite/gcc.dg/atomic-compare-exchange-3.c
gcc/testsuite/gcc.dg/atomic-exchange-1.c
gcc/testsuite/gcc.dg/atomic-exchange-2.c
gcc/testsuite/gcc.dg/atomic-exchange-3.c
gcc/testsuite/gcc.dg/atomic-fence.c
gcc/testsuite/gcc.dg/atomic-flag.c
gcc/testsuite/gcc.dg/atomic-generic.c
gcc/testsuite/gcc.dg/atomic-generic-aux.c
gcc/testsuite/gcc.dg/atomic-invalid-2.c
gcc/testsuite/gcc.dg/atomic-load-1.c
gcc/testsuite/gcc.dg/atomic-load-2.c
gcc/testsuite/gcc.dg/atomic-load-3.c
gcc/testsuite/gcc.dg/atomic-lockfree.c
gcc/testsuite/gcc.dg/atomic-lockfree-aux.c
gcc/testsuite/gcc.dg/atomic-noinline.c
gcc/testsuite/gcc.dg/atomic-noinline-aux.c
gcc/testsuite/gcc.dg/atomic-op-1.c
gcc/testsuite/gcc.dg/atomic-op-2.c
gcc/testsuite/gcc.dg/atomic-op-3.c
gcc/testsuite/gcc.dg/atomic-op-6.c
gcc/testsuite/gcc.dg/atomic-store-1.c
gcc/testsuite/gcc.dg/atomic-store-2.c
gcc/testsuite/gcc.dg/atomic-store-3.c
gcc/testsuite/g++.dg/ext/atomic-1.C
gcc/testsuite/g++.dg/ext/atomic-2.C
gcc/testsuite/gcc.target/arm/atomic-comp-swap-release-acquire.c
gcc/testsuite/gcc.target/arm/atomic-op-acq_rel.c
gcc/testsuite/gcc.target/arm/atomic-op-acquire.c
gcc/testsuite/gcc.target/arm/atomic-op-char.c
gcc/testsuite/gcc.target/arm/atomic-op-consume.c
gcc/testsuite/gcc.target/arm/atomic-op-int.c
gcc/testsuite/gcc.target/arm/atomic-op-relaxed.c
gcc/testsuite/gcc.target/arm/atomic-op-release.c
gcc/testsuite/gcc.target/arm/atomic-op-seq_cst.c
gcc/testsuite/gcc.target/arm/atomic-op-short.c
gcc/testsuite/gcc.target/arm/atomic_loaddi_1.c
gcc/testsuite/gcc.target/arm/atomic_loaddi_2.c
gcc/testsuite/gcc.target/arm/atomic_loaddi_3.c
gcc/testsuite/gcc.target/arm/atomic_loaddi_4.c
gcc/testsuite/gcc.target/arm/atomic_loaddi_5.c
gcc/testsuite/gcc.target/arm/atomic_loaddi_6.c
gcc/testsuite/gcc.target/arm/atomic_loaddi_7.c
gcc/testsuite/gcc.target/arm/atomic_loaddi_8.c
gcc/testsuite/gcc.target/arm/atomic_loaddi_9.c
gcc/testsuite/gcc.target/arm/sync-1.c
gcc/testsuite/gcc.target/arm/synchronize.c
gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c
gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c
gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c
gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c
libstdc++-v3/testsuite/29_atomics/atomic/60658.cc
libstdc++-v3/testsuite/29_atomics/atomic/62259.cc
libstdc++-v3/testsuite/29_atomics/atomic/64658.cc
libstdc++-v3/testsuite/29_atomics/atomic/65147.cc
libstdc++-v3/testsuite/29_atomics/atomic/65913.cc
libstdc++-v3/testsuite/29_atomics/atomic/70766.cc
libstdc++-v3/testsuite/29_atomics/atomic/cons/49445.cc
libstdc++-v3/testsuite/29_atomics/atomic/cons/constexpr.cc
libstdc++-v3/testsuite/29_atomics/atomic/cons/copy_list.cc
libstdc++-v3/testsuite/29_atomics/atomic/cons/default.cc
libstdc++-v3/testsuite/29_atomics/atomic/cons/direct_list.cc
libstdc++-v3/testsuite/29_atomics/atomic/cons/single_value.cc
libstdc++-v3/testsuite/29_atomics/atomic/cons/user_pod.cc
libstdc++-v3/testsuite/29_atomics/atomic/operators/51811.cc
libstdc++-v3/testsuite/29_atomics/atomic/operators/56011.cc
libstdc++-v3/testsuite/29_atomics/atomic/operators/integral_assignment.cc
libstdc++-v3/testsuite/29_atomics/atomic/operators/integral_conversion.cc
libstdc++-v3/testsuite/29_atomics/atomic/operators/pointer_partial_void.cc
libstdc++-v3/testsuite/29_atomics/atomic/requirements/base_classes.cc
libstdc++-v3/testsuite/29_atomics/atomic/requirements/compare_exchange_lowering.cc
libstdc++-v3/testsuite/29_atomics/atomic/requirements/explicit_instantiation/1.cc
libstdc++-v3/testsuite/29_atomics/atomic_flag/clear/1.cc
libstdc++-v3/testsuite/29_atomics/atomic_flag/cons/1.cc
libstdc++-v3/testsuite/29_atomics/atomic_flag/cons/56012.cc
libstdc++-v3/testsuite/29_atomics/atomic_flag/cons/aggregate.cc
libstdc++-v3/testsuite/29_atomics/atomic_flag/cons/default.cc
libstdc++-v3/testsuite/29_atomics/atomic_flag/requirements/standard_layout.cc
libstdc++-v3/testsuite/29_atomics/atomic_flag/requirements/trivial.cc
libstdc++-v3/testsuite/29_atomics/atomic_flag/test_and_set/explicit.cc
libstdc++-v3/testsuite/29_atomics/atomic_flag/test_and_set/implicit.cc
libstdc++-v3/testsuite/29_atomics/atomic_integral/60940.cc
libstdc++-v3/testsuite/29_atomics/atomic_integral/65147.cc
libstdc++-v3/testsuite/29_atomics/atomic_integral/cons/constexpr.cc
libstdc++-v3/testsuite/29_atomics/atomic_integral/cons/copy_list.cc
libstdc++-v3/testsuite/29_atomics/atomic_integral/cons/default.cc
libstdc++-v3/testsuite/29_atomics/atomic_integral/cons/direct_list.cc
libstdc++-v3/testsuite/29_atomics/atomic_integral/cons/single_value.cc
libstdc++-v3/testsuite/29_atomics/atomic_integral/operators/bitwise.cc
libstdc++-v3/testsuite/29_atomics/atomic_integral/operators/decrement.cc
libstdc++-v3/testsuite/29_atomics/atomic_integral/operators/increment.cc
libstdc++-v3/testsuite/29_atomics/atomic_integral/operators/integral_assignment.cc
libstdc++-v3/testsuite/29_atomics/atomic_integral/operators/integral_conversion.cc
libstdc++-v3/testsuite/29_atomics/atomic_integral/requirements/standard_layout.cc
libstdc++-v3/testsuite/29_atomics/atomic_integral/requirements/trivial.cc
libstdc++-v3/testsuite/29_atomics/headers/atomic/functions_std_c++0x.cc
libstdc++-v3/testsuite/29_atomics/headers/atomic/macros.cc
libstdc++-v3/testsuite/29_atomics/headers/atomic/types_std_c++0x.cc
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 97b3ef1673a98f406aed023f5715c35ebf69b07e..e8b32471d7b25813d46f8a46281c149c579e3816 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -28117,9 +28117,9 @@ emit_unlikely_jump (rtx insn)
void
arm_expand_compare_and_swap (rtx operands[])
{
- rtx bval, rval, mem, oldval, newval, is_weak, mod_s, mod_f, x;
+ rtx bval, bdst, rval, mem, oldval, newval, is_weak, mod_s, mod_f, x;
machine_mode mode;
- rtx (*gen) (rtx, rtx, rtx, rtx, rtx, rtx, rtx);
+ rtx (*gen) (rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx);
bval = operands[0];
rval = operands[1];
@@ -28176,43 +28176,54 @@ arm_expand_compare_and_swap (rtx operands[])
gcc_unreachable ();
}
- emit_insn (gen (rval, mem, oldval, newval, is_weak, mod_s, mod_f));
+ bdst = TARGET_THUMB1 ? bval : gen_rtx_REG (CCmode, CC_REGNUM);
+ emit_insn (gen (bdst, rval, mem, oldval, newval, is_weak, mod_s, mod_f));
if (mode == QImode || mode == HImode)
emit_move_insn (operands[1], gen_lowpart (mode, rval));
/* In all cases, we arrange for success to be signaled by Z set.
This arrangement allows for the boolean result to be used directly
- in a subsequent branch, post optimization. */
- x = gen_rtx_REG (CCmode, CC_REGNUM);
- x = gen_rtx_EQ (SImode, x, const0_rtx);
- emit_insn (gen_rtx_SET (bval, x));
+ in a subsequent branch, post optimization. For Thumb-1 targets, the
+ boolean negation of the result is also stored in bval because Thumb-1
+ backend lacks dependency tracking for CC flag due to flag-setting not
+ being represented at RTL level. */
+ if (TARGET_THUMB1)
+ gen_cstoresi_eq0_thumb1 (bval, bdst);
+ else
+ {
+ x = gen_rtx_EQ (SImode, bdst, const0_rtx);
+ emit_insn (gen_rtx_SET (bval, x));
+ }
}
/* Split a compare and swap pattern. It is IMPLEMENTATION DEFINED whether
another memory store between the load-exclusive and store-exclusive can
reset the monitor from Exclusive to Open state. This means we must wait
until after reload to split the pattern, lest we get a register spill in
- the middle of the atomic sequence. */
+ the middle of the atomic sequence. Success of the compare and swap is
+ indicated by the Z flag set for 32bit targets and by neg_bval being zero
+ for Thumb-1 targets (ie. negation of the boolean value returned by
+ atomic_compare_and_swapmode standard pattern in operand 0). */
void
arm_split_compare_and_swap (rtx operands[])
{
- rtx rval, mem, oldval, newval, scratch;
+ rtx rval, mem, oldval, newval, neg_bval;
machine_mode mode;
enum memmodel mod_s, mod_f;
bool is_weak;
rtx_code_label *label1, *label2;
rtx x, cond;
- rval = operands[0];
- mem = operands[1];
- oldval = operands[2];
- newval = operands[3];
- is_weak = (operands[4] != const0_rtx);
- mod_s = memmodel_from_int (INTVAL (operands[5]));
- mod_f = memmodel_from_int (INTVAL (operands[6]));
- scratch = operands[7];
+ rval = operands[1];
+ mem = operands[2];
+ oldval = operands[3];
+ newval = operands[4];
+ is_weak = (operands[5] != const0_rtx);
+ mod_s = memmodel_from_int (INTVAL (operands[6]));
+ mod_f = memmodel_from_int (INTVAL (operands[7]));
+ neg_bval = TARGET_THUMB1 ? operands[0] : operands[8];
mode = GET_MODE (mem);
bool is_armv8_sync = arm_arch8 && is_mm_sync (mod_s);
@@ -28244,26 +28255,44 @@ arm_split_compare_and_swap (rtx operands[])
arm_emit_load_exclusive (mode, rval, mem, use_acquire);
- cond = arm_gen_compare_reg (NE, rval, oldval, scratch);
- x = gen_rtx_NE (VOIDmode, cond, const0_rtx);
- x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
- gen_rtx_LABEL_REF (Pmode, label2), pc_rtx);
- emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
+ /* Z is set to 0 for 32bit targets (resp. rval set to 1) if oldval != rval,
+ as required to communicate with arm_expand_compare_and_swap. */
+ if (TARGET_32BIT)
+ {
+ cond = arm_gen_compare_reg (NE, rval, oldval, neg_bval);
+ x = gen_rtx_NE (VOIDmode, cond, const0_rtx);
+ x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
+ gen_rtx_LABEL_REF (Pmode, label2), pc_rtx);
+ emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
+ }
+ else
+ {
+ emit_move_insn (neg_bval, const1_rtx);
+ cond = gen_rtx_NE (VOIDmode, rval, oldval);
+ if (thumb1_cmpneg_operand (oldval, SImode))
+ emit_unlikely_jump (gen_cbranchsi4_scratch (neg_bval, rval, oldval,
+ label2, cond));
+ else
+ emit_unlikely_jump (gen_cbranchsi4_insn (cond, rval, oldval, label2));
+ }
- arm_emit_store_exclusive (mode, scratch, mem, newval, use_release);
+ arm_emit_store_exclusive (mode, neg_bval, mem, newval, use_release);
/* Weak or strong, we want EQ to be true for success, so that we
match the flags that we got from the compare above. */
- cond = gen_rtx_REG (CCmode, CC_REGNUM);
- x = gen_rtx_COMPARE (CCmode, scratch, const0_rtx);
- emit_insn (gen_rtx_SET (cond, x));
+ if (TARGET_32BIT)
+ {
+ cond = gen_rtx_REG (CCmode, CC_REGNUM);
+ x = gen_rtx_COMPARE (CCmode, neg_bval, const0_rtx);
+ emit_insn (gen_rtx_SET (cond, x));
+ }
if (!is_weak)
{
- x = gen_rtx_NE (VOIDmode, cond, const0_rtx);
- x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
- gen_rtx_LABEL_REF (Pmode, label1), pc_rtx);
- emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
+ /* Z is set to boolean value of !neg_bval, as required to communicate
+ with arm_expand_compare_and_swap. */
+ x = gen_rtx_NE (VOIDmode, neg_bval, const0_rtx);
+ emit_unlikely_jump (gen_cbranchsi4 (x, neg_bval, const0_rtx, label1));
}
if (!is_mm_relaxed (mod_f))
diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
index 4504ecc2810ced15cb70ab93487635c7dafa9972..2bc8ac134caf881fcf993b8e6d8c3786a67589ec 100644
--- a/gcc/config/arm/predicates.md
+++ b/gcc/config/arm/predicates.md
@@ -396,6 +396,12 @@
|| mode == CC_DGTUmode));
})
+;; Any register, including CC
+(define_predicate "cc_register_operand"
+ (and (match_code "reg")
+ (ior (match_operand 0 "s_register_operand")
+ (match_operand 0 "cc_register"))))
+
(define_special_predicate "arm_extendqisi_mem_op"
(and (match_operand 0 "memory_operand")
(match_test "TARGET_ARM ? arm_legitimate_address_outer_p (mode,
diff --git a/gcc/config/arm/sync.md b/gcc/config/arm/sync.md
index af1a6b07ca7ad9d83599b8c59789b6499cb40721..b1e87cdd5d9587d7b301d0dd0072fc41079a04d3 100644
--- a/gcc/config/arm/sync.md
+++ b/gcc/config/arm/sync.md
@@ -190,20 +190,20 @@
})
(define_insn_and_split "atomic_compare_and_swap<mode>_1"
- [(set (reg:CC_Z CC_REGNUM) ;; bool out
+ [(set (match_operand 0 "cc_register_operand" "=&c") ;; bool out
(unspec_volatile:CC_Z [(const_int 0)] VUNSPEC_ATOMIC_CAS))
- (set (match_operand:SI 0 "s_register_operand" "=&r") ;; val out
+ (set (match_operand:SI 1 "s_register_operand" "=&r") ;; val out
(zero_extend:SI
- (match_operand:NARROW 1 "mem_noofs_operand" "+Ua"))) ;; memory
- (set (match_dup 1)
+ (match_operand:NARROW 2 "mem_noofs_operand" "+Ua"))) ;; memory
+ (set (match_dup 2)
(unspec_volatile:NARROW
- [(match_operand:SI 2 "arm_add_operand" "rIL") ;; expected
- (match_operand:NARROW 3 "s_register_operand" "r") ;; desired
- (match_operand:SI 4 "const_int_operand") ;; is_weak
- (match_operand:SI 5 "const_int_operand") ;; mod_s
- (match_operand:SI 6 "const_int_operand")] ;; mod_f
+ [(match_operand:SI 3 "arm_add_operand" "rIL") ;; expected
+ (match_operand:NARROW 4 "s_register_operand" "r") ;; desired
+ (match_operand:SI 5 "const_int_operand") ;; is_weak
+ (match_operand:SI 6 "const_int_operand") ;; mod_s
+ (match_operand:SI 7 "const_int_operand")] ;; mod_f
VUNSPEC_ATOMIC_CAS))
- (clobber (match_scratch:SI 7 "=&r"))]
+ (clobber (match_scratch:SI 8 "=&r"))]
"<sync_predtab>"
"#"
"&& reload_completed"
@@ -219,19 +219,19 @@
[(SI "rIL") (DI "rDi")])
(define_insn_and_split "atomic_compare_and_swap<mode>_1"
- [(set (reg:CC_Z CC_REGNUM) ;; bool out
+ [(set (match_operand 0 "cc_register_operand" "=&c") ;; bool out
(unspec_volatile:CC_Z [(const_int 0)] VUNSPEC_ATOMIC_CAS))
- (set (match_operand:SIDI 0 "s_register_operand" "=&r") ;; val out
- (match_operand:SIDI 1 "mem_noofs_operand" "+Ua")) ;; memory
- (set (match_dup 1)
+ (set (match_operand:SIDI 1 "s_register_operand" "=&r") ;; val out
+ (match_operand:SIDI 2 "mem_noofs_operand" "+Ua")) ;; memory
+ (set (match_dup 2)
(unspec_volatile:SIDI
- [(match_operand:SIDI 2 "<cas_cmp_operand>" "<cas_cmp_str>") ;; expect
- (match_operand:SIDI 3 "s_register_operand" "r") ;; desired
- (match_operand:SI 4 "const_int_operand") ;; is_weak
- (match_operand:SI 5 "const_int_operand") ;; mod_s
- (match_operand:SI 6 "const_int_operand")] ;; mod_f
+ [(match_operand:SIDI 3 "<cas_cmp_operand>" "<cas_cmp_str>") ;; expect
+ (match_operand:SIDI 4 "s_register_operand" "r") ;; desired
+ (match_operand:SI 5 "const_int_operand") ;; is_weak
+ (match_operand:SI 6 "const_int_operand") ;; mod_s
+ (match_operand:SI 7 "const_int_operand")] ;; mod_f
VUNSPEC_ATOMIC_CAS))
- (clobber (match_scratch:SI 7 "=&r"))]
+ (clobber (match_scratch:SI 8 "=&r"))]
"<sync_predtab>"
"#"
"&& reload_completed"