Pinging this back into context so that I don't forget about it...
https://gcc.gnu.org/ml/gcc-patches/2017-03/msg00376.html Thanks, Kyrill On 08/03/17 16:35, Kyrill Tkachov wrote:
Hi all, For the testcase in this patch where the value of x is zero we currently generate: foo: mov w1, 4 .L2: ldaxr w2, [x0] cmp w2, 0 bne .L3 stxr w3, w1, [x0] cbnz w3, .L2 .L3: cset w0, eq ret We currently cannot merge the cmp and b.ne inside the loop into a cbnz because we need the condition flags set for the return value of the function (i.e. the cset at the end). But if we re-jig the sequence in that case we can generate a tighter loop: foo: mov w1, 4 .L2: ldaxr w2, [x0] cbnz w2, .L3 stxr w3, w1, [x0] cbnz w3, .L2 .L3: cmp w2, 0 cset w0, eq ret So we add an explicit compare after the loop and inside the loop we use the fact that we're comparing against zero to emit a CBNZ. This means we may re-do the comparison twice (once inside the CBNZ, once at the CMP at the end), but there is now less code inside the loop. I've seen this sequence appear in glibc locking code so maybe it's worth adding the extra bit of complexity to the compare-exchange splitter to catch this case. Bootstrapped and tested on aarch64-none-linux-gnu. In previous iterations of the patch where I had gotten some logic wrong it would cause miscompiles of libgomp leading to timeouts in its testsuite but this version passes everything cleanly. Ok for GCC 8? (I know it's early, but might as well get it out in case someone wants to try it out) Thanks, Kyrill 2017-03-08 Kyrylo Tkachov <kyrylo.tkac...@arm.com> * config/aarch64/aarch64.c (aarch64_split_compare_and_swap): Emit CBNZ inside loop when doing a strong exchange and comparing against zero. Generate the CC flags after the loop. 2017-03-08 Kyrylo Tkachov <kyrylo.tkac...@arm.com> * gcc.target/aarch64/atomic_cmp_exchange_zero_strong_1.c: New test.