Hi Richard,
> What I have not done, but is now a possibility, is to use a custom
> calling convention for the out-of-line routines. I now only clobber
> 2 (or 3, for TImode) temp regs and set a return value.
This would be a great feature to have since it reduces the overhead of
outlining considerably.
> I think this patch series would be great to have for GCC 10!
Agreed. I've got a couple of general comments:
* The option name -matomic-ool sounds too abbreviated. I think eg.
-moutline-atomics is more descriptive and user friendlier.
* Similarly the exported __aa64_have_atomics variable could be named
__aarch64_have_lse_atomics so it's clear that it is about LSE atomics.
+@item -matomic-ool
+@itemx -mno-atomic-ool
+Enable or disable calls to out-of-line helpers to implement atomic operations.
+These helpers will, at runtime, determine if ARMv8.1-Atomics instructions
+should be used; if not, they will use the load/store-exclusive instructions
+that are present in the base ARMv8.0 ISA.
+
+This option is only applicable when compiling for the base ARMv8.0
+instruction set. If using a later revision, e.g. @option{-march=armv8.1-a}
+or @option{-march=armv8-a+lse}, the ARMv8.1-Atomics instructions will be
+used directly.
So what is the behaviour when you explicitly select a specific CPU?
+/* Branch to LABEL if LSE is enabled.
+ The branch should be easily predicted, in that it will, after constructors,
+ always branch the same way. The expectation is that systems that implement
+ ARMv8.1-Atomics are "beefier" than those that omit the extension.
+ By arranging for the fall-through path to use load-store-exclusive insns,
+ we aid the branch predictor of the smallest cpus. */
I'd say that by the time GCC10 is released and used in distros, systems without
LSE atomics would be practically non-existent. So we should favour LSE atomics
by default.
Cheers,
Wilco