From: Matthew Malcomson <[email protected]>
Hello, this is an RFC for adding an atomic floating point fetch_add builtin
(and variants) to GCC. The atomic fetch_add operation is defined to work
on the base floating point types in the C++20 standard chapter 31.7.3, and
extended to work for all cv-unqualified floating point types in C++23
chapter 33.5.7.4.
Honestly not sure who to Cc, please do point me to someone else if that's
better.
This is nowhere near complete (for one thing even the tests I've added
don't fully pass), but I think I have a complete enough idea that it's
worth checking if this is something that could be agreed on.
As it stands no target except the nvptx backend would natively support
these operations.
Main questions that I'm looking to resolve with this RFC:
1) Would GCC be OK accepting this implementation even though no backend
would be implementing these yet?
- AIUI only the nvptx backend could theoretically implement this.
- Even without a backend implementing it natively, the ability to use
this in code (especially libstdc++) enables other compilers to
generate better code for GPU's using standard C++.
2) Would libstdc++ be OK relying on `__has_builtin(__atomic_fetch_add_fp)`
(i.e. a check on the resolved builtin rather than the more user-facing
one) in order to determine whether floating point atomic fetch_add is
available.
- N.b. this builtin is actually the builtin working on the "double"
type, one would have to rely on any compilers implementing that
particular resolved builtin to also implement the other floating point
atomic fetch_add builtins that they would want to support in libstdc++
`atomic<[floating_point_type]>::fetch_add`.
More specific questions about the choice of which builtins to implement and
whether the types are OK:
1) Is it OK to not implement the `__sync_*` versions?
Since these are deprecated and the `__atomic_*` versions are there to
match the C/C++ code atomic operations (which is a large part of the
reason for the new floating point operations).
2) Is it OK to not implement all the `fetch_*` operations?
None of the bitwise operations are specified for C++ and bitwise
operations are (AIUI) rarely used on floating point values.
3) Wanting to be able to farm out to libatomic meant that we need constant names
for the specialised functions.
- This led to the naming convention based on floating point type.
- That naming convention *could* be updated to include the special backend
floating point types if needed. I have not done this mostly because I
thought it would not add much, though I have not looked into this very
closely.
4) Wanting to name the functions based on floating point type rather than size
meant that the mapping from type passed to the overloaded version to
specialised builtin was less direct than for the integral versions.
- Ended up with a hard-coded table in the source to check this.
- Would very much like some better approach, not certain what better approach
I could find.
- Will eventually at least share the hard-coded tables (which isn't
happening yet because this is at RFC level).
5) Are there any other types that I should use?
Similarly are there any types that I'm trying to use that I shouldn't?
I *believe* I've implemented all the types that make sense and are
general builtin types. Could easily have missed some (e.g. left
`_Float128x` alone because AIUI it's larger than 128bits which means we
don't have any other atomic operations on such data), could also easily
be misunderstanding the mention in the C++ standards of "extended" types
(which I believe is the `_Float*` and `bfloat16` types).
6) Anything special about floating point maths that I'm tripping up on?
(Especially around CAS loop where we expand the operation directly,
sometimes convert a PLUS into a MINUS of a negative value ...).
Don't know of anything myself, but also a bit wary of floating point
maths.
N.b. I know that there's a reasonable amount of work left in:
1) Ensuring that all the places which use atomic types are updated
(e.g. asan),
2) Ensuring that all frontends can use these to the level that they could
use the integral atomics.
3) Ensuring the major backends can still compile libatomic (I had to do
something in AArch64 libatomic backend to make it compile, seems like
there might be more to do for others).
Matthew Malcomson (8):
Define new floating point builtin fetch_add functions
Add FP types for atomic builtin overload resolution
Tie the new atomic builtins to the backend
Have libatomic working as first draft
Use new builtins in libstdc++
First attempt at testsuite
Mention floating point atomic fetch_add etc in docs
Add demo implementation of one of the operations
gcc/builtin-types.def | 20 +
gcc/builtins.cc | 153 ++-
gcc/c-family/c-common.cc | 88 +-
gcc/config/aarch64/aarch64.h | 2 +
gcc/config/aarch64/aarch64.opt | 5 +
gcc/config/aarch64/atomics.md | 15 +
gcc/doc/extend.texi | 12 +
gcc/fortran/types.def | 48 +
gcc/optabs.cc | 101 +-
gcc/optabs.def | 6 +-
gcc/optabs.h | 2 +-
gcc/sync-builtins.def | 40 +
gcc/testsuite/gcc.dg/atomic-op-fp.c | 204 +++
gcc/testsuite/gcc.dg/atomic-op-fpf.c | 204 +++
gcc/testsuite/gcc.dg/atomic-op-fpf128.c | 204 +++
gcc/testsuite/gcc.dg/atomic-op-fpf16.c | 204 +++
gcc/testsuite/gcc.dg/atomic-op-fpf16b.c | 204 +++
gcc/testsuite/gcc.dg/atomic-op-fpf32.c | 204 +++
gcc/testsuite/gcc.dg/atomic-op-fpf32x.c | 204 +++
gcc/testsuite/gcc.dg/atomic-op-fpf64.c | 204 +++
gcc/testsuite/gcc.dg/atomic-op-fpf64x.c | 204 +++
gcc/testsuite/gcc.dg/atomic-op-fpl.c | 204 +++
gcc/testsuite/lib/target-supports.exp | 463 ++++++-
libatomic/Makefile.am | 6 +-
libatomic/Makefile.in | 12 +-
libatomic/acinclude.m4 | 49 +
libatomic/auto-config.h.in | 84 +-
libatomic/config/linux/aarch64/host-config.h | 2 +
libatomic/configure | 1153 ++++++++++++++++-
libatomic/configure.ac | 4 +
libatomic/fadd_n.c | 23 +
libatomic/fop_n.c | 5 +-
libatomic/fsub_n.c | 23 +
libatomic/libatomic.map | 44 +
libatomic/libatomic_i.h | 58 +
libatomic/testsuite/Makefile.in | 1 +
.../testsuite/libatomic.c/atomic-op-fp.c | 203 +++
.../testsuite/libatomic.c/atomic-op-fpf.c | 203 +++
.../testsuite/libatomic.c/atomic-op-fpf128.c | 203 +++
.../testsuite/libatomic.c/atomic-op-fpf16.c | 203 +++
.../testsuite/libatomic.c/atomic-op-fpf16b.c | 203 +++
.../testsuite/libatomic.c/atomic-op-fpf32.c | 203 +++
.../testsuite/libatomic.c/atomic-op-fpf32x.c | 203 +++
.../testsuite/libatomic.c/atomic-op-fpf64.c | 203 +++
.../testsuite/libatomic.c/atomic-op-fpf64x.c | 203 +++
.../testsuite/libatomic.c/atomic-op-fpl.c | 203 +++
libstdc++-v3/include/bits/atomic_base.h | 16 +
47 files changed, 6393 insertions(+), 112 deletions(-)
create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fp.c
create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpf.c
create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpf128.c
create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpf16.c
create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpf16b.c
create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpf32.c
create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpf32x.c
create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpf64.c
create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpf64x.c
create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpl.c
create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fp.c
create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpf.c
create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpf128.c
create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpf16.c
create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpf16b.c
create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpf32.c
create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpf32x.c
create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpf64.c
create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpf64x.c
create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpl.c
--
2.43.0