Hi Jonathan,

Pinging again for some clarification on what is the ideal way to implement
atomic min/max in libstdc++.

Would you prefer we check for the builtins using concepts (similar to fp
add/sub)? If so, the earlier questions in this thread still apply.

If not, since I do have a patch that implements the builtins in GCC, would you
prefer the builtins be called directly, without a concept check?

Patch is at:
https://gcc.gnu.org/pipermail/gcc-patches/2025-November/700786.html

Please let me know if the general approach for this is OK.

Thanks,
Soumya



On 11 Nov 2025, at 4:31 PM, Soumya AR <[email protected]> wrote:

Hi Jakub and Jonathan,

Pinging on this again to ask for confirmation.

As Matthew mentioned earlier, does it sound OK to have patches in this order:

- Have builtins used in libstdc++ (under SFINAE checks).
- Have builtins defined and emitting correct code for AArch64.
- Add the CAS pattern matching into an IFN.

We'd like to continue with the 2nd option that Jakub suggested — lowering the
generic builtin to an IFN, and converting that to a CAS loop after IPA if
the backend does not support a direct optab.

---

As for changes on the libstdc++ side, since we cannot emit a direct call to the
builtins and need to resolve these appropriately under SFINAE, how do we handle
implemnting the SFINAE check for __atomic_base?

I would like to ideally emit less calls to the compiler builtins and thus have
a single check for the builtin, the way it is currently implemented in
__atomic_impl.

What would be a neater way to tackle this?

- Forward declaration of just the min/max functions from __atomic_impl and
 resolve them using the __atomic_impl implementation for __atomic_base
- Move the SFINAE checks outside __atomic_impl. This messes with the current
 philosophy of __atomic_impl though, as I understand it
- Resolve the integer types using a separate SFINAE check. Would result in code
 duplication though
- Any other suggestion I'm missing?

Lastly, how do we guard these for C++26?

Thanks,
Soumya

On 7 Nov 2025, at 4:13 PM, Matthew Malcomson <[email protected]> wrote:



On 11/4/25 12:43, Jakub Jelinek wrote:
External email: Use caution opening links or attachments
On Tue, Nov 04, 2025 at 11:33:05AM +0000, Matthew Malcomson wrote:
We've been meaning to send this email for a while after the Cauldron.
IIUC you discussed this with Ramana there -- my understanding of what he
told me is that your main concern is with the explosion of builtins.
Yeah.
Similarly, what I understand is that if we reduce the number of builtins to
just the `__atomic_fetch_min`, `__atomic_fetch_max`, `__atomic_min_fetch`
and `__atomic_max_fetch` (which are the same builtins currently exposed by
clang) that seems more acceptable.

I wanted to double-check that this understanding of a conversation I'm
hearing second-hand is correct.
Yes.  Even min vs. max could be reduced to minmax with an extra argument
which determines which one it is, but I'm fine with those 4.
But just those type generic ones, not the _{1,2,4,8,16} suffixes thereof
etc., let the compiler figure out the signedness and size (and will it
handle floating point too or not?).

Great, sticking with those four works for us (and matches what clang has 
already implemented).

Yes -- we expect to make these four work for floating point too.

Also I'd like to gather opinion (from you but also anyone who has an
opinion) about the following implementation design decisions:
1) Should we still implement the libatomic functions?  And still with the
unsigned/signed distinction and all sizes?
- I'd expect so, mostly for the `-fno-inline-atomics` flag.
Do you really need that?  Can't you just emit a CAS loop for the non-inline
atomics?  Because the function explosion will be there again on the
libatomic side.  Or at least don't add such symbols on arches which are
never going to benefit from those right now (i.e. all the implementations
would be CAS loops anyway).  Some function explosion can be limited by only
having the __atomic_fetch_{min,max}_{s,u}{1,2,4,8,16} variants and not
the {min,max}_fetch ones, because that case can be implemented on the caller
side by just returning the DESIRED argument separately.

We don't need libatomic -- we were suggesting implementing it on the guess that 
others would like it but are happy to drop that work and emit a CAS loop from 
the compiler instead.

2) Earlier you floated the idea of using an internal function to encode the
operation through the rest of the compiler (outside of the frontend).  Does
that approach still seem good to you?
There are 2 options.
Lower the type-generic builtin into a CAS loop and pattern recognize it at
some late time (e.g. the widening_mul pass, certainly after IPA) into an IFN
if the corresponding optab is supported.
Or lower the type-generic builtin into IFN (ifns can have the min vs. max
argument and derive size and sign from the DESIRED argument) and at some
perhaps early (before IPA) point - forwprop? - pattern match a CAS loop into
the IFN too and then ideally shortly after IPA lower the IFN back into a CAS
loop if optab doesn't exist.
The reason for the pre vs. post-IPA is OpenMP/OpenACC, before IPA you don't
always know what the backend will be.

Ah -- I did not know about this OpenMP/OpenACC case to watch out for -- thanks 
for raising it.  I think that shows my implementation of the CAS loop expansion 
in my floating point fetch add/sub patches to be problematic.  (I had put it in 
the frontend because the decision about how to handle floating point exceptions 
etc seemed a frontend-specific decision).

------
Personally I like sound of lowering to an IFN before expanding it later on the 
principle that if we do start to miss anything with our pattern match we can at 
least ensure we don't miss anything the user explicitly provided as a 
fetch_min/fetch_max.
(I.e. the second option you suggested).

------
For clarity, in order to get some benefit in GCC 15 we have the following goals 
(ordered by increasing amount of code changes we need to write, also somewhat 
corresponding to how much we're aiming to get that in by GCC 15):
- Have builtins used in libstdc++ (under SFINAE checks).
- Have builtins defined and emitting correct code for AArch64.
- Add the CAS pattern matching into an IFN.

Based on the above preferences we'd like to see if you'd be happy with the 
first patchset to not include the CAS pattern matching.
- N.b. This assumes going with the second option you suggested of expanding the 
IFN into a CAS loop if optab doesn't exist rather than pattern matching if 
optab does exist.  Hence if you'd want the first option do mention.

Thanks again,
Matthew



Reply via email to