Hi Jakub and Jonathan,

Pinging on this again to ask for confirmation.

As Matthew mentioned earlier, does it sound OK to have patches in this order: 

> - Have builtins used in libstdc++ (under SFINAE checks).
> - Have builtins defined and emitting correct code for AArch64.
> - Add the CAS pattern matching into an IFN.

We'd like to continue with the 2nd option that Jakub suggested — lowering the
generic builtin to an IFN, and converting that to a CAS loop after IPA if 
the backend does not support a direct optab.

---

As for changes on the libstdc++ side, since we cannot emit a direct call to the
builtins and need to resolve these appropriately under SFINAE, how do we handle
implemnting the SFINAE check for __atomic_base?

I would like to ideally emit less calls to the compiler builtins and thus have
a single check for the builtin, the way it is currently implemented in
__atomic_impl.

What would be a neater way to tackle this? 

- Forward declaration of just the min/max functions from __atomic_impl and
  resolve them using the __atomic_impl implementation for __atomic_base
- Move the SFINAE checks outside __atomic_impl. This messes with the current
  philosophy of __atomic_impl though, as I understand it
- Resolve the integer types using a separate SFINAE check. Would result in code
  duplication though
- Any other suggestion I'm missing?

Lastly, how do we guard these for C++26?

Thanks,
Soumya

> On 7 Nov 2025, at 4:13 PM, Matthew Malcomson <[email protected]> wrote:
> 
> 
> 
> On 11/4/25 12:43, Jakub Jelinek wrote:
>> External email: Use caution opening links or attachments
>> On Tue, Nov 04, 2025 at 11:33:05AM +0000, Matthew Malcomson wrote:
>>> We've been meaning to send this email for a while after the Cauldron.
>>> IIUC you discussed this with Ramana there -- my understanding of what he
>>> told me is that your main concern is with the explosion of builtins.
>> Yeah.
>>> Similarly, what I understand is that if we reduce the number of builtins to
>>> just the `__atomic_fetch_min`, `__atomic_fetch_max`, `__atomic_min_fetch`
>>> and `__atomic_max_fetch` (which are the same builtins currently exposed by
>>> clang) that seems more acceptable.
>>> 
>>> I wanted to double-check that this understanding of a conversation I'm
>>> hearing second-hand is correct.
>> Yes.  Even min vs. max could be reduced to minmax with an extra argument
>> which determines which one it is, but I'm fine with those 4.
>> But just those type generic ones, not the _{1,2,4,8,16} suffixes thereof
>> etc., let the compiler figure out the signedness and size (and will it
>> handle floating point too or not?).
> 
> Great, sticking with those four works for us (and matches what clang has 
> already implemented).
> 
> Yes -- we expect to make these four work for floating point too.
> 
>>> Also I'd like to gather opinion (from you but also anyone who has an
>>> opinion) about the following implementation design decisions:
>>> 1) Should we still implement the libatomic functions?  And still with the
>>> unsigned/signed distinction and all sizes?
>>> - I'd expect so, mostly for the `-fno-inline-atomics` flag.
>> Do you really need that?  Can't you just emit a CAS loop for the non-inline
>> atomics?  Because the function explosion will be there again on the
>> libatomic side.  Or at least don't add such symbols on arches which are
>> never going to benefit from those right now (i.e. all the implementations
>> would be CAS loops anyway).  Some function explosion can be limited by only
>> having the __atomic_fetch_{min,max}_{s,u}{1,2,4,8,16} variants and not
>> the {min,max}_fetch ones, because that case can be implemented on the caller
>> side by just returning the DESIRED argument separately.
> 
> We don't need libatomic -- we were suggesting implementing it on the guess 
> that others would like it but are happy to drop that work and emit a CAS loop 
> from the compiler instead.
> 
>>> 2) Earlier you floated the idea of using an internal function to encode the
>>> operation through the rest of the compiler (outside of the frontend).  Does
>>> that approach still seem good to you?
>> There are 2 options.
>> Lower the type-generic builtin into a CAS loop and pattern recognize it at
>> some late time (e.g. the widening_mul pass, certainly after IPA) into an IFN
>> if the corresponding optab is supported.
>> Or lower the type-generic builtin into IFN (ifns can have the min vs. max
>> argument and derive size and sign from the DESIRED argument) and at some
>> perhaps early (before IPA) point - forwprop? - pattern match a CAS loop into
>> the IFN too and then ideally shortly after IPA lower the IFN back into a CAS
>> loop if optab doesn't exist.
>> The reason for the pre vs. post-IPA is OpenMP/OpenACC, before IPA you don't
>> always know what the backend will be.
> 
> Ah -- I did not know about this OpenMP/OpenACC case to watch out for -- 
> thanks for raising it.  I think that shows my implementation of the CAS loop 
> expansion in my floating point fetch add/sub patches to be problematic.  (I 
> had put it in the frontend because the decision about how to handle floating 
> point exceptions etc seemed a frontend-specific decision).
> 
> ------
> Personally I like sound of lowering to an IFN before expanding it later on 
> the principle that if we do start to miss anything with our pattern match we 
> can at least ensure we don't miss anything the user explicitly provided as a 
> fetch_min/fetch_max.
> (I.e. the second option you suggested).
> 
> ------
> For clarity, in order to get some benefit in GCC 15 we have the following 
> goals (ordered by increasing amount of code changes we need to write, also 
> somewhat corresponding to how much we're aiming to get that in by GCC 15):
> - Have builtins used in libstdc++ (under SFINAE checks).
> - Have builtins defined and emitting correct code for AArch64.
> - Add the CAS pattern matching into an IFN.
> 
> Based on the above preferences we'd like to see if you'd be happy with the 
> first patchset to not include the CAS pattern matching.
> - N.b. This assumes going with the second option you suggested of expanding 
> the IFN into a CAS loop if optab doesn't exist rather than pattern matching 
> if optab does exist.  Hence if you'd want the first option do mention.
> 
> Thanks again,
> Matthew


Reply via email to