Hi Jonathan,

Gentle ping.

Thanks,
Soumya

> On 5 Dec 2025, at 2:46 PM, Soumya AR <[email protected]> wrote:
> 
> Hi Jonathan,
> 
> Pinging again for some clarification on what is the ideal way to implement
> atomic min/max in libstdc++.
> 
> Would you prefer we check for the builtins using concepts (similar to fp
> add/sub)? If so, the earlier questions in this thread still apply.
> 
> If not, since I do have a patch that implements the builtins in GCC, would you
> prefer the builtins be called directly, without a concept check?
> 
> Patch is at:
> https://gcc.gnu.org/pipermail/gcc-patches/2025-November/700786.html
> 
> Please let me know if the general approach for this is OK.
> 
> Thanks,
> Soumya
> 
>  
>> On 11 Nov 2025, at 4:31 PM, Soumya AR <[email protected]> wrote:
>> 
>> Hi Jakub and Jonathan,
>> 
>> Pinging on this again to ask for confirmation.
>> 
>> As Matthew mentioned earlier, does it sound OK to have patches in this 
>> order: 
>> 
>>> - Have builtins used in libstdc++ (under SFINAE checks).
>>> - Have builtins defined and emitting correct code for AArch64.
>>> - Add the CAS pattern matching into an IFN.
>> 
>> We'd like to continue with the 2nd option that Jakub suggested — lowering the
>> generic builtin to an IFN, and converting that to a CAS loop after IPA if 
>> the backend does not support a direct optab.
>> 
>> ---
>> 
>> As for changes on the libstdc++ side, since we cannot emit a direct call to 
>> the
>> builtins and need to resolve these appropriately under SFINAE, how do we 
>> handle
>> implemnting the SFINAE check for __atomic_base?
>> 
>> I would like to ideally emit less calls to the compiler builtins and thus 
>> have
>> a single check for the builtin, the way it is currently implemented in
>> __atomic_impl.
>> 
>> What would be a neater way to tackle this? 
>> 
>> - Forward declaration of just the min/max functions from __atomic_impl and
>>  resolve them using the __atomic_impl implementation for __atomic_base
>> - Move the SFINAE checks outside __atomic_impl. This messes with the current
>>  philosophy of __atomic_impl though, as I understand it
>> - Resolve the integer types using a separate SFINAE check. Would result in 
>> code
>>  duplication though
>> - Any other suggestion I'm missing?
>> 
>> Lastly, how do we guard these for C++26?
>> 
>> Thanks,
>> Soumya
>> 
>>> On 7 Nov 2025, at 4:13 PM, Matthew Malcomson <[email protected]> wrote:
>>> 
>>> 
>>> 
>>> On 11/4/25 12:43, Jakub Jelinek wrote:
>>>> External email: Use caution opening links or attachments
>>>> On Tue, Nov 04, 2025 at 11:33:05AM +0000, Matthew Malcomson wrote:
>>>>> We've been meaning to send this email for a while after the Cauldron.
>>>>> IIUC you discussed this with Ramana there -- my understanding of what he
>>>>> told me is that your main concern is with the explosion of builtins.
>>>> Yeah.
>>>>> Similarly, what I understand is that if we reduce the number of builtins 
>>>>> to
>>>>> just the `__atomic_fetch_min`, `__atomic_fetch_max`, `__atomic_min_fetch`
>>>>> and `__atomic_max_fetch` (which are the same builtins currently exposed by
>>>>> clang) that seems more acceptable.
>>>>> 
>>>>> I wanted to double-check that this understanding of a conversation I'm
>>>>> hearing second-hand is correct.
>>>> Yes.  Even min vs. max could be reduced to minmax with an extra argument
>>>> which determines which one it is, but I'm fine with those 4.
>>>> But just those type generic ones, not the _{1,2,4,8,16} suffixes thereof
>>>> etc., let the compiler figure out the signedness and size (and will it
>>>> handle floating point too or not?).
>>> 
>>> Great, sticking with those four works for us (and matches what clang has 
>>> already implemented).
>>> 
>>> Yes -- we expect to make these four work for floating point too.
>>> 
>>>>> Also I'd like to gather opinion (from you but also anyone who has an
>>>>> opinion) about the following implementation design decisions:
>>>>> 1) Should we still implement the libatomic functions?  And still with the
>>>>> unsigned/signed distinction and all sizes?
>>>>> - I'd expect so, mostly for the `-fno-inline-atomics` flag.
>>>> Do you really need that?  Can't you just emit a CAS loop for the non-inline
>>>> atomics?  Because the function explosion will be there again on the
>>>> libatomic side.  Or at least don't add such symbols on arches which are
>>>> never going to benefit from those right now (i.e. all the implementations
>>>> would be CAS loops anyway).  Some function explosion can be limited by only
>>>> having the __atomic_fetch_{min,max}_{s,u}{1,2,4,8,16} variants and not
>>>> the {min,max}_fetch ones, because that case can be implemented on the 
>>>> caller
>>>> side by just returning the DESIRED argument separately.
>>> 
>>> We don't need libatomic -- we were suggesting implementing it on the guess 
>>> that others would like it but are happy to drop that work and emit a CAS 
>>> loop from the compiler instead.
>>> 
>>>>> 2) Earlier you floated the idea of using an internal function to encode 
>>>>> the
>>>>> operation through the rest of the compiler (outside of the frontend).  
>>>>> Does
>>>>> that approach still seem good to you?
>>>> There are 2 options.
>>>> Lower the type-generic builtin into a CAS loop and pattern recognize it at
>>>> some late time (e.g. the widening_mul pass, certainly after IPA) into an 
>>>> IFN
>>>> if the corresponding optab is supported.
>>>> Or lower the type-generic builtin into IFN (ifns can have the min vs. max
>>>> argument and derive size and sign from the DESIRED argument) and at some
>>>> perhaps early (before IPA) point - forwprop? - pattern match a CAS loop 
>>>> into
>>>> the IFN too and then ideally shortly after IPA lower the IFN back into a 
>>>> CAS
>>>> loop if optab doesn't exist.
>>>> The reason for the pre vs. post-IPA is OpenMP/OpenACC, before IPA you don't
>>>> always know what the backend will be.
>>> 
>>> Ah -- I did not know about this OpenMP/OpenACC case to watch out for -- 
>>> thanks for raising it.  I think that shows my implementation of the CAS 
>>> loop expansion in my floating point fetch add/sub patches to be 
>>> problematic.  (I had put it in the frontend because the decision about how 
>>> to handle floating point exceptions etc seemed a frontend-specific 
>>> decision).
>>> 
>>> ------
>>> Personally I like sound of lowering to an IFN before expanding it later on 
>>> the principle that if we do start to miss anything with our pattern match 
>>> we can at least ensure we don't miss anything the user explicitly provided 
>>> as a fetch_min/fetch_max.
>>> (I.e. the second option you suggested).
>>> 
>>> ------
>>> For clarity, in order to get some benefit in GCC 15 we have the following 
>>> goals (ordered by increasing amount of code changes we need to write, also 
>>> somewhat corresponding to how much we're aiming to get that in by GCC 15):
>>> - Have builtins used in libstdc++ (under SFINAE checks).
>>> - Have builtins defined and emitting correct code for AArch64.
>>> - Add the CAS pattern matching into an IFN.
>>> 
>>> Based on the above preferences we'd like to see if you'd be happy with the 
>>> first patchset to not include the CAS pattern matching.
>>> - N.b. This assumes going with the second option you suggested of expanding 
>>> the IFN into a CAS loop if optab doesn't exist rather than pattern matching 
>>> if optab does exist.  Hence if you'd want the first option do mention.
>>> 
>>> Thanks again,
>>> Matthew
>> 
>> 
> 

Reply via email to