https://github.com/xroche updated https://github.com/llvm/llvm-project/pull/204815
>From 5c7fc853303e60e1c39d6ea49d5fa6ad445abc16 Mon Sep 17 00:00:00 2001 From: Xavier Roche <[email protected]> Date: Fri, 19 Jun 2026 13:59:45 +0200 Subject: [PATCH 1/7] [Clang][POC] Atomic operations on _BitInt(N) _BitInt(N) was rejected by every atomic path: the _Atomic(...) type specifier, the __c11_atomic_*/__atomic_* builtins, and transitively std::atomic. Two blanket isBitIntType() checks disabled it, dating to the type's introduction (the __atomic builtin half is D84049). __int128, the closest analogue, was allowed at both sites. Lift both rejections so _BitInt flows through the normal integer path. load, store, exchange, compare-exchange, and bitwise read-modify-write are then correct at every width through the existing canonicalizing store and the libcall fallback. Arithmetic read-modify-write needs more care. A single atomicrmw on the padded memory integer of a non-byte-aligned width (e.g. _BitInt(37) in an i64) carries into the padding bits, leaving a non-canonical value that breaks a later compare-exchange and gives wrong signed min/max. A wide arithmetic fetch (e.g. _BitInt(256)) hit an llvm_unreachable in the libcall path, a compiler crash. Both are fixed by emitting a compare-exchange loop that computes the new value at width N via llvm::buildAtomicRMWValue and writes back a canonical representation, reusing the existing EmitAtomicCompareExchange helper, which selects the inline cmpxchg or the __atomic_compare_exchange libcall by size. No-padding inline widths (64, 128) keep the direct atomicrmw fast path. The libc++ bit-int.verify.cpp test only asserted that Clang rejection, so it is removed as obsolete. It is not replaced with a positive test here: the libc++ premerge matrix builds tests with a pinned clang that predates this change. Whether libc++ should expose atomic _BitInt is a separate design question for the P3666R4 discussion. Verified against gcc-14: identical size and alignment for all widths, and cross-compiler compare-exchange interop in both directions, confirming the padding canonicalization matches. Assisted-by: Claude (Anthropic) Co-Authored-By: Claude Opus 4.6 <[email protected]> --- clang/docs/LanguageExtensions.rst | 9 + clang/docs/ReleaseNotes.rst | 6 + clang/lib/CodeGen/CGAtomic.cpp | 211 ++++++++++++++++++ clang/lib/Sema/SemaChecking.cpp | 5 - clang/lib/Sema/SemaType.cpp | 2 - clang/test/CodeGen/atomic-bitint.c | 90 ++++++++ clang/test/Sema/builtins.c | 4 +- clang/test/SemaCXX/ext-int.cpp | 10 +- libcxx/test/libcxx/atomics/bit-int.verify.cpp | 22 -- 9 files changed, 322 insertions(+), 37 deletions(-) create mode 100644 clang/test/CodeGen/atomic-bitint.c delete mode 100644 libcxx/test/libcxx/atomics/bit-int.verify.cpp diff --git a/clang/docs/LanguageExtensions.rst b/clang/docs/LanguageExtensions.rst index d79d82a175c68..5ff076d3e48ad 100644 --- a/clang/docs/LanguageExtensions.rst +++ b/clang/docs/LanguageExtensions.rst @@ -451,6 +451,15 @@ favor of the standard type. Note: the ABI for ``_BitInt(N)`` is still in the process of being stabilized, so this type should not yet be used in interfaces that require ABI stability. +``_BitInt(N)`` may be used as an atomic type: ``_Atomic(_BitInt(N))``, the +``__c11_atomic_*`` and ``__atomic_*`` builtins, and ``std::atomic`` all accept +it for any width. Widths the target cannot operate on inline are lowered to the +``__atomic_*`` libcalls. For a width whose representation has padding bits (``N`` +not a multiple of the type's alignment, e.g. ``_BitInt(37)``), arithmetic +read-modify-write operations are emitted as a compare-exchange loop that computes +at width ``N``, so the result wraps modulo ``2**N`` and the padding bits stay +canonical. + C keywords supported in all language modes ------------------------------------------ diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst index 7f056abfbbe24..8692da8830dff 100644 --- a/clang/docs/ReleaseNotes.rst +++ b/clang/docs/ReleaseNotes.rst @@ -265,6 +265,12 @@ Non-comprehensive list of changes in this release - Added support for floating point and pointer values in most ``__atomic_`` builtins. +- Atomic operations on ``_BitInt(N)`` are now supported, including + ``_Atomic(_BitInt(N))``, the ``__c11_atomic_*`` / ``__atomic_*`` builtins, and + ``std::atomic``. Widths the target cannot operate on inline use the + ``__atomic_*`` libcalls; arithmetic read-modify-write on a width with padding + bits is emitted as a compare-exchange loop computing at the value width. + - Added ``__builtin_stdc_rotate_left`` and ``__builtin_stdc_rotate_right`` for bit rotation of unsigned integers including ``_BitInt`` types. Rotation counts are normalized modulo the bit-width and support negative values. diff --git a/clang/lib/CodeGen/CGAtomic.cpp b/clang/lib/CodeGen/CGAtomic.cpp index 270965b109943..66c059fd40e26 100644 --- a/clang/lib/CodeGen/CGAtomic.cpp +++ b/clang/lib/CodeGen/CGAtomic.cpp @@ -21,6 +21,7 @@ #include "llvm/ADT/DenseMap.h" #include "llvm/IR/DataLayout.h" #include "llvm/IR/Intrinsics.h" +#include "llvm/Transforms/Utils/LowerAtomic.h" using namespace clang; using namespace CodeGen; @@ -558,6 +559,195 @@ static llvm::Value *EmitPostAtomicMinMax(CGBuilderTy &Builder, return Builder.CreateSelect(Cmp, OldVal, RHS, "newval"); } +/// Classify an atomic op as an arithmetic/bitwise read-modify-write (one that +/// normally lowers to a single `atomicrmw`), mapping it to the matching +/// `AtomicRMWInst::BinOp` and reporting whether the builtin returns the new +/// value (`<op>_fetch`) rather than the old value (`fetch_<op>`). \p IsSigned +/// selects signed vs unsigned min/max. Returns false for exchange, load, store, +/// compare-exchange, and any non-RMW op, none of which need the _BitInt loop. +static bool classifyBitIntRMW(AtomicExpr::AtomicOp Op, bool IsSigned, + llvm::AtomicRMWInst::BinOp &BinOp, + bool &ReturnsNew) { + using RMW = llvm::AtomicRMWInst; + switch (Op) { + case AtomicExpr::AO__c11_atomic_fetch_add: + case AtomicExpr::AO__hip_atomic_fetch_add: + case AtomicExpr::AO__opencl_atomic_fetch_add: + case AtomicExpr::AO__atomic_fetch_add: + case AtomicExpr::AO__scoped_atomic_fetch_add: + BinOp = RMW::Add, ReturnsNew = false; + return true; + case AtomicExpr::AO__atomic_add_fetch: + case AtomicExpr::AO__scoped_atomic_add_fetch: + BinOp = RMW::Add, ReturnsNew = true; + return true; + case AtomicExpr::AO__c11_atomic_fetch_sub: + case AtomicExpr::AO__hip_atomic_fetch_sub: + case AtomicExpr::AO__opencl_atomic_fetch_sub: + case AtomicExpr::AO__atomic_fetch_sub: + case AtomicExpr::AO__scoped_atomic_fetch_sub: + BinOp = RMW::Sub, ReturnsNew = false; + return true; + case AtomicExpr::AO__atomic_sub_fetch: + case AtomicExpr::AO__scoped_atomic_sub_fetch: + BinOp = RMW::Sub, ReturnsNew = true; + return true; + case AtomicExpr::AO__c11_atomic_fetch_and: + case AtomicExpr::AO__hip_atomic_fetch_and: + case AtomicExpr::AO__opencl_atomic_fetch_and: + case AtomicExpr::AO__atomic_fetch_and: + case AtomicExpr::AO__scoped_atomic_fetch_and: + BinOp = RMW::And, ReturnsNew = false; + return true; + case AtomicExpr::AO__atomic_and_fetch: + case AtomicExpr::AO__scoped_atomic_and_fetch: + BinOp = RMW::And, ReturnsNew = true; + return true; + case AtomicExpr::AO__c11_atomic_fetch_or: + case AtomicExpr::AO__hip_atomic_fetch_or: + case AtomicExpr::AO__opencl_atomic_fetch_or: + case AtomicExpr::AO__atomic_fetch_or: + case AtomicExpr::AO__scoped_atomic_fetch_or: + BinOp = RMW::Or, ReturnsNew = false; + return true; + case AtomicExpr::AO__atomic_or_fetch: + case AtomicExpr::AO__scoped_atomic_or_fetch: + BinOp = RMW::Or, ReturnsNew = true; + return true; + case AtomicExpr::AO__c11_atomic_fetch_xor: + case AtomicExpr::AO__hip_atomic_fetch_xor: + case AtomicExpr::AO__opencl_atomic_fetch_xor: + case AtomicExpr::AO__atomic_fetch_xor: + case AtomicExpr::AO__scoped_atomic_fetch_xor: + BinOp = RMW::Xor, ReturnsNew = false; + return true; + case AtomicExpr::AO__atomic_xor_fetch: + case AtomicExpr::AO__scoped_atomic_xor_fetch: + BinOp = RMW::Xor, ReturnsNew = true; + return true; + case AtomicExpr::AO__c11_atomic_fetch_nand: + case AtomicExpr::AO__atomic_fetch_nand: + case AtomicExpr::AO__scoped_atomic_fetch_nand: + BinOp = RMW::Nand, ReturnsNew = false; + return true; + case AtomicExpr::AO__atomic_nand_fetch: + case AtomicExpr::AO__scoped_atomic_nand_fetch: + BinOp = RMW::Nand, ReturnsNew = true; + return true; + case AtomicExpr::AO__c11_atomic_fetch_min: + case AtomicExpr::AO__hip_atomic_fetch_min: + case AtomicExpr::AO__opencl_atomic_fetch_min: + case AtomicExpr::AO__atomic_fetch_min: + case AtomicExpr::AO__scoped_atomic_fetch_min: + BinOp = IsSigned ? RMW::Min : RMW::UMin, ReturnsNew = false; + return true; + case AtomicExpr::AO__atomic_min_fetch: + case AtomicExpr::AO__scoped_atomic_min_fetch: + BinOp = IsSigned ? RMW::Min : RMW::UMin, ReturnsNew = true; + return true; + case AtomicExpr::AO__c11_atomic_fetch_max: + case AtomicExpr::AO__hip_atomic_fetch_max: + case AtomicExpr::AO__opencl_atomic_fetch_max: + case AtomicExpr::AO__atomic_fetch_max: + case AtomicExpr::AO__scoped_atomic_fetch_max: + BinOp = IsSigned ? RMW::Max : RMW::UMax, ReturnsNew = false; + return true; + case AtomicExpr::AO__atomic_max_fetch: + case AtomicExpr::AO__scoped_atomic_max_fetch: + BinOp = IsSigned ? RMW::Max : RMW::UMax, ReturnsNew = true; + return true; + case AtomicExpr::AO__atomic_fetch_uinc: + case AtomicExpr::AO__scoped_atomic_fetch_uinc: + BinOp = RMW::UIncWrap, ReturnsNew = false; + return true; + case AtomicExpr::AO__atomic_fetch_udec: + case AtomicExpr::AO__scoped_atomic_fetch_udec: + BinOp = RMW::UDecWrap, ReturnsNew = false; + return true; + default: + return false; + } +} + +/// True for a `_BitInt(N)` whose value width N differs from its in-memory width +/// (e.g. `_BitInt(37)` occupies 64 bits), so the high bits are padding. +static bool hasBitIntPadding(QualType T, const ASTContext &C) { + if (const auto *BIT = T->getAs<BitIntType>()) + return BIT->getNumBits() != C.getTypeSize(T); + return false; +} + +/// Map a constant C ABI memory order to an llvm ordering. A non-constant order +/// is handled conservatively with the strongest ordering. +static llvm::AtomicOrdering atomicOrderOrSeqCst(llvm::Value *Order) { + auto *C = dyn_cast<llvm::ConstantInt>(Order); + if (!C || !llvm::isValidAtomicOrderingCABI(C->getZExtValue())) + return llvm::AtomicOrdering::SequentiallyConsistent; + switch ((llvm::AtomicOrderingCABI)C->getZExtValue()) { + case llvm::AtomicOrderingCABI::relaxed: + return llvm::AtomicOrdering::Monotonic; + case llvm::AtomicOrderingCABI::consume: + case llvm::AtomicOrderingCABI::acquire: + return llvm::AtomicOrdering::Acquire; + case llvm::AtomicOrderingCABI::release: + return llvm::AtomicOrdering::Release; + case llvm::AtomicOrderingCABI::acq_rel: + return llvm::AtomicOrdering::AcquireRelease; + case llvm::AtomicOrderingCABI::seq_cst: + return llvm::AtomicOrdering::SequentiallyConsistent; + } + llvm_unreachable("invalid CABI ordering"); +} + +/// Emit a `_BitInt(N)` atomic read-modify-write as a compare-exchange loop. A +/// single `atomicrmw` on the padded memory integer would carry into / compare +/// the padding bits, and no arbitrary-width `__atomic_fetch_*` libcall exists +/// for wide widths. The loop computes the new value at width N and writes back +/// a canonical (extended) representation via the existing cmpxchg helper, which +/// also picks the inline-vs-libcall form by size. +static RValue emitBitIntAtomicRMWLoop(CodeGenFunction &CGF, AtomicExpr *E, + Address Ptr, Address Val1, + QualType AtomicTy, + llvm::AtomicRMWInst::BinOp BinOp, + bool ReturnsNew, llvm::Value *Order) { + QualType ValTy = E->getValueType(); + llvm::AtomicOrdering AO = atomicOrderOrSeqCst(Order); + llvm::AtomicOrdering Failure = + llvm::AtomicCmpXchgInst::getStrongestFailureOrdering(AO); + + LValue AtomicLVal = CGF.MakeAddrLValue(Ptr, AtomicTy); + AtomicInfo Atomics(CGF, AtomicLVal); + + llvm::Value *RHS = + CGF.EmitLoadOfScalar(CGF.MakeAddrLValue(Val1, ValTy), E->getExprLoc()); + + RValue OldRV = Atomics.EmitAtomicLoad( + AggValueSlot::ignored(), E->getExprLoc(), + /*AsValue=*/true, llvm::AtomicOrdering::Monotonic, E->isVolatile()); + llvm::Value *Init = OldRV.getScalarVal(); + + llvm::BasicBlock *StartBB = CGF.Builder.GetInsertBlock(); + llvm::BasicBlock *LoopBB = CGF.createBasicBlock("atomicrmw.start", CGF.CurFn); + llvm::BasicBlock *EndBB = CGF.createBasicBlock("atomicrmw.end", CGF.CurFn); + CGF.Builder.CreateBr(LoopBB); + CGF.Builder.SetInsertPoint(LoopBB); + + llvm::PHINode *Old = CGF.Builder.CreatePHI(Init->getType(), 2); + Old->addIncoming(Init, StartBB); + + // Compute at the value width via the canonical RMW lowering, so the result + // wraps mod 2^N and never touches the padding bits. + llvm::Value *New = llvm::buildAtomicRMWValue(BinOp, CGF.Builder, Old, RHS); + + auto Res = Atomics.EmitAtomicCompareExchange( + RValue::get(Old), RValue::get(New), AO, Failure, /*IsWeak=*/true); + Old->addIncoming(Res.first.getScalarVal(), CGF.Builder.GetInsertBlock()); + CGF.Builder.CreateCondBr(Res.second, EndBB, LoopBB); + + CGF.Builder.SetInsertPoint(EndBB); + return RValue::get(ReturnsNew ? New : static_cast<llvm::Value *>(Old)); +} + static void EmitAtomicOp(CodeGenFunction &CGF, AtomicExpr *E, Address Dest, Address Ptr, Address Val1, Address Val2, Address ExpectedResult, llvm::Value *IsWeak, @@ -1109,6 +1299,27 @@ RValue CodeGenFunction::EmitAtomicExpr(AtomicExpr *E) { LValue AtomicVal = MakeAddrLValue(Ptr, AtomicTy); AtomicInfo Atomics(*this, AtomicVal); + // A `_BitInt(N)` read-modify-write whose value width has padding bits, or + // whose size forces a libcall, cannot use a single atomicrmw: the op would + // carry into / compare the padding bits, and no arbitrary-width + // __atomic_fetch_* libcall exists. Emit a compare-exchange loop instead. + // Bitwise and/or/xor are exact even with padding, so only the wide case needs + // the loop for them. load/store/exchange/compare_exchange keep their paths. + if (MemTy->isBitIntType()) { + llvm::AtomicRMWInst::BinOp BinOp; + bool RMWReturnsNew; + if (classifyBitIntRMW(E->getOp(), MemTy->isSignedIntegerType(), BinOp, + RMWReturnsNew)) { + bool WideOrNonPow2 = (Size & (Size - 1)) != 0 || Size > 16; + bool Bitwise = BinOp == llvm::AtomicRMWInst::And || + BinOp == llvm::AtomicRMWInst::Or || + BinOp == llvm::AtomicRMWInst::Xor; + if (WideOrNonPow2 || (hasBitIntPadding(MemTy, getContext()) && !Bitwise)) + return emitBitIntAtomicRMWLoop(*this, E, Ptr, Val1, AtomicTy, BinOp, + RMWReturnsNew, Order); + } + } + Address OriginalVal1 = Val1; if (ShouldCastToIntPtrTy) { Ptr = Atomics.castToAtomicIntPointer(Ptr); diff --git a/clang/lib/Sema/SemaChecking.cpp b/clang/lib/Sema/SemaChecking.cpp index b8a3f48a32f24..874ce2bf1ce3a 100644 --- a/clang/lib/Sema/SemaChecking.cpp +++ b/clang/lib/Sema/SemaChecking.cpp @@ -5460,11 +5460,6 @@ ExprResult Sema::BuildAtomicExpr(SourceRange CallRange, SourceRange ExprRange, ? 0 : 1); - if (ValType->isBitIntType()) { - Diag(Ptr->getExprLoc(), diag::err_atomic_builtin_bit_int_prohibit); - return ExprError(); - } - return AE; } diff --git a/clang/lib/Sema/SemaType.cpp b/clang/lib/Sema/SemaType.cpp index d2bb312feadc1..4a3506c281acf 100644 --- a/clang/lib/Sema/SemaType.cpp +++ b/clang/lib/Sema/SemaType.cpp @@ -10412,8 +10412,6 @@ QualType Sema::BuildAtomicType(QualType T, SourceLocation Loc) { else if (!T.isTriviallyCopyableType(Context) && getLangOpts().CPlusPlus) // Some other non-trivially-copyable type (probably a C++ class) DisallowedKind = 7; - else if (T->isBitIntType()) - DisallowedKind = 8; else if (getLangOpts().C23 && T->isUndeducedAutoType()) // _Atomic auto is prohibited in C23 DisallowedKind = 9; diff --git a/clang/test/CodeGen/atomic-bitint.c b/clang/test/CodeGen/atomic-bitint.c new file mode 100644 index 0000000000000..358b530e8a792 --- /dev/null +++ b/clang/test/CodeGen/atomic-bitint.c @@ -0,0 +1,90 @@ +// RUN: %clang_cc1 -std=c23 -triple x86_64-unknown-linux-gnu -emit-llvm %s -o - | FileCheck %s +// +// Atomic operations on _BitInt(N). load/store/exchange/compare-exchange and +// bitwise RMW lower directly; arithmetic RMW on a padded width and any RMW on a +// width too wide for an inline atomicrmw lower to a compare-exchange loop that +// computes at the value width. + +typedef _BitInt(37) S37; +typedef unsigned _BitInt(37) U37; +typedef _BitInt(64) S64; +typedef _BitInt(128) S128; +typedef _BitInt(256) S256; + +// CHECK-LABEL: @ld37( +// CHECK: load atomic i64 +S37 ld37(_Atomic(S37) *p) { return __c11_atomic_load(p, __ATOMIC_SEQ_CST); } + +// CHECK-LABEL: @st37( +// CHECK: store atomic i64 +void st37(_Atomic(S37) *p, S37 v) { __c11_atomic_store(p, v, __ATOMIC_SEQ_CST); } + +// CHECK-LABEL: @xchg37( +// CHECK: atomicrmw xchg ptr {{.*}} i64 +S37 xchg37(_Atomic(S37) *p, S37 v) { + return __c11_atomic_exchange(p, v, __ATOMIC_SEQ_CST); +} + +// CHECK-LABEL: @cas37( +// CHECK: cmpxchg ptr {{.*}} i64 +_Bool cas37(_Atomic(S37) *p, S37 *e, S37 d) { + return __c11_atomic_compare_exchange_strong(p, e, d, __ATOMIC_SEQ_CST, + __ATOMIC_SEQ_CST); +} + +// Bitwise RMW on a padded width keeps the direct atomicrmw: it is exact. +// CHECK-LABEL: @and37( +// CHECK: atomicrmw and ptr {{.*}} i64 +// CHECK-NOT: cmpxchg +S37 and37(_Atomic(S37) *p, S37 v) { + return __c11_atomic_fetch_and(p, v, __ATOMIC_SEQ_CST); +} + +// Arithmetic RMW on a padded width becomes a compare-exchange loop, not a bare +// atomicrmw that would carry into the padding bits. +// CHECK-LABEL: @add37( +// CHECK: atomicrmw.start: +// CHECK: cmpxchg weak ptr {{.*}} i64 +// CHECK-NOT: atomicrmw add +S37 add37(_Atomic(S37) *p, S37 v) { + return __c11_atomic_fetch_add(p, v, __ATOMIC_SEQ_CST); +} + +// Signed min is computed at the value width, so the sign bit is at bit N-1. +// CHECK-LABEL: @min37( +// CHECK: icmp sle i37 +// CHECK: select i1 +// CHECK: cmpxchg weak ptr {{.*}} i64 +U37 min37(_Atomic(S37) *p, S37 v) { + return __c11_atomic_fetch_min(p, v, __ATOMIC_SEQ_CST); +} + +// No padding: direct atomicrmw, no loop. +// CHECK-LABEL: @add64( +// CHECK: atomicrmw add ptr {{.*}} i64 +// CHECK-NOT: cmpxchg +S64 add64(_Atomic(S64) *p, S64 v) { + return __c11_atomic_fetch_add(p, v, __ATOMIC_SEQ_CST); +} + +// CHECK-LABEL: @add128( +// CHECK: atomicrmw add ptr {{.*}} i128 +S128 add128(_Atomic(S128) *p, S128 v) { + return __c11_atomic_fetch_add(p, v, __ATOMIC_SEQ_CST); +} + +// Wide: no inline atomicrmw and no arbitrary-width __atomic_fetch_add libcall, +// so the loop calls __atomic_compare_exchange. +// CHECK-LABEL: @add256( +// CHECK: call {{.*}}@__atomic_compare_exchange +// CHECK-NOT: cmpxchg +S256 add256(_Atomic(S256) *p, S256 v) { + return __c11_atomic_fetch_add(p, v, __ATOMIC_SEQ_CST); +} + +// Wide bitwise also needs the loop: the wide path has no inline atomicrmw. +// CHECK-LABEL: @or256( +// CHECK: call {{.*}}@__atomic_compare_exchange +S256 or256(_Atomic(S256) *p, S256 v) { + return __c11_atomic_fetch_or(p, v, __ATOMIC_SEQ_CST); +} diff --git a/clang/test/Sema/builtins.c b/clang/test/Sema/builtins.c index b669ee68cdd95..57e0eefdb772b 100644 --- a/clang/test/Sema/builtins.c +++ b/clang/test/Sema/builtins.c @@ -281,7 +281,7 @@ void test_ei_i42i(_BitInt(42) *ptr, int value) { // expected-warning@+1 {{the semantics of this intrinsic changed with GCC version 4.4 - the newer semantics are provided here}} __sync_nand_and_fetch(ptr, value); // expected-error {{atomic memory operand must have a power-of-two size}} - __atomic_fetch_add(ptr, 1, 0); // expected-error {{argument to atomic builtin of type '_BitInt' is not supported}} + __atomic_fetch_add(ptr, 1, 0); // expect success: the GNU atomic builtins support _BitInt } void test_ei_i64i(_BitInt(64) *ptr, int value) { @@ -289,7 +289,7 @@ void test_ei_i64i(_BitInt(64) *ptr, int value) { // expected-warning@+1 {{the semantics of this intrinsic changed with GCC version 4.4 - the newer semantics are provided here}} __sync_nand_and_fetch(ptr, value); // expect success - __atomic_fetch_add(ptr, 1, 0); // expected-error {{argument to atomic builtin of type '_BitInt' is not supported}} + __atomic_fetch_add(ptr, 1, 0); // expect success } void test_ei_ii42(int *ptr, _BitInt(42) value) { diff --git a/clang/test/SemaCXX/ext-int.cpp b/clang/test/SemaCXX/ext-int.cpp index 281ae3d3c1779..f62a07a84200e 100644 --- a/clang/test/SemaCXX/ext-int.cpp +++ b/clang/test/SemaCXX/ext-int.cpp @@ -121,13 +121,11 @@ _Complex _BitInt(3) Cmplx; // expected-error@+1{{'_Complex _BitInt' is invalid}} typedef _Complex _BitInt(3) Cmp; -// Reject cases of _Atomic: -// expected-error@+1{{_Atomic cannot be applied to integer type '_BitInt(4)'}} -_Atomic _BitInt(4) TooSmallAtomic; -// expected-error@+1{{_Atomic cannot be applied to integer type '_BitInt(9)'}} +// _Atomic accepts any _BitInt width: small and non-power-of-2 included. +// Sizes the target cannot lower inline use the __atomic_* libcalls. +_Atomic _BitInt(4) SmallAtomic; _Atomic _BitInt(9) NotPow2Atomic; -// expected-error@+1{{_Atomic cannot be applied to integer type '_BitInt(128)'}} -_Atomic _BitInt(128) JustRightAtomic; +_Atomic _BitInt(128) WideAtomic; // Test result types of Unary/Bitwise/Binary Operations: void Ops() { diff --git a/libcxx/test/libcxx/atomics/bit-int.verify.cpp b/libcxx/test/libcxx/atomics/bit-int.verify.cpp deleted file mode 100644 index 03880a1b6215c..0000000000000 --- a/libcxx/test/libcxx/atomics/bit-int.verify.cpp +++ /dev/null @@ -1,22 +0,0 @@ -//===----------------------------------------------------------------------===// -// -// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. -// See https://llvm.org/LICENSE.txt for license information. -// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception -// -//===----------------------------------------------------------------------===// - -// <atomic> - -// Make sure that `std::atomic` doesn't work with `_BitInt`. The intent is to -// disable them for now until their behavior can be designed better later. -// See https://reviews.llvm.org/D84049 for details. - -// UNSUPPORTED: c++03 - -#include <atomic> - -void f() { - // expected-error@*:*1 {{_Atomic cannot be applied to integer type '_BitInt(32)'}} - std::atomic<_BitInt(32)> x(42); -} >From 92e21eaa61bf0f26c3ba825545845f531d79c4d6 Mon Sep 17 00:00:00 2001 From: Xavier Roche <[email protected]> Date: Tue, 23 Jun 2026 11:53:49 +0200 Subject: [PATCH 2/7] [Clang][POC] Add C23/C2y Sema tests for atomic _BitInt(N) C23 requires the type-generic atomic interfaces to accept _BitInt(N), so _Atomic(_BitInt(N)) is well-formed at every width. Add a Sema acceptance test covering the _Atomic specifier and the __c11_atomic_*/__atomic_* builtins in C23 and C2y modes, and a -std=c2y run of the codegen test. Assisted-by: Claude (Anthropic) Co-Authored-By: Claude Opus 4.6 <[email protected]> --- clang/test/CodeGen/atomic-bitint.c | 1 + clang/test/Sema/atomic-bitint.c | 40 ++++++++++++++++++++++++++++++ 2 files changed, 41 insertions(+) create mode 100644 clang/test/Sema/atomic-bitint.c diff --git a/clang/test/CodeGen/atomic-bitint.c b/clang/test/CodeGen/atomic-bitint.c index 358b530e8a792..9fa259776bf62 100644 --- a/clang/test/CodeGen/atomic-bitint.c +++ b/clang/test/CodeGen/atomic-bitint.c @@ -1,4 +1,5 @@ // RUN: %clang_cc1 -std=c23 -triple x86_64-unknown-linux-gnu -emit-llvm %s -o - | FileCheck %s +// RUN: %clang_cc1 -std=c2y -triple x86_64-unknown-linux-gnu -emit-llvm %s -o - | FileCheck %s // // Atomic operations on _BitInt(N). load/store/exchange/compare-exchange and // bitwise RMW lower directly; arithmetic RMW on a padded width and any RMW on a diff --git a/clang/test/Sema/atomic-bitint.c b/clang/test/Sema/atomic-bitint.c new file mode 100644 index 0000000000000..1466412bae732 --- /dev/null +++ b/clang/test/Sema/atomic-bitint.c @@ -0,0 +1,40 @@ +// RUN: %clang_cc1 %s -fsyntax-only -verify -triple x86_64-unknown-linux-gnu -std=c23 +// RUN: %clang_cc1 %s -fsyntax-only -verify -triple x86_64-unknown-linux-gnu -std=c2y +// +// C23 requires the type-generic atomic interfaces to accept _BitInt(N) for +// every N, so _Atomic(_BitInt(N)) is well-formed at every width. Widths past +// 128 are x86-only. + +// expected-no-diagnostics + +_Atomic(_BitInt(4)) a4; // small +_Atomic(_BitInt(9)) a9; // non-power-of-two +_Atomic(_BitInt(37)) a37; // padded +_Atomic(_BitInt(64)) a64; +_Atomic(_BitInt(128)) a128; +_Atomic(_BitInt(256)) a256; // wider than any inline atomic + +// The _Atomic qualifier spelling is equally valid. +_Atomic _BitInt(9) q9; + +static_assert(sizeof(_Atomic(_BitInt(37))) == 8); +static_assert(sizeof(_Atomic(_BitInt(128))) == 16); +static_assert(sizeof(_Atomic(_BitInt(256))) == 32); + +void c11_builtins(_Atomic(_BitInt(37)) *p, _BitInt(37) v, _BitInt(37) *e) { + (void)__c11_atomic_load(p, __ATOMIC_SEQ_CST); + __c11_atomic_store(p, v, __ATOMIC_SEQ_CST); + (void)__c11_atomic_exchange(p, v, __ATOMIC_SEQ_CST); + (void)__c11_atomic_compare_exchange_strong(p, e, v, __ATOMIC_SEQ_CST, + __ATOMIC_SEQ_CST); + (void)__c11_atomic_fetch_add(p, v, __ATOMIC_SEQ_CST); + (void)__c11_atomic_fetch_and(p, v, __ATOMIC_SEQ_CST); + (void)__c11_atomic_fetch_min(p, v, __ATOMIC_SEQ_CST); +} + +// The GNU __atomic_* builtins take a plain _BitInt pointer. +void gnu_builtins(_BitInt(37) *p, _BitInt(37) v) { + (void)__atomic_load_n(p, __ATOMIC_SEQ_CST); + __atomic_store_n(p, v, __ATOMIC_SEQ_CST); + (void)__atomic_fetch_add(p, v, __ATOMIC_SEQ_CST); +} >From c0e011fb61086a5a0cae13c54abe21ab312e85c7 Mon Sep 17 00:00:00 2001 From: Xavier Roche <[email protected]> Date: Tue, 23 Jun 2026 13:51:48 +0200 Subject: [PATCH 3/7] [Clang][POC] Extend atomic _BitInt Sema test: width 4096 + RISC-V Add a _BitInt(4096) acceptance case and a riscv64 RUN line. The atomic code imposes no width cap of its own, so the only limit is the target's getMaxBitIntWidth(); x86 and RISC-V allow widths past 128, others cap at 128. Correct the comment that claimed wide widths were x86-only. Assisted-by: Claude (Anthropic) Co-Authored-By: Claude Opus 4.6 <[email protected]> --- clang/test/Sema/atomic-bitint.c | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-) diff --git a/clang/test/Sema/atomic-bitint.c b/clang/test/Sema/atomic-bitint.c index 1466412bae732..fbb4c518438fb 100644 --- a/clang/test/Sema/atomic-bitint.c +++ b/clang/test/Sema/atomic-bitint.c @@ -1,18 +1,21 @@ // RUN: %clang_cc1 %s -fsyntax-only -verify -triple x86_64-unknown-linux-gnu -std=c23 // RUN: %clang_cc1 %s -fsyntax-only -verify -triple x86_64-unknown-linux-gnu -std=c2y +// RUN: %clang_cc1 %s -fsyntax-only -verify -triple riscv64-unknown-linux-gnu -std=c23 // // C23 requires the type-generic atomic interfaces to accept _BitInt(N) for -// every N, so _Atomic(_BitInt(N)) is well-formed at every width. Widths past -// 128 are x86-only. +// every N, so _Atomic(_BitInt(N)) is well-formed at every width. The atomic +// code imposes no width cap of its own; widths past 128 are available wherever +// the target accepts _BitInt > 128 (x86 and RISC-V today). // expected-no-diagnostics -_Atomic(_BitInt(4)) a4; // small -_Atomic(_BitInt(9)) a9; // non-power-of-two -_Atomic(_BitInt(37)) a37; // padded -_Atomic(_BitInt(64)) a64; -_Atomic(_BitInt(128)) a128; -_Atomic(_BitInt(256)) a256; // wider than any inline atomic +_Atomic(_BitInt(4)) a4; // small +_Atomic(_BitInt(9)) a9; // non-power-of-two +_Atomic(_BitInt(37)) a37; // padded +_Atomic(_BitInt(64)) a64; +_Atomic(_BitInt(128)) a128; +_Atomic(_BitInt(256)) a256; // wider than any inline atomic +_Atomic(_BitInt(4096)) a4096; // far past the inline range // The _Atomic qualifier spelling is equally valid. _Atomic _BitInt(9) q9; >From 92e2068b4837befaa51c040c0c882ea7f10cd57f Mon Sep 17 00:00:00 2001 From: Xavier Roche <[email protected]> Date: Fri, 26 Jun 2026 21:21:00 +0200 Subject: [PATCH 4/7] [Clang][NFC] Address review: atomic _BitInt diagnostic list and cast Drop the now-unreachable _BitInt ("integer") option from err_atomic_specifier_bad_type and renumber the _Atomic-auto/C23 case to fill the gap; the rendered text is unchanged. Use static_cast instead of a C-style cast in atomicOrderOrSeqCst. Assisted-by: Claude (Anthropic) Co-Authored-By: Claude Opus 4.6 <[email protected]> --- clang/include/clang/Basic/DiagnosticSemaKinds.td | 4 ++-- clang/lib/CodeGen/CGAtomic.cpp | 2 +- clang/lib/Sema/SemaType.cpp | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td b/clang/include/clang/Basic/DiagnosticSemaKinds.td index cde99dfb16ec5..414357c5a7c73 100644 --- a/clang/include/clang/Basic/DiagnosticSemaKinds.td +++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td @@ -7475,8 +7475,8 @@ def err_func_def_incomplete_result : Error< def err_atomic_specifier_bad_type : Error<"_Atomic cannot be applied to " "%select{incomplete |array |function |reference |atomic |qualified " - "|sizeless ||integer |}0type " - "%1 %select{|||||||which is not trivially copyable||in C23}0">; + "|sizeless ||}0type " + "%1 %select{|||||||which is not trivially copyable|in C23}0">; def warn_atomic_member_access : Warning< "accessing a member of an atomic structure or union is undefined behavior">, InGroup<DiagGroup<"atomic-access">>, DefaultError; diff --git a/clang/lib/CodeGen/CGAtomic.cpp b/clang/lib/CodeGen/CGAtomic.cpp index 66c059fd40e26..820849f5974c0 100644 --- a/clang/lib/CodeGen/CGAtomic.cpp +++ b/clang/lib/CodeGen/CGAtomic.cpp @@ -683,7 +683,7 @@ static llvm::AtomicOrdering atomicOrderOrSeqCst(llvm::Value *Order) { auto *C = dyn_cast<llvm::ConstantInt>(Order); if (!C || !llvm::isValidAtomicOrderingCABI(C->getZExtValue())) return llvm::AtomicOrdering::SequentiallyConsistent; - switch ((llvm::AtomicOrderingCABI)C->getZExtValue()) { + switch (static_cast<llvm::AtomicOrderingCABI>(C->getZExtValue())) { case llvm::AtomicOrderingCABI::relaxed: return llvm::AtomicOrdering::Monotonic; case llvm::AtomicOrderingCABI::consume: diff --git a/clang/lib/Sema/SemaType.cpp b/clang/lib/Sema/SemaType.cpp index 4a3506c281acf..f76244d5f2871 100644 --- a/clang/lib/Sema/SemaType.cpp +++ b/clang/lib/Sema/SemaType.cpp @@ -10414,7 +10414,7 @@ QualType Sema::BuildAtomicType(QualType T, SourceLocation Loc) { DisallowedKind = 7; else if (getLangOpts().C23 && T->isUndeducedAutoType()) // _Atomic auto is prohibited in C23 - DisallowedKind = 9; + DisallowedKind = 8; if (DisallowedKind != -1) { Diag(Loc, diag::err_atomic_specifier_bad_type) << DisallowedKind << T; >From ece57436ed0217f4c1ad4ca92f10182793ec109f Mon Sep 17 00:00:00 2001 From: Xavier Roche <[email protected]> Date: Fri, 26 Jun 2026 21:21:02 +0200 Subject: [PATCH 5/7] [Clang][NFC] Regenerate atomic-bitint.c checks with update_cc_test_checks The prior spot-checks did not show the lowering. Regenerate complete check lines so the compare-exchange loop, the width-N arithmetic, and the sext/zext memory canonicalization are all visible. Assisted-by: Claude (Anthropic) Co-Authored-By: Claude Opus 4.6 <[email protected]> --- clang/test/CodeGen/atomic-bitint.c | 350 ++++++++++++++++++++++++++--- 1 file changed, 321 insertions(+), 29 deletions(-) diff --git a/clang/test/CodeGen/atomic-bitint.c b/clang/test/CodeGen/atomic-bitint.c index 9fa259776bf62..6476c26a0f0dd 100644 --- a/clang/test/CodeGen/atomic-bitint.c +++ b/clang/test/CodeGen/atomic-bitint.c @@ -1,3 +1,4 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 6 // RUN: %clang_cc1 -std=c23 -triple x86_64-unknown-linux-gnu -emit-llvm %s -o - | FileCheck %s // RUN: %clang_cc1 -std=c2y -triple x86_64-unknown-linux-gnu -emit-llvm %s -o - | FileCheck %s // @@ -12,80 +13,371 @@ typedef _BitInt(64) S64; typedef _BitInt(128) S128; typedef _BitInt(256) S256; -// CHECK-LABEL: @ld37( -// CHECK: load atomic i64 +// CHECK-LABEL: define dso_local i64 @ld37( +// CHECK-SAME: ptr noundef [[P:%.*]]) #[[ATTR0:[0-9]+]] { +// CHECK-NEXT: [[ENTRY:.*:]] +// CHECK-NEXT: [[RETVAL:%.*]] = alloca i37, align 8 +// CHECK-NEXT: [[P_ADDR:%.*]] = alloca ptr, align 8 +// CHECK-NEXT: [[ATOMIC_TEMP:%.*]] = alloca i64, align 8 +// CHECK-NEXT: store ptr [[P]], ptr [[P_ADDR]], align 8 +// CHECK-NEXT: [[TMP0:%.*]] = load ptr, ptr [[P_ADDR]], align 8 +// CHECK-NEXT: [[TMP1:%.*]] = load atomic i64, ptr [[TMP0]] seq_cst, align 8 +// CHECK-NEXT: store i64 [[TMP1]], ptr [[ATOMIC_TEMP]], align 8 +// CHECK-NEXT: [[TMP2:%.*]] = load i64, ptr [[ATOMIC_TEMP]], align 8 +// CHECK-NEXT: [[LOADEDV:%.*]] = trunc i64 [[TMP2]] to i37 +// CHECK-NEXT: store i37 [[LOADEDV]], ptr [[RETVAL]], align 8 +// CHECK-NEXT: [[TMP3:%.*]] = load i37, ptr [[RETVAL]], align 8 +// CHECK-NEXT: [[COERCE_VAL_II:%.*]] = zext i37 [[TMP3]] to i64 +// CHECK-NEXT: ret i64 [[COERCE_VAL_II]] +// S37 ld37(_Atomic(S37) *p) { return __c11_atomic_load(p, __ATOMIC_SEQ_CST); } -// CHECK-LABEL: @st37( -// CHECK: store atomic i64 +// CHECK-LABEL: define dso_local void @st37( +// CHECK-SAME: ptr noundef [[P:%.*]], i64 noundef [[V_COERCE:%.*]]) #[[ATTR0]] { +// CHECK-NEXT: [[ENTRY:.*:]] +// CHECK-NEXT: [[V:%.*]] = alloca i64, align 8 +// CHECK-NEXT: [[P_ADDR:%.*]] = alloca ptr, align 8 +// CHECK-NEXT: [[V_ADDR:%.*]] = alloca i64, align 8 +// CHECK-NEXT: [[DOTATOMICTMP:%.*]] = alloca i64, align 8 +// CHECK-NEXT: store i64 [[V_COERCE]], ptr [[V]], align 8 +// CHECK-NEXT: [[TMP0:%.*]] = load i64, ptr [[V]], align 8 +// CHECK-NEXT: [[V1:%.*]] = trunc i64 [[TMP0]] to i37 +// CHECK-NEXT: store ptr [[P]], ptr [[P_ADDR]], align 8 +// CHECK-NEXT: [[STOREDV:%.*]] = sext i37 [[V1]] to i64 +// CHECK-NEXT: store i64 [[STOREDV]], ptr [[V_ADDR]], align 8 +// CHECK-NEXT: [[TMP1:%.*]] = load ptr, ptr [[P_ADDR]], align 8 +// CHECK-NEXT: [[TMP2:%.*]] = load i64, ptr [[V_ADDR]], align 8 +// CHECK-NEXT: [[LOADEDV:%.*]] = trunc i64 [[TMP2]] to i37 +// CHECK-NEXT: [[STOREDV2:%.*]] = sext i37 [[LOADEDV]] to i64 +// CHECK-NEXT: store i64 [[STOREDV2]], ptr [[DOTATOMICTMP]], align 8 +// CHECK-NEXT: [[TMP3:%.*]] = load i64, ptr [[DOTATOMICTMP]], align 8 +// CHECK-NEXT: store atomic i64 [[TMP3]], ptr [[TMP1]] seq_cst, align 8 +// CHECK-NEXT: ret void +// void st37(_Atomic(S37) *p, S37 v) { __c11_atomic_store(p, v, __ATOMIC_SEQ_CST); } -// CHECK-LABEL: @xchg37( -// CHECK: atomicrmw xchg ptr {{.*}} i64 +// CHECK-LABEL: define dso_local i64 @xchg37( +// CHECK-SAME: ptr noundef [[P:%.*]], i64 noundef [[V_COERCE:%.*]]) #[[ATTR0]] { +// CHECK-NEXT: [[ENTRY:.*:]] +// CHECK-NEXT: [[RETVAL:%.*]] = alloca i37, align 8 +// CHECK-NEXT: [[V:%.*]] = alloca i64, align 8 +// CHECK-NEXT: [[P_ADDR:%.*]] = alloca ptr, align 8 +// CHECK-NEXT: [[V_ADDR:%.*]] = alloca i64, align 8 +// CHECK-NEXT: [[DOTATOMICTMP:%.*]] = alloca i64, align 8 +// CHECK-NEXT: [[ATOMIC_TEMP:%.*]] = alloca i64, align 8 +// CHECK-NEXT: store i64 [[V_COERCE]], ptr [[V]], align 8 +// CHECK-NEXT: [[TMP0:%.*]] = load i64, ptr [[V]], align 8 +// CHECK-NEXT: [[V1:%.*]] = trunc i64 [[TMP0]] to i37 +// CHECK-NEXT: store ptr [[P]], ptr [[P_ADDR]], align 8 +// CHECK-NEXT: [[STOREDV:%.*]] = sext i37 [[V1]] to i64 +// CHECK-NEXT: store i64 [[STOREDV]], ptr [[V_ADDR]], align 8 +// CHECK-NEXT: [[TMP1:%.*]] = load ptr, ptr [[P_ADDR]], align 8 +// CHECK-NEXT: [[TMP2:%.*]] = load i64, ptr [[V_ADDR]], align 8 +// CHECK-NEXT: [[LOADEDV:%.*]] = trunc i64 [[TMP2]] to i37 +// CHECK-NEXT: [[STOREDV2:%.*]] = sext i37 [[LOADEDV]] to i64 +// CHECK-NEXT: store i64 [[STOREDV2]], ptr [[DOTATOMICTMP]], align 8 +// CHECK-NEXT: [[TMP3:%.*]] = load i64, ptr [[DOTATOMICTMP]], align 8 +// CHECK-NEXT: [[TMP4:%.*]] = atomicrmw xchg ptr [[TMP1]], i64 [[TMP3]] seq_cst, align 8 +// CHECK-NEXT: store i64 [[TMP4]], ptr [[ATOMIC_TEMP]], align 8 +// CHECK-NEXT: [[TMP5:%.*]] = load i64, ptr [[ATOMIC_TEMP]], align 8 +// CHECK-NEXT: [[LOADEDV3:%.*]] = trunc i64 [[TMP5]] to i37 +// CHECK-NEXT: store i37 [[LOADEDV3]], ptr [[RETVAL]], align 8 +// CHECK-NEXT: [[TMP6:%.*]] = load i37, ptr [[RETVAL]], align 8 +// CHECK-NEXT: [[COERCE_VAL_II:%.*]] = zext i37 [[TMP6]] to i64 +// CHECK-NEXT: ret i64 [[COERCE_VAL_II]] +// S37 xchg37(_Atomic(S37) *p, S37 v) { return __c11_atomic_exchange(p, v, __ATOMIC_SEQ_CST); } -// CHECK-LABEL: @cas37( -// CHECK: cmpxchg ptr {{.*}} i64 +// CHECK-LABEL: define dso_local zeroext i1 @cas37( +// CHECK-SAME: ptr noundef [[P:%.*]], ptr noundef [[E:%.*]], i64 noundef [[D_COERCE:%.*]]) #[[ATTR0]] { +// CHECK-NEXT: [[ENTRY:.*:]] +// CHECK-NEXT: [[D:%.*]] = alloca i64, align 8 +// CHECK-NEXT: [[P_ADDR:%.*]] = alloca ptr, align 8 +// CHECK-NEXT: [[E_ADDR:%.*]] = alloca ptr, align 8 +// CHECK-NEXT: [[D_ADDR:%.*]] = alloca i64, align 8 +// CHECK-NEXT: [[DOTATOMICTMP:%.*]] = alloca i64, align 8 +// CHECK-NEXT: [[CMPXCHG_BOOL:%.*]] = alloca i8, align 1 +// CHECK-NEXT: store i64 [[D_COERCE]], ptr [[D]], align 8 +// CHECK-NEXT: [[TMP0:%.*]] = load i64, ptr [[D]], align 8 +// CHECK-NEXT: [[D1:%.*]] = trunc i64 [[TMP0]] to i37 +// CHECK-NEXT: store ptr [[P]], ptr [[P_ADDR]], align 8 +// CHECK-NEXT: store ptr [[E]], ptr [[E_ADDR]], align 8 +// CHECK-NEXT: [[STOREDV:%.*]] = sext i37 [[D1]] to i64 +// CHECK-NEXT: store i64 [[STOREDV]], ptr [[D_ADDR]], align 8 +// CHECK-NEXT: [[TMP1:%.*]] = load ptr, ptr [[P_ADDR]], align 8 +// CHECK-NEXT: [[TMP2:%.*]] = load ptr, ptr [[E_ADDR]], align 8 +// CHECK-NEXT: [[TMP3:%.*]] = load i64, ptr [[D_ADDR]], align 8 +// CHECK-NEXT: [[LOADEDV:%.*]] = trunc i64 [[TMP3]] to i37 +// CHECK-NEXT: [[STOREDV2:%.*]] = sext i37 [[LOADEDV]] to i64 +// CHECK-NEXT: store i64 [[STOREDV2]], ptr [[DOTATOMICTMP]], align 8 +// CHECK-NEXT: [[TMP4:%.*]] = load i64, ptr [[TMP2]], align 8 +// CHECK-NEXT: [[TMP5:%.*]] = load i64, ptr [[DOTATOMICTMP]], align 8 +// CHECK-NEXT: [[TMP6:%.*]] = cmpxchg ptr [[TMP1]], i64 [[TMP4]], i64 [[TMP5]] seq_cst seq_cst, align 8 +// CHECK-NEXT: [[TMP7:%.*]] = extractvalue { i64, i1 } [[TMP6]], 0 +// CHECK-NEXT: [[TMP8:%.*]] = extractvalue { i64, i1 } [[TMP6]], 1 +// CHECK-NEXT: br i1 [[TMP8]], label %[[CMPXCHG_CONTINUE:.*]], label %[[CMPXCHG_STORE_EXPECTED:.*]] +// CHECK: [[CMPXCHG_STORE_EXPECTED]]: +// CHECK-NEXT: store i64 [[TMP7]], ptr [[TMP2]], align 8 +// CHECK-NEXT: br label %[[CMPXCHG_CONTINUE]] +// CHECK: [[CMPXCHG_CONTINUE]]: +// CHECK-NEXT: [[STOREDV3:%.*]] = zext i1 [[TMP8]] to i8 +// CHECK-NEXT: store i8 [[STOREDV3]], ptr [[CMPXCHG_BOOL]], align 1 +// CHECK-NEXT: [[TMP9:%.*]] = load i8, ptr [[CMPXCHG_BOOL]], align 1 +// CHECK-NEXT: [[LOADEDV4:%.*]] = icmp ne i8 [[TMP9]], 0 +// CHECK-NEXT: ret i1 [[LOADEDV4]] +// _Bool cas37(_Atomic(S37) *p, S37 *e, S37 d) { return __c11_atomic_compare_exchange_strong(p, e, d, __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST); } // Bitwise RMW on a padded width keeps the direct atomicrmw: it is exact. -// CHECK-LABEL: @and37( -// CHECK: atomicrmw and ptr {{.*}} i64 -// CHECK-NOT: cmpxchg +// CHECK-LABEL: define dso_local i64 @and37( +// CHECK-SAME: ptr noundef [[P:%.*]], i64 noundef [[V_COERCE:%.*]]) #[[ATTR0]] { +// CHECK-NEXT: [[ENTRY:.*:]] +// CHECK-NEXT: [[RETVAL:%.*]] = alloca i37, align 8 +// CHECK-NEXT: [[V:%.*]] = alloca i64, align 8 +// CHECK-NEXT: [[P_ADDR:%.*]] = alloca ptr, align 8 +// CHECK-NEXT: [[V_ADDR:%.*]] = alloca i64, align 8 +// CHECK-NEXT: [[DOTATOMICTMP:%.*]] = alloca i64, align 8 +// CHECK-NEXT: [[ATOMIC_TEMP:%.*]] = alloca i64, align 8 +// CHECK-NEXT: store i64 [[V_COERCE]], ptr [[V]], align 8 +// CHECK-NEXT: [[TMP0:%.*]] = load i64, ptr [[V]], align 8 +// CHECK-NEXT: [[V1:%.*]] = trunc i64 [[TMP0]] to i37 +// CHECK-NEXT: store ptr [[P]], ptr [[P_ADDR]], align 8 +// CHECK-NEXT: [[STOREDV:%.*]] = sext i37 [[V1]] to i64 +// CHECK-NEXT: store i64 [[STOREDV]], ptr [[V_ADDR]], align 8 +// CHECK-NEXT: [[TMP1:%.*]] = load ptr, ptr [[P_ADDR]], align 8 +// CHECK-NEXT: [[TMP2:%.*]] = load i64, ptr [[V_ADDR]], align 8 +// CHECK-NEXT: [[LOADEDV:%.*]] = trunc i64 [[TMP2]] to i37 +// CHECK-NEXT: [[STOREDV2:%.*]] = sext i37 [[LOADEDV]] to i64 +// CHECK-NEXT: store i64 [[STOREDV2]], ptr [[DOTATOMICTMP]], align 8 +// CHECK-NEXT: [[TMP3:%.*]] = load i64, ptr [[DOTATOMICTMP]], align 8 +// CHECK-NEXT: [[TMP4:%.*]] = atomicrmw and ptr [[TMP1]], i64 [[TMP3]] seq_cst, align 8 +// CHECK-NEXT: store i64 [[TMP4]], ptr [[ATOMIC_TEMP]], align 8 +// CHECK-NEXT: [[TMP5:%.*]] = load i64, ptr [[ATOMIC_TEMP]], align 8 +// CHECK-NEXT: [[LOADEDV3:%.*]] = trunc i64 [[TMP5]] to i37 +// CHECK-NEXT: store i37 [[LOADEDV3]], ptr [[RETVAL]], align 8 +// CHECK-NEXT: [[TMP6:%.*]] = load i37, ptr [[RETVAL]], align 8 +// CHECK-NEXT: [[COERCE_VAL_II:%.*]] = zext i37 [[TMP6]] to i64 +// CHECK-NEXT: ret i64 [[COERCE_VAL_II]] +// S37 and37(_Atomic(S37) *p, S37 v) { return __c11_atomic_fetch_and(p, v, __ATOMIC_SEQ_CST); } // Arithmetic RMW on a padded width becomes a compare-exchange loop, not a bare // atomicrmw that would carry into the padding bits. -// CHECK-LABEL: @add37( -// CHECK: atomicrmw.start: -// CHECK: cmpxchg weak ptr {{.*}} i64 -// CHECK-NOT: atomicrmw add +// CHECK-LABEL: define dso_local i64 @add37( +// CHECK-SAME: ptr noundef [[P:%.*]], i64 noundef [[V_COERCE:%.*]]) #[[ATTR0]] { +// CHECK-NEXT: [[ENTRY:.*]]: +// CHECK-NEXT: [[RETVAL:%.*]] = alloca i37, align 8 +// CHECK-NEXT: [[V:%.*]] = alloca i64, align 8 +// CHECK-NEXT: [[P_ADDR:%.*]] = alloca ptr, align 8 +// CHECK-NEXT: [[V_ADDR:%.*]] = alloca i64, align 8 +// CHECK-NEXT: [[DOTATOMICTMP:%.*]] = alloca i64, align 8 +// CHECK-NEXT: store i64 [[V_COERCE]], ptr [[V]], align 8 +// CHECK-NEXT: [[TMP0:%.*]] = load i64, ptr [[V]], align 8 +// CHECK-NEXT: [[V1:%.*]] = trunc i64 [[TMP0]] to i37 +// CHECK-NEXT: store ptr [[P]], ptr [[P_ADDR]], align 8 +// CHECK-NEXT: [[STOREDV:%.*]] = sext i37 [[V1]] to i64 +// CHECK-NEXT: store i64 [[STOREDV]], ptr [[V_ADDR]], align 8 +// CHECK-NEXT: [[TMP1:%.*]] = load ptr, ptr [[P_ADDR]], align 8 +// CHECK-NEXT: [[TMP2:%.*]] = load i64, ptr [[V_ADDR]], align 8 +// CHECK-NEXT: [[LOADEDV:%.*]] = trunc i64 [[TMP2]] to i37 +// CHECK-NEXT: [[STOREDV2:%.*]] = sext i37 [[LOADEDV]] to i64 +// CHECK-NEXT: store i64 [[STOREDV2]], ptr [[DOTATOMICTMP]], align 8 +// CHECK-NEXT: [[TMP3:%.*]] = load i64, ptr [[DOTATOMICTMP]], align 8 +// CHECK-NEXT: [[LOADEDV3:%.*]] = trunc i64 [[TMP3]] to i37 +// CHECK-NEXT: [[ATOMIC_LOAD:%.*]] = load atomic i64, ptr [[TMP1]] monotonic, align 8 +// CHECK-NEXT: [[LOADEDV4:%.*]] = trunc i64 [[ATOMIC_LOAD]] to i37 +// CHECK-NEXT: br label %[[ATOMICRMW_START:.*]] +// CHECK: [[ATOMICRMW_START]]: +// CHECK-NEXT: [[TMP4:%.*]] = phi i37 [ [[LOADEDV4]], %[[ENTRY]] ], [ [[LOADEDV7:%.*]], %[[ATOMICRMW_START]] ] +// CHECK-NEXT: [[NEW:%.*]] = add i37 [[TMP4]], [[LOADEDV3]] +// CHECK-NEXT: [[STOREDV5:%.*]] = sext i37 [[TMP4]] to i64 +// CHECK-NEXT: [[STOREDV6:%.*]] = sext i37 [[NEW]] to i64 +// CHECK-NEXT: [[TMP5:%.*]] = cmpxchg weak ptr [[TMP1]], i64 [[STOREDV5]], i64 [[STOREDV6]] seq_cst seq_cst, align 8 +// CHECK-NEXT: [[TMP6:%.*]] = extractvalue { i64, i1 } [[TMP5]], 0 +// CHECK-NEXT: [[TMP7:%.*]] = extractvalue { i64, i1 } [[TMP5]], 1 +// CHECK-NEXT: [[LOADEDV7]] = trunc i64 [[TMP6]] to i37 +// CHECK-NEXT: br i1 [[TMP7]], label %[[ATOMICRMW_END:.*]], label %[[ATOMICRMW_START]] +// CHECK: [[ATOMICRMW_END]]: +// CHECK-NEXT: store i37 [[TMP4]], ptr [[RETVAL]], align 8 +// CHECK-NEXT: [[TMP8:%.*]] = load i37, ptr [[RETVAL]], align 8 +// CHECK-NEXT: [[COERCE_VAL_II:%.*]] = zext i37 [[TMP8]] to i64 +// CHECK-NEXT: ret i64 [[COERCE_VAL_II]] +// S37 add37(_Atomic(S37) *p, S37 v) { return __c11_atomic_fetch_add(p, v, __ATOMIC_SEQ_CST); } // Signed min is computed at the value width, so the sign bit is at bit N-1. -// CHECK-LABEL: @min37( -// CHECK: icmp sle i37 -// CHECK: select i1 -// CHECK: cmpxchg weak ptr {{.*}} i64 +// CHECK-LABEL: define dso_local i64 @min37( +// CHECK-SAME: ptr noundef [[P:%.*]], i64 noundef [[V_COERCE:%.*]]) #[[ATTR0]] { +// CHECK-NEXT: [[ENTRY:.*]]: +// CHECK-NEXT: [[RETVAL:%.*]] = alloca i37, align 8 +// CHECK-NEXT: [[V:%.*]] = alloca i64, align 8 +// CHECK-NEXT: [[P_ADDR:%.*]] = alloca ptr, align 8 +// CHECK-NEXT: [[V_ADDR:%.*]] = alloca i64, align 8 +// CHECK-NEXT: [[DOTATOMICTMP:%.*]] = alloca i64, align 8 +// CHECK-NEXT: store i64 [[V_COERCE]], ptr [[V]], align 8 +// CHECK-NEXT: [[TMP0:%.*]] = load i64, ptr [[V]], align 8 +// CHECK-NEXT: [[V1:%.*]] = trunc i64 [[TMP0]] to i37 +// CHECK-NEXT: store ptr [[P]], ptr [[P_ADDR]], align 8 +// CHECK-NEXT: [[STOREDV:%.*]] = sext i37 [[V1]] to i64 +// CHECK-NEXT: store i64 [[STOREDV]], ptr [[V_ADDR]], align 8 +// CHECK-NEXT: [[TMP1:%.*]] = load ptr, ptr [[P_ADDR]], align 8 +// CHECK-NEXT: [[TMP2:%.*]] = load i64, ptr [[V_ADDR]], align 8 +// CHECK-NEXT: [[LOADEDV:%.*]] = trunc i64 [[TMP2]] to i37 +// CHECK-NEXT: [[STOREDV2:%.*]] = sext i37 [[LOADEDV]] to i64 +// CHECK-NEXT: store i64 [[STOREDV2]], ptr [[DOTATOMICTMP]], align 8 +// CHECK-NEXT: [[TMP3:%.*]] = load i64, ptr [[DOTATOMICTMP]], align 8 +// CHECK-NEXT: [[LOADEDV3:%.*]] = trunc i64 [[TMP3]] to i37 +// CHECK-NEXT: [[ATOMIC_LOAD:%.*]] = load atomic i64, ptr [[TMP1]] monotonic, align 8 +// CHECK-NEXT: [[LOADEDV4:%.*]] = trunc i64 [[ATOMIC_LOAD]] to i37 +// CHECK-NEXT: br label %[[ATOMICRMW_START:.*]] +// CHECK: [[ATOMICRMW_START]]: +// CHECK-NEXT: [[TMP4:%.*]] = phi i37 [ [[LOADEDV4]], %[[ENTRY]] ], [ [[LOADEDV7:%.*]], %[[ATOMICRMW_START]] ] +// CHECK-NEXT: [[TMP5:%.*]] = icmp sle i37 [[TMP4]], [[LOADEDV3]] +// CHECK-NEXT: [[NEW:%.*]] = select i1 [[TMP5]], i37 [[TMP4]], i37 [[LOADEDV3]] +// CHECK-NEXT: [[STOREDV5:%.*]] = sext i37 [[TMP4]] to i64 +// CHECK-NEXT: [[STOREDV6:%.*]] = sext i37 [[NEW]] to i64 +// CHECK-NEXT: [[TMP6:%.*]] = cmpxchg weak ptr [[TMP1]], i64 [[STOREDV5]], i64 [[STOREDV6]] seq_cst seq_cst, align 8 +// CHECK-NEXT: [[TMP7:%.*]] = extractvalue { i64, i1 } [[TMP6]], 0 +// CHECK-NEXT: [[TMP8:%.*]] = extractvalue { i64, i1 } [[TMP6]], 1 +// CHECK-NEXT: [[LOADEDV7]] = trunc i64 [[TMP7]] to i37 +// CHECK-NEXT: br i1 [[TMP8]], label %[[ATOMICRMW_END:.*]], label %[[ATOMICRMW_START]] +// CHECK: [[ATOMICRMW_END]]: +// CHECK-NEXT: store i37 [[TMP4]], ptr [[RETVAL]], align 8 +// CHECK-NEXT: [[TMP9:%.*]] = load i37, ptr [[RETVAL]], align 8 +// CHECK-NEXT: [[COERCE_VAL_II:%.*]] = zext i37 [[TMP9]] to i64 +// CHECK-NEXT: ret i64 [[COERCE_VAL_II]] +// U37 min37(_Atomic(S37) *p, S37 v) { return __c11_atomic_fetch_min(p, v, __ATOMIC_SEQ_CST); } // No padding: direct atomicrmw, no loop. -// CHECK-LABEL: @add64( -// CHECK: atomicrmw add ptr {{.*}} i64 -// CHECK-NOT: cmpxchg +// CHECK-LABEL: define dso_local i64 @add64( +// CHECK-SAME: ptr noundef [[P:%.*]], i64 noundef [[V:%.*]]) #[[ATTR0]] { +// CHECK-NEXT: [[ENTRY:.*:]] +// CHECK-NEXT: [[P_ADDR:%.*]] = alloca ptr, align 8 +// CHECK-NEXT: [[V_ADDR:%.*]] = alloca i64, align 8 +// CHECK-NEXT: [[DOTATOMICTMP:%.*]] = alloca i64, align 8 +// CHECK-NEXT: [[ATOMIC_TEMP:%.*]] = alloca i64, align 8 +// CHECK-NEXT: store ptr [[P]], ptr [[P_ADDR]], align 8 +// CHECK-NEXT: store i64 [[V]], ptr [[V_ADDR]], align 8 +// CHECK-NEXT: [[TMP0:%.*]] = load ptr, ptr [[P_ADDR]], align 8 +// CHECK-NEXT: [[TMP1:%.*]] = load i64, ptr [[V_ADDR]], align 8 +// CHECK-NEXT: store i64 [[TMP1]], ptr [[DOTATOMICTMP]], align 8 +// CHECK-NEXT: [[TMP2:%.*]] = load i64, ptr [[DOTATOMICTMP]], align 8 +// CHECK-NEXT: [[TMP3:%.*]] = atomicrmw add ptr [[TMP0]], i64 [[TMP2]] seq_cst, align 8 +// CHECK-NEXT: store i64 [[TMP3]], ptr [[ATOMIC_TEMP]], align 8 +// CHECK-NEXT: [[TMP4:%.*]] = load i64, ptr [[ATOMIC_TEMP]], align 8 +// CHECK-NEXT: ret i64 [[TMP4]] +// S64 add64(_Atomic(S64) *p, S64 v) { return __c11_atomic_fetch_add(p, v, __ATOMIC_SEQ_CST); } -// CHECK-LABEL: @add128( -// CHECK: atomicrmw add ptr {{.*}} i128 +// CHECK-LABEL: define dso_local i128 @add128( +// CHECK-SAME: ptr noundef [[P:%.*]], i128 noundef [[V:%.*]]) #[[ATTR0]] { +// CHECK-NEXT: [[ENTRY:.*:]] +// CHECK-NEXT: [[P_ADDR:%.*]] = alloca ptr, align 8 +// CHECK-NEXT: [[V_ADDR:%.*]] = alloca i128, align 8 +// CHECK-NEXT: [[DOTATOMICTMP:%.*]] = alloca i128, align 8 +// CHECK-NEXT: [[ATOMIC_TEMP:%.*]] = alloca i128, align 16 +// CHECK-NEXT: store ptr [[P]], ptr [[P_ADDR]], align 8 +// CHECK-NEXT: store i128 [[V]], ptr [[V_ADDR]], align 8 +// CHECK-NEXT: [[TMP0:%.*]] = load ptr, ptr [[P_ADDR]], align 8 +// CHECK-NEXT: [[TMP1:%.*]] = load i128, ptr [[V_ADDR]], align 8 +// CHECK-NEXT: store i128 [[TMP1]], ptr [[DOTATOMICTMP]], align 8 +// CHECK-NEXT: [[TMP2:%.*]] = load i128, ptr [[DOTATOMICTMP]], align 8 +// CHECK-NEXT: [[TMP3:%.*]] = atomicrmw add ptr [[TMP0]], i128 [[TMP2]] seq_cst, align 16 +// CHECK-NEXT: store i128 [[TMP3]], ptr [[ATOMIC_TEMP]], align 16 +// CHECK-NEXT: [[TMP4:%.*]] = load i128, ptr [[ATOMIC_TEMP]], align 16 +// CHECK-NEXT: ret i128 [[TMP4]] +// S128 add128(_Atomic(S128) *p, S128 v) { return __c11_atomic_fetch_add(p, v, __ATOMIC_SEQ_CST); } // Wide: no inline atomicrmw and no arbitrary-width __atomic_fetch_add libcall, // so the loop calls __atomic_compare_exchange. -// CHECK-LABEL: @add256( -// CHECK: call {{.*}}@__atomic_compare_exchange -// CHECK-NOT: cmpxchg +// CHECK-LABEL: define dso_local void @add256( +// CHECK-SAME: ptr dead_on_unwind noalias writable sret(i256) align 8 [[AGG_RESULT:%.*]], ptr noundef [[P:%.*]], ptr noundef byval(i256) align 8 [[TMP0:%.*]]) #[[ATTR0]] { +// CHECK-NEXT: [[ENTRY:.*]]: +// CHECK-NEXT: [[P_ADDR:%.*]] = alloca ptr, align 8 +// CHECK-NEXT: [[V_ADDR:%.*]] = alloca i256, align 8 +// CHECK-NEXT: [[DOTATOMICTMP:%.*]] = alloca i256, align 8 +// CHECK-NEXT: [[ATOMIC_TEMP:%.*]] = alloca i256, align 8 +// CHECK-NEXT: [[ATOMIC_TEMP1:%.*]] = alloca i256, align 8 +// CHECK-NEXT: [[ATOMIC_TEMP2:%.*]] = alloca i256, align 8 +// CHECK-NEXT: [[V:%.*]] = load i256, ptr [[TMP0]], align 8 +// CHECK-NEXT: store ptr [[P]], ptr [[P_ADDR]], align 8 +// CHECK-NEXT: store i256 [[V]], ptr [[V_ADDR]], align 8 +// CHECK-NEXT: [[TMP1:%.*]] = load ptr, ptr [[P_ADDR]], align 8 +// CHECK-NEXT: [[TMP2:%.*]] = load i256, ptr [[V_ADDR]], align 8 +// CHECK-NEXT: store i256 [[TMP2]], ptr [[DOTATOMICTMP]], align 8 +// CHECK-NEXT: [[TMP3:%.*]] = load i256, ptr [[DOTATOMICTMP]], align 8 +// CHECK-NEXT: call void @__atomic_load(i64 noundef 32, ptr noundef [[TMP1]], ptr noundef [[ATOMIC_TEMP]], i32 noundef 0) +// CHECK-NEXT: [[TMP4:%.*]] = load i256, ptr [[ATOMIC_TEMP]], align 8 +// CHECK-NEXT: br label %[[ATOMICRMW_START:.*]] +// CHECK: [[ATOMICRMW_START]]: +// CHECK-NEXT: [[TMP5:%.*]] = phi i256 [ [[TMP4]], %[[ENTRY]] ], [ [[TMP6:%.*]], %[[ATOMICRMW_START]] ] +// CHECK-NEXT: [[NEW:%.*]] = add i256 [[TMP5]], [[TMP3]] +// CHECK-NEXT: store i256 [[TMP5]], ptr [[ATOMIC_TEMP1]], align 8 +// CHECK-NEXT: store i256 [[NEW]], ptr [[ATOMIC_TEMP2]], align 8 +// CHECK-NEXT: [[CALL:%.*]] = call zeroext i1 @__atomic_compare_exchange(i64 noundef 32, ptr noundef [[TMP1]], ptr noundef [[ATOMIC_TEMP1]], ptr noundef [[ATOMIC_TEMP2]], i32 noundef 5, i32 noundef 5) +// CHECK-NEXT: [[TMP6]] = load i256, ptr [[ATOMIC_TEMP1]], align 8 +// CHECK-NEXT: br i1 [[CALL]], label %[[ATOMICRMW_END:.*]], label %[[ATOMICRMW_START]] +// CHECK: [[ATOMICRMW_END]]: +// CHECK-NEXT: store i256 [[TMP5]], ptr [[AGG_RESULT]], align 8 +// CHECK-NEXT: [[TMP7:%.*]] = load i256, ptr [[AGG_RESULT]], align 8 +// CHECK-NEXT: store i256 [[TMP7]], ptr [[AGG_RESULT]], align 8 +// CHECK-NEXT: ret void +// S256 add256(_Atomic(S256) *p, S256 v) { return __c11_atomic_fetch_add(p, v, __ATOMIC_SEQ_CST); } // Wide bitwise also needs the loop: the wide path has no inline atomicrmw. -// CHECK-LABEL: @or256( -// CHECK: call {{.*}}@__atomic_compare_exchange +// CHECK-LABEL: define dso_local void @or256( +// CHECK-SAME: ptr dead_on_unwind noalias writable sret(i256) align 8 [[AGG_RESULT:%.*]], ptr noundef [[P:%.*]], ptr noundef byval(i256) align 8 [[TMP0:%.*]]) #[[ATTR0]] { +// CHECK-NEXT: [[ENTRY:.*]]: +// CHECK-NEXT: [[P_ADDR:%.*]] = alloca ptr, align 8 +// CHECK-NEXT: [[V_ADDR:%.*]] = alloca i256, align 8 +// CHECK-NEXT: [[DOTATOMICTMP:%.*]] = alloca i256, align 8 +// CHECK-NEXT: [[ATOMIC_TEMP:%.*]] = alloca i256, align 8 +// CHECK-NEXT: [[ATOMIC_TEMP1:%.*]] = alloca i256, align 8 +// CHECK-NEXT: [[ATOMIC_TEMP2:%.*]] = alloca i256, align 8 +// CHECK-NEXT: [[V:%.*]] = load i256, ptr [[TMP0]], align 8 +// CHECK-NEXT: store ptr [[P]], ptr [[P_ADDR]], align 8 +// CHECK-NEXT: store i256 [[V]], ptr [[V_ADDR]], align 8 +// CHECK-NEXT: [[TMP1:%.*]] = load ptr, ptr [[P_ADDR]], align 8 +// CHECK-NEXT: [[TMP2:%.*]] = load i256, ptr [[V_ADDR]], align 8 +// CHECK-NEXT: store i256 [[TMP2]], ptr [[DOTATOMICTMP]], align 8 +// CHECK-NEXT: [[TMP3:%.*]] = load i256, ptr [[DOTATOMICTMP]], align 8 +// CHECK-NEXT: call void @__atomic_load(i64 noundef 32, ptr noundef [[TMP1]], ptr noundef [[ATOMIC_TEMP]], i32 noundef 0) +// CHECK-NEXT: [[TMP4:%.*]] = load i256, ptr [[ATOMIC_TEMP]], align 8 +// CHECK-NEXT: br label %[[ATOMICRMW_START:.*]] +// CHECK: [[ATOMICRMW_START]]: +// CHECK-NEXT: [[TMP5:%.*]] = phi i256 [ [[TMP4]], %[[ENTRY]] ], [ [[TMP6:%.*]], %[[ATOMICRMW_START]] ] +// CHECK-NEXT: [[NEW:%.*]] = or i256 [[TMP5]], [[TMP3]] +// CHECK-NEXT: store i256 [[TMP5]], ptr [[ATOMIC_TEMP1]], align 8 +// CHECK-NEXT: store i256 [[NEW]], ptr [[ATOMIC_TEMP2]], align 8 +// CHECK-NEXT: [[CALL:%.*]] = call zeroext i1 @__atomic_compare_exchange(i64 noundef 32, ptr noundef [[TMP1]], ptr noundef [[ATOMIC_TEMP1]], ptr noundef [[ATOMIC_TEMP2]], i32 noundef 5, i32 noundef 5) +// CHECK-NEXT: [[TMP6]] = load i256, ptr [[ATOMIC_TEMP1]], align 8 +// CHECK-NEXT: br i1 [[CALL]], label %[[ATOMICRMW_END:.*]], label %[[ATOMICRMW_START]] +// CHECK: [[ATOMICRMW_END]]: +// CHECK-NEXT: store i256 [[TMP5]], ptr [[AGG_RESULT]], align 8 +// CHECK-NEXT: [[TMP7:%.*]] = load i256, ptr [[AGG_RESULT]], align 8 +// CHECK-NEXT: store i256 [[TMP7]], ptr [[AGG_RESULT]], align 8 +// CHECK-NEXT: ret void +// S256 or256(_Atomic(S256) *p, S256 v) { return __c11_atomic_fetch_or(p, v, __ATOMIC_SEQ_CST); } >From 1c87c9994199bd82766b4d6d6f5edf392c225ba2 Mon Sep 17 00:00:00 2001 From: Xavier Roche <[email protected]> Date: Sat, 27 Jun 2026 09:23:04 +0200 Subject: [PATCH 6/7] [Clang] Carry the raw representation in the _BitInt atomic RMW loop The compare-exchange loop for padded and wide _BitInt atomics formed its cmpxchg expected by re-canonicalizing the loaded value (sign/zero-extending the truncated old). An object whose padding bits were non-canonical, e.g. written through a union, then never matched that expected, so the cmpxchg failed every iteration and the read-modify-write spun forever. Reuse the existing EmitAtomicUpdate loop, which carries the raw loaded representation as the expected and writes back a canonical desired computed at value width N. The object converges on the first iteration regardless of its padding, and the value it stores is canonical. See P0528. Assisted-by: Claude (Anthropic) Co-Authored-By: Claude Opus 4.6 <[email protected]> --- clang/lib/CodeGen/CGAtomic.cpp | 48 +++++------- clang/test/CodeGen/atomic-bitint.c | 120 ++++++++++++++--------------- 2 files changed, 77 insertions(+), 91 deletions(-) diff --git a/clang/lib/CodeGen/CGAtomic.cpp b/clang/lib/CodeGen/CGAtomic.cpp index 820849f5974c0..0043c79b398ee 100644 --- a/clang/lib/CodeGen/CGAtomic.cpp +++ b/clang/lib/CodeGen/CGAtomic.cpp @@ -702,9 +702,13 @@ static llvm::AtomicOrdering atomicOrderOrSeqCst(llvm::Value *Order) { /// Emit a `_BitInt(N)` atomic read-modify-write as a compare-exchange loop. A /// single `atomicrmw` on the padded memory integer would carry into / compare /// the padding bits, and no arbitrary-width `__atomic_fetch_*` libcall exists -/// for wide widths. The loop computes the new value at width N and writes back -/// a canonical (extended) representation via the existing cmpxchg helper, which -/// also picks the inline-vs-libcall form by size. +/// for wide widths. +/// +/// The update computes at value width N (so the result wraps mod 2^N and is +/// independent of padding). EmitAtomicUpdate carries the raw loaded +/// representation as the cmpxchg expected, so an object with non-canonical +/// padding (e.g. written through a union) still converges instead of spinning +/// forever; the desired it writes back is canonical. See P0528. static RValue emitBitIntAtomicRMWLoop(CodeGenFunction &CGF, AtomicExpr *E, Address Ptr, Address Val1, QualType AtomicTy, @@ -712,8 +716,6 @@ static RValue emitBitIntAtomicRMWLoop(CodeGenFunction &CGF, AtomicExpr *E, bool ReturnsNew, llvm::Value *Order) { QualType ValTy = E->getValueType(); llvm::AtomicOrdering AO = atomicOrderOrSeqCst(Order); - llvm::AtomicOrdering Failure = - llvm::AtomicCmpXchgInst::getStrongestFailureOrdering(AO); LValue AtomicLVal = CGF.MakeAddrLValue(Ptr, AtomicTy); AtomicInfo Atomics(CGF, AtomicLVal); @@ -721,31 +723,17 @@ static RValue emitBitIntAtomicRMWLoop(CodeGenFunction &CGF, AtomicExpr *E, llvm::Value *RHS = CGF.EmitLoadOfScalar(CGF.MakeAddrLValue(Val1, ValTy), E->getExprLoc()); - RValue OldRV = Atomics.EmitAtomicLoad( - AggValueSlot::ignored(), E->getExprLoc(), - /*AsValue=*/true, llvm::AtomicOrdering::Monotonic, E->isVolatile()); - llvm::Value *Init = OldRV.getScalarVal(); - - llvm::BasicBlock *StartBB = CGF.Builder.GetInsertBlock(); - llvm::BasicBlock *LoopBB = CGF.createBasicBlock("atomicrmw.start", CGF.CurFn); - llvm::BasicBlock *EndBB = CGF.createBasicBlock("atomicrmw.end", CGF.CurFn); - CGF.Builder.CreateBr(LoopBB); - CGF.Builder.SetInsertPoint(LoopBB); - - llvm::PHINode *Old = CGF.Builder.CreatePHI(Init->getType(), 2); - Old->addIncoming(Init, StartBB); - - // Compute at the value width via the canonical RMW lowering, so the result - // wraps mod 2^N and never touches the padding bits. - llvm::Value *New = llvm::buildAtomicRMWValue(BinOp, CGF.Builder, Old, RHS); - - auto Res = Atomics.EmitAtomicCompareExchange( - RValue::get(Old), RValue::get(New), AO, Failure, /*IsWeak=*/true); - Old->addIncoming(Res.first.getScalarVal(), CGF.Builder.GetInsertBlock()); - CGF.Builder.CreateCondBr(Res.second, EndBB, LoopBB); - - CGF.Builder.SetInsertPoint(EndBB); - return RValue::get(ReturnsNew ? New : static_cast<llvm::Value *>(Old)); + llvm::Value *Old = nullptr, *New = nullptr; + Atomics.EmitAtomicUpdate( + AO, + [&](RValue OldRV) { + Old = OldRV.getScalarVal(); + New = llvm::buildAtomicRMWValue(BinOp, CGF.Builder, Old, RHS); + return RValue::get(New); + }, + E->isVolatile()); + + return RValue::get(ReturnsNew ? New : Old); } static void EmitAtomicOp(CodeGenFunction &CGF, AtomicExpr *E, Address Dest, diff --git a/clang/test/CodeGen/atomic-bitint.c b/clang/test/CodeGen/atomic-bitint.c index 6476c26a0f0dd..bc1e165fd90e3 100644 --- a/clang/test/CodeGen/atomic-bitint.c +++ b/clang/test/CodeGen/atomic-bitint.c @@ -178,6 +178,7 @@ S37 and37(_Atomic(S37) *p, S37 v) { // CHECK-NEXT: [[P_ADDR:%.*]] = alloca ptr, align 8 // CHECK-NEXT: [[V_ADDR:%.*]] = alloca i64, align 8 // CHECK-NEXT: [[DOTATOMICTMP:%.*]] = alloca i64, align 8 +// CHECK-NEXT: [[ATOMIC_TEMP:%.*]] = alloca i64, align 8 // CHECK-NEXT: store i64 [[V_COERCE]], ptr [[V]], align 8 // CHECK-NEXT: [[TMP0:%.*]] = load i64, ptr [[V]], align 8 // CHECK-NEXT: [[V1:%.*]] = trunc i64 [[TMP0]] to i37 @@ -191,23 +192,23 @@ S37 and37(_Atomic(S37) *p, S37 v) { // CHECK-NEXT: store i64 [[STOREDV2]], ptr [[DOTATOMICTMP]], align 8 // CHECK-NEXT: [[TMP3:%.*]] = load i64, ptr [[DOTATOMICTMP]], align 8 // CHECK-NEXT: [[LOADEDV3:%.*]] = trunc i64 [[TMP3]] to i37 -// CHECK-NEXT: [[ATOMIC_LOAD:%.*]] = load atomic i64, ptr [[TMP1]] monotonic, align 8 -// CHECK-NEXT: [[LOADEDV4:%.*]] = trunc i64 [[ATOMIC_LOAD]] to i37 -// CHECK-NEXT: br label %[[ATOMICRMW_START:.*]] -// CHECK: [[ATOMICRMW_START]]: -// CHECK-NEXT: [[TMP4:%.*]] = phi i37 [ [[LOADEDV4]], %[[ENTRY]] ], [ [[LOADEDV7:%.*]], %[[ATOMICRMW_START]] ] -// CHECK-NEXT: [[NEW:%.*]] = add i37 [[TMP4]], [[LOADEDV3]] -// CHECK-NEXT: [[STOREDV5:%.*]] = sext i37 [[TMP4]] to i64 -// CHECK-NEXT: [[STOREDV6:%.*]] = sext i37 [[NEW]] to i64 -// CHECK-NEXT: [[TMP5:%.*]] = cmpxchg weak ptr [[TMP1]], i64 [[STOREDV5]], i64 [[STOREDV6]] seq_cst seq_cst, align 8 -// CHECK-NEXT: [[TMP6:%.*]] = extractvalue { i64, i1 } [[TMP5]], 0 -// CHECK-NEXT: [[TMP7:%.*]] = extractvalue { i64, i1 } [[TMP5]], 1 -// CHECK-NEXT: [[LOADEDV7]] = trunc i64 [[TMP6]] to i37 -// CHECK-NEXT: br i1 [[TMP7]], label %[[ATOMICRMW_END:.*]], label %[[ATOMICRMW_START]] -// CHECK: [[ATOMICRMW_END]]: -// CHECK-NEXT: store i37 [[TMP4]], ptr [[RETVAL]], align 8 -// CHECK-NEXT: [[TMP8:%.*]] = load i37, ptr [[RETVAL]], align 8 -// CHECK-NEXT: [[COERCE_VAL_II:%.*]] = zext i37 [[TMP8]] to i64 +// CHECK-NEXT: [[ATOMIC_LOAD:%.*]] = load atomic i64, ptr [[TMP1]] seq_cst, align 8 +// CHECK-NEXT: br label %[[ATOMIC_CONT:.*]] +// CHECK: [[ATOMIC_CONT]]: +// CHECK-NEXT: [[TMP4:%.*]] = phi i64 [ [[ATOMIC_LOAD]], %[[ENTRY]] ], [ [[TMP7:%.*]], %[[ATOMIC_CONT]] ] +// CHECK-NEXT: [[LOADEDV4:%.*]] = trunc i64 [[TMP4]] to i37 +// CHECK-NEXT: [[NEW:%.*]] = add i37 [[LOADEDV4]], [[LOADEDV3]] +// CHECK-NEXT: [[STOREDV5:%.*]] = sext i37 [[NEW]] to i64 +// CHECK-NEXT: store atomic i64 [[STOREDV5]], ptr [[ATOMIC_TEMP]] seq_cst, align 8 +// CHECK-NEXT: [[TMP5:%.*]] = load i64, ptr [[ATOMIC_TEMP]], align 8 +// CHECK-NEXT: [[TMP6:%.*]] = cmpxchg ptr [[TMP1]], i64 [[TMP4]], i64 [[TMP5]] seq_cst seq_cst, align 8 +// CHECK-NEXT: [[TMP7]] = extractvalue { i64, i1 } [[TMP6]], 0 +// CHECK-NEXT: [[TMP8:%.*]] = extractvalue { i64, i1 } [[TMP6]], 1 +// CHECK-NEXT: br i1 [[TMP8]], label %[[ATOMIC_EXIT:.*]], label %[[ATOMIC_CONT]] +// CHECK: [[ATOMIC_EXIT]]: +// CHECK-NEXT: store i37 [[LOADEDV4]], ptr [[RETVAL]], align 8 +// CHECK-NEXT: [[TMP9:%.*]] = load i37, ptr [[RETVAL]], align 8 +// CHECK-NEXT: [[COERCE_VAL_II:%.*]] = zext i37 [[TMP9]] to i64 // CHECK-NEXT: ret i64 [[COERCE_VAL_II]] // S37 add37(_Atomic(S37) *p, S37 v) { @@ -223,6 +224,7 @@ S37 add37(_Atomic(S37) *p, S37 v) { // CHECK-NEXT: [[P_ADDR:%.*]] = alloca ptr, align 8 // CHECK-NEXT: [[V_ADDR:%.*]] = alloca i64, align 8 // CHECK-NEXT: [[DOTATOMICTMP:%.*]] = alloca i64, align 8 +// CHECK-NEXT: [[ATOMIC_TEMP:%.*]] = alloca i64, align 8 // CHECK-NEXT: store i64 [[V_COERCE]], ptr [[V]], align 8 // CHECK-NEXT: [[TMP0:%.*]] = load i64, ptr [[V]], align 8 // CHECK-NEXT: [[V1:%.*]] = trunc i64 [[TMP0]] to i37 @@ -236,24 +238,24 @@ S37 add37(_Atomic(S37) *p, S37 v) { // CHECK-NEXT: store i64 [[STOREDV2]], ptr [[DOTATOMICTMP]], align 8 // CHECK-NEXT: [[TMP3:%.*]] = load i64, ptr [[DOTATOMICTMP]], align 8 // CHECK-NEXT: [[LOADEDV3:%.*]] = trunc i64 [[TMP3]] to i37 -// CHECK-NEXT: [[ATOMIC_LOAD:%.*]] = load atomic i64, ptr [[TMP1]] monotonic, align 8 -// CHECK-NEXT: [[LOADEDV4:%.*]] = trunc i64 [[ATOMIC_LOAD]] to i37 -// CHECK-NEXT: br label %[[ATOMICRMW_START:.*]] -// CHECK: [[ATOMICRMW_START]]: -// CHECK-NEXT: [[TMP4:%.*]] = phi i37 [ [[LOADEDV4]], %[[ENTRY]] ], [ [[LOADEDV7:%.*]], %[[ATOMICRMW_START]] ] -// CHECK-NEXT: [[TMP5:%.*]] = icmp sle i37 [[TMP4]], [[LOADEDV3]] -// CHECK-NEXT: [[NEW:%.*]] = select i1 [[TMP5]], i37 [[TMP4]], i37 [[LOADEDV3]] -// CHECK-NEXT: [[STOREDV5:%.*]] = sext i37 [[TMP4]] to i64 -// CHECK-NEXT: [[STOREDV6:%.*]] = sext i37 [[NEW]] to i64 -// CHECK-NEXT: [[TMP6:%.*]] = cmpxchg weak ptr [[TMP1]], i64 [[STOREDV5]], i64 [[STOREDV6]] seq_cst seq_cst, align 8 -// CHECK-NEXT: [[TMP7:%.*]] = extractvalue { i64, i1 } [[TMP6]], 0 -// CHECK-NEXT: [[TMP8:%.*]] = extractvalue { i64, i1 } [[TMP6]], 1 -// CHECK-NEXT: [[LOADEDV7]] = trunc i64 [[TMP7]] to i37 -// CHECK-NEXT: br i1 [[TMP8]], label %[[ATOMICRMW_END:.*]], label %[[ATOMICRMW_START]] -// CHECK: [[ATOMICRMW_END]]: -// CHECK-NEXT: store i37 [[TMP4]], ptr [[RETVAL]], align 8 -// CHECK-NEXT: [[TMP9:%.*]] = load i37, ptr [[RETVAL]], align 8 -// CHECK-NEXT: [[COERCE_VAL_II:%.*]] = zext i37 [[TMP9]] to i64 +// CHECK-NEXT: [[ATOMIC_LOAD:%.*]] = load atomic i64, ptr [[TMP1]] seq_cst, align 8 +// CHECK-NEXT: br label %[[ATOMIC_CONT:.*]] +// CHECK: [[ATOMIC_CONT]]: +// CHECK-NEXT: [[TMP4:%.*]] = phi i64 [ [[ATOMIC_LOAD]], %[[ENTRY]] ], [ [[TMP8:%.*]], %[[ATOMIC_CONT]] ] +// CHECK-NEXT: [[LOADEDV4:%.*]] = trunc i64 [[TMP4]] to i37 +// CHECK-NEXT: [[TMP5:%.*]] = icmp sle i37 [[LOADEDV4]], [[LOADEDV3]] +// CHECK-NEXT: [[NEW:%.*]] = select i1 [[TMP5]], i37 [[LOADEDV4]], i37 [[LOADEDV3]] +// CHECK-NEXT: [[STOREDV5:%.*]] = sext i37 [[NEW]] to i64 +// CHECK-NEXT: store atomic i64 [[STOREDV5]], ptr [[ATOMIC_TEMP]] seq_cst, align 8 +// CHECK-NEXT: [[TMP6:%.*]] = load i64, ptr [[ATOMIC_TEMP]], align 8 +// CHECK-NEXT: [[TMP7:%.*]] = cmpxchg ptr [[TMP1]], i64 [[TMP4]], i64 [[TMP6]] seq_cst seq_cst, align 8 +// CHECK-NEXT: [[TMP8]] = extractvalue { i64, i1 } [[TMP7]], 0 +// CHECK-NEXT: [[TMP9:%.*]] = extractvalue { i64, i1 } [[TMP7]], 1 +// CHECK-NEXT: br i1 [[TMP9]], label %[[ATOMIC_EXIT:.*]], label %[[ATOMIC_CONT]] +// CHECK: [[ATOMIC_EXIT]]: +// CHECK-NEXT: store i37 [[LOADEDV4]], ptr [[RETVAL]], align 8 +// CHECK-NEXT: [[TMP10:%.*]] = load i37, ptr [[RETVAL]], align 8 +// CHECK-NEXT: [[COERCE_VAL_II:%.*]] = zext i37 [[TMP10]] to i64 // CHECK-NEXT: ret i64 [[COERCE_VAL_II]] // U37 min37(_Atomic(S37) *p, S37 v) { @@ -309,7 +311,7 @@ S128 add128(_Atomic(S128) *p, S128 v) { // so the loop calls __atomic_compare_exchange. // CHECK-LABEL: define dso_local void @add256( // CHECK-SAME: ptr dead_on_unwind noalias writable sret(i256) align 8 [[AGG_RESULT:%.*]], ptr noundef [[P:%.*]], ptr noundef byval(i256) align 8 [[TMP0:%.*]]) #[[ATTR0]] { -// CHECK-NEXT: [[ENTRY:.*]]: +// CHECK-NEXT: [[ENTRY:.*:]] // CHECK-NEXT: [[P_ADDR:%.*]] = alloca ptr, align 8 // CHECK-NEXT: [[V_ADDR:%.*]] = alloca i256, align 8 // CHECK-NEXT: [[DOTATOMICTMP:%.*]] = alloca i256, align 8 @@ -323,21 +325,19 @@ S128 add128(_Atomic(S128) *p, S128 v) { // CHECK-NEXT: [[TMP2:%.*]] = load i256, ptr [[V_ADDR]], align 8 // CHECK-NEXT: store i256 [[TMP2]], ptr [[DOTATOMICTMP]], align 8 // CHECK-NEXT: [[TMP3:%.*]] = load i256, ptr [[DOTATOMICTMP]], align 8 -// CHECK-NEXT: call void @__atomic_load(i64 noundef 32, ptr noundef [[TMP1]], ptr noundef [[ATOMIC_TEMP]], i32 noundef 0) +// CHECK-NEXT: call void @__atomic_load(i64 noundef 32, ptr noundef [[TMP1]], ptr noundef [[ATOMIC_TEMP]], i32 noundef 5) +// CHECK-NEXT: br label %[[ATOMIC_CONT:.*]] +// CHECK: [[ATOMIC_CONT]]: // CHECK-NEXT: [[TMP4:%.*]] = load i256, ptr [[ATOMIC_TEMP]], align 8 -// CHECK-NEXT: br label %[[ATOMICRMW_START:.*]] -// CHECK: [[ATOMICRMW_START]]: -// CHECK-NEXT: [[TMP5:%.*]] = phi i256 [ [[TMP4]], %[[ENTRY]] ], [ [[TMP6:%.*]], %[[ATOMICRMW_START]] ] -// CHECK-NEXT: [[NEW:%.*]] = add i256 [[TMP5]], [[TMP3]] -// CHECK-NEXT: store i256 [[TMP5]], ptr [[ATOMIC_TEMP1]], align 8 +// CHECK-NEXT: [[NEW:%.*]] = add i256 [[TMP4]], [[TMP3]] // CHECK-NEXT: store i256 [[NEW]], ptr [[ATOMIC_TEMP2]], align 8 -// CHECK-NEXT: [[CALL:%.*]] = call zeroext i1 @__atomic_compare_exchange(i64 noundef 32, ptr noundef [[TMP1]], ptr noundef [[ATOMIC_TEMP1]], ptr noundef [[ATOMIC_TEMP2]], i32 noundef 5, i32 noundef 5) -// CHECK-NEXT: [[TMP6]] = load i256, ptr [[ATOMIC_TEMP1]], align 8 -// CHECK-NEXT: br i1 [[CALL]], label %[[ATOMICRMW_END:.*]], label %[[ATOMICRMW_START]] -// CHECK: [[ATOMICRMW_END]]: +// CHECK-NEXT: call void @__atomic_store(i64 noundef 32, ptr noundef [[ATOMIC_TEMP1]], ptr noundef [[ATOMIC_TEMP2]], i32 noundef 5) +// CHECK-NEXT: [[CALL:%.*]] = call zeroext i1 @__atomic_compare_exchange(i64 noundef 32, ptr noundef [[TMP1]], ptr noundef [[ATOMIC_TEMP]], ptr noundef [[ATOMIC_TEMP1]], i32 noundef 5, i32 noundef 5) +// CHECK-NEXT: br i1 [[CALL]], label %[[ATOMIC_EXIT:.*]], label %[[ATOMIC_CONT]] +// CHECK: [[ATOMIC_EXIT]]: +// CHECK-NEXT: store i256 [[TMP4]], ptr [[AGG_RESULT]], align 8 +// CHECK-NEXT: [[TMP5:%.*]] = load i256, ptr [[AGG_RESULT]], align 8 // CHECK-NEXT: store i256 [[TMP5]], ptr [[AGG_RESULT]], align 8 -// CHECK-NEXT: [[TMP7:%.*]] = load i256, ptr [[AGG_RESULT]], align 8 -// CHECK-NEXT: store i256 [[TMP7]], ptr [[AGG_RESULT]], align 8 // CHECK-NEXT: ret void // S256 add256(_Atomic(S256) *p, S256 v) { @@ -347,7 +347,7 @@ S256 add256(_Atomic(S256) *p, S256 v) { // Wide bitwise also needs the loop: the wide path has no inline atomicrmw. // CHECK-LABEL: define dso_local void @or256( // CHECK-SAME: ptr dead_on_unwind noalias writable sret(i256) align 8 [[AGG_RESULT:%.*]], ptr noundef [[P:%.*]], ptr noundef byval(i256) align 8 [[TMP0:%.*]]) #[[ATTR0]] { -// CHECK-NEXT: [[ENTRY:.*]]: +// CHECK-NEXT: [[ENTRY:.*:]] // CHECK-NEXT: [[P_ADDR:%.*]] = alloca ptr, align 8 // CHECK-NEXT: [[V_ADDR:%.*]] = alloca i256, align 8 // CHECK-NEXT: [[DOTATOMICTMP:%.*]] = alloca i256, align 8 @@ -361,21 +361,19 @@ S256 add256(_Atomic(S256) *p, S256 v) { // CHECK-NEXT: [[TMP2:%.*]] = load i256, ptr [[V_ADDR]], align 8 // CHECK-NEXT: store i256 [[TMP2]], ptr [[DOTATOMICTMP]], align 8 // CHECK-NEXT: [[TMP3:%.*]] = load i256, ptr [[DOTATOMICTMP]], align 8 -// CHECK-NEXT: call void @__atomic_load(i64 noundef 32, ptr noundef [[TMP1]], ptr noundef [[ATOMIC_TEMP]], i32 noundef 0) +// CHECK-NEXT: call void @__atomic_load(i64 noundef 32, ptr noundef [[TMP1]], ptr noundef [[ATOMIC_TEMP]], i32 noundef 5) +// CHECK-NEXT: br label %[[ATOMIC_CONT:.*]] +// CHECK: [[ATOMIC_CONT]]: // CHECK-NEXT: [[TMP4:%.*]] = load i256, ptr [[ATOMIC_TEMP]], align 8 -// CHECK-NEXT: br label %[[ATOMICRMW_START:.*]] -// CHECK: [[ATOMICRMW_START]]: -// CHECK-NEXT: [[TMP5:%.*]] = phi i256 [ [[TMP4]], %[[ENTRY]] ], [ [[TMP6:%.*]], %[[ATOMICRMW_START]] ] -// CHECK-NEXT: [[NEW:%.*]] = or i256 [[TMP5]], [[TMP3]] -// CHECK-NEXT: store i256 [[TMP5]], ptr [[ATOMIC_TEMP1]], align 8 +// CHECK-NEXT: [[NEW:%.*]] = or i256 [[TMP4]], [[TMP3]] // CHECK-NEXT: store i256 [[NEW]], ptr [[ATOMIC_TEMP2]], align 8 -// CHECK-NEXT: [[CALL:%.*]] = call zeroext i1 @__atomic_compare_exchange(i64 noundef 32, ptr noundef [[TMP1]], ptr noundef [[ATOMIC_TEMP1]], ptr noundef [[ATOMIC_TEMP2]], i32 noundef 5, i32 noundef 5) -// CHECK-NEXT: [[TMP6]] = load i256, ptr [[ATOMIC_TEMP1]], align 8 -// CHECK-NEXT: br i1 [[CALL]], label %[[ATOMICRMW_END:.*]], label %[[ATOMICRMW_START]] -// CHECK: [[ATOMICRMW_END]]: +// CHECK-NEXT: call void @__atomic_store(i64 noundef 32, ptr noundef [[ATOMIC_TEMP1]], ptr noundef [[ATOMIC_TEMP2]], i32 noundef 5) +// CHECK-NEXT: [[CALL:%.*]] = call zeroext i1 @__atomic_compare_exchange(i64 noundef 32, ptr noundef [[TMP1]], ptr noundef [[ATOMIC_TEMP]], ptr noundef [[ATOMIC_TEMP1]], i32 noundef 5, i32 noundef 5) +// CHECK-NEXT: br i1 [[CALL]], label %[[ATOMIC_EXIT:.*]], label %[[ATOMIC_CONT]] +// CHECK: [[ATOMIC_EXIT]]: +// CHECK-NEXT: store i256 [[TMP4]], ptr [[AGG_RESULT]], align 8 +// CHECK-NEXT: [[TMP5:%.*]] = load i256, ptr [[AGG_RESULT]], align 8 // CHECK-NEXT: store i256 [[TMP5]], ptr [[AGG_RESULT]], align 8 -// CHECK-NEXT: [[TMP7:%.*]] = load i256, ptr [[AGG_RESULT]], align 8 -// CHECK-NEXT: store i256 [[TMP7]], ptr [[AGG_RESULT]], align 8 // CHECK-NEXT: ret void // S256 or256(_Atomic(S256) *p, S256 v) { >From e79f0f4e3fbec4c77fcae7f42f2d9dc5f3e85c5a Mon Sep 17 00:00:00 2001 From: Xavier Roche <[email protected]> Date: Sat, 27 Jun 2026 09:34:49 +0200 Subject: [PATCH 7/7] [compiler-rt] Add runtime test for atomic _BitInt(N) Single-threaded execution test for _Atomic(_BitInt(N)): per-op value correctness on a padded inline width and on wide libcall widths, plus dirty-padding convergence. An object with non-canonical padding (written through a union) must not spin forever in the read-modify-write compare-exchange loop. The IR-shape checks in clang/test/CodeGen/atomic-bitint.c cannot witness non-termination. Assisted-by: Claude (Anthropic) Co-Authored-By: Claude Opus 4.6 <[email protected]> --- .../test/builtins/Unit/atomic_bitint_test.c | 91 +++++++++++++++++++ 1 file changed, 91 insertions(+) create mode 100644 compiler-rt/test/builtins/Unit/atomic_bitint_test.c diff --git a/compiler-rt/test/builtins/Unit/atomic_bitint_test.c b/compiler-rt/test/builtins/Unit/atomic_bitint_test.c new file mode 100644 index 0000000000000..33a745348a6f0 --- /dev/null +++ b/compiler-rt/test/builtins/Unit/atomic_bitint_test.c @@ -0,0 +1,91 @@ +// RUN: %clang_builtins -std=c23 %s %librt -o %t && %run %t +// REQUIRES: librt_has_atomic +//===-- atomic_bitint_test.c - Test atomic ops on _BitInt -----------------===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// +// +// Runtime checks for atomic read-modify-write on _BitInt(N). A padded width +// (37) exercises the inline compare-exchange loop; a wide width (256) exercises +// the __atomic_compare_exchange libcall loop. Each op is cross-checked against +// the same operation done non-atomically, and the dirty-padding cases confirm +// the loop converges (a re-canonicalized expected would spin forever). +// +//===----------------------------------------------------------------------===// + +#include <assert.h> +#include <stdio.h> + +typedef signed _BitInt(37) S37; +typedef unsigned _BitInt(37) U37; +typedef signed _BitInt(256) S256; // no padding (exactly 32 bytes) +typedef signed _BitInt(200) S200; // padded: 200 value bits in 32-byte storage + +// Each macro runs the atomic op and asserts the returned old value and the +// resulting object both match the non-atomic computation at width N. +#define CHECK_FETCH(T, init, op, rhs, expr) \ + do { \ + _Atomic(T) a = (init); \ + T old = __c11_atomic_fetch_##op(&a, (rhs), __ATOMIC_SEQ_CST); \ + assert(old == (T)(init)); \ + assert((T)a == (T)(expr)); \ + } while (0) + +static void test_ops(void) { + CHECK_FETCH(S37, 100, add, 5, 105); + CHECK_FETCH(S37, 100, sub, 40, 60); + CHECK_FETCH(S37, -3, add, 1, -2); + CHECK_FETCH(U37, 7, add, 9, 16); + CHECK_FETCH(S37, 0x15, and, 0x13, 0x11); + CHECK_FETCH(S37, 0x10, or, 5, 0x15); + CHECK_FETCH(S37, 0x1F, xor, 0x15, 0x0A); + CHECK_FETCH(S37, -5, min, -7, -7); // signed: -7 < -5 + CHECK_FETCH(U37, 5, min, (U37)-1, 5); // unsigned: 5 < 2^37-1 + CHECK_FETCH(S37, 3, max, 9, 9); + CHECK_FETCH(S37, 0x15, nand, 0x13, (S37) ~(0x15 & 0x13)); + // Wide widths: the libcall loop (no padding, and padded). + CHECK_FETCH(S256, 100, add, 5, 105); + CHECK_FETCH(S256, 1, or, 0xFE, 0xFF); + CHECK_FETCH(S200, 100, add, 5, 105); +} + +// Seed non-canonical padding through a union, then RMW. A loop that carried a +// re-canonicalized expected would never match memory and hang here. +static void test_dirty_padding(void) { + union { + _Atomic(S37) a; + unsigned long b; + } s; + s.b = ((unsigned long)1 << 40) | 5u; // value bits 5, padding bit 40 set + S37 old = __c11_atomic_fetch_add(&s.a, 1, __ATOMIC_SEQ_CST); + assert(old == 5 && (S37)s.a == 6); + + union { + _Atomic(U37) a; + unsigned long b; + } u; + u.b = ((unsigned long)3 << 50) | 7u; + U37 uold = __c11_atomic_fetch_add(&u.a, 1, __ATOMIC_SEQ_CST); + assert(uold == 7 && (U37)u.a == 8); + + // Wide padded width (libcall loop): _BitInt(200) has 56 padding bits in its + // 32-byte storage. Set the overlay at value level (endian-independent): low + // 200 bits = 5, a padding bit (240) dirtied. + union { + _Atomic(S200) a; + unsigned _BitInt(256) full; + } w; + w.full = (unsigned _BitInt(256))5 | ((unsigned _BitInt(256))0xAA << 240); + S200 wold = __c11_atomic_fetch_add(&w.a, 1, __ATOMIC_SEQ_CST); + assert(wold == 5 && (S200)w.a == 6); +} + +int main(void) { + test_ops(); + test_dirty_padding(); + printf("PASS\n"); + return 0; +} _______________________________________________ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
