llvmbot wrote:
<!--LLVM PR SUMMARY COMMENT--> @llvm/pr-subscribers-backend-amdgpu Author: Yaxun (Sam) Liu (yxsamliu) <details> <summary>Changes</summary> Clang recently introduced `-f[no]atomic-` options and `[[clang::atomic]]` attributes to give better control over atomic codegen. The default assumed no remote memory and no fine-grained memory to enable more efficient instructions. This was correct for most AMD GPUs, but it caused regressions on AMDGPU gfx11 and gfx12 for FP atomics on fine-grained memory, where earlier Clang used conservative behavior. Those changes altered metadata and instruction selection, breaking code that depended on the old defaults. For example, the following compiler-explorer link showed `__atomic_fetch_add` was emitted as CAS loop in ROCm 6.4 but as `global_atomic_add_f32` in llvm trunk for gfx1200: https://godbolt.org/z/35Y3sdsn5 We need to restore a safe default to avoid regressions while keeping a clean path to the new model. This patch adds `-f[no-]atomic-backward-compatible`. It is enabled by default to recover the old behavior and avoid regressions. Disabling it adopts the new and consistent atomic codegen design with explicit assumptions about memory types. To make this robust, atomic attributes and atomic options are made tri-state (unset/true/false). This accurately conveys user intent and attribute semantics, so Clang can reconstruct prior behavior per GPU architecture and atomic data type, and still honor explicit requests. (Note: the unset state is only for initial state and cannot be specified by users explicitly.) The tri-state change is minimal and lightweight in the driver, LangOptions, and CodeGen plumbing. The driver prefers the last `-fatomic-*` argument. CodeGen computes per-op defaults on gfx11/gfx12 FP atomics under compatibility, applies explicit options and attributes, and emits matching AMDGPU metadata. --- Patch is 22.85 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/157672.diff 12 Files Affected: - (modified) clang/docs/LanguageExtensions.rst (+16) - (modified) clang/include/clang/Basic/LangOptions.h (+38-11) - (modified) clang/include/clang/Driver/Options.td (+30-17) - (modified) clang/lib/Basic/Targets/AMDGPU.cpp (-2) - (modified) clang/lib/CodeGen/CodeGenFunction.h (+4-4) - (modified) clang/lib/CodeGen/CodeGenModule.h (+5-5) - (modified) clang/lib/CodeGen/Targets/AMDGPU.cpp (+39-5) - (modified) clang/lib/Driver/ToolChains/Clang.cpp (+9-6) - (modified) clang/lib/Frontend/CompilerInvocation.cpp (+7) - (modified) clang/test/CodeGenCUDA/amdgpu-atomic-ops.cu (+1-1) - (added) clang/test/CodeGenCUDA/atomic-options-backward-compat.hip (+71) - (modified) clang/test/Driver/atomic-options.hip (+15-2) ``````````diff diff --git a/clang/docs/LanguageExtensions.rst b/clang/docs/LanguageExtensions.rst index ad190eace5b05..1da026b4e8e0d 100644 --- a/clang/docs/LanguageExtensions.rst +++ b/clang/docs/LanguageExtensions.rst @@ -5901,6 +5901,22 @@ Each option has a corresponding flag: ``-fatomic-fine-grained-memory`` / ``-fno-atomic-fine-grained-memory``, and ``-fatomic-ignore-denormal-mode`` / ``-fno-atomic-ignore-denormal-mode``. +To address correctness regressions on some targets, such as AMDGPU, a backward +compatibility mode is available via the ``-f[no-]atomic-backward-compatible`` +flag. When this flag is enabled (default), the compiler keeps the older, +more conservative behavior for floating-point atomic operations on gfx11 and +gfx12 processors. In this mode, the compiler does not emit special metadata +unless you explicitly request it with the ``[[clang::atomic]]`` attribute or +specify ``-f[no-]atomic-*`` options. This preserves existing behavior and avoids +source code changes. + +On AMDGPU, when backward compatibility mode is disabled, the compiler assumes +that atomic operations are not used on fine-grained or remote memory. This +corresponds to ``no_fine_grained_memory`` and ``no_remote_memory``. These +assumptions allow the compiler to generate more efficient native instructions. +However, they can cause correctness issues if atomic operations are used on +fine-grained or remote memory. + Code using the ``[[clang::atomic]]`` attribute can then selectively override the command-line defaults on a per-block basis. For instance: diff --git a/clang/include/clang/Basic/LangOptions.h b/clang/include/clang/Basic/LangOptions.h index a8943df5b39aa..6236fa847169c 100644 --- a/clang/include/clang/Basic/LangOptions.h +++ b/clang/include/clang/Basic/LangOptions.h @@ -563,8 +563,12 @@ class LangOptions : public LangOptionsBase { /// Atomic code-generation options. /// These flags are set directly from the command-line options. bool AtomicRemoteMemory = false; + bool AtomicRemoteMemorySpecified = false; bool AtomicFineGrainedMemory = false; + bool AtomicFineGrainedMemorySpecified = false; bool AtomicIgnoreDenormalMode = false; + bool AtomicIgnoreDenormalModeSpecified = false; + bool AtomicBackwardCompatible = true; LangOptions(); @@ -1047,20 +1051,36 @@ enum class AtomicOptionKind { }; struct AtomicOptions { - // Bitfields for each option. - unsigned remote_memory : 1; - unsigned fine_grained_memory : 1; - unsigned ignore_denormal_mode : 1; + // Tri-state for each option: unset / true / false. + std::optional<bool> remote_memory; + std::optional<bool> fine_grained_memory; + std::optional<bool> ignore_denormal_mode; - AtomicOptions() - : remote_memory(0), fine_grained_memory(0), ignore_denormal_mode(0) {} + // Default constructs with everything unset. + AtomicOptions() = default; + // Construct with all values explicitly set from LangOptions (used when you + // want a fully-specified options set, e.g., to compute effective values). AtomicOptions(const LangOptions &LO) : remote_memory(LO.AtomicRemoteMemory), fine_grained_memory(LO.AtomicFineGrainedMemory), ignore_denormal_mode(LO.AtomicIgnoreDenormalMode) {} - bool getOption(AtomicOptionKind Kind) const { + // Whether an option was explicitly set (true/false) vs. unset. + bool hasOption(AtomicOptionKind Kind) const { + switch (Kind) { + case AtomicOptionKind::RemoteMemory: + return remote_memory.has_value(); + case AtomicOptionKind::FineGrainedMemory: + return fine_grained_memory.has_value(); + case AtomicOptionKind::IgnoreDenormalMode: + return ignore_denormal_mode.has_value(); + } + llvm_unreachable("Invalid AtomicOptionKind"); + } + + // Get the tri-state value (std::optional<bool>) of an option. + std::optional<bool> getTri(AtomicOptionKind Kind) const { switch (Kind) { case AtomicOptionKind::RemoteMemory: return remote_memory; @@ -1072,7 +1092,8 @@ struct AtomicOptions { llvm_unreachable("Invalid AtomicOptionKind"); } - void setOption(AtomicOptionKind Kind, bool Value) { + // Set or clear (unset) an option explicitly. + void setOption(AtomicOptionKind Kind, std::optional<bool> Value) { switch (Kind) { case AtomicOptionKind::RemoteMemory: remote_memory = Value; @@ -1088,9 +1109,15 @@ struct AtomicOptions { } LLVM_DUMP_METHOD void dump() const { - llvm::errs() << "\n remote_memory: " << remote_memory - << "\n fine_grained_memory: " << fine_grained_memory - << "\n ignore_denormal_mode: " << ignore_denormal_mode << "\n"; + auto TriStr = [](const std::optional<bool> &v) -> const char * { + if (!v.has_value()) + return "Unset"; + return *v ? "1" : "0"; + }; + llvm::errs() << "\n remote_memory: " << TriStr(remote_memory) + << "\n fine_grained_memory: " << TriStr(fine_grained_memory) + << "\n ignore_denormal_mode: " << TriStr(ignore_denormal_mode) + << "\n"; } }; diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td index 718808d583e8c..4835cb606c5be 100644 --- a/clang/include/clang/Driver/Options.td +++ b/clang/include/clang/Driver/Options.td @@ -2344,23 +2344,36 @@ def fsymbol_partition_EQ : Joined<["-"], "fsymbol-partition=">, Group<f_Group>, Visibility<[ClangOption, CC1Option]>, MarshallingInfoString<CodeGenOpts<"SymbolPartition">>; -defm atomic_remote_memory : BoolFOption<"atomic-remote-memory", - LangOpts<"AtomicRemoteMemory">, DefaultFalse, - PosFlag<SetTrue, [], [ClangOption, CC1Option, FlangOption, FC1Option], "May have">, - NegFlag<SetFalse, [], [ClangOption, FlangOption], "Assume no">, - BothFlags<[], [ClangOption, FlangOption], " atomic operations on remote memory">>; - -defm atomic_fine_grained_memory : BoolFOption<"atomic-fine-grained-memory", - LangOpts<"AtomicFineGrainedMemory">, DefaultFalse, - PosFlag<SetTrue, [], [ClangOption, CC1Option, FlangOption, FC1Option], "May have">, - NegFlag<SetFalse, [], [ClangOption, FlangOption], "Assume no">, - BothFlags<[], [ClangOption, FlangOption], " atomic operations on fine-grained memory">>; - -defm atomic_ignore_denormal_mode : BoolFOption<"atomic-ignore-denormal-mode", - LangOpts<"AtomicIgnoreDenormalMode">, DefaultFalse, - PosFlag<SetTrue, [], [ClangOption, CC1Option, FlangOption, FC1Option], "Allow">, - NegFlag<SetFalse, [], [ClangOption, FlangOption], "Disallow">, - BothFlags<[], [ClangOption, FlangOption], " atomic operations to ignore denormal mode">>; +defm atomic_remote_memory + : BoolFOption< + "atomic-remote-memory", LangOpts<"AtomicRemoteMemory">, DefaultFalse, + PosFlag<SetTrue, [], [ClangOption, CC1Option], "May have">, + NegFlag<SetFalse, [], [ClangOption, CC1Option], "Assume no">, + BothFlags<[], [ClangOption], " atomic operations on remote memory">>; + +defm atomic_fine_grained_memory + : BoolFOption<"atomic-fine-grained-memory", + LangOpts<"AtomicFineGrainedMemory">, DefaultFalse, + PosFlag<SetTrue, [], [ClangOption, CC1Option], "May have">, + NegFlag<SetFalse, [], [ClangOption, CC1Option], "Assume no">, + BothFlags<[], [ClangOption], + " atomic operations on fine-grained memory">>; + +defm atomic_ignore_denormal_mode + : BoolFOption<"atomic-ignore-denormal-mode", + LangOpts<"AtomicIgnoreDenormalMode">, DefaultFalse, + PosFlag<SetTrue, [], [ClangOption, CC1Option], "Allow">, + NegFlag<SetFalse, [], [ClangOption, CC1Option], "Disallow">, + BothFlags<[], [ClangOption], + " atomic operations to ignore denormal mode">>; + +defm atomic_backward_compatible + : BoolFOption<"atomic-backward-compatible", + LangOpts<"AtomicBackwardCompatible">, DefaultTrue, + PosFlag<SetTrue, [], [ClangOption], "Enable">, + NegFlag<SetFalse, [], [ClangOption, CC1Option], "Disable">, + BothFlags<[], [ClangOption], + " backward compatibility for atomic operations">>; defm memory_profile : OptInCC1FFlag<"memory-profile", "Enable", "Disable", " heap memory profiling">; def fmemory_profile_EQ : Joined<["-"], "fmemory-profile=">, diff --git a/clang/lib/Basic/Targets/AMDGPU.cpp b/clang/lib/Basic/Targets/AMDGPU.cpp index 87de9e6865e71..fcbc101231aa2 100644 --- a/clang/lib/Basic/Targets/AMDGPU.cpp +++ b/clang/lib/Basic/Targets/AMDGPU.cpp @@ -281,8 +281,6 @@ void AMDGPUTargetInfo::adjust(DiagnosticsEngine &Diags, LangOptions &Opts, // to OpenCL can be removed from the following line. setAddressSpaceMap((Opts.OpenCL && !Opts.OpenCLGenericAddressSpace) || !isAMDGCN(getTriple())); - - AtomicOpts = AtomicOptions(Opts); } llvm::SmallVector<Builtin::InfosShard> diff --git a/clang/lib/CodeGen/CodeGenFunction.h b/clang/lib/CodeGen/CodeGenFunction.h index 123cb4f51f828..b5b99c35246d0 100644 --- a/clang/lib/CodeGen/CodeGenFunction.h +++ b/clang/lib/CodeGen/CodeGenFunction.h @@ -818,7 +818,7 @@ class CodeGenFunction : public CodeGenTypeCache { class CGAtomicOptionsRAII { public: - CGAtomicOptionsRAII(CodeGenModule &CGM_, AtomicOptions AO) + CGAtomicOptionsRAII(CodeGenModule &CGM_, std::optional<AtomicOptions> AO) : CGM(CGM_), SavedAtomicOpts(CGM.getAtomicOpts()) { CGM.setAtomicOpts(AO); } @@ -826,7 +826,7 @@ class CodeGenFunction : public CodeGenTypeCache { : CGM(CGM_), SavedAtomicOpts(CGM.getAtomicOpts()) { if (!AA) return; - AtomicOptions AO = SavedAtomicOpts; + AtomicOptions AO = SavedAtomicOpts.value_or(AtomicOptions()); for (auto Option : AA->atomicOptions()) { switch (Option) { case AtomicAttr::remote_memory: @@ -849,7 +849,7 @@ class CodeGenFunction : public CodeGenTypeCache { break; } } - CGM.setAtomicOpts(AO); + CGM.setAtomicOpts(std::make_optional(AO)); } CGAtomicOptionsRAII(const CGAtomicOptionsRAII &) = delete; @@ -858,7 +858,7 @@ class CodeGenFunction : public CodeGenTypeCache { private: CodeGenModule &CGM; - AtomicOptions SavedAtomicOpts; + std::optional<AtomicOptions> SavedAtomicOpts; }; public: diff --git a/clang/lib/CodeGen/CodeGenModule.h b/clang/lib/CodeGen/CodeGenModule.h index f62350fd8d378..9e37a92287590 100644 --- a/clang/lib/CodeGen/CodeGenModule.h +++ b/clang/lib/CodeGen/CodeGenModule.h @@ -677,7 +677,7 @@ class CodeGenModule : public CodeGenTypeCache { std::optional<PointerAuthQualifier> computeVTPointerAuthentication(const CXXRecordDecl *ThisClass); - AtomicOptions AtomicOpts; + std::optional<AtomicOptions> AtomicOpts; // A set of functions which should be hot-patched; see // -fms-hotpatch-functions-file (and -list). This will nearly always be empty. @@ -699,11 +699,11 @@ class CodeGenModule : public CodeGenTypeCache { /// Finalize LLVM code generation. void Release(); - /// Get the current Atomic options. - AtomicOptions getAtomicOpts() { return AtomicOpts; } + /// Get the current atomic options. + std::optional<AtomicOptions> getAtomicOpts() { return AtomicOpts; } - /// Set the current Atomic options. - void setAtomicOpts(AtomicOptions AO) { AtomicOpts = AO; } + /// Set the current atomic options. + void setAtomicOpts(std::optional<AtomicOptions> AO) { AtomicOpts = AO; } /// Return true if we should emit location information for expressions. bool getExpressionLocationsEnabled() const; diff --git a/clang/lib/CodeGen/Targets/AMDGPU.cpp b/clang/lib/CodeGen/Targets/AMDGPU.cpp index 0fcbf7e458a34..bcf3d878f8b9a 100644 --- a/clang/lib/CodeGen/Targets/AMDGPU.cpp +++ b/clang/lib/CodeGen/Targets/AMDGPU.cpp @@ -10,6 +10,7 @@ #include "TargetInfo.h" #include "llvm/ADT/StringExtras.h" #include "llvm/Support/AMDGPUAddrSpace.h" +#include "llvm/TargetParser/TargetParser.h" using namespace clang; using namespace clang::CodeGen; @@ -544,14 +545,47 @@ void AMDGPUTargetCodeGenInfo::setTargetAtomicMetadata( if (!RMW) return; - AtomicOptions AO = CGF.CGM.getAtomicOpts(); llvm::MDNode *Empty = llvm::MDNode::get(CGF.getLLVMContext(), {}); - if (!AO.getOption(clang::AtomicOptionKind::FineGrainedMemory)) + const bool IsFP = llvm::AtomicRMWInst::isFPOperation(RMW->getOperation()); + llvm::AMDGPU::IsaVersion ISAVer = + llvm::AMDGPU::getIsaVersion(CGF.getTarget().getTargetOpts().CPU); + const bool Compat = CGF.CGM.getLangOpts().AtomicBackwardCompatible && + (ISAVer.Major >= 11 && ISAVer.Major <= 12); + // Establish local defaults based on compat mode and FP-ness. + // - compat + FP: default to remote_memory=true, fine_grained_memory=true + // - otherwise: default to remote_memory=false, fine_grained_memory=false + // - ignore_denormal_mode defaults to false + bool Remote = (Compat && IsFP); + bool FineGrained = (Compat && IsFP); + bool IgnoreDenorm = false; + + // Override with -cc1/driver-specified LangOptions if explicitly specified. + const clang::LangOptions &LO = CGF.CGM.getLangOpts(); + if (LO.AtomicRemoteMemorySpecified) + Remote = LO.AtomicRemoteMemory; + if (LO.AtomicFineGrainedMemorySpecified) + FineGrained = LO.AtomicFineGrainedMemory; + if (LO.AtomicIgnoreDenormalModeSpecified) + IgnoreDenorm = LO.AtomicIgnoreDenormalMode; + + // Override with statement attribute values if present (CGM-scoped). + // We assume AtomicOptions holds tri-state (std::optional<bool>) fields: + // remote_memory, fine_grained_memory, ignore_denormal_mode. + if (auto MaybeAO = CGF.CGM.getAtomicOpts()) { + if (MaybeAO->remote_memory.has_value()) + Remote = *MaybeAO->remote_memory; + if (MaybeAO->fine_grained_memory.has_value()) + FineGrained = *MaybeAO->fine_grained_memory; + if (MaybeAO->ignore_denormal_mode.has_value()) + IgnoreDenorm = *MaybeAO->ignore_denormal_mode; + } + + // Emit metadata according to final values. + if (!FineGrained) RMW->setMetadata("amdgpu.no.fine.grained.memory", Empty); - if (!AO.getOption(clang::AtomicOptionKind::RemoteMemory)) + if (!Remote) RMW->setMetadata("amdgpu.no.remote.memory", Empty); - if (AO.getOption(clang::AtomicOptionKind::IgnoreDenormalMode) && - RMW->getOperation() == llvm::AtomicRMWInst::FAdd && + if (IgnoreDenorm && RMW->getOperation() == llvm::AtomicRMWInst::FAdd && RMW->getType()->isFloatTy()) RMW->setMetadata("amdgpu.ignore.denormal.mode", Empty); } diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp index 21e45c6b56bbb..7520adfdc7844 100644 --- a/clang/lib/Driver/ToolChains/Clang.cpp +++ b/clang/lib/Driver/ToolChains/Clang.cpp @@ -5865,12 +5865,15 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA, RenderFloatingPointOptions(TC, D, OFastEnabled, Args, CmdArgs, JA); - Args.addOptInFlag(CmdArgs, options::OPT_fatomic_remote_memory, - options::OPT_fno_atomic_remote_memory); - Args.addOptInFlag(CmdArgs, options::OPT_fatomic_fine_grained_memory, - options::OPT_fno_atomic_fine_grained_memory); - Args.addOptInFlag(CmdArgs, options::OPT_fatomic_ignore_denormal_mode, - options::OPT_fno_atomic_ignore_denormal_mode); + Args.addLastArg(CmdArgs, options::OPT_fatomic_remote_memory, + options::OPT_fno_atomic_remote_memory); + Args.addLastArg(CmdArgs, options::OPT_fatomic_fine_grained_memory, + options::OPT_fno_atomic_fine_grained_memory); + Args.addLastArg(CmdArgs, options::OPT_fatomic_ignore_denormal_mode, + options::OPT_fno_atomic_ignore_denormal_mode); + + Args.addOptOutFlag(CmdArgs, options::OPT_fatomic_backward_compatible, + options::OPT_fno_atomic_backward_compatible); if (Arg *A = Args.getLastArg(options::OPT_fextend_args_EQ)) { const llvm::Triple::ArchType Arch = TC.getArch(); diff --git a/clang/lib/Frontend/CompilerInvocation.cpp b/clang/lib/Frontend/CompilerInvocation.cpp index 8411d00cc7812..2739c045eca50 100644 --- a/clang/lib/Frontend/CompilerInvocation.cpp +++ b/clang/lib/Frontend/CompilerInvocation.cpp @@ -4633,6 +4633,13 @@ bool CompilerInvocation::ParseLangArgs(LangOptions &Opts, ArgList &Args, } } + Opts.AtomicRemoteMemorySpecified = + Args.hasArg(OPT_fatomic_remote_memory, OPT_fno_atomic_remote_memory); + Opts.AtomicFineGrainedMemorySpecified = Args.hasArg( + OPT_fatomic_fine_grained_memory, OPT_fno_atomic_fine_grained_memory); + Opts.AtomicIgnoreDenormalModeSpecified = Args.hasArg( + OPT_fatomic_ignore_denormal_mode, OPT_fno_atomic_ignore_denormal_mode); + return Diags.getNumErrors() == NumErrorsBefore; } diff --git a/clang/test/CodeGenCUDA/amdgpu-atomic-ops.cu b/clang/test/CodeGenCUDA/amdgpu-atomic-ops.cu index 22c40e6d38ea2..62360c7c9a4db 100644 --- a/clang/test/CodeGenCUDA/amdgpu-atomic-ops.cu +++ b/clang/test/CodeGenCUDA/amdgpu-atomic-ops.cu @@ -7,7 +7,7 @@ // RUN: -fnative-half-arguments-and-returns -munsafe-fp-atomics | FileCheck -check-prefixes=FUN,CHECK,UNSAFEIR %s // RUN: %clang_cc1 -x hip %s -O3 -S -o - -triple=amdgcn-amd-amdhsa \ -// RUN: -fcuda-is-device -target-cpu gfx1100 -fnative-half-type \ +// RUN: -fcuda-is-device -target-cpu gfx1100 -fnative-half-type -fno-atomic-backward-compatible \ // RUN: -fnative-half-arguments-and-returns | FileCheck -check-prefixes=FUN,SAFE %s // RUN: %clang_cc1 -x hip %s -O3 -S -o - -triple=amdgcn-amd-amdhsa \ diff --git a/clang/test/CodeGenCUDA/atomic-options-backward-compat.hip b/clang/test/CodeGenCUDA/atomic-options-backward-compat.hip new file mode 100644 index 0000000000000..017d7052a3472 --- /dev/null +++ b/clang/test/CodeGenCUDA/atomic-options-backward-compat.hip @@ -0,0 +1,71 @@ +// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -target-cpu gfx1100 -emit-llvm -o - \ +// RUN: -fno-atomic-backward-compatible %s | FileCheck %s --check-prefix=CHECK-NEW +// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -target-cpu gfx1100 -emit-llvm -o - %s \ +// RUN: | FileCheck %s --check-prefix=CHECK-OLD + +// CHECK-NEW-LABEL: @_Z16test_atomic_faddPf( +// CHECK-NEW: atomicrmw fadd ptr %{{.*}}, float %{{.*}} monotonic, align 4, !amdgpu.no.fine.grained.memory !{{[0-9]+}}, !amdgpu.no.remote.memory !{{[0-9]+}} +// CHECK-OLD-LABEL: @_Z16test_atomic_faddPf( +// CHECK-OLD: atomicrmw fadd ptr %{{.*}}, float %{{.*}} monotonic, align 4 +void test_atomic_fadd(float* ptr) { + __atomic_fetch_add(ptr, 1.0f, __ATOMIC_RELAXED); +} + +// CHECK-NEW-LABEL: @_Z13test_overridePf( +// CHECK-NEW: atomicrmw fadd ptr %{{.*}}, float %{{.*}} monotonic, align 4, !amdgpu.no.fine.grained.memory !{{[0-9]+}}, !amdgpu.no.remote.memory !{{[0-9]+}}, !amdgpu.ignore.denormal.mode !{{[0-9]+}} +// CHECK-OLD-LABEL: @_Z13test_overridePf( +// CHECK-OLD-NOT: !amdgpu.no.fine.grained.memory +// CHECK-OLD-NOT: !amdgpu.no.remote.memory +// CHECK-OLD: atomicrmw fadd ptr %{{.*}}, float %{{.*}} monotonic, align 4, !amdgpu.ignore.denormal.mode !{{[0-9]+}} +void test_override(float* ptr) { + [[clang::atomic(ignore_denormal_mode)]] { + __atomic_fetch_add(ptr, 1.0f, __ATOMIC_RELAXED); + } +} + +// CHECK-NEW-LABEL: @_Z20test_nested_overridePf( +// First atomic in the outer scope should have ignore_denormal plus the two new metadata. +// CHECK-NEW: atomicrmw fadd ptr %{{.*}}, float %{{.*}} monotonic, align 4, !amdgpu.no.fine.grained.memory !{{[0-9]+}}, !amdgpu.no.remote.memory !{{[0-9]+}}, !amdgpu.ignore.denormal.mode !{{[0-9]+}} +// For the inner scope with no_ignore_denormal_mode, ensure ignore.denormal is absent but the two new metadata are present. +// CHECK-NEW: atomicrmw fsub ptr %{{.*}}, float %{{.*}} monotonic, align 4, !amdgpu.no.fine.grained.memory !{{[0-9]+}}, !amdgpu.no.remote.memory !{{[0-9]+}} +// CHECK-NEW-NOT: !amdgpu.ignore.denormal.mode +// CHECK-OLD-LABEL: @_Z20test_nested_overridePf( +// CHECK-OLD-NOT: !amdgpu.no.fine.grained.memory +// CHECK-OLD-NOT: !amdgpu.no.remote.memory +// CHECK-OLD: atomicrmw fadd ptr %{{.*}}, float %{{.*}} monotonic, align 4, !amdgpu.ignore.denormal.mode !{{[0-9]+}} +void test_nested_override(float* ptr) { + [[clang::atomic(ignore_denormal_mode)]] { + __atomic_fetch_add(ptr, 1.0f, __ATOMIC_RELAXED); + + // CHECK-OLD-NOT: !amdgpu.no.fine.grained.... [truncated] `````````` </details> https://github.com/llvm/llvm-project/pull/157672 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits