llvmbot wrote:

<!--LLVM PR SUMMARY COMMENT-->

@llvm/pr-subscribers-backend-amdgpu

Author: Yaxun (Sam) Liu (yxsamliu)

<details>
<summary>Changes</summary>

Clang recently introduced `-f[no]atomic-` options and `[[clang::atomic]]` 
attributes to give better control over atomic codegen. The default assumed no 
remote memory and no fine-grained memory to enable more efficient instructions. 
This was correct for most AMD GPUs, but it caused regressions on AMDGPU gfx11 
and gfx12 for FP atomics on fine-grained memory, where earlier Clang used 
conservative behavior. Those changes altered metadata and instruction 
selection, breaking code that depended on the old defaults. For example, the 
following compiler-explorer link showed `__atomic_fetch_add` was emitted as CAS 
loop in ROCm 6.4 but as `global_atomic_add_f32` in llvm trunk for gfx1200:

https://godbolt.org/z/35Y3sdsn5

We need to restore a safe default to avoid regressions while keeping a clean 
path to the new model.

This patch adds `-f[no-]atomic-backward-compatible`. It is enabled by default 
to recover the old behavior and avoid regressions. Disabling it adopts the new 
and consistent atomic codegen design with explicit assumptions about memory 
types.

To make this robust, atomic attributes and atomic options are made tri-state 
(unset/true/false). This accurately conveys user intent and attribute 
semantics, so Clang can reconstruct prior behavior per GPU architecture and 
atomic data type, and still honor explicit requests. (Note: the unset state is 
only for initial state and cannot be specified by users explicitly.)

The tri-state change is minimal and lightweight in the driver, LangOptions, and 
CodeGen plumbing. The driver prefers the last `-fatomic-*` argument. CodeGen 
computes per-op defaults on gfx11/gfx12 FP atomics under compatibility, applies 
explicit options and attributes, and emits matching AMDGPU metadata.

---

Patch is 22.85 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/157672.diff


12 Files Affected:

- (modified) clang/docs/LanguageExtensions.rst (+16) 
- (modified) clang/include/clang/Basic/LangOptions.h (+38-11) 
- (modified) clang/include/clang/Driver/Options.td (+30-17) 
- (modified) clang/lib/Basic/Targets/AMDGPU.cpp (-2) 
- (modified) clang/lib/CodeGen/CodeGenFunction.h (+4-4) 
- (modified) clang/lib/CodeGen/CodeGenModule.h (+5-5) 
- (modified) clang/lib/CodeGen/Targets/AMDGPU.cpp (+39-5) 
- (modified) clang/lib/Driver/ToolChains/Clang.cpp (+9-6) 
- (modified) clang/lib/Frontend/CompilerInvocation.cpp (+7) 
- (modified) clang/test/CodeGenCUDA/amdgpu-atomic-ops.cu (+1-1) 
- (added) clang/test/CodeGenCUDA/atomic-options-backward-compat.hip (+71) 
- (modified) clang/test/Driver/atomic-options.hip (+15-2) 


``````````diff
diff --git a/clang/docs/LanguageExtensions.rst 
b/clang/docs/LanguageExtensions.rst
index ad190eace5b05..1da026b4e8e0d 100644
--- a/clang/docs/LanguageExtensions.rst
+++ b/clang/docs/LanguageExtensions.rst
@@ -5901,6 +5901,22 @@ Each option has a corresponding flag:
 ``-fatomic-fine-grained-memory`` / ``-fno-atomic-fine-grained-memory``,
 and ``-fatomic-ignore-denormal-mode`` / ``-fno-atomic-ignore-denormal-mode``.
 
+To address correctness regressions on some targets, such as AMDGPU, a backward
+compatibility mode is available via the ``-f[no-]atomic-backward-compatible``
+flag. When this flag is enabled (default), the compiler keeps the older,
+more conservative behavior for floating-point atomic operations on gfx11 and
+gfx12 processors. In this mode, the compiler does not emit special metadata
+unless you explicitly request it with the ``[[clang::atomic]]`` attribute or
+specify ``-f[no-]atomic-*`` options. This preserves existing behavior and 
avoids
+source code changes.
+
+On AMDGPU, when backward compatibility mode is disabled, the compiler assumes
+that atomic operations are not used on fine-grained or remote memory. This
+corresponds to ``no_fine_grained_memory`` and ``no_remote_memory``. These
+assumptions allow the compiler to generate more efficient native instructions.
+However, they can cause correctness issues if atomic operations are used on
+fine-grained or remote memory.
+
 Code using the ``[[clang::atomic]]`` attribute can then selectively override
 the command-line defaults on a per-block basis. For instance:
 
diff --git a/clang/include/clang/Basic/LangOptions.h 
b/clang/include/clang/Basic/LangOptions.h
index a8943df5b39aa..6236fa847169c 100644
--- a/clang/include/clang/Basic/LangOptions.h
+++ b/clang/include/clang/Basic/LangOptions.h
@@ -563,8 +563,12 @@ class LangOptions : public LangOptionsBase {
   /// Atomic code-generation options.
   /// These flags are set directly from the command-line options.
   bool AtomicRemoteMemory = false;
+  bool AtomicRemoteMemorySpecified = false;
   bool AtomicFineGrainedMemory = false;
+  bool AtomicFineGrainedMemorySpecified = false;
   bool AtomicIgnoreDenormalMode = false;
+  bool AtomicIgnoreDenormalModeSpecified = false;
+  bool AtomicBackwardCompatible = true;
 
   LangOptions();
 
@@ -1047,20 +1051,36 @@ enum class AtomicOptionKind {
 };
 
 struct AtomicOptions {
-  // Bitfields for each option.
-  unsigned remote_memory : 1;
-  unsigned fine_grained_memory : 1;
-  unsigned ignore_denormal_mode : 1;
+  // Tri-state for each option: unset / true / false.
+  std::optional<bool> remote_memory;
+  std::optional<bool> fine_grained_memory;
+  std::optional<bool> ignore_denormal_mode;
 
-  AtomicOptions()
-      : remote_memory(0), fine_grained_memory(0), ignore_denormal_mode(0) {}
+  // Default constructs with everything unset.
+  AtomicOptions() = default;
 
+  // Construct with all values explicitly set from LangOptions (used when you
+  // want a fully-specified options set, e.g., to compute effective values).
   AtomicOptions(const LangOptions &LO)
       : remote_memory(LO.AtomicRemoteMemory),
         fine_grained_memory(LO.AtomicFineGrainedMemory),
         ignore_denormal_mode(LO.AtomicIgnoreDenormalMode) {}
 
-  bool getOption(AtomicOptionKind Kind) const {
+  // Whether an option was explicitly set (true/false) vs. unset.
+  bool hasOption(AtomicOptionKind Kind) const {
+    switch (Kind) {
+    case AtomicOptionKind::RemoteMemory:
+      return remote_memory.has_value();
+    case AtomicOptionKind::FineGrainedMemory:
+      return fine_grained_memory.has_value();
+    case AtomicOptionKind::IgnoreDenormalMode:
+      return ignore_denormal_mode.has_value();
+    }
+    llvm_unreachable("Invalid AtomicOptionKind");
+  }
+
+  // Get the tri-state value (std::optional<bool>) of an option.
+  std::optional<bool> getTri(AtomicOptionKind Kind) const {
     switch (Kind) {
     case AtomicOptionKind::RemoteMemory:
       return remote_memory;
@@ -1072,7 +1092,8 @@ struct AtomicOptions {
     llvm_unreachable("Invalid AtomicOptionKind");
   }
 
-  void setOption(AtomicOptionKind Kind, bool Value) {
+  // Set or clear (unset) an option explicitly.
+  void setOption(AtomicOptionKind Kind, std::optional<bool> Value) {
     switch (Kind) {
     case AtomicOptionKind::RemoteMemory:
       remote_memory = Value;
@@ -1088,9 +1109,15 @@ struct AtomicOptions {
   }
 
   LLVM_DUMP_METHOD void dump() const {
-    llvm::errs() << "\n remote_memory: " << remote_memory
-                 << "\n fine_grained_memory: " << fine_grained_memory
-                 << "\n ignore_denormal_mode: " << ignore_denormal_mode << 
"\n";
+    auto TriStr = [](const std::optional<bool> &v) -> const char * {
+      if (!v.has_value())
+        return "Unset";
+      return *v ? "1" : "0";
+    };
+    llvm::errs() << "\n remote_memory: " << TriStr(remote_memory)
+                 << "\n fine_grained_memory: " << TriStr(fine_grained_memory)
+                 << "\n ignore_denormal_mode: " << TriStr(ignore_denormal_mode)
+                 << "\n";
   }
 };
 
diff --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index 718808d583e8c..4835cb606c5be 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -2344,23 +2344,36 @@ def fsymbol_partition_EQ : Joined<["-"], 
"fsymbol-partition=">, Group<f_Group>,
   Visibility<[ClangOption, CC1Option]>,
   MarshallingInfoString<CodeGenOpts<"SymbolPartition">>;
 
-defm atomic_remote_memory : BoolFOption<"atomic-remote-memory",
-  LangOpts<"AtomicRemoteMemory">, DefaultFalse,
-  PosFlag<SetTrue, [], [ClangOption, CC1Option, FlangOption, FC1Option], "May 
have">,
-  NegFlag<SetFalse, [], [ClangOption, FlangOption], "Assume no">,
-  BothFlags<[], [ClangOption, FlangOption], " atomic operations on remote 
memory">>;
-
-defm atomic_fine_grained_memory : BoolFOption<"atomic-fine-grained-memory",
-  LangOpts<"AtomicFineGrainedMemory">, DefaultFalse,
-  PosFlag<SetTrue, [], [ClangOption, CC1Option, FlangOption, FC1Option], "May 
have">,
-  NegFlag<SetFalse, [], [ClangOption, FlangOption], "Assume no">,
-  BothFlags<[], [ClangOption, FlangOption], " atomic operations on 
fine-grained memory">>;
-
-defm atomic_ignore_denormal_mode : BoolFOption<"atomic-ignore-denormal-mode",
-  LangOpts<"AtomicIgnoreDenormalMode">, DefaultFalse,
-  PosFlag<SetTrue, [], [ClangOption, CC1Option, FlangOption, FC1Option], 
"Allow">,
-  NegFlag<SetFalse, [], [ClangOption, FlangOption], "Disallow">,
-  BothFlags<[], [ClangOption, FlangOption], " atomic operations to ignore 
denormal mode">>;
+defm atomic_remote_memory
+    : BoolFOption<
+          "atomic-remote-memory", LangOpts<"AtomicRemoteMemory">, DefaultFalse,
+          PosFlag<SetTrue, [], [ClangOption, CC1Option], "May have">,
+          NegFlag<SetFalse, [], [ClangOption, CC1Option], "Assume no">,
+          BothFlags<[], [ClangOption], " atomic operations on remote memory">>;
+
+defm atomic_fine_grained_memory
+    : BoolFOption<"atomic-fine-grained-memory",
+                  LangOpts<"AtomicFineGrainedMemory">, DefaultFalse,
+                  PosFlag<SetTrue, [], [ClangOption, CC1Option], "May have">,
+                  NegFlag<SetFalse, [], [ClangOption, CC1Option], "Assume no">,
+                  BothFlags<[], [ClangOption],
+                            " atomic operations on fine-grained memory">>;
+
+defm atomic_ignore_denormal_mode
+    : BoolFOption<"atomic-ignore-denormal-mode",
+                  LangOpts<"AtomicIgnoreDenormalMode">, DefaultFalse,
+                  PosFlag<SetTrue, [], [ClangOption, CC1Option], "Allow">,
+                  NegFlag<SetFalse, [], [ClangOption, CC1Option], "Disallow">,
+                  BothFlags<[], [ClangOption],
+                            " atomic operations to ignore denormal mode">>;
+
+defm atomic_backward_compatible
+    : BoolFOption<"atomic-backward-compatible",
+                  LangOpts<"AtomicBackwardCompatible">, DefaultTrue,
+                  PosFlag<SetTrue, [], [ClangOption], "Enable">,
+                  NegFlag<SetFalse, [], [ClangOption, CC1Option], "Disable">,
+                  BothFlags<[], [ClangOption],
+                            " backward compatibility for atomic operations">>;
 
 defm memory_profile : OptInCC1FFlag<"memory-profile", "Enable", "Disable", " 
heap memory profiling">;
 def fmemory_profile_EQ : Joined<["-"], "fmemory-profile=">,
diff --git a/clang/lib/Basic/Targets/AMDGPU.cpp 
b/clang/lib/Basic/Targets/AMDGPU.cpp
index 87de9e6865e71..fcbc101231aa2 100644
--- a/clang/lib/Basic/Targets/AMDGPU.cpp
+++ b/clang/lib/Basic/Targets/AMDGPU.cpp
@@ -281,8 +281,6 @@ void AMDGPUTargetInfo::adjust(DiagnosticsEngine &Diags, 
LangOptions &Opts,
   // to OpenCL can be removed from the following line.
   setAddressSpaceMap((Opts.OpenCL && !Opts.OpenCLGenericAddressSpace) ||
                      !isAMDGCN(getTriple()));
-
-  AtomicOpts = AtomicOptions(Opts);
 }
 
 llvm::SmallVector<Builtin::InfosShard>
diff --git a/clang/lib/CodeGen/CodeGenFunction.h 
b/clang/lib/CodeGen/CodeGenFunction.h
index 123cb4f51f828..b5b99c35246d0 100644
--- a/clang/lib/CodeGen/CodeGenFunction.h
+++ b/clang/lib/CodeGen/CodeGenFunction.h
@@ -818,7 +818,7 @@ class CodeGenFunction : public CodeGenTypeCache {
 
   class CGAtomicOptionsRAII {
   public:
-    CGAtomicOptionsRAII(CodeGenModule &CGM_, AtomicOptions AO)
+    CGAtomicOptionsRAII(CodeGenModule &CGM_, std::optional<AtomicOptions> AO)
         : CGM(CGM_), SavedAtomicOpts(CGM.getAtomicOpts()) {
       CGM.setAtomicOpts(AO);
     }
@@ -826,7 +826,7 @@ class CodeGenFunction : public CodeGenTypeCache {
         : CGM(CGM_), SavedAtomicOpts(CGM.getAtomicOpts()) {
       if (!AA)
         return;
-      AtomicOptions AO = SavedAtomicOpts;
+      AtomicOptions AO = SavedAtomicOpts.value_or(AtomicOptions());
       for (auto Option : AA->atomicOptions()) {
         switch (Option) {
         case AtomicAttr::remote_memory:
@@ -849,7 +849,7 @@ class CodeGenFunction : public CodeGenTypeCache {
           break;
         }
       }
-      CGM.setAtomicOpts(AO);
+      CGM.setAtomicOpts(std::make_optional(AO));
     }
 
     CGAtomicOptionsRAII(const CGAtomicOptionsRAII &) = delete;
@@ -858,7 +858,7 @@ class CodeGenFunction : public CodeGenTypeCache {
 
   private:
     CodeGenModule &CGM;
-    AtomicOptions SavedAtomicOpts;
+    std::optional<AtomicOptions> SavedAtomicOpts;
   };
 
 public:
diff --git a/clang/lib/CodeGen/CodeGenModule.h 
b/clang/lib/CodeGen/CodeGenModule.h
index f62350fd8d378..9e37a92287590 100644
--- a/clang/lib/CodeGen/CodeGenModule.h
+++ b/clang/lib/CodeGen/CodeGenModule.h
@@ -677,7 +677,7 @@ class CodeGenModule : public CodeGenTypeCache {
   std::optional<PointerAuthQualifier>
   computeVTPointerAuthentication(const CXXRecordDecl *ThisClass);
 
-  AtomicOptions AtomicOpts;
+  std::optional<AtomicOptions> AtomicOpts;
 
   // A set of functions which should be hot-patched; see
   // -fms-hotpatch-functions-file (and -list). This will nearly always be 
empty.
@@ -699,11 +699,11 @@ class CodeGenModule : public CodeGenTypeCache {
   /// Finalize LLVM code generation.
   void Release();
 
-  /// Get the current Atomic options.
-  AtomicOptions getAtomicOpts() { return AtomicOpts; }
+  /// Get the current atomic options.
+  std::optional<AtomicOptions> getAtomicOpts() { return AtomicOpts; }
 
-  /// Set the current Atomic options.
-  void setAtomicOpts(AtomicOptions AO) { AtomicOpts = AO; }
+  /// Set the current atomic options.
+  void setAtomicOpts(std::optional<AtomicOptions> AO) { AtomicOpts = AO; }
 
   /// Return true if we should emit location information for expressions.
   bool getExpressionLocationsEnabled() const;
diff --git a/clang/lib/CodeGen/Targets/AMDGPU.cpp 
b/clang/lib/CodeGen/Targets/AMDGPU.cpp
index 0fcbf7e458a34..bcf3d878f8b9a 100644
--- a/clang/lib/CodeGen/Targets/AMDGPU.cpp
+++ b/clang/lib/CodeGen/Targets/AMDGPU.cpp
@@ -10,6 +10,7 @@
 #include "TargetInfo.h"
 #include "llvm/ADT/StringExtras.h"
 #include "llvm/Support/AMDGPUAddrSpace.h"
+#include "llvm/TargetParser/TargetParser.h"
 
 using namespace clang;
 using namespace clang::CodeGen;
@@ -544,14 +545,47 @@ void AMDGPUTargetCodeGenInfo::setTargetAtomicMetadata(
   if (!RMW)
     return;
 
-  AtomicOptions AO = CGF.CGM.getAtomicOpts();
   llvm::MDNode *Empty = llvm::MDNode::get(CGF.getLLVMContext(), {});
-  if (!AO.getOption(clang::AtomicOptionKind::FineGrainedMemory))
+  const bool IsFP = llvm::AtomicRMWInst::isFPOperation(RMW->getOperation());
+  llvm::AMDGPU::IsaVersion ISAVer =
+      llvm::AMDGPU::getIsaVersion(CGF.getTarget().getTargetOpts().CPU);
+  const bool Compat = CGF.CGM.getLangOpts().AtomicBackwardCompatible &&
+                      (ISAVer.Major >= 11 && ISAVer.Major <= 12);
+  // Establish local defaults based on compat mode and FP-ness.
+  //    - compat + FP: default to remote_memory=true, fine_grained_memory=true
+  //    - otherwise:   default to remote_memory=false, 
fine_grained_memory=false
+  //    - ignore_denormal_mode defaults to false
+  bool Remote = (Compat && IsFP);
+  bool FineGrained = (Compat && IsFP);
+  bool IgnoreDenorm = false;
+
+  // Override with -cc1/driver-specified LangOptions if explicitly specified.
+  const clang::LangOptions &LO = CGF.CGM.getLangOpts();
+  if (LO.AtomicRemoteMemorySpecified)
+    Remote = LO.AtomicRemoteMemory;
+  if (LO.AtomicFineGrainedMemorySpecified)
+    FineGrained = LO.AtomicFineGrainedMemory;
+  if (LO.AtomicIgnoreDenormalModeSpecified)
+    IgnoreDenorm = LO.AtomicIgnoreDenormalMode;
+
+  // Override with statement attribute values if present (CGM-scoped).
+  //    We assume AtomicOptions holds tri-state (std::optional<bool>) fields:
+  //    remote_memory, fine_grained_memory, ignore_denormal_mode.
+  if (auto MaybeAO = CGF.CGM.getAtomicOpts()) {
+    if (MaybeAO->remote_memory.has_value())
+      Remote = *MaybeAO->remote_memory;
+    if (MaybeAO->fine_grained_memory.has_value())
+      FineGrained = *MaybeAO->fine_grained_memory;
+    if (MaybeAO->ignore_denormal_mode.has_value())
+      IgnoreDenorm = *MaybeAO->ignore_denormal_mode;
+  }
+
+  // Emit metadata according to final values.
+  if (!FineGrained)
     RMW->setMetadata("amdgpu.no.fine.grained.memory", Empty);
-  if (!AO.getOption(clang::AtomicOptionKind::RemoteMemory))
+  if (!Remote)
     RMW->setMetadata("amdgpu.no.remote.memory", Empty);
-  if (AO.getOption(clang::AtomicOptionKind::IgnoreDenormalMode) &&
-      RMW->getOperation() == llvm::AtomicRMWInst::FAdd &&
+  if (IgnoreDenorm && RMW->getOperation() == llvm::AtomicRMWInst::FAdd &&
       RMW->getType()->isFloatTy())
     RMW->setMetadata("amdgpu.ignore.denormal.mode", Empty);
 }
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp 
b/clang/lib/Driver/ToolChains/Clang.cpp
index 21e45c6b56bbb..7520adfdc7844 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -5865,12 +5865,15 @@ void Clang::ConstructJob(Compilation &C, const 
JobAction &JA,
 
   RenderFloatingPointOptions(TC, D, OFastEnabled, Args, CmdArgs, JA);
 
-  Args.addOptInFlag(CmdArgs, options::OPT_fatomic_remote_memory,
-                    options::OPT_fno_atomic_remote_memory);
-  Args.addOptInFlag(CmdArgs, options::OPT_fatomic_fine_grained_memory,
-                    options::OPT_fno_atomic_fine_grained_memory);
-  Args.addOptInFlag(CmdArgs, options::OPT_fatomic_ignore_denormal_mode,
-                    options::OPT_fno_atomic_ignore_denormal_mode);
+  Args.addLastArg(CmdArgs, options::OPT_fatomic_remote_memory,
+                  options::OPT_fno_atomic_remote_memory);
+  Args.addLastArg(CmdArgs, options::OPT_fatomic_fine_grained_memory,
+                  options::OPT_fno_atomic_fine_grained_memory);
+  Args.addLastArg(CmdArgs, options::OPT_fatomic_ignore_denormal_mode,
+                  options::OPT_fno_atomic_ignore_denormal_mode);
+
+  Args.addOptOutFlag(CmdArgs, options::OPT_fatomic_backward_compatible,
+                     options::OPT_fno_atomic_backward_compatible);
 
   if (Arg *A = Args.getLastArg(options::OPT_fextend_args_EQ)) {
     const llvm::Triple::ArchType Arch = TC.getArch();
diff --git a/clang/lib/Frontend/CompilerInvocation.cpp 
b/clang/lib/Frontend/CompilerInvocation.cpp
index 8411d00cc7812..2739c045eca50 100644
--- a/clang/lib/Frontend/CompilerInvocation.cpp
+++ b/clang/lib/Frontend/CompilerInvocation.cpp
@@ -4633,6 +4633,13 @@ bool CompilerInvocation::ParseLangArgs(LangOptions 
&Opts, ArgList &Args,
     }
   }
 
+  Opts.AtomicRemoteMemorySpecified =
+      Args.hasArg(OPT_fatomic_remote_memory, OPT_fno_atomic_remote_memory);
+  Opts.AtomicFineGrainedMemorySpecified = Args.hasArg(
+      OPT_fatomic_fine_grained_memory, OPT_fno_atomic_fine_grained_memory);
+  Opts.AtomicIgnoreDenormalModeSpecified = Args.hasArg(
+      OPT_fatomic_ignore_denormal_mode, OPT_fno_atomic_ignore_denormal_mode);
+
   return Diags.getNumErrors() == NumErrorsBefore;
 }
 
diff --git a/clang/test/CodeGenCUDA/amdgpu-atomic-ops.cu 
b/clang/test/CodeGenCUDA/amdgpu-atomic-ops.cu
index 22c40e6d38ea2..62360c7c9a4db 100644
--- a/clang/test/CodeGenCUDA/amdgpu-atomic-ops.cu
+++ b/clang/test/CodeGenCUDA/amdgpu-atomic-ops.cu
@@ -7,7 +7,7 @@
 // RUN:   -fnative-half-arguments-and-returns -munsafe-fp-atomics | FileCheck 
-check-prefixes=FUN,CHECK,UNSAFEIR %s
 
 // RUN: %clang_cc1 -x hip %s -O3 -S -o - -triple=amdgcn-amd-amdhsa \
-// RUN:   -fcuda-is-device -target-cpu gfx1100 -fnative-half-type \
+// RUN:   -fcuda-is-device -target-cpu gfx1100 -fnative-half-type 
-fno-atomic-backward-compatible \
 // RUN:   -fnative-half-arguments-and-returns | FileCheck 
-check-prefixes=FUN,SAFE %s
 
 // RUN: %clang_cc1 -x hip %s -O3 -S -o - -triple=amdgcn-amd-amdhsa \
diff --git a/clang/test/CodeGenCUDA/atomic-options-backward-compat.hip 
b/clang/test/CodeGenCUDA/atomic-options-backward-compat.hip
new file mode 100644
index 0000000000000..017d7052a3472
--- /dev/null
+++ b/clang/test/CodeGenCUDA/atomic-options-backward-compat.hip
@@ -0,0 +1,71 @@
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -target-cpu gfx1100 -emit-llvm -o 
- \
+// RUN:   -fno-atomic-backward-compatible %s | FileCheck %s 
--check-prefix=CHECK-NEW
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -target-cpu gfx1100 -emit-llvm -o 
- %s \
+// RUN:   | FileCheck %s --check-prefix=CHECK-OLD
+
+// CHECK-NEW-LABEL: @_Z16test_atomic_faddPf(
+// CHECK-NEW: atomicrmw fadd ptr %{{.*}}, float %{{.*}} monotonic, align 4, 
!amdgpu.no.fine.grained.memory !{{[0-9]+}}, !amdgpu.no.remote.memory !{{[0-9]+}}
+// CHECK-OLD-LABEL: @_Z16test_atomic_faddPf(
+// CHECK-OLD: atomicrmw fadd ptr %{{.*}}, float %{{.*}} monotonic, align 4
+void test_atomic_fadd(float* ptr) {
+  __atomic_fetch_add(ptr, 1.0f, __ATOMIC_RELAXED);
+}
+
+// CHECK-NEW-LABEL: @_Z13test_overridePf(
+// CHECK-NEW: atomicrmw fadd ptr %{{.*}}, float %{{.*}} monotonic, align 4, 
!amdgpu.no.fine.grained.memory !{{[0-9]+}}, !amdgpu.no.remote.memory 
!{{[0-9]+}}, !amdgpu.ignore.denormal.mode !{{[0-9]+}}
+// CHECK-OLD-LABEL: @_Z13test_overridePf(
+// CHECK-OLD-NOT: !amdgpu.no.fine.grained.memory
+// CHECK-OLD-NOT: !amdgpu.no.remote.memory
+// CHECK-OLD: atomicrmw fadd ptr %{{.*}}, float %{{.*}} monotonic, align 4, 
!amdgpu.ignore.denormal.mode !{{[0-9]+}}
+void test_override(float* ptr) {
+  [[clang::atomic(ignore_denormal_mode)]] {
+    __atomic_fetch_add(ptr, 1.0f, __ATOMIC_RELAXED);
+  }
+}
+
+// CHECK-NEW-LABEL: @_Z20test_nested_overridePf(
+// First atomic in the outer scope should have ignore_denormal plus the two 
new metadata.
+// CHECK-NEW: atomicrmw fadd ptr %{{.*}}, float %{{.*}} monotonic, align 4, 
!amdgpu.no.fine.grained.memory !{{[0-9]+}}, !amdgpu.no.remote.memory 
!{{[0-9]+}}, !amdgpu.ignore.denormal.mode !{{[0-9]+}}
+// For the inner scope with no_ignore_denormal_mode, ensure ignore.denormal is 
absent but the two new metadata are present.
+// CHECK-NEW: atomicrmw fsub ptr %{{.*}}, float %{{.*}} monotonic, align 4, 
!amdgpu.no.fine.grained.memory !{{[0-9]+}}, !amdgpu.no.remote.memory !{{[0-9]+}}
+// CHECK-NEW-NOT: !amdgpu.ignore.denormal.mode
+// CHECK-OLD-LABEL: @_Z20test_nested_overridePf(
+// CHECK-OLD-NOT: !amdgpu.no.fine.grained.memory
+// CHECK-OLD-NOT: !amdgpu.no.remote.memory
+// CHECK-OLD: atomicrmw fadd ptr %{{.*}}, float %{{.*}} monotonic, align 4, 
!amdgpu.ignore.denormal.mode !{{[0-9]+}}
+void test_nested_override(float* ptr) {
+  [[clang::atomic(ignore_denormal_mode)]] {
+    __atomic_fetch_add(ptr, 1.0f, __ATOMIC_RELAXED);
+
+    // CHECK-OLD-NOT: !amdgpu.no.fine.grained....
[truncated]

``````````

</details>


https://github.com/llvm/llvm-project/pull/157672
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to