================
@@ -537,7 +537,11 @@ AMDGPUTargetCodeGenInfo::getLLVMSyncScopeID(const 
LangOptions &LangOpts,
     break;
   }
 
-  if (Ordering != llvm::AtomicOrdering::SequentiallyConsistent) {
+  // OpenCL assumes by default that atomic scopes are per-address space for
+  // non-sequentially consistent operations.
+  if (Scope >= SyncScope::OpenCLWorkGroup &&
----------------
t-tye wrote:

The OpenCL language defines atomics to have this "strange" behavior. So this 
function appears to be part of Clang which is responsible for generating the 
correct LLVM IR for a given target. The AMD GPU supports multiple sync-scope 
values so that the different language semantics can be represented efficiently. 
The on-as allows unnecessary waitcnts to be eliminated which has been an issue 
for performance of some of our library code.

Does this change preserve the optimizations that are possible for OpenCL source 
code?

@Pierre-vh should review these changes as I believe he is the compute code 
maintainer for the AMD GPU memory model.

https://github.com/llvm/llvm-project/pull/120095
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to