================
@@ -537,7 +537,11 @@ AMDGPUTargetCodeGenInfo::getLLVMSyncScopeID(const
LangOptions &LangOpts,
break;
}
- if (Ordering != llvm::AtomicOrdering::SequentiallyConsistent) {
+ // OpenCL assumes by default that atomic scopes are per-address space for
+ // non-sequentially consistent operations.
+ if (Scope >= SyncScope::OpenCLWorkGroup &&
----------------
t-tye wrote:
The OpenCL language defines atomics to have this "strange" behavior. So this
function appears to be part of Clang which is responsible for generating the
correct LLVM IR for a given target. The AMD GPU supports multiple sync-scope
values so that the different language semantics can be represented efficiently.
The on-as allows unnecessary waitcnts to be eliminated which has been an issue
for performance of some of our library code.
Does this change preserve the optimizations that are possible for OpenCL source
code?
@Pierre-vh should review these changes as I believe he is the compute code
maintainer for the AMD GPU memory model.
https://github.com/llvm/llvm-project/pull/120095
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits