https://github.com/AlexVlx created 
https://github.com/llvm/llvm-project/pull/146813

This (mostly) removes one of the largest remaining limitations of `hipstdpar` 
based algorithm acceleration, by adding support for global variable usage in 
offloaded algorithms. It is mean to compose with a run time component that will 
live in the support library, and fires iff a special variable is provided by 
the latter. In short, things work as follows:

- We replace uses some global `G` with an indirect access via an implicitly 
created anonymous global `F`, which is of pointer type and is expected to hold 
the program-wide address of `G`;
- We append 'F', alongside 'G''s name, to an table structure;
- At run-time, the support library uses the table to look-up the program-wide 
address of a contained symbol based on its name, and then stores the address 
via the paired pointer.

This doesn't handle internal linkage symbols (`static foo` or `namespace { foo 
}`) if they are not unique i.e. if there's a name clash that is solved by the 
linker, as the resolution would not be visible. Also, initially we will only 
support "true" globals in RDC mode. Things would be much simpler if we had 
direct access to the accelerator loader, but since the expectation is to 
compose at the HIP RT level we have to jump through additional hoops.

>From d98e3785a144ada9881cdbe24c86f273850eca20 Mon Sep 17 00:00:00 2001
From: Alex Voicu <alexandru.vo...@amd.com>
Date: Thu, 3 Jul 2025 02:02:04 +0100
Subject: [PATCH 1/2] Add support for true globals.

---
 llvm/lib/Transforms/HipStdPar/HipStdPar.cpp   | 220 +++++++++++++++++-
 ...al-var-indirection-wrong-table-member-0.ll |  15 ++
 ...al-var-indirection-wrong-table-member-1.ll |  15 ++
 ...al-var-indirection-wrong-table-member-2.ll |  15 ++
 ...ar-indirection-wrong-table-member-count.ll |  14 ++
 ...global-var-indirection-wrong-table-type.ll |  13 ++
 .../HipStdPar/global-var-indirection.ll       |  87 +++++++
 llvm/test/Transforms/HipStdPar/global-var.ll  |   4 +-
 8 files changed, 371 insertions(+), 12 deletions(-)
 create mode 100644 
llvm/test/Transforms/HipStdPar/global-var-indirection-wrong-table-member-0.ll
 create mode 100644 
llvm/test/Transforms/HipStdPar/global-var-indirection-wrong-table-member-1.ll
 create mode 100644 
llvm/test/Transforms/HipStdPar/global-var-indirection-wrong-table-member-2.ll
 create mode 100644 
llvm/test/Transforms/HipStdPar/global-var-indirection-wrong-table-member-count.ll
 create mode 100644 
llvm/test/Transforms/HipStdPar/global-var-indirection-wrong-table-type.ll
 create mode 100644 llvm/test/Transforms/HipStdPar/global-var-indirection.ll

diff --git a/llvm/lib/Transforms/HipStdPar/HipStdPar.cpp 
b/llvm/lib/Transforms/HipStdPar/HipStdPar.cpp
index 5a87cf8c83d79..87fbcd40be431 100644
--- a/llvm/lib/Transforms/HipStdPar/HipStdPar.cpp
+++ b/llvm/lib/Transforms/HipStdPar/HipStdPar.cpp
@@ -48,6 +48,7 @@
 #include "llvm/Analysis/OptimizationRemarkEmitter.h"
 #include "llvm/IR/Constants.h"
 #include "llvm/IR/Function.h"
+#include "llvm/IR/IRBuilder.h"
 #include "llvm/IR/Module.h"
 #include "llvm/Transforms/Utils/ModuleUtils.h"
 
@@ -114,24 +115,223 @@ static inline void clearModule(Module &M) { // TODO: 
simplify.
     eraseFromModule(*M.ifuncs().begin());
 }
 
+static inline SmallVector<std::reference_wrapper<Use>> collectIndirectableUses(
+  GlobalVariable *G) {
+  // We are interested only in use chains that end in an Instruction.
+  SmallVector<std::reference_wrapper<Use>> Uses;
+
+  SmallVector<std::reference_wrapper<Use>> Tmp(G->use_begin(), G->use_end());
+  while (!Tmp.empty()) {
+    Use &U = Tmp.back();
+    Tmp.pop_back();
+    if (isa<Instruction>(U.getUser()))
+      Uses.emplace_back(U);
+    else
+      transform(U.getUser()->uses(), std::back_inserter(Tmp), [](auto &&U) {
+        return std::ref(U);
+      });
+  }
+
+  return Uses;
+}
+
+static inline GlobalVariable *getGlobalForName(GlobalVariable *G) {
+  // Create an anonymous global which stores the variable's name, which will be
+  // used by the HIPSTDPAR runtime to look up the program-wide symbol.
+  LLVMContext &Ctx = G->getContext();
+  auto *CDS = ConstantDataArray::getString(Ctx, G->getName());
+
+  GlobalVariable *N = G->getParent()->getOrInsertGlobal("", CDS->getType());
+  N->setInitializer(CDS);
+  N->setLinkage(GlobalValue::LinkageTypes::PrivateLinkage);
+  N->setConstant(true);
+
+  return N;
+}
+
+static inline GlobalVariable *getIndirectionGlobal(Module *M) {
+  // Create an anonymous global which stores a pointer to a pointer, which will
+  // be externally initialised by the HIPSTDPAR runtime with the address of the
+  // program-wide symbol.
+  Type *PtrTy =
+      PointerType::get(M->getContext(),
+                       M->getDataLayout().getDefaultGlobalsAddressSpace());
+  GlobalVariable *NewG = M->getOrInsertGlobal("", PtrTy);
+
+  NewG->setInitializer(PoisonValue::get(NewG->getValueType()));
+  NewG->setLinkage(GlobalValue::LinkageTypes::PrivateLinkage);
+  NewG->setConstant(true);
+  NewG->setExternallyInitialized(true);
+
+  return NewG;
+}
+
+static inline Constant *appendIndirectedGlobal(
+    const GlobalVariable *IndirectionTable,
+    SmallVector<Constant *> &SymbolIndirections,
+    GlobalVariable *ToIndirect) {
+  Module *M = ToIndirect->getParent();
+
+  auto *InitTy = cast<StructType>(IndirectionTable->getValueType());
+  auto *SymbolListTy = cast<StructType>(InitTy->getStructElementType(2));
+  Type *NameTy = SymbolListTy->getElementType(0);
+  Type *IndirectTy = SymbolListTy->getElementType(1);
+
+  Constant *NameG = getGlobalForName(ToIndirect);
+  Constant *IndirectG = getIndirectionGlobal(M);
+  Constant *Entry = ConstantStruct::get(
+      SymbolListTy, {ConstantExpr::getAddrSpaceCast(NameG, NameTy),
+                     ConstantExpr::getAddrSpaceCast(IndirectG, IndirectTy)});
+  SymbolIndirections.push_back(Entry);
+
+  return IndirectG;
+}
+
+static void fillIndirectionTable(GlobalVariable *IndirectionTable,
+                                 SmallVector<Constant *> Indirections) {
+  Module *M = IndirectionTable->getParent();
+  size_t SymCnt = Indirections.size();
+
+  auto *InitTy = cast<StructType>(IndirectionTable->getValueType());
+  Type *SymbolListTy = InitTy->getStructElementType(1);
+  auto *SymbolTy = cast<StructType>(InitTy->getStructElementType(2));
+
+  Constant *Count = ConstantInt::get(InitTy->getStructElementType(0), SymCnt);
+  M->removeGlobalVariable(IndirectionTable);
+  GlobalVariable *Symbols =
+      M->getOrInsertGlobal("", ArrayType::get(SymbolTy, SymCnt));
+  Symbols->setLinkage(GlobalValue::LinkageTypes::PrivateLinkage);
+  Symbols->setInitializer(ConstantArray::get(ArrayType::get(SymbolTy, SymCnt),
+                                             {Indirections}));
+  Symbols->setConstant(true);
+
+  Constant *ASCSymbols = ConstantExpr::getAddrSpaceCast(Symbols, SymbolListTy);
+  Constant *Init = ConstantStruct::get(
+      InitTy, {Count, ASCSymbols, PoisonValue::get(SymbolTy)});
+  M->insertGlobalVariable(IndirectionTable);
+  IndirectionTable->setInitializer(Init);
+}
+
+static void replaceWithIndirectUse(const Use &U, const GlobalVariable *G,
+                                   Constant *IndirectedG) {
+  auto *I = cast<Instruction>(U.getUser());
+
+  IRBuilder<> Builder(I);
+  Value *Op = I->getOperand(U.getOperandNo());
+
+  // We walk back up the use chain, which could be an arbitrarily long sequence
+  // of constexpr AS casts, ptr-to-int and GEP instructions, until we reach the
+  // indirected global.
+  while (auto *CE = dyn_cast<ConstantExpr>(Op)) {
+    assert((CE->getOpcode() == Instruction::GetElementPtr ||
+            CE->getOpcode() == Instruction::AddrSpaceCast ||
+            CE->getOpcode() == Instruction::PtrToInt) &&
+            "Only GEP, ASCAST or PTRTOINT constant uses supported!");
+
+    Instruction *NewI = Builder.Insert(CE->getAsInstruction());
+    I->replaceUsesOfWith(Op, NewI);
+    I = NewI;
+    Op = I->getOperand(0);
+    Builder.SetInsertPoint(I);
+  }
+
+  assert(Op == G && "Must reach indirected global!");
+
+  Builder.GetInsertPoint()->setOperand(0, Builder.CreateLoad(G->getType(),
+                                                             IndirectedG));
+}
+
+static inline bool isValidIndirectionTable(GlobalVariable *IndirectionTable) {
+  std::string W;
+  raw_string_ostream OS(W);
+
+  Type *Ty = IndirectionTable->getValueType();
+  bool Valid = false;
+
+  if (!isa<StructType>(Ty)) {
+    OS << "The Indirection Table must be a struct type; ";
+    Ty->print(OS);
+    OS << " is incorrect.\n";
+  } else if (cast<StructType>(Ty)->getNumElements() != 3u) {
+    OS << "The Indirection Table must have 3 elements; "
+      << cast<StructType>(Ty)->getNumElements() << " is incorrect.\n";
+  } else if (!isa<IntegerType>(cast<StructType>(Ty)->getStructElementType(0))) 
{
+    OS << "The first element in the Indirection Table must be an integer; ";
+    cast<StructType>(Ty)->getStructElementType(0)->print(OS);
+    OS << " is incorrect.\n";
+  } else if (!isa<PointerType>(cast<StructType>(Ty)->getStructElementType(1))) 
{
+    OS << "The second element in the Indirection Table must be a pointer; ";
+    cast<StructType>(Ty)->getStructElementType(1)->print(OS);
+    OS << " is incorrect.\n";
+  } else if (!isa<StructType>(cast<StructType>(Ty)->getStructElementType(2))) {
+    OS << "The third element in the Indirection Table must be a struct type; ";
+    cast<StructType>(Ty)->getStructElementType(2)->print(OS);
+    OS << " is incorrect.\n";
+  } else {
+    Valid = true;
+  }
+
+  if (!Valid)
+    IndirectionTable->getContext().diagnose(DiagnosticInfoGeneric(W, 
DS_Error));
+
+  return Valid;
+}
+
+static void indirectGlobals(GlobalVariable *IndirectionTable,
+                            SmallVector<GlobalVariable *> ToIndirect) {
+  // We replace globals with an indirected access via a pointer that will get
+  // set by the HIPSTDPAR runtime, using their accessible, program-wide unique
+  // address as set by the host linker-loader.
+  SmallVector<Constant *> SymbolIndirections;
+  for (auto &&G : ToIndirect) {
+    SmallVector<std::reference_wrapper<Use>> Uses = collectIndirectableUses(G);
+
+    if (Uses.empty())
+      continue;
+
+    Constant *IndirectedGlobal = appendIndirectedGlobal(IndirectionTable,
+                                                        SymbolIndirections, G);
+
+    for_each(Uses,
+             [=](auto &&U) { replaceWithIndirectUse(U, G, IndirectedGlobal); 
});
+
+    eraseFromModule(*G);
+  }
+
+  if (SymbolIndirections.empty())
+    return;
+
+  fillIndirectionTable(IndirectionTable, std::move(SymbolIndirections));
+}
+
 static inline void maybeHandleGlobals(Module &M) {
   unsigned GlobAS = M.getDataLayout().getDefaultGlobalsAddressSpace();
-  for (auto &&G : M.globals()) { // TODO: should we handle these in the FE?
+
+  SmallVector<GlobalVariable *> ToIndirect;
+  for (auto &&G : M.globals()) {
     if (!checkIfSupported(G))
       return clearModule(M);
-
-    if (G.isThreadLocal())
-      continue;
-    if (G.isConstant())
-      continue;
     if (G.getAddressSpace() != GlobAS)
       continue;
-    if (G.getLinkage() != GlobalVariable::ExternalLinkage)
+    if (G.isConstant() && G.hasInitializer() && G.hasAtLeastLocalUnnamedAddr())
       continue;
 
-    G.setLinkage(GlobalVariable::ExternalWeakLinkage);
-    G.setInitializer(nullptr);
-    G.setExternallyInitialized(true);
+    ToIndirect.push_back(&G);
+  }
+
+  if (ToIndirect.empty())
+    return;
+
+  if (auto *IT = M.getNamedGlobal("__hipstdpar_symbol_indirection_table")) {
+    if (!isValidIndirectionTable(IT))
+      return clearModule(M);
+    return indirectGlobals(IT, std::move(ToIndirect));
+  } else {
+    for (auto &&G : ToIndirect) {
+      // We will internalise these, so we provide a poison initialiser.
+      if (!G->hasInitializer())
+        G->setInitializer(PoisonValue::get(G->getValueType()));
+    }
   }
 }
 
diff --git 
a/llvm/test/Transforms/HipStdPar/global-var-indirection-wrong-table-member-0.ll 
b/llvm/test/Transforms/HipStdPar/global-var-indirection-wrong-table-member-0.ll
new file mode 100644
index 0000000000000..258bcfba55eca
--- /dev/null
+++ 
b/llvm/test/Transforms/HipStdPar/global-var-indirection-wrong-table-member-0.ll
@@ -0,0 +1,15 @@
+; REQUIRES: amdgpu-registered-target
+; RUN: not opt -S -mtriple=amdgcn-amd-amdhsa 
-passes=hipstdpar-select-accelerator-code \
+; RUN: %s 2>&1 | FileCheck %s
+
+; CHECK: error: The first element in the Indirection Table must be an integer; 
%struct.anon.1 = type { ptr, ptr } is incorrect.
+%struct.anon.1 = type { ptr, ptr }
+%class.anon = type { %struct.anon.1, ptr, %struct.anon.1 }
+@a = external hidden local_unnamed_addr addrspace(1) global ptr, align 8
+@__hipstdpar_symbol_indirection_table = weak_odr protected addrspace(4) 
externally_initialized constant %class.anon zeroinitializer, align 8
+
+define amdgpu_kernel void @store(ptr %p) {
+entry:
+  store ptr %p, ptr addrspace(1) @a, align 8
+  ret void
+}
diff --git 
a/llvm/test/Transforms/HipStdPar/global-var-indirection-wrong-table-member-1.ll 
b/llvm/test/Transforms/HipStdPar/global-var-indirection-wrong-table-member-1.ll
new file mode 100644
index 0000000000000..331f4bf92e928
--- /dev/null
+++ 
b/llvm/test/Transforms/HipStdPar/global-var-indirection-wrong-table-member-1.ll
@@ -0,0 +1,15 @@
+; REQUIRES: amdgpu-registered-target
+; RUN: not opt -S -mtriple=amdgcn-amd-amdhsa 
-passes=hipstdpar-select-accelerator-code \
+; RUN: %s 2>&1 | FileCheck %s
+
+; CHECK: error: The second element in the Indirection Table must be a pointer; 
%struct.anon.1 = type { ptr, ptr } is incorrect.
+%struct.anon.1 = type { ptr, ptr }
+%class.anon = type { i64, %struct.anon.1, %struct.anon.1 }
+@a = external hidden local_unnamed_addr addrspace(1) global ptr, align 8
+@__hipstdpar_symbol_indirection_table = weak_odr protected addrspace(4) 
externally_initialized constant %class.anon zeroinitializer, align 8
+
+define amdgpu_kernel void @store(ptr %p) {
+entry:
+  store ptr %p, ptr addrspace(1) @a, align 8
+  ret void
+}
diff --git 
a/llvm/test/Transforms/HipStdPar/global-var-indirection-wrong-table-member-2.ll 
b/llvm/test/Transforms/HipStdPar/global-var-indirection-wrong-table-member-2.ll
new file mode 100644
index 0000000000000..6bdedcbe65340
--- /dev/null
+++ 
b/llvm/test/Transforms/HipStdPar/global-var-indirection-wrong-table-member-2.ll
@@ -0,0 +1,15 @@
+; REQUIRES: amdgpu-registered-target
+; RUN: not opt -S -mtriple=amdgcn-amd-amdhsa 
-passes=hipstdpar-select-accelerator-code \
+; RUN: %s 2>&1 | FileCheck %s
+
+; CHECK: error: The third element in the Indirection Table must be a struct 
type; i64 is incorrect.
+%struct.anon.1 = type { ptr, ptr }
+%class.anon = type { i64, ptr, i64 }
+@a = external hidden local_unnamed_addr addrspace(1) global ptr, align 8
+@__hipstdpar_symbol_indirection_table = weak_odr protected addrspace(4) 
externally_initialized constant %class.anon zeroinitializer, align 8
+
+define amdgpu_kernel void @store(ptr %p) {
+entry:
+  store ptr %p, ptr addrspace(1) @a, align 8
+  ret void
+}
diff --git 
a/llvm/test/Transforms/HipStdPar/global-var-indirection-wrong-table-member-count.ll
 
b/llvm/test/Transforms/HipStdPar/global-var-indirection-wrong-table-member-count.ll
new file mode 100644
index 0000000000000..cf0efa0953a74
--- /dev/null
+++ 
b/llvm/test/Transforms/HipStdPar/global-var-indirection-wrong-table-member-count.ll
@@ -0,0 +1,14 @@
+; REQUIRES: amdgpu-registered-target
+; RUN: not opt -S -mtriple=amdgcn-amd-amdhsa 
-passes=hipstdpar-select-accelerator-code \
+; RUN: %s 2>&1 | FileCheck %s
+
+; CHECK: error: The Indirection Table must have 3 elements; 2 is incorrect.
+%class.anon = type { i64, ptr }
+@a = external hidden local_unnamed_addr addrspace(1) global ptr, align 8
+@__hipstdpar_symbol_indirection_table = weak_odr protected addrspace(4) 
externally_initialized constant %class.anon zeroinitializer, align 8
+
+define amdgpu_kernel void @store(ptr %p) {
+entry:
+  store ptr %p, ptr addrspace(1) @a, align 8
+  ret void
+}
diff --git 
a/llvm/test/Transforms/HipStdPar/global-var-indirection-wrong-table-type.ll 
b/llvm/test/Transforms/HipStdPar/global-var-indirection-wrong-table-type.ll
new file mode 100644
index 0000000000000..f32e378038f1a
--- /dev/null
+++ b/llvm/test/Transforms/HipStdPar/global-var-indirection-wrong-table-type.ll
@@ -0,0 +1,13 @@
+; REQUIRES: amdgpu-registered-target
+; RUN: not opt -S -mtriple=amdgcn-amd-amdhsa 
-passes=hipstdpar-select-accelerator-code \
+; RUN: %s 2>&1 | FileCheck %s
+
+; CHECK: error: The Indirection Table must be a struct type; ptr is incorrect.
+@a = external hidden local_unnamed_addr addrspace(1) global ptr, align 8
+@__hipstdpar_symbol_indirection_table = weak_odr protected addrspace(4) 
externally_initialized constant ptr zeroinitializer, align 8
+
+define amdgpu_kernel void @store(ptr %p) {
+entry:
+  store ptr %p, ptr addrspace(1) @a, align 8
+  ret void
+}
diff --git a/llvm/test/Transforms/HipStdPar/global-var-indirection.ll 
b/llvm/test/Transforms/HipStdPar/global-var-indirection.ll
new file mode 100644
index 0000000000000..da4bb0b549947
--- /dev/null
+++ b/llvm/test/Transforms/HipStdPar/global-var-indirection.ll
@@ -0,0 +1,87 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py 
UTC_ARGS: --check-globals all --version 5
+; REQUIRES: amdgpu-registered-target
+; RUN: opt -S -mtriple=amdgcn-amd-amdhsa 
-passes=hipstdpar-select-accelerator-code \
+; RUN: %s | FileCheck %s
+
+%class.anon = type { i64, ptr, %struct.anon.1 }
+%struct.anon.1 = type { ptr, ptr }
+%struct.A = type { i32, i32, i32, i32, i32, double, [205 x double], [2000 x 
i32], [52000 x i32], [156000 x double], [14823 x double] }
+
+@do_not_indirect = protected addrspace(4) externally_initialized constant [4 x 
double] [double 1.000000e+00, double 1.000000e+00, double 2.000000e+00, double 
6.000000e+00], align 16
+@a = external hidden local_unnamed_addr addrspace(1) global %struct.A, align 8
+@b = external hidden local_unnamed_addr addrspace(1) global ptr, align 8
+@c = internal addrspace(1) global { i32 } zeroinitializer, align 4
+@__hipstdpar_symbol_indirection_table = weak_odr protected addrspace(4) 
externally_initialized constant %class.anon zeroinitializer, align 8
+
+declare i64 @fn(i64 %x, i32 %y, i64 %z, i64 %w)
+
+;.
+; CHECK: @do_not_indirect = protected addrspace(4) externally_initialized 
constant [4 x double] [double 1.000000e+00, double 1.000000e+00, double 
2.000000e+00, double 6.000000e+00], align 16
+; CHECK: @[[GLOB0:[0-9]+]] = private addrspace(1) constant [2 x i8] c"a\00"
+; CHECK: @[[GLOB1:[0-9]+]] = private addrspace(1) externally_initialized 
constant ptr addrspace(1) poison
+; CHECK: @[[GLOB2:[0-9]+]] = private addrspace(1) constant [2 x i8] c"b\00"
+; CHECK: @[[GLOB3:[0-9]+]] = private addrspace(1) externally_initialized 
constant ptr addrspace(1) poison
+; CHECK: @[[GLOB4:[0-9]+]] = private addrspace(1) constant [2 x i8] c"c\00"
+; CHECK: @[[GLOB5:[0-9]+]] = private addrspace(1) externally_initialized 
constant ptr addrspace(1) poison
+; CHECK: @[[GLOB6:[0-9]+]] = private addrspace(1) constant [3 x 
%struct.anon.1] [%struct.anon.1 { ptr addrspacecast (ptr addrspace(1) 
@[[GLOB0]] to ptr), ptr addrspacecast (ptr addrspace(1) @[[GLOB1]] to ptr) }, 
%struct.anon.1 { ptr addrspacecast (ptr addrspace(1) @[[GLOB2]] to ptr), ptr 
addrspacecast (ptr addrspace(1) @[[GLOB3]] to ptr) }, %struct.anon.1 { ptr 
addrspacecast (ptr addrspace(1) @[[GLOB4]] to ptr), ptr addrspacecast (ptr 
addrspace(1) @[[GLOB5]] to ptr) }]
+; CHECK: @__hipstdpar_symbol_indirection_table = weak_odr protected 
addrspace(4) externally_initialized constant %class.anon { i64 3, ptr 
addrspacecast (ptr addrspace(1) @[[GLOB6]] to ptr), %struct.anon.1 poison }, 
align 8
+;.
+define double @gep(i64 %idx) {
+; CHECK-LABEL: define double @gep(
+; CHECK-SAME: i64 [[IDX:%.*]]) {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[TMP0:%.*]] = load ptr addrspace(1), ptr addrspace(1) 
@[[GLOB1]], align 8
+; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr inbounds nuw i8, ptr 
addrspace(1) [[TMP0]], i64 217672
+; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [156000 x double], 
ptr addrspace(1) [[TMP1]], i64 0, i64 [[IDX]]
+; CHECK-NEXT:    [[R:%.*]] = load double, ptr addrspace(1) [[ARRAYIDX]], align 
8
+; CHECK-NEXT:    ret double [[R]]
+;
+entry:
+  %arrayidx = getelementptr inbounds [156000 x double], ptr addrspace(1) 
getelementptr inbounds nuw (i8, ptr addrspace(1) @a, i64 217672), i64 0, i64 
%idx
+  %r = load double, ptr addrspace(1) %arrayidx, align 8
+  ret double %r
+}
+
+define void @store(ptr %p) {
+; CHECK-LABEL: define void @store(
+; CHECK-SAME: ptr [[P:%.*]]) {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[TMP0:%.*]] = load ptr addrspace(1), ptr addrspace(1) 
@[[GLOB3]], align 8
+; CHECK-NEXT:    store ptr addrspace(1) [[TMP0]], ptr addrspace(1) poison, 
align 8
+; CHECK-NEXT:    ret void
+;
+entry:
+  store ptr %p, ptr addrspace(1) @b, align 8
+  ret void
+}
+
+define i64 @chain(i64 %x, i32 %y, i64 %z) {
+; CHECK-LABEL: define i64 @chain(
+; CHECK-SAME: i64 [[X:%.*]], i32 [[Y:%.*]], i64 [[Z:%.*]]) {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[TMP0:%.*]] = load ptr addrspace(1), ptr addrspace(1) 
@[[GLOB5]], align 8
+; CHECK-NEXT:    [[TMP1:%.*]] = addrspacecast ptr addrspace(1) [[TMP0]] to ptr
+; CHECK-NEXT:    [[TMP2:%.*]] = ptrtoint ptr [[TMP1]] to i64
+; CHECK-NEXT:    [[TMP3:%.*]] = call i64 @fn(i64 [[X]], i32 [[Y]], i64 
[[TMP2]], i64 [[Z]])
+; CHECK-NEXT:    ret i64 [[TMP3]]
+;
+entry:
+  %0 = call i64 @fn(i64 %x, i32 %y, i64 ptrtoint (ptr addrspacecast (ptr 
addrspace(1) @c to ptr) to i64), i64 %z)
+  ret i64 %0
+}
+
+define amdgpu_kernel void @ensure_reachable(ptr %p, i64 %idx, i64 %x, i32 %y, 
i64 %z) {
+; CHECK-LABEL: define amdgpu_kernel void @ensure_reachable(
+; CHECK-SAME: ptr [[P:%.*]], i64 [[IDX:%.*]], i64 [[X:%.*]], i32 [[Y:%.*]], 
i64 [[Z:%.*]]) {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    call void @store(ptr [[P]])
+; CHECK-NEXT:    [[TMP0:%.*]] = call double @gep(i64 [[IDX]])
+; CHECK-NEXT:    [[TMP1:%.*]] = call i64 @chain(i64 [[X]], i32 [[Y]], i64 
[[Z]])
+; CHECK-NEXT:    ret void
+;
+entry:
+  call void @store(ptr %p)
+  %0 = call double @gep(i64 %idx)
+  %1 = call i64 @chain(i64 %x, i32 %y, i64 %z)
+  ret void
+}
diff --git a/llvm/test/Transforms/HipStdPar/global-var.ll 
b/llvm/test/Transforms/HipStdPar/global-var.ll
index 860c30e4a464d..3a22a7b864f03 100644
--- a/llvm/test/Transforms/HipStdPar/global-var.ll
+++ b/llvm/test/Transforms/HipStdPar/global-var.ll
@@ -2,8 +2,8 @@
 ; RUN: opt -S -mtriple=amdgcn-amd-amdhsa 
-passes=hipstdpar-select-accelerator-code \
 ; RUN: %s | FileCheck %s
 
-; CHECK: @var = extern_weak addrspace(1) externally_initialized global i32, 
align 4
-@var = addrspace(1) global i32 0, align 4
+; CHECK: @var = addrspace(1) global i32 poison, align 4
+@var = external addrspace(1) global i32, align 4
 
 define amdgpu_kernel void @kernel() {
 entry:

>From 09fdfcc8a8e601995655275a34de5cabbd41fe37 Mon Sep 17 00:00:00 2001
From: Alex Voicu <alexandru.vo...@amd.com>
Date: Thu, 3 Jul 2025 02:11:51 +0100
Subject: [PATCH 2/2] Update docs.

---
 clang/docs/HIPSupport.rst | 56 ++++++++-------------------------------
 1 file changed, 11 insertions(+), 45 deletions(-)

diff --git a/clang/docs/HIPSupport.rst b/clang/docs/HIPSupport.rst
index 406e1c8e5a2fe..bffb8f2348490 100644
--- a/clang/docs/HIPSupport.rst
+++ b/clang/docs/HIPSupport.rst
@@ -537,7 +537,7 @@ We define two modes in which runtime execution can occur:
    directly accessible to the accelerator and it follows the C++ memory model;
 2. **Interposition Mode** - this is a fallback mode for cases where transparent
    on-demand paging is unavailable (e.g. in the Windows OS), which means that
-   memory must be allocated via an accelerator aware mechanism, and system
+   memory must be allocated via an accelerator aware mechanism, and some system
    allocated memory is inaccessible for the accelerator.
 
 The following restrictions imposed on user code apply to both modes:
@@ -545,27 +545,8 @@ The following restrictions imposed on user code apply to 
both modes:
 1. Pointers to function, and all associated features, such as e.g. dynamic
    polymorphism, cannot be used (directly or transitively) by the user provided
    callable passed to an algorithm invocation;
-2. Global / namespace scope / ``static`` / ``thread`` storage duration 
variables
-   cannot be used (directly or transitively) in name by the user provided
-   callable;
-
-   - When executing in **HMM Mode** they can be used in address e.g.:
-
-     .. code-block:: C++
-
-        namespace { int foo = 42; }
-
-        bool never(const std::vector<int>& v) {
-          return std::any_of(std::execution::par_unseq, std::cbegin(v), 
std::cend(v), [](auto&& x) {
-            return x == foo;
-          });
-        }
-
-        bool only_in_hmm_mode(const std::vector<int>& v) {
-          return std::any_of(std::execution::par_unseq, std::cbegin(v), 
std::cend(v),
-                             [p = &foo](auto&& x) { return x == *p; });
-        }
-
+2. ``static`` / ``thread`` storage duration variables cannot be used (directly
+   or transitively) in name by the user provided callable;
 3. Only algorithms that are invoked with the ``parallel_unsequenced_policy`` 
are
    candidates for offload;
 4. Only algorithms that are invoked with iterator arguments that model
@@ -585,15 +566,6 @@ additional restrictions:
 1. All code that is expected to interoperate has to be recompiled with the
    ``--hipstdpar-interpose-alloc`` flag i.e. it is not safe to compose 
libraries
    that have been independently compiled;
-2. automatic storage duration (i.e. stack allocated) variables cannot be used
-   (directly or transitively) by the user provided callable e.g.
-
-   .. code-block:: c++
-
-      bool never(const std::vector<int>& v, int n) {
-        return std::any_of(std::execution::par_unseq, std::cbegin(v), 
std::cend(v),
-                           [p = &n](auto&& x) { return x == *p; });
-      }
 
 Current Support
 ===============
@@ -626,17 +598,12 @@ Linux operating system. Support is synthesised in the 
following table:
 
 The minimum Linux kernel version for running in HMM mode is 6.4.
 
-The forwarding header can be obtained from
-`its GitHub repository <https://github.com/ROCm/roc-stdpar>`_.
-It will be packaged with a future `ROCm 
<https://rocm.docs.amd.com/en/latest/>`_
-release. Because accelerated algorithms are provided via
-`rocThrust <https://rocm.docs.amd.com/projects/rocThrust/en/latest/>`_, a
-transitive dependency on
-`rocPrim <https://rocm.docs.amd.com/projects/rocPRIM/en/latest/>`_ exists. Both
-can be obtained either by installing their associated components of the
-`ROCm <https://rocm.docs.amd.com/en/latest/>`_ stack, or from their respective
-repositories. The list algorithms that can be offloaded is available
-`here <https://github.com/ROCm/roc-stdpar#algorithm-support-status>`_.
+The forwarding header is packaged by
+`ROCm <https://rocm.docs.amd.com/en/latest/>`_, and is obtainable by installing
+the `hipstdpar` packege. The list algorithms that can be offloaded is available
+`here <https://github.com/ROCm/roc-stdpar#algorithm-support-status>`_. More
+details are available
+`here 
<https://rocm.blogs.amd.com/software-tools-optimization/hipstdpar/README.html>`_.
 
 HIP Specific Elements
 ---------------------
@@ -690,9 +657,8 @@ HIP Specific Elements
 Open Questions / Future Developments
 ====================================
 
-1. The restriction on the use of global / namespace scope / ``static`` /
-   ``thread`` storage duration variables in offloaded algorithms will be lifted
-   in the future, when running in **HMM Mode**;
+1. The restriction on the use of ``static`` / ``thread`` storage duration
+   variables in offloaded algorithms might be lifted;
 2. The restriction on the use of dynamic memory allocation in offloaded
    algorithms will be lifted in the future.
 3. The restriction on the use of pointers to function, and associated features

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to