[llvm-branch-commits] [clang] [llvm] clang/AMDGPU: Emit atomicrmw from ds_fadd builtins (PR #95395)

2024-06-14 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian edited 
https://github.com/llvm/llvm-project/pull/95395
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] clang/AMDGPU: Emit atomicrmw from ds_fadd builtins (PR #95395)

2024-06-14 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian approved this pull request.

Looks fairly straightforward with those prerequisites.

https://github.com/llvm/llvm-project/pull/95395
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Start selecting flat/global atomicrmw fmin/fmax. (PR #95592)

2024-06-18 Thread Shilei Tian via llvm-branch-commits


@@ -1699,7 +1709,7 @@ multiclass SIBufferAtomicPat_Common RtnModes = ["ret", "noret"]> {
-  let SubtargetPredicate = HasUnrestrictedSOffset in {
+  let OtherPredicates = [HasUnrestrictedSOffset] in {

shiltian wrote:

A side question, what is the difference between `OtherPredicates` and 
`SubtargetPredicate`? It looks like you swapped a couple of them here.

https://github.com/llvm/llvm-project/pull/95592
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Start selecting flat/global atomicrmw fmin/fmax. (PR #95592)

2024-06-18 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian approved this pull request.

LG

https://github.com/llvm/llvm-project/pull/95592
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Legalize atomicrmw fmin/fmax (PR #97048)

2024-06-28 Thread Shilei Tian via llvm-branch-commits


@@ -1670,10 +1670,22 @@ AMDGPULegalizerInfo::AMDGPULegalizerInfo(const 
GCNSubtarget &ST_,
   if (ST.hasAtomicFlatPkAdd16Insts())
 Atomic.legalFor({{V2F16, FlatPtr}, {V2BF16, FlatPtr}});
 
-  // FIXME: Handle flat, global and buffer cases.
-  getActionDefinitionsBuilder({G_ATOMICRMW_FMIN, G_ATOMICRMW_FMAX})
+
+  // Most of the legalization work here is done by AtomicExpand. We could
+  // probably use a simpler legality rule that just assumes anything is OK.
+  auto &AtomicFMinFMax =
+getActionDefinitionsBuilder({G_ATOMICRMW_FMIN, G_ATOMICRMW_FMAX})
 .legalFor({{F32, LocalPtr}, {F64, LocalPtr}});
 
+  if (ST.hasAtomicFMinFMaxF32GlobalInsts())
+AtomicFMinFMax.legalFor({{F32, GlobalPtr},{F32, BufferFatPtr}});

shiltian wrote:

so those targets that support global ptrs also support buffer fat ptrs?

https://github.com/llvm/llvm-project/pull/97048
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Legalize atomicrmw fmin/fmax (PR #97048)

2024-06-28 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian approved this pull request.


https://github.com/llvm/llvm-project/pull/97048
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] clang/AMDGPU: Emit atomicrmw for global/flat fadd v2bf16 builtins (PR #96875)

2024-07-22 Thread Shilei Tian via llvm-branch-commits


@@ -48,7 +48,7 @@ void test_local_add_2f16_noret(__local half2 *addr, half2 x) {
 }
 
 // CHECK-LABEL: test_flat_add_2f16
-// CHECK: [[RMW:%.+]] = atomicrmw fadd ptr %{{.+}}, <2 x half> %{{.+}} 
syncscope("agent") seq_cst, align 4, !amdgpu.no.fine.grained.memory !{{[0-9]+$}}
+// CHECK: [[RMW:%.+]] = atomicrmw fadd ptr %{{.+}}, <2 x half> %{{.+}} 
syncscope("agent") monotonic, align 4, !amdgpu.no.fine.grained.memory 
!{{[0-9]+$}}

shiltian wrote:

why was this changed?

https://github.com/llvm/llvm-project/pull/96875
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] clang/AMDGPU: Emit atomicrmw for global/flat fadd v2bf16 builtins (PR #96875)

2024-07-22 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian approved this pull request.

LGTM with one question regarding the memory order

https://github.com/llvm/llvm-project/pull/96875
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] clang/AMDGPU: Emit atomicrmw for global/flat fadd v2bf16 builtins (PR #96875)

2024-07-22 Thread Shilei Tian via llvm-branch-commits


@@ -48,7 +48,7 @@ void test_local_add_2f16_noret(__local half2 *addr, half2 x) {
 }
 
 // CHECK-LABEL: test_flat_add_2f16
-// CHECK: [[RMW:%.+]] = atomicrmw fadd ptr %{{.+}}, <2 x half> %{{.+}} 
syncscope("agent") seq_cst, align 4, !amdgpu.no.fine.grained.memory !{{[0-9]+$}}
+// CHECK: [[RMW:%.+]] = atomicrmw fadd ptr %{{.+}}, <2 x half> %{{.+}} 
syncscope("agent") monotonic, align 4, !amdgpu.no.fine.grained.memory 
!{{[0-9]+$}}

shiltian wrote:

it passes BuildBot tests

https://github.com/llvm/llvm-project/pull/96875
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] [LLVM][PassBuilder] Extend the function signature of callback for optimizer pipeline extension point (PR #100953)

2024-07-28 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian created 
https://github.com/llvm/llvm-project/pull/100953

These callbacks can be invoked in multiple places when building an optimization
pipeline, both in compile time and link time. However, there is no indicator on
what pipeline it is currently building.

In this patch, an extra argument is added to indicate its (Thin)LTO stage such
that the callback can check it if needed. There is no test expected from this,
and the benefit of this change will be demonstrated in 
https://github.com/llvm/llvm-project/pull/66488.

>From 9c949107271a303b7961cb2a1bea3157008323d6 Mon Sep 17 00:00:00 2001
From: Shilei Tian 
Date: Sun, 28 Jul 2024 15:28:09 -0400
Subject: [PATCH] [LLVM][PassBuilder] Extend the function signature of callback
 for optimizer pipeline extension point

These callbacks can be invoked in multiple places when building an optimization
pipeline, both in compile time and link time. However, there is no indicator on
what pipeline it is currently building.

In this patch, an extra argument is added to indicate its (Thin)LTO stage such
that the callback can check it if needed. There is no test expected from this,
and the benefit of this change will be demonstrated in 
https://github.com/llvm/llvm-project/pull/66488.
---
 clang/lib/CodeGen/BackendUtil.cpp | 19 ++-
 llvm/include/llvm/Passes/PassBuilder.h| 10 +++---
 llvm/lib/Passes/PassBuilderPipelines.cpp  | 12 +++-
 llvm/lib/Target/AMDGPU/AMDGPU.h   |  7 ++-
 llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp   | 11 +++
 .../lib/Target/AMDGPU/AMDGPUTargetMachine.cpp | 15 +--
 llvm/tools/opt/NewPMDriver.cpp|  2 +-
 7 files changed, 47 insertions(+), 29 deletions(-)

diff --git a/clang/lib/CodeGen/BackendUtil.cpp 
b/clang/lib/CodeGen/BackendUtil.cpp
index e765bbf637a66..64f0020a170aa 100644
--- a/clang/lib/CodeGen/BackendUtil.cpp
+++ b/clang/lib/CodeGen/BackendUtil.cpp
@@ -643,7 +643,7 @@ static void addKCFIPass(const Triple &TargetTriple, const 
LangOptions &LangOpts,
 
   // Ensure we lower KCFI operand bundles with -O0.
   PB.registerOptimizerLastEPCallback(
-  [&](ModulePassManager &MPM, OptimizationLevel Level) {
+  [&](ModulePassManager &MPM, OptimizationLevel Level, ThinOrFullLTOPhase) 
{
 if (Level == OptimizationLevel::O0 &&
 LangOpts.Sanitize.has(SanitizerKind::KCFI))
   MPM.addPass(createModuleToFunctionPassAdaptor(KCFIPass()));
@@ -662,8 +662,8 @@ static void addKCFIPass(const Triple &TargetTriple, const 
LangOptions &LangOpts,
 static void addSanitizers(const Triple &TargetTriple,
   const CodeGenOptions &CodeGenOpts,
   const LangOptions &LangOpts, PassBuilder &PB) {
-  auto SanitizersCallback = [&](ModulePassManager &MPM,
-OptimizationLevel Level) {
+  auto SanitizersCallback = [&](ModulePassManager &MPM, OptimizationLevel 
Level,
+ThinOrFullLTOPhase) {
 if (CodeGenOpts.hasSanitizeCoverage()) {
   auto SancovOpts = getSancovOptsFromCGOpts(CodeGenOpts);
   MPM.addPass(SanitizerCoveragePass(
@@ -749,7 +749,7 @@ static void addSanitizers(const Triple &TargetTriple,
 PB.registerOptimizerEarlyEPCallback(
 [SanitizersCallback](ModulePassManager &MPM, OptimizationLevel Level) {
   ModulePassManager NewMPM;
-  SanitizersCallback(NewMPM, Level);
+  SanitizersCallback(NewMPM, Level, ThinOrFullLTOPhase::None);
   if (!NewMPM.isEmpty()) {
 // Sanitizers can abandon.
 NewMPM.addPass(RequireAnalysisPass());
@@ -1018,11 +1018,12 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
 // TODO: Consider passing the MemoryProfileOutput to the pass builder via
 // the PGOOptions, and set this up there.
 if (!CodeGenOpts.MemoryProfileOutput.empty()) {
-  PB.registerOptimizerLastEPCallback(
-  [](ModulePassManager &MPM, OptimizationLevel Level) {
-MPM.addPass(createModuleToFunctionPassAdaptor(MemProfilerPass()));
-MPM.addPass(ModuleMemProfilerPass());
-  });
+  PB.registerOptimizerLastEPCallback([](ModulePassManager &MPM,
+OptimizationLevel Level,
+ThinOrFullLTOPhase) {
+MPM.addPass(createModuleToFunctionPassAdaptor(MemProfilerPass()));
+MPM.addPass(ModuleMemProfilerPass());
+  });
 }
 
 if (CodeGenOpts.FatLTO) {
diff --git a/llvm/include/llvm/Passes/PassBuilder.h 
b/llvm/include/llvm/Passes/PassBuilder.h
index 474a19531ff5d..4c2763404ff05 100644
--- a/llvm/include/llvm/Passes/PassBuilder.h
+++ b/llvm/include/llvm/Passes/PassBuilder.h
@@ -497,7 +497,8 @@ class PassBuilder {
   /// This extension point allows adding optimizations at the very end of the
   /// function optimization pipeline.
   void registerOptimizerLastEPCallback(
-  const st

[llvm-branch-commits] [clang] [llvm] [LLVM][PassBuilder] Extend the function signature of callback for optimizer pipeline extension point (PR #100953)

2024-07-28 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian ready_for_review 
https://github.com/llvm/llvm-project/pull/100953
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] [LLVM][PassBuilder] Extend the function signature of callback for optimizer pipeline extension point (PR #100953)

2024-07-28 Thread Shilei Tian via llvm-branch-commits

shiltian wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/100953?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#100953** https://app.graphite.dev/github/pr/llvm/llvm-project/100953?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈
* **#100952** https://app.graphite.dev/github/pr/llvm/llvm-project/100952?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`

This stack of pull requests is managed by Graphite. https://stacking.dev/?utm_source=stack-comment";>Learn more about 
stacking.


 Join @shiltian and the rest of your teammates on https://graphite.dev?utm-source=stack-comment";>https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="11px" height="11px"/> Graphite
  

https://github.com/llvm/llvm-project/pull/100953
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [Attributor][AMDGPU] Improve the handling of indirect calls (PR #100954)

2024-07-28 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian created 
https://github.com/llvm/llvm-project/pull/100954

None

>From 26e3c81b1488d32620f840d741966648e6d6c884 Mon Sep 17 00:00:00 2001
From: Shilei Tian 
Date: Sun, 28 Jul 2024 19:24:31 -0400
Subject: [PATCH] [Attributor][AMDGPU] Improve the handling of indirect calls

---
 llvm/include/llvm/Transforms/IPO/Attributor.h  |  9 +
 llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp| 18 ++
 llvm/lib/Transforms/IPO/Attributor.cpp |  2 +-
 .../Transforms/IPO/AttributorAttributes.cpp|  3 ++-
 .../AMDGPU/amdgpu-attributor-no-agpr.ll| 16 +++-
 5 files changed, 29 insertions(+), 19 deletions(-)

diff --git a/llvm/include/llvm/Transforms/IPO/Attributor.h 
b/llvm/include/llvm/Transforms/IPO/Attributor.h
index 34557238ecb23..596ee39c35a37 100644
--- a/llvm/include/llvm/Transforms/IPO/Attributor.h
+++ b/llvm/include/llvm/Transforms/IPO/Attributor.h
@@ -1448,7 +1448,7 @@ struct AttributorConfig {
   /// Callback function to determine if an indirect call targets should be made
   /// direct call targets (with an if-cascade).
   std::function
+ Function &AssummedCallee, bool IsSingleton)>
   IndirectCalleeSpecializationCallback = nullptr;
 
   /// Helper to update an underlying call graph and to delete functions.
@@ -1718,10 +1718,11 @@ struct Attributor {
   /// Return true if we should specialize the call site \b CB for the potential
   /// callee \p Fn.
   bool shouldSpecializeCallSiteForCallee(const AbstractAttribute &AA,
- CallBase &CB, Function &Callee) {
+ CallBase &CB, Function &Callee,
+ bool IsSingleton) {
 return Configuration.IndirectCalleeSpecializationCallback
-   ? Configuration.IndirectCalleeSpecializationCallback(*this, AA,
-CB, Callee)
+   ? Configuration.IndirectCalleeSpecializationCallback(
+ *this, AA, CB, Callee, IsSingleton)
: true;
   }
 
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
index ab98da31b050f..b8ab11a7b420b 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
@@ -14,6 +14,7 @@
 #include "GCNSubtarget.h"
 #include "Utils/AMDGPUBaseInfo.h"
 #include "llvm/Analysis/CycleAnalysis.h"
+#include "llvm/Analysis/TargetTransformInfo.h"
 #include "llvm/CodeGen/TargetPassConfig.h"
 #include "llvm/IR/IntrinsicsAMDGPU.h"
 #include "llvm/IR/IntrinsicsR600.h"
@@ -1041,11 +1042,28 @@ static bool runImpl(Module &M, AnalysisGetter &AG, 
TargetMachine &TM,
&AAPointerInfo::ID, &AAPotentialConstantValues::ID,
&AAUnderlyingObjects::ID, &AAIndirectCallInfo::ID});
 
+  /// Helper to decide if we should specialize the indirect \p CB for \p 
Callee.
+  /// \p IsSingleton indicates whether the \p Callee is the only assumed 
callee.
+  auto IndirectCalleeSpecializationCallback =
+  [&](Attributor &A, const AbstractAttribute &AA, CallBase &CB,
+  Function &Callee, bool IsSingleton) {
+if (AMDGPU::isEntryFunctionCC(Callee.getCallingConv()))
+  return false;
+// Singleton functions should be specialized.
+if (IsSingleton)
+  return true;
+// Otherwise specialize uniform values.
+const auto &TTI = TM.getTargetTransformInfo(*CB.getCaller());
+return TTI.isAlwaysUniform(CB.getCalledOperand());
+  };
+
   AttributorConfig AC(CGUpdater);
   AC.IsClosedWorldModule = HasWholeProgramVisibility;
   AC.Allowed = &Allowed;
   AC.IsModulePass = true;
   AC.DefaultInitializeLiveInternals = false;
+  AC.IndirectCalleeSpecializationCallback =
+  IndirectCalleeSpecializationCallback;
   AC.IPOAmendableCB = [](const Function &F) {
 return F.getCallingConv() == CallingConv::AMDGPU_KERNEL;
   };
diff --git a/llvm/lib/Transforms/IPO/Attributor.cpp 
b/llvm/lib/Transforms/IPO/Attributor.cpp
index 910c0aeacc42e..879a26bcf328d 100644
--- a/llvm/lib/Transforms/IPO/Attributor.cpp
+++ b/llvm/lib/Transforms/IPO/Attributor.cpp
@@ -3836,7 +3836,7 @@ static bool runAttributorOnFunctions(InformationCache 
&InfoCache,
   if (MaxSpecializationPerCB.getNumOccurrences()) {
 AC.IndirectCalleeSpecializationCallback =
 [&](Attributor &, const AbstractAttribute &AA, CallBase &CB,
-Function &Callee) {
+Function &Callee, bool IsSingleton) {
   if (MaxSpecializationPerCB == 0)
 return false;
   auto &Set = IndirectCalleeTrackingMap[&CB];
diff --git a/llvm/lib/Transforms/IPO/AttributorAttributes.cpp 
b/llvm/lib/Transforms/IPO/AttributorAttributes.cpp
index 2816a85743faa..3f02ea1cbd6cb 100644
--- a/llvm/lib/Transforms/IPO/AttributorAttributes.cpp
+++ b/llvm/lib/Transforms/IPO/AttributorAttributes.cpp
@@ -12347,7 +12347,8 @@ struct AAIndi

[llvm-branch-commits] [llvm] [Attributor][AMDGPU] Improve the handling of indirect calls (PR #100954)

2024-07-28 Thread Shilei Tian via llvm-branch-commits

shiltian wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/100954?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#100954** https://app.graphite.dev/github/pr/llvm/llvm-project/100954?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈
* **#100953** https://app.graphite.dev/github/pr/llvm/llvm-project/100953?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#100952** https://app.graphite.dev/github/pr/llvm/llvm-project/100952?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`

This stack of pull requests is managed by Graphite. https://stacking.dev/?utm_source=stack-comment";>Learn more about 
stacking.


 Join @shiltian and the rest of your teammates on https://graphite.dev?utm-source=stack-comment";>https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="11px" height="11px"/> Graphite
  

https://github.com/llvm/llvm-project/pull/100954
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] [LLVM][PassBuilder] Extend the function signature of callback for optimizer pipeline extension point (PR #100953)

2024-07-28 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian edited 
https://github.com/llvm/llvm-project/pull/100953
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [Attributor][AMDGPU] Improve the handling of indirect calls (PR #100954)

2024-07-29 Thread Shilei Tian via llvm-branch-commits

shiltian wrote:

> The apparent change here is to simply reverse the effect of #100952 on the 
> lit test. Would be good to have a test which shows what the improvement is.

Yes, this patch is still WIP (draft).

> Also, I think #100952 merely enables AAIndirectCallInfo, and feels like an 
> integral part of this change itself. I would lean towards squashing it into 
> this change.

#100953 is based on #100952 because I'd like to demonstrate how the change of 
function signature will work.

https://github.com/llvm/llvm-project/pull/100954
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] [LLVM][PassBuilder] Extend the function signature of callback for optimizer pipeline extension point (PR #100953)

2024-07-29 Thread Shilei Tian via llvm-branch-commits

shiltian wrote:

> This seems fine to me in general. The patch stack seems to be messed up 
> though, or at least this seems to contain some unrelated AMDGPU changes.

It has some AMD changes because I'd like to demonstrate how the changes will be 
used.

> The other thing I wonder about is whether this argument should be added to 
> other callbacks as well for consistency.

Yeah, I was wondering that as well. I'm happy to do the changes but not sure if 
that's necessary.

https://github.com/llvm/llvm-project/pull/100953
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] [LLVM][PassBuilder] Extend the function signature of callback for optimizer pipeline extension point (PR #100953)

2024-07-29 Thread Shilei Tian via llvm-branch-commits


@@ -2159,7 +2161,7 @@ ModulePassManager 
PassBuilder::buildO0DefaultPipeline(OptimizationLevel Level,
   CoroPM.addPass(GlobalDCEPass());
   MPM.addPass(CoroConditionalWrapper(std::move(CoroPM)));
 
-  invokeOptimizerLastEPCallbacks(MPM, Level);
+  invokeOptimizerLastEPCallbacks(MPM, Level, ThinOrFullLTOPhase::None);

shiltian wrote:

Good catch. I didn't notice the `bool LTOPreLink` argument.

https://github.com/llvm/llvm-project/pull/100953
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] [LLVM][PassBuilder] Extend the function signature of callback for optimizer pipeline extension point (PR #100953)

2024-07-29 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian updated 
https://github.com/llvm/llvm-project/pull/100953

>From ed46483b388d1a8803b93116beda75108a3bf478 Mon Sep 17 00:00:00 2001
From: Shilei Tian 
Date: Sun, 28 Jul 2024 15:28:09 -0400
Subject: [PATCH] [LLVM][PassBuilder] Extend the function signature of callback
 for optimizer pipeline extension point

These callbacks can be invoked in multiple places when building an optimization
pipeline, both in compile time and link time. However, there is no indicator on
what pipeline it is currently building.

In this patch, an extra argument is added to indicate its (Thin)LTO stage such
that the callback can check it if needed. There is no test expected from this,
and the benefit of this change will be demonstrated in 
https://github.com/llvm/llvm-project/pull/66488.
---
 clang/lib/CodeGen/BackendUtil.cpp | 19 ++-
 llvm/include/llvm/Passes/PassBuilder.h| 10 +++---
 llvm/lib/Passes/PassBuilderPipelines.cpp  | 14 +-
 llvm/lib/Target/AMDGPU/AMDGPU.h   |  7 ++-
 llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp   | 11 +++
 .../lib/Target/AMDGPU/AMDGPUTargetMachine.cpp | 15 +--
 llvm/tools/opt/NewPMDriver.cpp|  2 +-
 7 files changed, 49 insertions(+), 29 deletions(-)

diff --git a/clang/lib/CodeGen/BackendUtil.cpp 
b/clang/lib/CodeGen/BackendUtil.cpp
index e765bbf637a66..64f0020a170aa 100644
--- a/clang/lib/CodeGen/BackendUtil.cpp
+++ b/clang/lib/CodeGen/BackendUtil.cpp
@@ -643,7 +643,7 @@ static void addKCFIPass(const Triple &TargetTriple, const 
LangOptions &LangOpts,
 
   // Ensure we lower KCFI operand bundles with -O0.
   PB.registerOptimizerLastEPCallback(
-  [&](ModulePassManager &MPM, OptimizationLevel Level) {
+  [&](ModulePassManager &MPM, OptimizationLevel Level, ThinOrFullLTOPhase) 
{
 if (Level == OptimizationLevel::O0 &&
 LangOpts.Sanitize.has(SanitizerKind::KCFI))
   MPM.addPass(createModuleToFunctionPassAdaptor(KCFIPass()));
@@ -662,8 +662,8 @@ static void addKCFIPass(const Triple &TargetTriple, const 
LangOptions &LangOpts,
 static void addSanitizers(const Triple &TargetTriple,
   const CodeGenOptions &CodeGenOpts,
   const LangOptions &LangOpts, PassBuilder &PB) {
-  auto SanitizersCallback = [&](ModulePassManager &MPM,
-OptimizationLevel Level) {
+  auto SanitizersCallback = [&](ModulePassManager &MPM, OptimizationLevel 
Level,
+ThinOrFullLTOPhase) {
 if (CodeGenOpts.hasSanitizeCoverage()) {
   auto SancovOpts = getSancovOptsFromCGOpts(CodeGenOpts);
   MPM.addPass(SanitizerCoveragePass(
@@ -749,7 +749,7 @@ static void addSanitizers(const Triple &TargetTriple,
 PB.registerOptimizerEarlyEPCallback(
 [SanitizersCallback](ModulePassManager &MPM, OptimizationLevel Level) {
   ModulePassManager NewMPM;
-  SanitizersCallback(NewMPM, Level);
+  SanitizersCallback(NewMPM, Level, ThinOrFullLTOPhase::None);
   if (!NewMPM.isEmpty()) {
 // Sanitizers can abandon.
 NewMPM.addPass(RequireAnalysisPass());
@@ -1018,11 +1018,12 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
 // TODO: Consider passing the MemoryProfileOutput to the pass builder via
 // the PGOOptions, and set this up there.
 if (!CodeGenOpts.MemoryProfileOutput.empty()) {
-  PB.registerOptimizerLastEPCallback(
-  [](ModulePassManager &MPM, OptimizationLevel Level) {
-MPM.addPass(createModuleToFunctionPassAdaptor(MemProfilerPass()));
-MPM.addPass(ModuleMemProfilerPass());
-  });
+  PB.registerOptimizerLastEPCallback([](ModulePassManager &MPM,
+OptimizationLevel Level,
+ThinOrFullLTOPhase) {
+MPM.addPass(createModuleToFunctionPassAdaptor(MemProfilerPass()));
+MPM.addPass(ModuleMemProfilerPass());
+  });
 }
 
 if (CodeGenOpts.FatLTO) {
diff --git a/llvm/include/llvm/Passes/PassBuilder.h 
b/llvm/include/llvm/Passes/PassBuilder.h
index 474a19531ff5d..4c2763404ff05 100644
--- a/llvm/include/llvm/Passes/PassBuilder.h
+++ b/llvm/include/llvm/Passes/PassBuilder.h
@@ -497,7 +497,8 @@ class PassBuilder {
   /// This extension point allows adding optimizations at the very end of the
   /// function optimization pipeline.
   void registerOptimizerLastEPCallback(
-  const std::function &C) {
+  const std::function &C) {
 OptimizerLastEPCallbacks.push_back(C);
   }
 
@@ -630,7 +631,8 @@ class PassBuilder {
   void invokeOptimizerEarlyEPCallbacks(ModulePassManager &MPM,
OptimizationLevel Level);
   void invokeOptimizerLastEPCallbacks(ModulePassManager &MPM,
-  OptimizationLevel Level);
+  OptimizationLevel Level,

[llvm-branch-commits] [llvm] [Attributor][AMDGPU] Improve the handling of indirect calls (PR #100954)

2024-07-29 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian updated 
https://github.com/llvm/llvm-project/pull/100954

>From 0e498ef8a9204d4766a5e3bf60e7363d80f9836b Mon Sep 17 00:00:00 2001
From: Shilei Tian 
Date: Sun, 28 Jul 2024 19:24:31 -0400
Subject: [PATCH] [Attributor][AMDGPU] Improve the handling of indirect calls

---
 llvm/include/llvm/Transforms/IPO/Attributor.h  |  9 +
 llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp| 18 ++
 llvm/lib/Transforms/IPO/Attributor.cpp |  2 +-
 .../Transforms/IPO/AttributorAttributes.cpp|  3 ++-
 .../AMDGPU/amdgpu-attributor-no-agpr.ll| 16 +++-
 5 files changed, 29 insertions(+), 19 deletions(-)

diff --git a/llvm/include/llvm/Transforms/IPO/Attributor.h 
b/llvm/include/llvm/Transforms/IPO/Attributor.h
index 34557238ecb23..596ee39c35a37 100644
--- a/llvm/include/llvm/Transforms/IPO/Attributor.h
+++ b/llvm/include/llvm/Transforms/IPO/Attributor.h
@@ -1448,7 +1448,7 @@ struct AttributorConfig {
   /// Callback function to determine if an indirect call targets should be made
   /// direct call targets (with an if-cascade).
   std::function
+ Function &AssummedCallee, bool IsSingleton)>
   IndirectCalleeSpecializationCallback = nullptr;
 
   /// Helper to update an underlying call graph and to delete functions.
@@ -1718,10 +1718,11 @@ struct Attributor {
   /// Return true if we should specialize the call site \b CB for the potential
   /// callee \p Fn.
   bool shouldSpecializeCallSiteForCallee(const AbstractAttribute &AA,
- CallBase &CB, Function &Callee) {
+ CallBase &CB, Function &Callee,
+ bool IsSingleton) {
 return Configuration.IndirectCalleeSpecializationCallback
-   ? Configuration.IndirectCalleeSpecializationCallback(*this, AA,
-CB, Callee)
+   ? Configuration.IndirectCalleeSpecializationCallback(
+ *this, AA, CB, Callee, IsSingleton)
: true;
   }
 
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
index ab98da31b050f..b8ab11a7b420b 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
@@ -14,6 +14,7 @@
 #include "GCNSubtarget.h"
 #include "Utils/AMDGPUBaseInfo.h"
 #include "llvm/Analysis/CycleAnalysis.h"
+#include "llvm/Analysis/TargetTransformInfo.h"
 #include "llvm/CodeGen/TargetPassConfig.h"
 #include "llvm/IR/IntrinsicsAMDGPU.h"
 #include "llvm/IR/IntrinsicsR600.h"
@@ -1041,11 +1042,28 @@ static bool runImpl(Module &M, AnalysisGetter &AG, 
TargetMachine &TM,
&AAPointerInfo::ID, &AAPotentialConstantValues::ID,
&AAUnderlyingObjects::ID, &AAIndirectCallInfo::ID});
 
+  /// Helper to decide if we should specialize the indirect \p CB for \p 
Callee.
+  /// \p IsSingleton indicates whether the \p Callee is the only assumed 
callee.
+  auto IndirectCalleeSpecializationCallback =
+  [&](Attributor &A, const AbstractAttribute &AA, CallBase &CB,
+  Function &Callee, bool IsSingleton) {
+if (AMDGPU::isEntryFunctionCC(Callee.getCallingConv()))
+  return false;
+// Singleton functions should be specialized.
+if (IsSingleton)
+  return true;
+// Otherwise specialize uniform values.
+const auto &TTI = TM.getTargetTransformInfo(*CB.getCaller());
+return TTI.isAlwaysUniform(CB.getCalledOperand());
+  };
+
   AttributorConfig AC(CGUpdater);
   AC.IsClosedWorldModule = HasWholeProgramVisibility;
   AC.Allowed = &Allowed;
   AC.IsModulePass = true;
   AC.DefaultInitializeLiveInternals = false;
+  AC.IndirectCalleeSpecializationCallback =
+  IndirectCalleeSpecializationCallback;
   AC.IPOAmendableCB = [](const Function &F) {
 return F.getCallingConv() == CallingConv::AMDGPU_KERNEL;
   };
diff --git a/llvm/lib/Transforms/IPO/Attributor.cpp 
b/llvm/lib/Transforms/IPO/Attributor.cpp
index 910c0aeacc42e..879a26bcf328d 100644
--- a/llvm/lib/Transforms/IPO/Attributor.cpp
+++ b/llvm/lib/Transforms/IPO/Attributor.cpp
@@ -3836,7 +3836,7 @@ static bool runAttributorOnFunctions(InformationCache 
&InfoCache,
   if (MaxSpecializationPerCB.getNumOccurrences()) {
 AC.IndirectCalleeSpecializationCallback =
 [&](Attributor &, const AbstractAttribute &AA, CallBase &CB,
-Function &Callee) {
+Function &Callee, bool IsSingleton) {
   if (MaxSpecializationPerCB == 0)
 return false;
   auto &Set = IndirectCalleeTrackingMap[&CB];
diff --git a/llvm/lib/Transforms/IPO/AttributorAttributes.cpp 
b/llvm/lib/Transforms/IPO/AttributorAttributes.cpp
index 2816a85743faa..3f02ea1cbd6cb 100644
--- a/llvm/lib/Transforms/IPO/AttributorAttributes.cpp
+++ b/llvm/lib/Transforms/IPO/AttributorAttributes.cpp
@@ -12347,7 +12347,8 @@ struct AAIndirectCa

[llvm-branch-commits] [llvm] [WIP][Attributor][AMDGPU] Improve the handling of indirect calls (PR #100954)

2024-07-29 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian edited 
https://github.com/llvm/llvm-project/pull/100954
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] [LLVM][PassBuilder] Extend the function signature of callback for optimizer pipeline extension point (PR #100953)

2024-07-30 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian updated 
https://github.com/llvm/llvm-project/pull/100953

>From 9980c1fbe9da05695f30e15005119b000a19da3f Mon Sep 17 00:00:00 2001
From: Shilei Tian 
Date: Sun, 28 Jul 2024 15:28:09 -0400
Subject: [PATCH] [LLVM][PassBuilder] Extend the function signature of callback
 for optimizer pipeline extension point

These callbacks can be invoked in multiple places when building an optimization
pipeline, both in compile time and link time. However, there is no indicator on
what pipeline it is currently building.

In this patch, an extra argument is added to indicate its (Thin)LTO stage such
that the callback can check it if needed. There is no test expected from this,
and the benefit of this change will be demonstrated in 
https://github.com/llvm/llvm-project/pull/66488.
---
 clang/lib/CodeGen/BackendUtil.cpp | 19 +-
 llvm/include/llvm/Passes/PassBuilder.h| 20 +++
 llvm/lib/Passes/PassBuilderPipelines.cpp  | 36 +--
 llvm/lib/Target/AMDGPU/AMDGPU.h   |  7 +++-
 llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp   | 11 +++---
 .../lib/Target/AMDGPU/AMDGPUTargetMachine.cpp | 15 
 llvm/tools/opt/NewPMDriver.cpp|  2 +-
 7 files changed, 64 insertions(+), 46 deletions(-)

diff --git a/clang/lib/CodeGen/BackendUtil.cpp 
b/clang/lib/CodeGen/BackendUtil.cpp
index e765bbf637a66..64f0020a170aa 100644
--- a/clang/lib/CodeGen/BackendUtil.cpp
+++ b/clang/lib/CodeGen/BackendUtil.cpp
@@ -643,7 +643,7 @@ static void addKCFIPass(const Triple &TargetTriple, const 
LangOptions &LangOpts,
 
   // Ensure we lower KCFI operand bundles with -O0.
   PB.registerOptimizerLastEPCallback(
-  [&](ModulePassManager &MPM, OptimizationLevel Level) {
+  [&](ModulePassManager &MPM, OptimizationLevel Level, ThinOrFullLTOPhase) 
{
 if (Level == OptimizationLevel::O0 &&
 LangOpts.Sanitize.has(SanitizerKind::KCFI))
   MPM.addPass(createModuleToFunctionPassAdaptor(KCFIPass()));
@@ -662,8 +662,8 @@ static void addKCFIPass(const Triple &TargetTriple, const 
LangOptions &LangOpts,
 static void addSanitizers(const Triple &TargetTriple,
   const CodeGenOptions &CodeGenOpts,
   const LangOptions &LangOpts, PassBuilder &PB) {
-  auto SanitizersCallback = [&](ModulePassManager &MPM,
-OptimizationLevel Level) {
+  auto SanitizersCallback = [&](ModulePassManager &MPM, OptimizationLevel 
Level,
+ThinOrFullLTOPhase) {
 if (CodeGenOpts.hasSanitizeCoverage()) {
   auto SancovOpts = getSancovOptsFromCGOpts(CodeGenOpts);
   MPM.addPass(SanitizerCoveragePass(
@@ -749,7 +749,7 @@ static void addSanitizers(const Triple &TargetTriple,
 PB.registerOptimizerEarlyEPCallback(
 [SanitizersCallback](ModulePassManager &MPM, OptimizationLevel Level) {
   ModulePassManager NewMPM;
-  SanitizersCallback(NewMPM, Level);
+  SanitizersCallback(NewMPM, Level, ThinOrFullLTOPhase::None);
   if (!NewMPM.isEmpty()) {
 // Sanitizers can abandon.
 NewMPM.addPass(RequireAnalysisPass());
@@ -1018,11 +1018,12 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
 // TODO: Consider passing the MemoryProfileOutput to the pass builder via
 // the PGOOptions, and set this up there.
 if (!CodeGenOpts.MemoryProfileOutput.empty()) {
-  PB.registerOptimizerLastEPCallback(
-  [](ModulePassManager &MPM, OptimizationLevel Level) {
-MPM.addPass(createModuleToFunctionPassAdaptor(MemProfilerPass()));
-MPM.addPass(ModuleMemProfilerPass());
-  });
+  PB.registerOptimizerLastEPCallback([](ModulePassManager &MPM,
+OptimizationLevel Level,
+ThinOrFullLTOPhase) {
+MPM.addPass(createModuleToFunctionPassAdaptor(MemProfilerPass()));
+MPM.addPass(ModuleMemProfilerPass());
+  });
 }
 
 if (CodeGenOpts.FatLTO) {
diff --git a/llvm/include/llvm/Passes/PassBuilder.h 
b/llvm/include/llvm/Passes/PassBuilder.h
index e1d78a8685aed..ad3901902f784 100644
--- a/llvm/include/llvm/Passes/PassBuilder.h
+++ b/llvm/include/llvm/Passes/PassBuilder.h
@@ -246,8 +246,9 @@ class PassBuilder {
   /// optimization and code generation without any link-time optimization. It
   /// typically correspond to frontend "-O[123]" options for optimization
   /// levels \c O1, \c O2 and \c O3 resp.
-  ModulePassManager buildPerModuleDefaultPipeline(OptimizationLevel Level,
-  bool LTOPreLink = false);
+  ModulePassManager buildPerModuleDefaultPipeline(
+  OptimizationLevel Level,
+  ThinOrFullLTOPhase Phase = ThinOrFullLTOPhase::None);
 
   /// Build a fat object default optimization pipeline.
   ///
@@ -297,8 +298,9 @@ class PassBuilder {
   /// Build an O0 pipeline with the minimal semantica

[llvm-branch-commits] [clang] [llvm] [WIP][Attributor][AMDGPU] Improve the handling of indirect calls (PR #100954)

2024-07-30 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian closed 
https://github.com/llvm/llvm-project/pull/100954
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] [WIP][Attributor][AMDGPU] Improve the handling of indirect calls (PR #100954)

2024-07-30 Thread Shilei Tian via llvm-branch-commits

shiltian wrote:

Moved most of the code to #100952 so this one is no longer needed. Will open a 
new PR if anything we need to do after the two patches are landed.

https://github.com/llvm/llvm-project/pull/100954
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] [LLVM][PassBuilder] Extend the function signature of callback for optimizer pipeline extension point (PR #100953)

2024-07-31 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian updated 
https://github.com/llvm/llvm-project/pull/100953

>From 6e26b390631fdc6ed844e04279db3857a4c15ab0 Mon Sep 17 00:00:00 2001
From: Shilei Tian 
Date: Sun, 28 Jul 2024 15:28:09 -0400
Subject: [PATCH] [LLVM][PassBuilder] Extend the function signature of callback
 for optimizer pipeline extension point

These callbacks can be invoked in multiple places when building an optimization
pipeline, both in compile time and link time. However, there is no indicator on
what pipeline it is currently building.

In this patch, an extra argument is added to indicate its (Thin)LTO stage such
that the callback can check it if needed. There is no test expected from this,
and the benefit of this change will be demonstrated in 
https://github.com/llvm/llvm-project/pull/66488.
---
 clang/lib/CodeGen/BackendUtil.cpp | 19 +-
 llvm/include/llvm/Passes/PassBuilder.h| 20 +++
 llvm/lib/Passes/PassBuilderPipelines.cpp  | 36 +--
 llvm/lib/Target/AMDGPU/AMDGPU.h   |  7 +++-
 llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp   | 11 +++---
 .../lib/Target/AMDGPU/AMDGPUTargetMachine.cpp | 15 
 llvm/tools/opt/NewPMDriver.cpp|  2 +-
 7 files changed, 64 insertions(+), 46 deletions(-)

diff --git a/clang/lib/CodeGen/BackendUtil.cpp 
b/clang/lib/CodeGen/BackendUtil.cpp
index e765bbf637a66..64f0020a170aa 100644
--- a/clang/lib/CodeGen/BackendUtil.cpp
+++ b/clang/lib/CodeGen/BackendUtil.cpp
@@ -643,7 +643,7 @@ static void addKCFIPass(const Triple &TargetTriple, const 
LangOptions &LangOpts,
 
   // Ensure we lower KCFI operand bundles with -O0.
   PB.registerOptimizerLastEPCallback(
-  [&](ModulePassManager &MPM, OptimizationLevel Level) {
+  [&](ModulePassManager &MPM, OptimizationLevel Level, ThinOrFullLTOPhase) 
{
 if (Level == OptimizationLevel::O0 &&
 LangOpts.Sanitize.has(SanitizerKind::KCFI))
   MPM.addPass(createModuleToFunctionPassAdaptor(KCFIPass()));
@@ -662,8 +662,8 @@ static void addKCFIPass(const Triple &TargetTriple, const 
LangOptions &LangOpts,
 static void addSanitizers(const Triple &TargetTriple,
   const CodeGenOptions &CodeGenOpts,
   const LangOptions &LangOpts, PassBuilder &PB) {
-  auto SanitizersCallback = [&](ModulePassManager &MPM,
-OptimizationLevel Level) {
+  auto SanitizersCallback = [&](ModulePassManager &MPM, OptimizationLevel 
Level,
+ThinOrFullLTOPhase) {
 if (CodeGenOpts.hasSanitizeCoverage()) {
   auto SancovOpts = getSancovOptsFromCGOpts(CodeGenOpts);
   MPM.addPass(SanitizerCoveragePass(
@@ -749,7 +749,7 @@ static void addSanitizers(const Triple &TargetTriple,
 PB.registerOptimizerEarlyEPCallback(
 [SanitizersCallback](ModulePassManager &MPM, OptimizationLevel Level) {
   ModulePassManager NewMPM;
-  SanitizersCallback(NewMPM, Level);
+  SanitizersCallback(NewMPM, Level, ThinOrFullLTOPhase::None);
   if (!NewMPM.isEmpty()) {
 // Sanitizers can abandon.
 NewMPM.addPass(RequireAnalysisPass());
@@ -1018,11 +1018,12 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
 // TODO: Consider passing the MemoryProfileOutput to the pass builder via
 // the PGOOptions, and set this up there.
 if (!CodeGenOpts.MemoryProfileOutput.empty()) {
-  PB.registerOptimizerLastEPCallback(
-  [](ModulePassManager &MPM, OptimizationLevel Level) {
-MPM.addPass(createModuleToFunctionPassAdaptor(MemProfilerPass()));
-MPM.addPass(ModuleMemProfilerPass());
-  });
+  PB.registerOptimizerLastEPCallback([](ModulePassManager &MPM,
+OptimizationLevel Level,
+ThinOrFullLTOPhase) {
+MPM.addPass(createModuleToFunctionPassAdaptor(MemProfilerPass()));
+MPM.addPass(ModuleMemProfilerPass());
+  });
 }
 
 if (CodeGenOpts.FatLTO) {
diff --git a/llvm/include/llvm/Passes/PassBuilder.h 
b/llvm/include/llvm/Passes/PassBuilder.h
index e1d78a8685aed..ad3901902f784 100644
--- a/llvm/include/llvm/Passes/PassBuilder.h
+++ b/llvm/include/llvm/Passes/PassBuilder.h
@@ -246,8 +246,9 @@ class PassBuilder {
   /// optimization and code generation without any link-time optimization. It
   /// typically correspond to frontend "-O[123]" options for optimization
   /// levels \c O1, \c O2 and \c O3 resp.
-  ModulePassManager buildPerModuleDefaultPipeline(OptimizationLevel Level,
-  bool LTOPreLink = false);
+  ModulePassManager buildPerModuleDefaultPipeline(
+  OptimizationLevel Level,
+  ThinOrFullLTOPhase Phase = ThinOrFullLTOPhase::None);
 
   /// Build a fat object default optimization pipeline.
   ///
@@ -297,8 +298,9 @@ class PassBuilder {
   /// Build an O0 pipeline with the minimal semantica

[llvm-branch-commits] [clang] [llvm] [LLVM][PassBuilder] Extend the function signature of callback for optimizer pipeline extension point (PR #100953)

2024-07-31 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian updated 
https://github.com/llvm/llvm-project/pull/100953

>From 8df9dec35f80419fc4d6692d47e9df59d35fcf90 Mon Sep 17 00:00:00 2001
From: Shilei Tian 
Date: Sun, 28 Jul 2024 15:28:09 -0400
Subject: [PATCH] [LLVM][PassBuilder] Extend the function signature of callback
 for optimizer pipeline extension point

These callbacks can be invoked in multiple places when building an optimization
pipeline, both in compile time and link time. However, there is no indicator on
what pipeline it is currently building.

In this patch, an extra argument is added to indicate its (Thin)LTO stage such
that the callback can check it if needed. There is no test expected from this,
and the benefit of this change will be demonstrated in 
https://github.com/llvm/llvm-project/pull/66488.
---
 clang/lib/CodeGen/BackendUtil.cpp | 19 +-
 llvm/include/llvm/Passes/PassBuilder.h| 20 +++
 llvm/lib/Passes/PassBuilderPipelines.cpp  | 36 +--
 llvm/lib/Target/AMDGPU/AMDGPU.h   |  7 +++-
 llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp   | 11 +++---
 .../lib/Target/AMDGPU/AMDGPUTargetMachine.cpp | 15 
 llvm/tools/opt/NewPMDriver.cpp|  2 +-
 7 files changed, 64 insertions(+), 46 deletions(-)

diff --git a/clang/lib/CodeGen/BackendUtil.cpp 
b/clang/lib/CodeGen/BackendUtil.cpp
index e765bbf637a66..64f0020a170aa 100644
--- a/clang/lib/CodeGen/BackendUtil.cpp
+++ b/clang/lib/CodeGen/BackendUtil.cpp
@@ -643,7 +643,7 @@ static void addKCFIPass(const Triple &TargetTriple, const 
LangOptions &LangOpts,
 
   // Ensure we lower KCFI operand bundles with -O0.
   PB.registerOptimizerLastEPCallback(
-  [&](ModulePassManager &MPM, OptimizationLevel Level) {
+  [&](ModulePassManager &MPM, OptimizationLevel Level, ThinOrFullLTOPhase) 
{
 if (Level == OptimizationLevel::O0 &&
 LangOpts.Sanitize.has(SanitizerKind::KCFI))
   MPM.addPass(createModuleToFunctionPassAdaptor(KCFIPass()));
@@ -662,8 +662,8 @@ static void addKCFIPass(const Triple &TargetTriple, const 
LangOptions &LangOpts,
 static void addSanitizers(const Triple &TargetTriple,
   const CodeGenOptions &CodeGenOpts,
   const LangOptions &LangOpts, PassBuilder &PB) {
-  auto SanitizersCallback = [&](ModulePassManager &MPM,
-OptimizationLevel Level) {
+  auto SanitizersCallback = [&](ModulePassManager &MPM, OptimizationLevel 
Level,
+ThinOrFullLTOPhase) {
 if (CodeGenOpts.hasSanitizeCoverage()) {
   auto SancovOpts = getSancovOptsFromCGOpts(CodeGenOpts);
   MPM.addPass(SanitizerCoveragePass(
@@ -749,7 +749,7 @@ static void addSanitizers(const Triple &TargetTriple,
 PB.registerOptimizerEarlyEPCallback(
 [SanitizersCallback](ModulePassManager &MPM, OptimizationLevel Level) {
   ModulePassManager NewMPM;
-  SanitizersCallback(NewMPM, Level);
+  SanitizersCallback(NewMPM, Level, ThinOrFullLTOPhase::None);
   if (!NewMPM.isEmpty()) {
 // Sanitizers can abandon.
 NewMPM.addPass(RequireAnalysisPass());
@@ -1018,11 +1018,12 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
 // TODO: Consider passing the MemoryProfileOutput to the pass builder via
 // the PGOOptions, and set this up there.
 if (!CodeGenOpts.MemoryProfileOutput.empty()) {
-  PB.registerOptimizerLastEPCallback(
-  [](ModulePassManager &MPM, OptimizationLevel Level) {
-MPM.addPass(createModuleToFunctionPassAdaptor(MemProfilerPass()));
-MPM.addPass(ModuleMemProfilerPass());
-  });
+  PB.registerOptimizerLastEPCallback([](ModulePassManager &MPM,
+OptimizationLevel Level,
+ThinOrFullLTOPhase) {
+MPM.addPass(createModuleToFunctionPassAdaptor(MemProfilerPass()));
+MPM.addPass(ModuleMemProfilerPass());
+  });
 }
 
 if (CodeGenOpts.FatLTO) {
diff --git a/llvm/include/llvm/Passes/PassBuilder.h 
b/llvm/include/llvm/Passes/PassBuilder.h
index e1d78a8685aed..ad3901902f784 100644
--- a/llvm/include/llvm/Passes/PassBuilder.h
+++ b/llvm/include/llvm/Passes/PassBuilder.h
@@ -246,8 +246,9 @@ class PassBuilder {
   /// optimization and code generation without any link-time optimization. It
   /// typically correspond to frontend "-O[123]" options for optimization
   /// levels \c O1, \c O2 and \c O3 resp.
-  ModulePassManager buildPerModuleDefaultPipeline(OptimizationLevel Level,
-  bool LTOPreLink = false);
+  ModulePassManager buildPerModuleDefaultPipeline(
+  OptimizationLevel Level,
+  ThinOrFullLTOPhase Phase = ThinOrFullLTOPhase::None);
 
   /// Build a fat object default optimization pipeline.
   ///
@@ -297,8 +298,9 @@ class PassBuilder {
   /// Build an O0 pipeline with the minimal semantica

[llvm-branch-commits] [clang] [llvm] [LLVM][PassBuilder] Extend the function signature of callback for optimizer pipeline extension point (PR #100953)

2024-07-31 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian updated 
https://github.com/llvm/llvm-project/pull/100953

>From 913f6a7cc866d133fe4c97e31fc03cfefb4f5eeb Mon Sep 17 00:00:00 2001
From: Shilei Tian 
Date: Sun, 28 Jul 2024 15:28:09 -0400
Subject: [PATCH] [LLVM][PassBuilder] Extend the function signature of callback
 for optimizer pipeline extension point

These callbacks can be invoked in multiple places when building an optimization
pipeline, both in compile time and link time. However, there is no indicator on
what pipeline it is currently building.

In this patch, an extra argument is added to indicate its (Thin)LTO stage such
that the callback can check it if needed. There is no test expected from this,
and the benefit of this change will be demonstrated in 
https://github.com/llvm/llvm-project/pull/66488.
---
 clang/lib/CodeGen/BackendUtil.cpp | 19 +-
 llvm/include/llvm/Passes/PassBuilder.h| 20 +++
 llvm/lib/Passes/PassBuilderPipelines.cpp  | 36 +--
 llvm/lib/Target/AMDGPU/AMDGPU.h   |  7 +++-
 llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp   | 11 +++---
 .../lib/Target/AMDGPU/AMDGPUTargetMachine.cpp | 15 
 llvm/tools/opt/NewPMDriver.cpp|  2 +-
 7 files changed, 64 insertions(+), 46 deletions(-)

diff --git a/clang/lib/CodeGen/BackendUtil.cpp 
b/clang/lib/CodeGen/BackendUtil.cpp
index e765bbf637a66..64f0020a170aa 100644
--- a/clang/lib/CodeGen/BackendUtil.cpp
+++ b/clang/lib/CodeGen/BackendUtil.cpp
@@ -643,7 +643,7 @@ static void addKCFIPass(const Triple &TargetTriple, const 
LangOptions &LangOpts,
 
   // Ensure we lower KCFI operand bundles with -O0.
   PB.registerOptimizerLastEPCallback(
-  [&](ModulePassManager &MPM, OptimizationLevel Level) {
+  [&](ModulePassManager &MPM, OptimizationLevel Level, ThinOrFullLTOPhase) 
{
 if (Level == OptimizationLevel::O0 &&
 LangOpts.Sanitize.has(SanitizerKind::KCFI))
   MPM.addPass(createModuleToFunctionPassAdaptor(KCFIPass()));
@@ -662,8 +662,8 @@ static void addKCFIPass(const Triple &TargetTriple, const 
LangOptions &LangOpts,
 static void addSanitizers(const Triple &TargetTriple,
   const CodeGenOptions &CodeGenOpts,
   const LangOptions &LangOpts, PassBuilder &PB) {
-  auto SanitizersCallback = [&](ModulePassManager &MPM,
-OptimizationLevel Level) {
+  auto SanitizersCallback = [&](ModulePassManager &MPM, OptimizationLevel 
Level,
+ThinOrFullLTOPhase) {
 if (CodeGenOpts.hasSanitizeCoverage()) {
   auto SancovOpts = getSancovOptsFromCGOpts(CodeGenOpts);
   MPM.addPass(SanitizerCoveragePass(
@@ -749,7 +749,7 @@ static void addSanitizers(const Triple &TargetTriple,
 PB.registerOptimizerEarlyEPCallback(
 [SanitizersCallback](ModulePassManager &MPM, OptimizationLevel Level) {
   ModulePassManager NewMPM;
-  SanitizersCallback(NewMPM, Level);
+  SanitizersCallback(NewMPM, Level, ThinOrFullLTOPhase::None);
   if (!NewMPM.isEmpty()) {
 // Sanitizers can abandon.
 NewMPM.addPass(RequireAnalysisPass());
@@ -1018,11 +1018,12 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
 // TODO: Consider passing the MemoryProfileOutput to the pass builder via
 // the PGOOptions, and set this up there.
 if (!CodeGenOpts.MemoryProfileOutput.empty()) {
-  PB.registerOptimizerLastEPCallback(
-  [](ModulePassManager &MPM, OptimizationLevel Level) {
-MPM.addPass(createModuleToFunctionPassAdaptor(MemProfilerPass()));
-MPM.addPass(ModuleMemProfilerPass());
-  });
+  PB.registerOptimizerLastEPCallback([](ModulePassManager &MPM,
+OptimizationLevel Level,
+ThinOrFullLTOPhase) {
+MPM.addPass(createModuleToFunctionPassAdaptor(MemProfilerPass()));
+MPM.addPass(ModuleMemProfilerPass());
+  });
 }
 
 if (CodeGenOpts.FatLTO) {
diff --git a/llvm/include/llvm/Passes/PassBuilder.h 
b/llvm/include/llvm/Passes/PassBuilder.h
index e1d78a8685aed..ad3901902f784 100644
--- a/llvm/include/llvm/Passes/PassBuilder.h
+++ b/llvm/include/llvm/Passes/PassBuilder.h
@@ -246,8 +246,9 @@ class PassBuilder {
   /// optimization and code generation without any link-time optimization. It
   /// typically correspond to frontend "-O[123]" options for optimization
   /// levels \c O1, \c O2 and \c O3 resp.
-  ModulePassManager buildPerModuleDefaultPipeline(OptimizationLevel Level,
-  bool LTOPreLink = false);
+  ModulePassManager buildPerModuleDefaultPipeline(
+  OptimizationLevel Level,
+  ThinOrFullLTOPhase Phase = ThinOrFullLTOPhase::None);
 
   /// Build a fat object default optimization pipeline.
   ///
@@ -297,8 +298,9 @@ class PassBuilder {
   /// Build an O0 pipeline with the minimal semantica

[llvm-branch-commits] [clang] [llvm] [Clang][OMPX] Add the code generation for multi-dim num_teams (PR #101407)

2024-07-31 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian ready_for_review 
https://github.com/llvm/llvm-project/pull/101407
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] [Clang][OMPX] Add the code generation for multi-dim num_teams (PR #101407)

2024-07-31 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian edited 
https://github.com/llvm/llvm-project/pull/101407
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] [Clang][OMPX] Add the code generation for multi-dim `num_teams` (PR #101407)

2024-07-31 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian edited 
https://github.com/llvm/llvm-project/pull/101407
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] [LLVM][PassBuilder] Extend the function signature of callback for optimizer pipeline extension point (PR #100953)

2024-08-01 Thread Shilei Tian via llvm-branch-commits

shiltian wrote:

ping

if it is preferred to split the AMDGPU related changes to another PR, I can do 
that.

https://github.com/llvm/llvm-project/pull/100953
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] [LLVM][PassBuilder] Extend the function signature of callback for optimizer pipeline extension point (PR #100953)

2024-08-01 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian updated 
https://github.com/llvm/llvm-project/pull/100953

>From 842d2229369b47a98a531ca29b80195d97a152d0 Mon Sep 17 00:00:00 2001
From: Shilei Tian 
Date: Sun, 28 Jul 2024 15:28:09 -0400
Subject: [PATCH] [LLVM][PassBuilder] Extend the function signature of callback
 for optimizer pipeline extension point

These callbacks can be invoked in multiple places when building an optimization
pipeline, both in compile time and link time. However, there is no indicator on
what pipeline it is currently building.

In this patch, an extra argument is added to indicate its (Thin)LTO stage such
that the callback can check it if needed. There is no test expected from this,
and the benefit of this change will be demonstrated in 
https://github.com/llvm/llvm-project/pull/66488.
---
 clang/lib/CodeGen/BackendUtil.cpp | 19 +-
 llvm/include/llvm/Passes/PassBuilder.h| 20 +++
 llvm/lib/Passes/PassBuilderPipelines.cpp  | 36 +--
 llvm/lib/Target/AMDGPU/AMDGPU.h   |  7 +++-
 llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp   | 11 +++---
 .../lib/Target/AMDGPU/AMDGPUTargetMachine.cpp | 15 
 llvm/tools/opt/NewPMDriver.cpp|  2 +-
 7 files changed, 64 insertions(+), 46 deletions(-)

diff --git a/clang/lib/CodeGen/BackendUtil.cpp 
b/clang/lib/CodeGen/BackendUtil.cpp
index e765bbf637a66..64f0020a170aa 100644
--- a/clang/lib/CodeGen/BackendUtil.cpp
+++ b/clang/lib/CodeGen/BackendUtil.cpp
@@ -643,7 +643,7 @@ static void addKCFIPass(const Triple &TargetTriple, const 
LangOptions &LangOpts,
 
   // Ensure we lower KCFI operand bundles with -O0.
   PB.registerOptimizerLastEPCallback(
-  [&](ModulePassManager &MPM, OptimizationLevel Level) {
+  [&](ModulePassManager &MPM, OptimizationLevel Level, ThinOrFullLTOPhase) 
{
 if (Level == OptimizationLevel::O0 &&
 LangOpts.Sanitize.has(SanitizerKind::KCFI))
   MPM.addPass(createModuleToFunctionPassAdaptor(KCFIPass()));
@@ -662,8 +662,8 @@ static void addKCFIPass(const Triple &TargetTriple, const 
LangOptions &LangOpts,
 static void addSanitizers(const Triple &TargetTriple,
   const CodeGenOptions &CodeGenOpts,
   const LangOptions &LangOpts, PassBuilder &PB) {
-  auto SanitizersCallback = [&](ModulePassManager &MPM,
-OptimizationLevel Level) {
+  auto SanitizersCallback = [&](ModulePassManager &MPM, OptimizationLevel 
Level,
+ThinOrFullLTOPhase) {
 if (CodeGenOpts.hasSanitizeCoverage()) {
   auto SancovOpts = getSancovOptsFromCGOpts(CodeGenOpts);
   MPM.addPass(SanitizerCoveragePass(
@@ -749,7 +749,7 @@ static void addSanitizers(const Triple &TargetTriple,
 PB.registerOptimizerEarlyEPCallback(
 [SanitizersCallback](ModulePassManager &MPM, OptimizationLevel Level) {
   ModulePassManager NewMPM;
-  SanitizersCallback(NewMPM, Level);
+  SanitizersCallback(NewMPM, Level, ThinOrFullLTOPhase::None);
   if (!NewMPM.isEmpty()) {
 // Sanitizers can abandon.
 NewMPM.addPass(RequireAnalysisPass());
@@ -1018,11 +1018,12 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
 // TODO: Consider passing the MemoryProfileOutput to the pass builder via
 // the PGOOptions, and set this up there.
 if (!CodeGenOpts.MemoryProfileOutput.empty()) {
-  PB.registerOptimizerLastEPCallback(
-  [](ModulePassManager &MPM, OptimizationLevel Level) {
-MPM.addPass(createModuleToFunctionPassAdaptor(MemProfilerPass()));
-MPM.addPass(ModuleMemProfilerPass());
-  });
+  PB.registerOptimizerLastEPCallback([](ModulePassManager &MPM,
+OptimizationLevel Level,
+ThinOrFullLTOPhase) {
+MPM.addPass(createModuleToFunctionPassAdaptor(MemProfilerPass()));
+MPM.addPass(ModuleMemProfilerPass());
+  });
 }
 
 if (CodeGenOpts.FatLTO) {
diff --git a/llvm/include/llvm/Passes/PassBuilder.h 
b/llvm/include/llvm/Passes/PassBuilder.h
index e1d78a8685aed..ad3901902f784 100644
--- a/llvm/include/llvm/Passes/PassBuilder.h
+++ b/llvm/include/llvm/Passes/PassBuilder.h
@@ -246,8 +246,9 @@ class PassBuilder {
   /// optimization and code generation without any link-time optimization. It
   /// typically correspond to frontend "-O[123]" options for optimization
   /// levels \c O1, \c O2 and \c O3 resp.
-  ModulePassManager buildPerModuleDefaultPipeline(OptimizationLevel Level,
-  bool LTOPreLink = false);
+  ModulePassManager buildPerModuleDefaultPipeline(
+  OptimizationLevel Level,
+  ThinOrFullLTOPhase Phase = ThinOrFullLTOPhase::None);
 
   /// Build a fat object default optimization pipeline.
   ///
@@ -297,8 +298,9 @@ class PassBuilder {
   /// Build an O0 pipeline with the minimal semantica

[llvm-branch-commits] [clang] [llvm] [Clang][OMPX] Add the code generation for multi-dim `num_teams` (PR #101407)

2024-08-01 Thread Shilei Tian via llvm-branch-commits


@@ -9576,6 +9576,20 @@ static void genMapInfo(const OMPExecutableDirective &D, 
CodeGenFunction &CGF,
 MappedVarSet, CombinedInfo);
   genMapInfo(MEHandler, CGF, CombinedInfo, OMPBuilder, MappedVarSet);
 }
+
+static void emitNumTeamsForBareTargetDirective(
+CodeGenFunction &CGF, const OMPExecutableDirective &D,
+llvm::SmallVectorImpl &NumTeams) {

shiltian wrote:

Nah, I need to call `push_back` or something similar in the function.

https://github.com/llvm/llvm-project/pull/101407
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [NFC][AMDGPU] Reformat code for creating AA (PR #101591)

2024-08-01 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian created 
https://github.com/llvm/llvm-project/pull/101591

None

>From 806563cbb89fea64b9c289ad39a4520ce72f0ebc Mon Sep 17 00:00:00 2001
From: Shilei Tian 
Date: Thu, 1 Aug 2024 20:24:49 -0400
Subject: [PATCH] [NFC][AMDGPU] Reformat code for creating AA

---
 llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp | 23 +++--
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
index 9d3c9e1e2ef9f..de1f3421cce4e 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
@@ -1051,17 +1051,18 @@ static bool runImpl(Module &M, AnalysisGetter &AG, 
TargetMachine &TM) {
   Attributor A(Functions, InfoCache, AC);
 
   for (Function &F : M) {
-if (!F.isIntrinsic()) {
-  A.getOrCreateAAFor(IRPosition::function(F));
-  A.getOrCreateAAFor(IRPosition::function(F));
-  A.getOrCreateAAFor(IRPosition::function(F));
-  CallingConv::ID CC = F.getCallingConv();
-  if (!AMDGPU::isEntryFunctionCC(CC)) {
-A.getOrCreateAAFor(IRPosition::function(F));
-A.getOrCreateAAFor(IRPosition::function(F));
-  } else if (CC == CallingConv::AMDGPU_KERNEL) {
-addPreloadKernArgHint(F, TM);
-  }
+if (F.isIntrinsic())
+  continue;
+
+A.getOrCreateAAFor(IRPosition::function(F));
+A.getOrCreateAAFor(IRPosition::function(F));
+A.getOrCreateAAFor(IRPosition::function(F));
+CallingConv::ID CC = F.getCallingConv();
+if (!AMDGPU::isEntryFunctionCC(CC)) {
+  A.getOrCreateAAFor(IRPosition::function(F));
+  A.getOrCreateAAFor(IRPosition::function(F));
+} else if (CC == CallingConv::AMDGPU_KERNEL) {
+  addPreloadKernArgHint(F, TM);
 }
   }
 

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [NFC][AMDGPU] Reformat code for creating AA (PR #101591)

2024-08-01 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian ready_for_review 
https://github.com/llvm/llvm-project/pull/101591
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [NFC][AMDGPU] Reformat code for creating AA (PR #101591)

2024-08-01 Thread Shilei Tian via llvm-branch-commits

shiltian wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/101591?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#101591** https://app.graphite.dev/github/pr/llvm/llvm-project/101591?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈
* **#101589** https://app.graphite.dev/github/pr/llvm/llvm-project/101589?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`

This stack of pull requests is managed by Graphite. https://stacking.dev/?utm_source=stack-comment";>Learn more about 
stacking.


 Join @shiltian and the rest of your teammates on https://graphite.dev?utm-source=stack-comment";>https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="11px" height="11px"/> Graphite
  

https://github.com/llvm/llvm-project/pull/101591
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [WIP][AMDGPU] Enable `AAAddressSpace` in `AMDGPUAttributor` (PR #101593)

2024-08-01 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian created 
https://github.com/llvm/llvm-project/pull/101593

None

>From 5dffd995b71395656b26977d019385a9d0a88533 Mon Sep 17 00:00:00 2001
From: Shilei Tian 
Date: Thu, 1 Aug 2024 20:30:07 -0400
Subject: [PATCH] [WIP][AMDGPU] Enable `AAAddressSpace` in `AMDGPUAttributor`

---
 llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp   | 13 ++-
 .../AMDGPU/annotate-kernel-features-hsa.ll| 36 +++
 .../CodeGen/AMDGPU/simple-indirect-call.ll| 15 
 3 files changed, 43 insertions(+), 21 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
index de1f3421cce4e..39c52140dfbd2 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
@@ -1038,7 +1038,7 @@ static bool runImpl(Module &M, AnalysisGetter &AG, 
TargetMachine &TM) {
&AAPotentialValues::ID, &AAAMDFlatWorkGroupSize::ID,
&AAAMDWavesPerEU::ID, &AAAMDGPUNoAGPR::ID, &AACallEdges::ID,
&AAPointerInfo::ID, &AAPotentialConstantValues::ID,
-   &AAUnderlyingObjects::ID});
+   &AAUnderlyingObjects::ID, &AAAddressSpace::ID});
 
   AttributorConfig AC(CGUpdater);
   AC.Allowed = &Allowed;
@@ -1064,6 +1064,17 @@ static bool runImpl(Module &M, AnalysisGetter &AG, 
TargetMachine &TM) {
 } else if (CC == CallingConv::AMDGPU_KERNEL) {
   addPreloadKernArgHint(F, TM);
 }
+
+for (auto &I : instructions(F)) {
+  if (auto *LI = dyn_cast(&I)) {
+A.getOrCreateAAFor(
+IRPosition::value(*LI->getPointerOperand()));
+  }
+  if (auto *SI = dyn_cast(&I)) {
+A.getOrCreateAAFor(
+IRPosition::value(*SI->getPointerOperand()));
+  }
+}
   }
 
   ChangeStatus Change = A.run();
diff --git a/llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa.ll 
b/llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa.ll
index 43cdf85ed3818..879bceaef97c0 100644
--- a/llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa.ll
+++ b/llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa.ll
@@ -425,8 +425,7 @@ define amdgpu_kernel void 
@use_group_to_flat_addrspacecast(ptr addrspace(3) %ptr
 ;
 ; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@use_group_to_flat_addrspacecast
 ; ATTRIBUTOR_HSA-SAME: (ptr addrspace(3) [[PTR:%.*]]) #[[ATTR12:[0-9]+]] {
-; ATTRIBUTOR_HSA-NEXT:[[STOF:%.*]] = addrspacecast ptr addrspace(3) 
[[PTR]] to ptr
-; ATTRIBUTOR_HSA-NEXT:store volatile i32 0, ptr [[STOF]], align 4
+; ATTRIBUTOR_HSA-NEXT:store volatile i32 0, ptr addrspace(3) [[PTR]], 
align 4
 ; ATTRIBUTOR_HSA-NEXT:ret void
 ;
   %stof = addrspacecast ptr addrspace(3) %ptr to ptr
@@ -443,8 +442,7 @@ define amdgpu_kernel void 
@use_private_to_flat_addrspacecast(ptr addrspace(5) %p
 ;
 ; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@use_private_to_flat_addrspacecast
 ; ATTRIBUTOR_HSA-SAME: (ptr addrspace(5) [[PTR:%.*]]) #[[ATTR12]] {
-; ATTRIBUTOR_HSA-NEXT:[[STOF:%.*]] = addrspacecast ptr addrspace(5) 
[[PTR]] to ptr
-; ATTRIBUTOR_HSA-NEXT:store volatile i32 0, ptr [[STOF]], align 4
+; ATTRIBUTOR_HSA-NEXT:store volatile i32 0, ptr addrspace(5) [[PTR]], 
align 4
 ; ATTRIBUTOR_HSA-NEXT:ret void
 ;
   %stof = addrspacecast ptr addrspace(5) %ptr to ptr
@@ -478,11 +476,16 @@ define amdgpu_kernel void 
@use_flat_to_private_addrspacecast(ptr %ptr) #1 {
 
 ; No-op addrspacecast should not use queue ptr
 define amdgpu_kernel void @use_global_to_flat_addrspacecast(ptr addrspace(1) 
%ptr) #1 {
-; HSA-LABEL: define {{[^@]+}}@use_global_to_flat_addrspacecast
-; HSA-SAME: (ptr addrspace(1) [[PTR:%.*]]) #[[ATTR1]] {
-; HSA-NEXT:[[STOF:%.*]] = addrspacecast ptr addrspace(1) [[PTR]] to ptr
-; HSA-NEXT:store volatile i32 0, ptr [[STOF]], align 4
-; HSA-NEXT:ret void
+; AKF_HSA-LABEL: define {{[^@]+}}@use_global_to_flat_addrspacecast
+; AKF_HSA-SAME: (ptr addrspace(1) [[PTR:%.*]]) #[[ATTR1]] {
+; AKF_HSA-NEXT:[[STOF:%.*]] = addrspacecast ptr addrspace(1) [[PTR]] to ptr
+; AKF_HSA-NEXT:store volatile i32 0, ptr [[STOF]], align 4
+; AKF_HSA-NEXT:ret void
+;
+; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@use_global_to_flat_addrspacecast
+; ATTRIBUTOR_HSA-SAME: (ptr addrspace(1) [[PTR:%.*]]) #[[ATTR1]] {
+; ATTRIBUTOR_HSA-NEXT:store volatile i32 0, ptr addrspace(1) [[PTR]], 
align 4
+; ATTRIBUTOR_HSA-NEXT:ret void
 ;
   %stof = addrspacecast ptr addrspace(1) %ptr to ptr
   store volatile i32 0, ptr %stof
@@ -490,11 +493,16 @@ define amdgpu_kernel void 
@use_global_to_flat_addrspacecast(ptr addrspace(1) %pt
 }
 
 define amdgpu_kernel void @use_constant_to_flat_addrspacecast(ptr addrspace(4) 
%ptr) #1 {
-; HSA-LABEL: define {{[^@]+}}@use_constant_to_flat_addrspacecast
-; HSA-SAME: (ptr addrspace(4) [[PTR:%.*]]) #[[ATTR1]] {
-; HSA-NEXT:[[STOF:%.*]] = addrspacecast ptr addrspace(4) [[PTR]] to ptr
-; HSA-NEXT:[[LD:%.*]] = load volatile i32, ptr [[STOF]], align 4
-; HSA-NEXT:ret void
+; AKF_HSA-LABEL: define {{[^@]+}}@use_constant_to_flat

[llvm-branch-commits] [llvm] [WIP][AMDGPU] Enable `AAAddressSpace` in `AMDGPUAttributor` (PR #101593)

2024-08-01 Thread Shilei Tian via llvm-branch-commits

shiltian wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/101593?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#101593** https://app.graphite.dev/github/pr/llvm/llvm-project/101593?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈
* **#101591** https://app.graphite.dev/github/pr/llvm/llvm-project/101591?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#101589** https://app.graphite.dev/github/pr/llvm/llvm-project/101589?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`

This stack of pull requests is managed by Graphite. https://stacking.dev/?utm_source=stack-comment";>Learn more about 
stacking.


 Join @shiltian and the rest of your teammates on https://graphite.dev?utm-source=stack-comment";>https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="11px" height="11px"/> Graphite
  

https://github.com/llvm/llvm-project/pull/101593
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [WIP][AMDGPU] Enable `AAAddressSpace` in `AMDGPUAttributor` (PR #101593)

2024-08-01 Thread Shilei Tian via llvm-branch-commits


@@ -1064,6 +1064,17 @@ static bool runImpl(Module &M, AnalysisGetter &AG, 
TargetMachine &TM) {
 } else if (CC == CallingConv::AMDGPU_KERNEL) {
   addPreloadKernArgHint(F, TM);
 }
+
+for (auto &I : instructions(F)) {
+  if (auto *LI = dyn_cast(&I)) {

shiltian wrote:

`AAAddressSpace` for now only supports load and store instructions. Will add 
more later.

https://github.com/llvm/llvm-project/pull/101593
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [WIP][AMDGPU] Enable `AAAddressSpace` in `AMDGPUAttributor` (PR #101593)

2024-08-01 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian updated 
https://github.com/llvm/llvm-project/pull/101593

>From 9b743f5bd577be07c858b1357da8d39264bc34db Mon Sep 17 00:00:00 2001
From: Shilei Tian 
Date: Thu, 1 Aug 2024 20:30:07 -0400
Subject: [PATCH] [WIP][AMDGPU] Enable `AAAddressSpace` in `AMDGPUAttributor`

---
 llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp   | 13 ++-
 .../AMDGPU/annotate-kernel-features-hsa.ll| 36 +++
 .../CodeGen/AMDGPU/simple-indirect-call.ll| 15 
 3 files changed, 43 insertions(+), 21 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
index de1f3421cce4e..39c52140dfbd2 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
@@ -1038,7 +1038,7 @@ static bool runImpl(Module &M, AnalysisGetter &AG, 
TargetMachine &TM) {
&AAPotentialValues::ID, &AAAMDFlatWorkGroupSize::ID,
&AAAMDWavesPerEU::ID, &AAAMDGPUNoAGPR::ID, &AACallEdges::ID,
&AAPointerInfo::ID, &AAPotentialConstantValues::ID,
-   &AAUnderlyingObjects::ID});
+   &AAUnderlyingObjects::ID, &AAAddressSpace::ID});
 
   AttributorConfig AC(CGUpdater);
   AC.Allowed = &Allowed;
@@ -1064,6 +1064,17 @@ static bool runImpl(Module &M, AnalysisGetter &AG, 
TargetMachine &TM) {
 } else if (CC == CallingConv::AMDGPU_KERNEL) {
   addPreloadKernArgHint(F, TM);
 }
+
+for (auto &I : instructions(F)) {
+  if (auto *LI = dyn_cast(&I)) {
+A.getOrCreateAAFor(
+IRPosition::value(*LI->getPointerOperand()));
+  }
+  if (auto *SI = dyn_cast(&I)) {
+A.getOrCreateAAFor(
+IRPosition::value(*SI->getPointerOperand()));
+  }
+}
   }
 
   ChangeStatus Change = A.run();
diff --git a/llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa.ll 
b/llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa.ll
index 43cdf85ed3818..879bceaef97c0 100644
--- a/llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa.ll
+++ b/llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa.ll
@@ -425,8 +425,7 @@ define amdgpu_kernel void 
@use_group_to_flat_addrspacecast(ptr addrspace(3) %ptr
 ;
 ; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@use_group_to_flat_addrspacecast
 ; ATTRIBUTOR_HSA-SAME: (ptr addrspace(3) [[PTR:%.*]]) #[[ATTR12:[0-9]+]] {
-; ATTRIBUTOR_HSA-NEXT:[[STOF:%.*]] = addrspacecast ptr addrspace(3) 
[[PTR]] to ptr
-; ATTRIBUTOR_HSA-NEXT:store volatile i32 0, ptr [[STOF]], align 4
+; ATTRIBUTOR_HSA-NEXT:store volatile i32 0, ptr addrspace(3) [[PTR]], 
align 4
 ; ATTRIBUTOR_HSA-NEXT:ret void
 ;
   %stof = addrspacecast ptr addrspace(3) %ptr to ptr
@@ -443,8 +442,7 @@ define amdgpu_kernel void 
@use_private_to_flat_addrspacecast(ptr addrspace(5) %p
 ;
 ; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@use_private_to_flat_addrspacecast
 ; ATTRIBUTOR_HSA-SAME: (ptr addrspace(5) [[PTR:%.*]]) #[[ATTR12]] {
-; ATTRIBUTOR_HSA-NEXT:[[STOF:%.*]] = addrspacecast ptr addrspace(5) 
[[PTR]] to ptr
-; ATTRIBUTOR_HSA-NEXT:store volatile i32 0, ptr [[STOF]], align 4
+; ATTRIBUTOR_HSA-NEXT:store volatile i32 0, ptr addrspace(5) [[PTR]], 
align 4
 ; ATTRIBUTOR_HSA-NEXT:ret void
 ;
   %stof = addrspacecast ptr addrspace(5) %ptr to ptr
@@ -478,11 +476,16 @@ define amdgpu_kernel void 
@use_flat_to_private_addrspacecast(ptr %ptr) #1 {
 
 ; No-op addrspacecast should not use queue ptr
 define amdgpu_kernel void @use_global_to_flat_addrspacecast(ptr addrspace(1) 
%ptr) #1 {
-; HSA-LABEL: define {{[^@]+}}@use_global_to_flat_addrspacecast
-; HSA-SAME: (ptr addrspace(1) [[PTR:%.*]]) #[[ATTR1]] {
-; HSA-NEXT:[[STOF:%.*]] = addrspacecast ptr addrspace(1) [[PTR]] to ptr
-; HSA-NEXT:store volatile i32 0, ptr [[STOF]], align 4
-; HSA-NEXT:ret void
+; AKF_HSA-LABEL: define {{[^@]+}}@use_global_to_flat_addrspacecast
+; AKF_HSA-SAME: (ptr addrspace(1) [[PTR:%.*]]) #[[ATTR1]] {
+; AKF_HSA-NEXT:[[STOF:%.*]] = addrspacecast ptr addrspace(1) [[PTR]] to ptr
+; AKF_HSA-NEXT:store volatile i32 0, ptr [[STOF]], align 4
+; AKF_HSA-NEXT:ret void
+;
+; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@use_global_to_flat_addrspacecast
+; ATTRIBUTOR_HSA-SAME: (ptr addrspace(1) [[PTR:%.*]]) #[[ATTR1]] {
+; ATTRIBUTOR_HSA-NEXT:store volatile i32 0, ptr addrspace(1) [[PTR]], 
align 4
+; ATTRIBUTOR_HSA-NEXT:ret void
 ;
   %stof = addrspacecast ptr addrspace(1) %ptr to ptr
   store volatile i32 0, ptr %stof
@@ -490,11 +493,16 @@ define amdgpu_kernel void 
@use_global_to_flat_addrspacecast(ptr addrspace(1) %pt
 }
 
 define amdgpu_kernel void @use_constant_to_flat_addrspacecast(ptr addrspace(4) 
%ptr) #1 {
-; HSA-LABEL: define {{[^@]+}}@use_constant_to_flat_addrspacecast
-; HSA-SAME: (ptr addrspace(4) [[PTR:%.*]]) #[[ATTR1]] {
-; HSA-NEXT:[[STOF:%.*]] = addrspacecast ptr addrspace(4) [[PTR]] to ptr
-; HSA-NEXT:[[LD:%.*]] = load volatile i32, ptr [[STOF]], align 4
-; HSA-NEXT:ret void
+; AKF_HSA-LABEL: define {{[^@]+}}@use_constant_to_flat_addrs

[llvm-branch-commits] [clang] [llvm] [LLVM][PassBuilder] Extend the function signature of callback for optimizer pipeline extension point (PR #100953)

2024-08-02 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian updated 
https://github.com/llvm/llvm-project/pull/100953

>From 225eca0dc689b2764acc23442d28dee57cd388d1 Mon Sep 17 00:00:00 2001
From: Shilei Tian 
Date: Sun, 28 Jul 2024 15:28:09 -0400
Subject: [PATCH] [LLVM][PassBuilder] Extend the function signature of callback
 for optimizer pipeline extension point

These callbacks can be invoked in multiple places when building an optimization
pipeline, both in compile time and link time. However, there is no indicator on
what pipeline it is currently building.

In this patch, an extra argument is added to indicate its (Thin)LTO stage such
that the callback can check it if needed. There is no test expected from this,
and the benefit of this change will be demonstrated in 
https://github.com/llvm/llvm-project/pull/66488.
---
 clang/lib/CodeGen/BackendUtil.cpp | 19 +-
 llvm/include/llvm/Passes/PassBuilder.h| 20 +++
 llvm/lib/Passes/PassBuilderPipelines.cpp  | 36 +--
 llvm/lib/Target/AMDGPU/AMDGPU.h   |  7 +++-
 llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp   | 11 +++---
 .../lib/Target/AMDGPU/AMDGPUTargetMachine.cpp | 15 
 llvm/tools/opt/NewPMDriver.cpp|  2 +-
 7 files changed, 64 insertions(+), 46 deletions(-)

diff --git a/clang/lib/CodeGen/BackendUtil.cpp 
b/clang/lib/CodeGen/BackendUtil.cpp
index e765bbf637a66..64f0020a170aa 100644
--- a/clang/lib/CodeGen/BackendUtil.cpp
+++ b/clang/lib/CodeGen/BackendUtil.cpp
@@ -643,7 +643,7 @@ static void addKCFIPass(const Triple &TargetTriple, const 
LangOptions &LangOpts,
 
   // Ensure we lower KCFI operand bundles with -O0.
   PB.registerOptimizerLastEPCallback(
-  [&](ModulePassManager &MPM, OptimizationLevel Level) {
+  [&](ModulePassManager &MPM, OptimizationLevel Level, ThinOrFullLTOPhase) 
{
 if (Level == OptimizationLevel::O0 &&
 LangOpts.Sanitize.has(SanitizerKind::KCFI))
   MPM.addPass(createModuleToFunctionPassAdaptor(KCFIPass()));
@@ -662,8 +662,8 @@ static void addKCFIPass(const Triple &TargetTriple, const 
LangOptions &LangOpts,
 static void addSanitizers(const Triple &TargetTriple,
   const CodeGenOptions &CodeGenOpts,
   const LangOptions &LangOpts, PassBuilder &PB) {
-  auto SanitizersCallback = [&](ModulePassManager &MPM,
-OptimizationLevel Level) {
+  auto SanitizersCallback = [&](ModulePassManager &MPM, OptimizationLevel 
Level,
+ThinOrFullLTOPhase) {
 if (CodeGenOpts.hasSanitizeCoverage()) {
   auto SancovOpts = getSancovOptsFromCGOpts(CodeGenOpts);
   MPM.addPass(SanitizerCoveragePass(
@@ -749,7 +749,7 @@ static void addSanitizers(const Triple &TargetTriple,
 PB.registerOptimizerEarlyEPCallback(
 [SanitizersCallback](ModulePassManager &MPM, OptimizationLevel Level) {
   ModulePassManager NewMPM;
-  SanitizersCallback(NewMPM, Level);
+  SanitizersCallback(NewMPM, Level, ThinOrFullLTOPhase::None);
   if (!NewMPM.isEmpty()) {
 // Sanitizers can abandon.
 NewMPM.addPass(RequireAnalysisPass());
@@ -1018,11 +1018,12 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
 // TODO: Consider passing the MemoryProfileOutput to the pass builder via
 // the PGOOptions, and set this up there.
 if (!CodeGenOpts.MemoryProfileOutput.empty()) {
-  PB.registerOptimizerLastEPCallback(
-  [](ModulePassManager &MPM, OptimizationLevel Level) {
-MPM.addPass(createModuleToFunctionPassAdaptor(MemProfilerPass()));
-MPM.addPass(ModuleMemProfilerPass());
-  });
+  PB.registerOptimizerLastEPCallback([](ModulePassManager &MPM,
+OptimizationLevel Level,
+ThinOrFullLTOPhase) {
+MPM.addPass(createModuleToFunctionPassAdaptor(MemProfilerPass()));
+MPM.addPass(ModuleMemProfilerPass());
+  });
 }
 
 if (CodeGenOpts.FatLTO) {
diff --git a/llvm/include/llvm/Passes/PassBuilder.h 
b/llvm/include/llvm/Passes/PassBuilder.h
index e1d78a8685aed..ad3901902f784 100644
--- a/llvm/include/llvm/Passes/PassBuilder.h
+++ b/llvm/include/llvm/Passes/PassBuilder.h
@@ -246,8 +246,9 @@ class PassBuilder {
   /// optimization and code generation without any link-time optimization. It
   /// typically correspond to frontend "-O[123]" options for optimization
   /// levels \c O1, \c O2 and \c O3 resp.
-  ModulePassManager buildPerModuleDefaultPipeline(OptimizationLevel Level,
-  bool LTOPreLink = false);
+  ModulePassManager buildPerModuleDefaultPipeline(
+  OptimizationLevel Level,
+  ThinOrFullLTOPhase Phase = ThinOrFullLTOPhase::None);
 
   /// Build a fat object default optimization pipeline.
   ///
@@ -297,8 +298,9 @@ class PassBuilder {
   /// Build an O0 pipeline with the minimal semantica

[llvm-branch-commits] [llvm] [AMDGPU][Attributor] Add a pass parameter `closed-world` for AMDGPUAttributor pass (PR #101760)

2024-08-02 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian created 
https://github.com/llvm/llvm-project/pull/101760

None

>From 54f85728f224b262b9d85d567e77f64e0c625832 Mon Sep 17 00:00:00 2001
From: Shilei Tian 
Date: Fri, 2 Aug 2024 18:05:44 -0400
Subject: [PATCH] [AMDGPU][Attributor] Add a pass parameter `closed-world` for
 AMDGPUAttributor pass

---
 llvm/lib/Target/AMDGPU/AMDGPU.h   | 11 +--
 llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp   | 11 ---
 llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def | 12 +++-
 .../lib/Target/AMDGPU/AMDGPUTargetMachine.cpp | 29 +--
 .../CodeGen/AMDGPU/simple-indirect-call-2.ll  |  2 +-
 5 files changed, 51 insertions(+), 14 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.h b/llvm/lib/Target/AMDGPU/AMDGPU.h
index 50aef36724f70..d8ed1d9db00e5 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.h
@@ -283,17 +283,22 @@ class AMDGPULowerKernelArgumentsPass
   PreservedAnalyses run(Function &, FunctionAnalysisManager &);
 };
 
+struct AMDGPUAttributorOptions {
+  bool IsClosedWorld = false;
+};
+
 class AMDGPUAttributorPass : public PassInfoMixin {
 private:
   TargetMachine &TM;
 
+  AMDGPUAttributorOptions Options;
+
   /// Asserts whether we can assume whole program visibility.
   bool HasWholeProgramVisibility = false;
 
 public:
-  AMDGPUAttributorPass(TargetMachine &TM,
-   bool HasWholeProgramVisibility = false)
-  : TM(TM), HasWholeProgramVisibility(HasWholeProgramVisibility) {};
+  AMDGPUAttributorPass(TargetMachine &TM, AMDGPUAttributorOptions Options = {})
+  : TM(TM), Options(Options) {};
   PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM);
 };
 
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
index 576494b1a564d..357ba0fadbcf1 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
@@ -1025,7 +1025,7 @@ static void addPreloadKernArgHint(Function &F, 
TargetMachine &TM) {
 }
 
 static bool runImpl(Module &M, AnalysisGetter &AG, TargetMachine &TM,
-bool HasWholeProgramVisibility) {
+AMDGPUAttributorOptions Options) {
   SetVector Functions;
   for (Function &F : M) {
 if (!F.isIntrinsic())
@@ -1043,7 +1043,7 @@ static bool runImpl(Module &M, AnalysisGetter &AG, 
TargetMachine &TM,
&AAUnderlyingObjects::ID, &AAIndirectCallInfo::ID, 
&AAInstanceInfo::ID});
 
   AttributorConfig AC(CGUpdater);
-  AC.IsClosedWorldModule = HasWholeProgramVisibility;
+  AC.IsClosedWorldModule = Options.IsClosedWorld;
   AC.Allowed = &Allowed;
   AC.IsModulePass = true;
   AC.DefaultInitializeLiveInternals = false;
@@ -1102,7 +1102,7 @@ class AMDGPUAttributorLegacy : public ModulePass {
 
   bool runOnModule(Module &M) override {
 AnalysisGetter AG(this);
-return runImpl(M, AG, *TM, /*HasWholeProgramVisibility=*/false);
+return runImpl(M, AG, *TM, /*Options=*/{});
   }
 
   void getAnalysisUsage(AnalysisUsage &AU) const override {
@@ -1123,9 +1123,8 @@ PreservedAnalyses llvm::AMDGPUAttributorPass::run(Module 
&M,
   AnalysisGetter AG(FAM);
 
   // TODO: Probably preserves CFG
-  return runImpl(M, AG, TM, HasWholeProgramVisibility)
- ? PreservedAnalyses::none()
- : PreservedAnalyses::all();
+  return runImpl(M, AG, TM, Options) ? PreservedAnalyses::none()
+ : PreservedAnalyses::all();
 }
 
 char AMDGPUAttributorLegacy::ID = 0;
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def 
b/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
index 57fc3314dd970..0adf11d27a2f5 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
+++ b/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
@@ -17,7 +17,6 @@
 #define MODULE_PASS(NAME, CREATE_PASS)
 #endif
 MODULE_PASS("amdgpu-always-inline", AMDGPUAlwaysInlinePass())
-MODULE_PASS("amdgpu-attributor", AMDGPUAttributorPass(*this))
 MODULE_PASS("amdgpu-lower-buffer-fat-pointers",
 AMDGPULowerBufferFatPointersPass(*this))
 MODULE_PASS("amdgpu-lower-ctor-dtor", AMDGPUCtorDtorLoweringPass())
@@ -26,6 +25,17 @@ MODULE_PASS("amdgpu-printf-runtime-binding", 
AMDGPUPrintfRuntimeBindingPass())
 MODULE_PASS("amdgpu-unify-metadata", AMDGPUUnifyMetadataPass())
 #undef MODULE_PASS
 
+#ifndef MODULE_PASS_WITH_PARAMS
+#define MODULE_PASS_WITH_PARAMS(NAME, CLASS, CREATE_PASS, PARSER, PARAMS)
+#endif
+MODULE_PASS_WITH_PARAMS(
+"amdgpu-attributor", "AMDGPUAttributorPass",
+[=](AMDGPUAttributorOptions Options) {
+  return AMDGPUAttributorPass(*this, Options);
+},
+parseAMDGPUAttributorPassOptions, "closed-world")
+#undef MODULE_PASS_WITH_PARAMS
+
 #ifndef FUNCTION_PASS
 #define FUNCTION_PASS(NAME, CREATE_PASS)
 #endif
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
index 50cc2d871d4ec..700408cd55e62 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+++ b/llvm/lib/T

[llvm-branch-commits] [llvm] [AMDGPU][Attributor] Add a pass parameter `closed-world` for AMDGPUAttributor pass (PR #101760)

2024-08-02 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian ready_for_review 
https://github.com/llvm/llvm-project/pull/101760
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][Attributor] Add a pass parameter `closed-world` for AMDGPUAttributor pass (PR #101760)

2024-08-02 Thread Shilei Tian via llvm-branch-commits

shiltian wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/101760?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#101760** https://app.graphite.dev/github/pr/llvm/llvm-project/101760?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈
* **#100953** https://app.graphite.dev/github/pr/llvm/llvm-project/100953?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>: 1 other dependent PR 
([#100954](https://github.com/llvm/llvm-project/pull/100954) https://app.graphite.dev/github/pr/llvm/llvm-project/100954?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>)
* **#100952** https://app.graphite.dev/github/pr/llvm/llvm-project/100952?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`

This stack of pull requests is managed by Graphite. https://stacking.dev/?utm_source=stack-comment";>Learn more about 
stacking.


 Join @shiltian and the rest of your teammates on https://graphite.dev?utm-source=stack-comment";>https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="11px" height="11px"/> Graphite
  

https://github.com/llvm/llvm-project/pull/101760
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] [WIP][Offload] Add runtime support for multi-dim `num_teams` (PR #101723)

2024-08-06 Thread Shilei Tian via llvm-branch-commits

shiltian wrote:

This will be closed for now. It will be easier to make runtime changes for 
thread block size and grid size in one PR.

https://github.com/llvm/llvm-project/pull/101723
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] [WIP][Offload] Add runtime support for multi-dim `num_teams` (PR #101723)

2024-08-06 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian closed 
https://github.com/llvm/llvm-project/pull/101723
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][Attributor] Add a pass parameter `closed-world` for AMDGPUAttributor pass (PR #101760)

2024-08-06 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian updated 
https://github.com/llvm/llvm-project/pull/101760

>From f9e990a43908efc2e155c95f3cd4ddadefc4d6a1 Mon Sep 17 00:00:00 2001
From: Shilei Tian 
Date: Fri, 2 Aug 2024 18:05:44 -0400
Subject: [PATCH] [AMDGPU][Attributor] Add a pass parameter `closed-world` for
 AMDGPUAttributor pass

---
 llvm/lib/Target/AMDGPU/AMDGPU.h   | 11 +--
 llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp   | 11 ---
 llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def | 12 +++-
 .../lib/Target/AMDGPU/AMDGPUTargetMachine.cpp | 29 +--
 .../CodeGen/AMDGPU/simple-indirect-call-2.ll  |  2 +-
 .../Other/amdgpu-pass-pipeline-parsing.ll | 12 
 6 files changed, 63 insertions(+), 14 deletions(-)
 create mode 100644 llvm/test/Other/amdgpu-pass-pipeline-parsing.ll

diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.h b/llvm/lib/Target/AMDGPU/AMDGPU.h
index 50aef36724f705..d8ed1d9db00e59 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.h
@@ -283,17 +283,22 @@ class AMDGPULowerKernelArgumentsPass
   PreservedAnalyses run(Function &, FunctionAnalysisManager &);
 };
 
+struct AMDGPUAttributorOptions {
+  bool IsClosedWorld = false;
+};
+
 class AMDGPUAttributorPass : public PassInfoMixin {
 private:
   TargetMachine &TM;
 
+  AMDGPUAttributorOptions Options;
+
   /// Asserts whether we can assume whole program visibility.
   bool HasWholeProgramVisibility = false;
 
 public:
-  AMDGPUAttributorPass(TargetMachine &TM,
-   bool HasWholeProgramVisibility = false)
-  : TM(TM), HasWholeProgramVisibility(HasWholeProgramVisibility) {};
+  AMDGPUAttributorPass(TargetMachine &TM, AMDGPUAttributorOptions Options = {})
+  : TM(TM), Options(Options) {};
   PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM);
 };
 
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
index 9557005721cb15..d65e0ae92308e6 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
@@ -1025,7 +1025,7 @@ static void addPreloadKernArgHint(Function &F, 
TargetMachine &TM) {
 }
 
 static bool runImpl(Module &M, AnalysisGetter &AG, TargetMachine &TM,
-bool HasWholeProgramVisibility) {
+AMDGPUAttributorOptions Options) {
   SetVector Functions;
   for (Function &F : M) {
 if (!F.isIntrinsic())
@@ -1044,7 +1044,7 @@ static bool runImpl(Module &M, AnalysisGetter &AG, 
TargetMachine &TM,
&AAInstanceInfo::ID});
 
   AttributorConfig AC(CGUpdater);
-  AC.IsClosedWorldModule = HasWholeProgramVisibility;
+  AC.IsClosedWorldModule = Options.IsClosedWorld;
   AC.Allowed = &Allowed;
   AC.IsModulePass = true;
   AC.DefaultInitializeLiveInternals = false;
@@ -1114,7 +1114,7 @@ class AMDGPUAttributorLegacy : public ModulePass {
 
   bool runOnModule(Module &M) override {
 AnalysisGetter AG(this);
-return runImpl(M, AG, *TM, /*HasWholeProgramVisibility=*/false);
+return runImpl(M, AG, *TM, /*Options=*/{});
   }
 
   void getAnalysisUsage(AnalysisUsage &AU) const override {
@@ -1135,9 +1135,8 @@ PreservedAnalyses llvm::AMDGPUAttributorPass::run(Module 
&M,
   AnalysisGetter AG(FAM);
 
   // TODO: Probably preserves CFG
-  return runImpl(M, AG, TM, HasWholeProgramVisibility)
- ? PreservedAnalyses::none()
- : PreservedAnalyses::all();
+  return runImpl(M, AG, TM, Options) ? PreservedAnalyses::none()
+ : PreservedAnalyses::all();
 }
 
 char AMDGPUAttributorLegacy::ID = 0;
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def 
b/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
index 57fc3314dd9709..0adf11d27a2f54 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
+++ b/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
@@ -17,7 +17,6 @@
 #define MODULE_PASS(NAME, CREATE_PASS)
 #endif
 MODULE_PASS("amdgpu-always-inline", AMDGPUAlwaysInlinePass())
-MODULE_PASS("amdgpu-attributor", AMDGPUAttributorPass(*this))
 MODULE_PASS("amdgpu-lower-buffer-fat-pointers",
 AMDGPULowerBufferFatPointersPass(*this))
 MODULE_PASS("amdgpu-lower-ctor-dtor", AMDGPUCtorDtorLoweringPass())
@@ -26,6 +25,17 @@ MODULE_PASS("amdgpu-printf-runtime-binding", 
AMDGPUPrintfRuntimeBindingPass())
 MODULE_PASS("amdgpu-unify-metadata", AMDGPUUnifyMetadataPass())
 #undef MODULE_PASS
 
+#ifndef MODULE_PASS_WITH_PARAMS
+#define MODULE_PASS_WITH_PARAMS(NAME, CLASS, CREATE_PASS, PARSER, PARAMS)
+#endif
+MODULE_PASS_WITH_PARAMS(
+"amdgpu-attributor", "AMDGPUAttributorPass",
+[=](AMDGPUAttributorOptions Options) {
+  return AMDGPUAttributorPass(*this, Options);
+},
+parseAMDGPUAttributorPassOptions, "closed-world")
+#undef MODULE_PASS_WITH_PARAMS
+
 #ifndef FUNCTION_PASS
 #define FUNCTION_PASS(NAME, CREATE_PASS)
 #endif
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
index 50cc2d871d4ece..700408cd55e6

[llvm-branch-commits] [clang] clang/AMDGPU: Set noalias.addrspace metadata on atomicrmw (PR #102462)

2024-08-08 Thread Shilei Tian via llvm-branch-commits


@@ -647,6 +647,14 @@ class LangOptions : public LangOptionsBase {
 return ConvergentFunctions;
   }
 
+  /// Return true if atomicrmw operations targeting allocations in private

shiltian wrote:

Do we want to have a check in target machine to tell if atomic operation on 
specific address space is legal? I'm thinking of adding atomic support in 
`AAAddressSpace`, and could drop the address space if an atomic operation in an 
inferred address space is not legal.

https://github.com/llvm/llvm-project/pull/102462
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][Attributor] Add a pass parameter `closed-world` for AMDGPUAttributor pass (PR #101760)

2024-08-08 Thread Shilei Tian via llvm-branch-commits

shiltian wrote:

This patch will be rebased once 
https://github.com/llvm/llvm-project/pull/102086 is landed.

https://github.com/llvm/llvm-project/pull/101760
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] NewPM/AMDGPU: Port AMDGPUPerfHintAnalysis to new pass manager (PR #102645)

2024-08-09 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian approved this pull request.


https://github.com/llvm/llvm-project/pull/102645
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] NewPM/AMDGPU: Port AMDGPUPerfHintAnalysis to new pass manager (PR #102645)

2024-08-09 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian edited 
https://github.com/llvm/llvm-project/pull/102645
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] [Clang][OMPX] Add the code generation for multi-dim `thread_limit` clause (PR #102717)

2024-08-09 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian created 
https://github.com/llvm/llvm-project/pull/102717

None

>From 3ec01daaa2d43350b2c835d4173ede441ca004a1 Mon Sep 17 00:00:00 2001
From: Shilei Tian 
Date: Fri, 9 Aug 2024 23:25:21 -0400
Subject: [PATCH] [Clang][OMPX] Add the code generation for multi-dim
 `thread_limit` clause

---
 clang/lib/CodeGen/CGOpenMPRuntime.cpp | 29 ++---
 clang/test/OpenMP/target_teams_codegen.cpp| 12 +++
 .../llvm/Frontend/OpenMP/OMPIRBuilder.h   | 26 
 llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp | 31 +++
 4 files changed, 54 insertions(+), 44 deletions(-)

diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp 
b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
index 8c5e4aa9c037e2..6c0c8646898cc6 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -9588,15 +9588,17 @@ static void genMapInfo(const OMPExecutableDirective &D, 
CodeGenFunction &CGF,
   genMapInfo(MEHandler, CGF, CombinedInfo, OMPBuilder, MappedVarSet);
 }
 
-static void emitNumTeamsForBareTargetDirective(
+template 
+static void emitClauseForBareTargetDirective(
 CodeGenFunction &CGF, const OMPExecutableDirective &D,
-llvm::SmallVectorImpl &NumTeams) {
-  const auto *C = D.getSingleClause();
-  assert(!C->varlist_empty() && "ompx_bare requires explicit num_teams");
-  CodeGenFunction::RunCleanupsScope NumTeamsScope(CGF);
-  for (auto *E : C->getNumTeams()) {
+llvm::SmallVectorImpl &Valuess) {
+  const auto *C = D.getSingleClause();
+  assert(!C->varlist_empty() &&
+ "ompx_bare requires explicit num_teams and thread_limit");
+  CodeGenFunction::RunCleanupsScope Scope(CGF);
+  for (auto *E : C->varlist()) {
 llvm::Value *V = CGF.EmitScalarExpr(E);
-NumTeams.push_back(
+Valuess.push_back(
 CGF.Builder.CreateIntCast(V, CGF.Int32Ty, /*isSigned=*/true));
   }
 }
@@ -9672,14 +9674,17 @@ static void emitTargetCallKernelLaunch(
 
 bool IsBare = D.hasClausesOfKind();
 SmallVector NumTeams;
-if (IsBare)
-  emitNumTeamsForBareTargetDirective(CGF, D, NumTeams);
-else
+SmallVector NumThreads;
+if (IsBare) {
+  emitClauseForBareTargetDirective(CGF, D, NumTeams);
+  emitClauseForBareTargetDirective(CGF, D,
+ NumThreads);
+} else {
   NumTeams.push_back(OMPRuntime->emitNumTeamsForTargetDirective(CGF, D));
+  NumThreads.push_back(OMPRuntime->emitNumThreadsForTargetDirective(CGF, 
D));
+}
 
 llvm::Value *DeviceID = emitDeviceID(Device, CGF);
-llvm::Value *NumThreads =
-OMPRuntime->emitNumThreadsForTargetDirective(CGF, D);
 llvm::Value *RTLoc = OMPRuntime->emitUpdateLocation(CGF, D.getBeginLoc());
 llvm::Value *NumIterations =
 OMPRuntime->emitTargetNumIterationsCall(CGF, D, SizeEmitter);
diff --git a/clang/test/OpenMP/target_teams_codegen.cpp 
b/clang/test/OpenMP/target_teams_codegen.cpp
index 9cab8eef148833..13d44e127201bd 100644
--- a/clang/test/OpenMP/target_teams_codegen.cpp
+++ b/clang/test/OpenMP/target_teams_codegen.cpp
@@ -127,13 +127,13 @@ int foo(int n) {
 aa += 1;
   }
 
-  #pragma omp target teams ompx_bare num_teams(1, 2) thread_limit(1)
+  #pragma omp target teams ompx_bare num_teams(1, 2) thread_limit(1, 2)
   {
 a += 1;
 aa += 1;
   }
 
-  #pragma omp target teams ompx_bare num_teams(1, 2, 3) thread_limit(1)
+  #pragma omp target teams ompx_bare num_teams(1, 2, 3) thread_limit(1, 2, 3)
   {
 a += 1;
 aa += 1;
@@ -667,7 +667,7 @@ int bar(int n){
 // CHECK1-NEXT:[[TMP144:%.*]] = getelementptr inbounds nuw 
[[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS29]], i32 0, i32 10
 // CHECK1-NEXT:store [3 x i32] [i32 1, i32 2, i32 0], ptr [[TMP144]], 
align 4
 // CHECK1-NEXT:[[TMP145:%.*]] = getelementptr inbounds nuw 
[[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS29]], i32 0, i32 11
-// CHECK1-NEXT:store [3 x i32] [i32 1, i32 0, i32 0], ptr [[TMP145]], 
align 4
+// CHECK1-NEXT:store [3 x i32] [i32 1, i32 2, i32 0], ptr [[TMP145]], 
align 4
 // CHECK1-NEXT:[[TMP146:%.*]] = getelementptr inbounds nuw 
[[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS29]], i32 0, i32 12
 // CHECK1-NEXT:store i32 0, ptr [[TMP146]], align 4
 // CHECK1-NEXT:[[TMP147:%.*]] = call i32 @__tgt_target_kernel(ptr 
@[[GLOB1]], i64 -1, i32 1, i32 1, ptr 
@.{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z3fooi_l130.region_id, ptr 
[[KERNEL_ARGS29]])
@@ -720,7 +720,7 @@ int bar(int n){
 // CHECK1-NEXT:[[TMP171:%.*]] = getelementptr inbounds nuw 
[[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS37]], i32 0, i32 10
 // CHECK1-NEXT:store [3 x i32] [i32 1, i32 2, i32 3], ptr [[TMP171]], 
align 4
 // CHECK1-NEXT:[[TMP172:%.*]] = getelementptr inbounds nuw 
[[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS37]], i32 0, i32 11
-// CHECK1-NEXT:store [3 x i32] [i32 1, i32 0, i32 0], ptr [[TMP172]], 
align 4
+// CHECK1-NEXT:   

[llvm-branch-commits] [clang] [llvm] [Clang][OMPX] Add the code generation for multi-dim `thread_limit` clause (PR #102717)

2024-08-09 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian ready_for_review 
https://github.com/llvm/llvm-project/pull/102717
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] [Clang][OMPX] Add the code generation for multi-dim `thread_limit` clause (PR #102717)

2024-08-09 Thread Shilei Tian via llvm-branch-commits

shiltian wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/102717?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#102717** https://app.graphite.dev/github/pr/llvm/llvm-project/102717?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈
* **#102715** https://app.graphite.dev/github/pr/llvm/llvm-project/102715?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`

This stack of pull requests is managed by Graphite. https://stacking.dev/?utm_source=stack-comment";>Learn more about 
stacking.


 Join @shiltian and the rest of your teammates on https://graphite.dev?utm-source=stack-comment";>https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="11px" height="11px"/> Graphite
  

https://github.com/llvm/llvm-project/pull/102717
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Preserve alignment when custom expanding atomicrmw (PR #103768)

2024-08-14 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian approved this pull request.


https://github.com/llvm/llvm-project/pull/103768
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [openmp] release/18.x: [OpenMP][AIX] Set worker stack size to 2 x KMP_DEFAULT_STKSIZE if system stack size is too big (#81996) (PR #82146)

2024-02-17 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian approved this pull request.


https://github.com/llvm/llvm-project/pull/82146
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [openmp] release/18.x: [OpenMP][AIX]Add assembly file containing microtasking routines and unnamed common block definitions (#81770) (PR #82391)

2024-02-20 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian approved this pull request.


https://github.com/llvm/llvm-project/pull/82391
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [openmp] release/18.x: [OpenMP] Implement __kmp_is_address_mapped on DragonFlyBSD. (#82895) (PR #82940)

2024-02-26 Thread Shilei Tian via llvm-branch-commits

shiltian wrote:

Generally we don't back port feature-implementation patches but we (actually 
it's me) already made an exception by mistake 
[ebc589e](https://github.com/llvm/llvm-project/commit/ebc589e44ffe7b77cc500f3d2dc1a7ba11dd82b1).
 How many patches like this do you expect to be back ported?

https://github.com/llvm/llvm-project/pull/82940
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [openmp] [openmp] __kmp_x86_cpuid fix for i386/PIC builds. (#84626) (PR #85053)

2024-03-13 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian approved this pull request.


https://github.com/llvm/llvm-project/pull/85053
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [openmp] release/18x: [OpenMP][AIX] Affinity implementation for AIX (#84984) (PR #86695)

2024-03-26 Thread Shilei Tian via llvm-branch-commits

shiltian wrote:

I think it's fine. WDYT? @tstellar 

https://github.com/llvm/llvm-project/pull/86695
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [openmp] release/18x: [OpenMP][AIX] Affinity implementation for AIX (#84984) (PR #86695)

2024-03-29 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian approved this pull request.

Since this feature is needed for the vendor release, we can make exception for 
this patch.

https://github.com/llvm/llvm-project/pull/86695
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [openmp] [openmp] Revise IDE folder structure (PR #89750)

2024-05-21 Thread Shilei Tian via llvm-branch-commits


@@ -126,6 +126,7 @@ macro(libomp_test_touch_recipe test_touch_dir)
 endmacro()
 libomp_append(libomp_test_touch_env "KMP_VERSION=1")
 add_custom_target(libomp-test-touch DEPENDS ${libomp_test_touch_targets})
+set_target_properties(libomp-test-touch PROPERTIES FOLDER "OpenMP/Tests")

shiltian wrote:

I suppose these are for Windows (and potentially Xcode on macOS), especially 
those "folders" don't look like anything on Linux. I'd recommend 
`libomo/tests`, `libomp/docs`, etc. We don't call our library "OpenMP" library.

https://github.com/llvm/llvm-project/pull/89750
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [openmp] [openmp] Revise IDE folder structure (PR #89750)

2024-05-21 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian edited 
https://github.com/llvm/llvm-project/pull/89750
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [openmp] [openmp] Revise IDE folder structure (PR #89750)

2024-05-22 Thread Shilei Tian via llvm-branch-commits


@@ -292,6 +294,7 @@ if(WIN32)
   set(LIBOMP_IMP_LIB_TARGET omp)
   set(LIBOMP_GENERATED_DEF_FILE ${LIBOMP_LIB_NAME}.def)
   add_custom_target(libomp-needed-def-file DEPENDS 
${LIBOMP_GENERATED_DEF_FILE})
+  set_target_properties(libomp-needed-def-file PROPERTIES FOLDER 
"OpenMP/Codegenning")

shiltian wrote:

`Codegenning` is really weird. I don't see this word anywhere else in LLVM 
project. Probably just `OpenMP/CodeGen`.

https://github.com/llvm/llvm-project/pull/89750
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [openmp] [openmp] Revise IDE folder structure (PR #89750)

2024-05-22 Thread Shilei Tian via llvm-branch-commits


@@ -126,6 +126,7 @@ macro(libomp_test_touch_recipe test_touch_dir)
 endmacro()
 libomp_append(libomp_test_touch_env "KMP_VERSION=1")
 add_custom_target(libomp-test-touch DEPENDS ${libomp_test_touch_targets})
+set_target_properties(libomp-test-touch PROPERTIES FOLDER "OpenMP/Tests")

shiltian wrote:

My point is, when it comes to directory (except the top level folder), we don't 
call it "OpenMP". I don't have a strong objection, but just find `libomp` more 
conventional. I'll leave it up to you.

https://github.com/llvm/llvm-project/pull/89750
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [openmp] [openmp] Revise IDE folder structure (PR #89750)

2024-05-22 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian edited 
https://github.com/llvm/llvm-project/pull/89750
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [openmp] [openmp] Revise IDE folder structure (PR #89750)

2024-05-22 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian approved this pull request.

LG with nit

https://github.com/llvm/llvm-project/pull/89750
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [X86] Avoid generating nested CALLSEQ for TLS pointer function arguments (PR #106965)

2024-09-02 Thread Shilei Tian via llvm-branch-commits


@@ -0,0 +1,17 @@
+; RUN: llc -verify-machineinstrs < %s -relocation-model=pic
+
+target datalayout = 
"e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+; Passing a pointer to thread-local storage to a function can be problematic
+; since computing such addresses requires a function call that is introduced
+; very late in instruction selection. We need to ensure that we don't introduce
+; nested call sequence markers if this function call happens in a call 
sequence.
+
+@TLS = internal thread_local global i64 zeroinitializer, align 8
+declare void @bar(ptr)
+define internal void @foo() {
+call void @bar(ptr @TLS)
+call void @bar(ptr @TLS)
+ret void
+}

shiltian wrote:

add an empty line at the end of file

https://github.com/llvm/llvm-project/pull/106965
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [Attributor] Take the address space from addrspacecast directly (PR #108258)

2024-09-14 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian edited 
https://github.com/llvm/llvm-project/pull/108258
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [Attributor] Take the address space from addrspacecast directly (PR #108258)

2024-09-14 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian updated 
https://github.com/llvm/llvm-project/pull/108258

>From 5f9e01b93c02d5951d399258c12381de6f1c8626 Mon Sep 17 00:00:00 2001
From: Shilei Tian 
Date: Wed, 11 Sep 2024 12:23:32 -0400
Subject: [PATCH] [Attributor] Take the address space from addrspacecast
 directly

If the value to be analyzed is directly from addrspacecast, we take the source
address space directly. This is to improve the case where in
`AMDGPUPromoteKernelArgumentsPass`, the kernel argument is promoted by
insertting an addrspacecast directly from a generic pointer. However, during the
analysis, the underlying object will be the generic pointer, instead of the
addrspacecast, thus the inferred address space is the generic one, which is not
ideal.
---
 .../Transforms/IPO/AttributorAttributes.cpp   | 54 ++-
 llvm/test/CodeGen/AMDGPU/aa-as-infer.ll   | 35 
 2 files changed, 76 insertions(+), 13 deletions(-)

diff --git a/llvm/lib/Transforms/IPO/AttributorAttributes.cpp 
b/llvm/lib/Transforms/IPO/AttributorAttributes.cpp
index 9c775e48f28195..749c5ea0bfcf6c 100644
--- a/llvm/lib/Transforms/IPO/AttributorAttributes.cpp
+++ b/llvm/lib/Transforms/IPO/AttributorAttributes.cpp
@@ -12589,15 +12589,37 @@ struct AAAddressSpaceImpl : public AAAddressSpace {
 
   ChangeStatus updateImpl(Attributor &A) override {
 uint32_t OldAddressSpace = AssumedAddressSpace;
-auto *AUO = A.getOrCreateAAFor(getIRPosition(), this,
-DepClassTy::REQUIRED);
-auto Pred = [&](Value &Obj) {
+
+auto CheckAddressSpace = [&](Value &Obj) {
   if (isa(&Obj))
 return true;
+  // If an argument in flat address space has addrspace cast uses, and 
those
+  // casts are same, then we take the dst addrspace.
+  if (auto *Arg = dyn_cast(&Obj)) {
+unsigned FlatAS =
+A.getInfoCache().getFlatAddressSpace(Arg->getParent());
+if (FlatAS != InvalidAddressSpace &&
+Arg->getType()->getPointerAddressSpace() == FlatAS) {
+  unsigned CastAddrSpace = FlatAS;
+  for (auto *U : Arg->users()) {
+auto *ASCI = dyn_cast(U);
+if (!ASCI)
+  continue;
+if (CastAddrSpace != FlatAS &&
+CastAddrSpace != ASCI->getDestAddressSpace())
+  return false;
+CastAddrSpace = ASCI->getDestAddressSpace();
+  }
+  if (CastAddrSpace != FlatAS)
+return takeAddressSpace(CastAddrSpace);
+}
+  }
   return takeAddressSpace(Obj.getType()->getPointerAddressSpace());
 };
 
-if (!AUO->forallUnderlyingObjects(Pred))
+auto *AUO = A.getOrCreateAAFor(getIRPosition(), this,
+DepClassTy::REQUIRED);
+if (!AUO->forallUnderlyingObjects(CheckAddressSpace))
   return indicatePessimisticFixpoint();
 
 return OldAddressSpace == AssumedAddressSpace ? ChangeStatus::UNCHANGED
@@ -12606,17 +12628,18 @@ struct AAAddressSpaceImpl : public AAAddressSpace {
 
   /// See AbstractAttribute::manifest(...).
   ChangeStatus manifest(Attributor &A) override {
-if (getAddressSpace() == InvalidAddressSpace ||
-getAddressSpace() == getAssociatedType()->getPointerAddressSpace())
+unsigned NewAS = getAddressSpace();
+
+if (NewAS == InvalidAddressSpace ||
+NewAS == getAssociatedType()->getPointerAddressSpace())
   return ChangeStatus::UNCHANGED;
 
 Value *AssociatedValue = &getAssociatedValue();
 Value *OriginalValue = peelAddrspacecast(AssociatedValue);
-
 PointerType *NewPtrTy =
-PointerType::get(getAssociatedType()->getContext(), getAddressSpace());
+PointerType::get(getAssociatedType()->getContext(), NewAS);
 bool UseOriginalValue =
-OriginalValue->getType()->getPointerAddressSpace() == 
getAddressSpace();
+OriginalValue->getType()->getPointerAddressSpace() == NewAS;
 
 bool Changed = false;
 
@@ -12677,11 +12700,16 @@ struct AAAddressSpaceImpl : public AAAddressSpace {
   }
 
   static Value *peelAddrspacecast(Value *V) {
-if (auto *I = dyn_cast(V))
-  return peelAddrspacecast(I->getPointerOperand());
+if (auto *I = dyn_cast(V)) {
+  assert(I->getSrcAddressSpace() && "there should not be AS 0 -> AS X");
+  return I->getPointerOperand();
+}
 if (auto *C = dyn_cast(V))
-  if (C->getOpcode() == Instruction::AddrSpaceCast)
-return peelAddrspacecast(C->getOperand(0));
+  if (C->getOpcode() == Instruction::AddrSpaceCast) {
+assert(C->getOperand(0)->getType()->getPointerAddressSpace() &&
+   "there should not be AS 0 -> AS X");
+return C->getOperand(0);
+  }
 return V;
   }
 };
diff --git a/llvm/test/CodeGen/AMDGPU/aa-as-infer.ll 
b/llvm/test/CodeGen/AMDGPU/aa-as-infer.ll
index fdc5debb18915c..d1a6414fe49ae1 100644
--- a/llvm/test/CodeGen/AMDGPU/aa-as-infer.ll
+++ b/llvm/test/CodeGen/AMD

[llvm-branch-commits] [llvm] [Attributor] Take the address space from addrspacecast directly (PR #108258)

2024-09-14 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian updated 
https://github.com/llvm/llvm-project/pull/108258

>From d16be3fc2a2d1d572c25a76ee297dd0f4f8e37ed Mon Sep 17 00:00:00 2001
From: Shilei Tian 
Date: Wed, 11 Sep 2024 12:23:32 -0400
Subject: [PATCH] [Attributor] Take the address space from addrspacecast
 directly

If the value to be analyzed is directly from addrspacecast, we take the source
address space directly. This is to improve the case where in
`AMDGPUPromoteKernelArgumentsPass`, the kernel argument is promoted by
insertting an addrspacecast directly from a generic pointer. However, during the
analysis, the underlying object will be the generic pointer, instead of the
addrspacecast, thus the inferred address space is the generic one, which is not
ideal.
---
 .../Transforms/IPO/AttributorAttributes.cpp   | 54 ++-
 llvm/test/CodeGen/AMDGPU/aa-as-infer.ll   | 35 
 2 files changed, 76 insertions(+), 13 deletions(-)

diff --git a/llvm/lib/Transforms/IPO/AttributorAttributes.cpp 
b/llvm/lib/Transforms/IPO/AttributorAttributes.cpp
index 9c775e48f28195..749c5ea0bfcf6c 100644
--- a/llvm/lib/Transforms/IPO/AttributorAttributes.cpp
+++ b/llvm/lib/Transforms/IPO/AttributorAttributes.cpp
@@ -12589,15 +12589,37 @@ struct AAAddressSpaceImpl : public AAAddressSpace {
 
   ChangeStatus updateImpl(Attributor &A) override {
 uint32_t OldAddressSpace = AssumedAddressSpace;
-auto *AUO = A.getOrCreateAAFor(getIRPosition(), this,
-DepClassTy::REQUIRED);
-auto Pred = [&](Value &Obj) {
+
+auto CheckAddressSpace = [&](Value &Obj) {
   if (isa(&Obj))
 return true;
+  // If an argument in flat address space has addrspace cast uses, and 
those
+  // casts are same, then we take the dst addrspace.
+  if (auto *Arg = dyn_cast(&Obj)) {
+unsigned FlatAS =
+A.getInfoCache().getFlatAddressSpace(Arg->getParent());
+if (FlatAS != InvalidAddressSpace &&
+Arg->getType()->getPointerAddressSpace() == FlatAS) {
+  unsigned CastAddrSpace = FlatAS;
+  for (auto *U : Arg->users()) {
+auto *ASCI = dyn_cast(U);
+if (!ASCI)
+  continue;
+if (CastAddrSpace != FlatAS &&
+CastAddrSpace != ASCI->getDestAddressSpace())
+  return false;
+CastAddrSpace = ASCI->getDestAddressSpace();
+  }
+  if (CastAddrSpace != FlatAS)
+return takeAddressSpace(CastAddrSpace);
+}
+  }
   return takeAddressSpace(Obj.getType()->getPointerAddressSpace());
 };
 
-if (!AUO->forallUnderlyingObjects(Pred))
+auto *AUO = A.getOrCreateAAFor(getIRPosition(), this,
+DepClassTy::REQUIRED);
+if (!AUO->forallUnderlyingObjects(CheckAddressSpace))
   return indicatePessimisticFixpoint();
 
 return OldAddressSpace == AssumedAddressSpace ? ChangeStatus::UNCHANGED
@@ -12606,17 +12628,18 @@ struct AAAddressSpaceImpl : public AAAddressSpace {
 
   /// See AbstractAttribute::manifest(...).
   ChangeStatus manifest(Attributor &A) override {
-if (getAddressSpace() == InvalidAddressSpace ||
-getAddressSpace() == getAssociatedType()->getPointerAddressSpace())
+unsigned NewAS = getAddressSpace();
+
+if (NewAS == InvalidAddressSpace ||
+NewAS == getAssociatedType()->getPointerAddressSpace())
   return ChangeStatus::UNCHANGED;
 
 Value *AssociatedValue = &getAssociatedValue();
 Value *OriginalValue = peelAddrspacecast(AssociatedValue);
-
 PointerType *NewPtrTy =
-PointerType::get(getAssociatedType()->getContext(), getAddressSpace());
+PointerType::get(getAssociatedType()->getContext(), NewAS);
 bool UseOriginalValue =
-OriginalValue->getType()->getPointerAddressSpace() == 
getAddressSpace();
+OriginalValue->getType()->getPointerAddressSpace() == NewAS;
 
 bool Changed = false;
 
@@ -12677,11 +12700,16 @@ struct AAAddressSpaceImpl : public AAAddressSpace {
   }
 
   static Value *peelAddrspacecast(Value *V) {
-if (auto *I = dyn_cast(V))
-  return peelAddrspacecast(I->getPointerOperand());
+if (auto *I = dyn_cast(V)) {
+  assert(I->getSrcAddressSpace() && "there should not be AS 0 -> AS X");
+  return I->getPointerOperand();
+}
 if (auto *C = dyn_cast(V))
-  if (C->getOpcode() == Instruction::AddrSpaceCast)
-return peelAddrspacecast(C->getOperand(0));
+  if (C->getOpcode() == Instruction::AddrSpaceCast) {
+assert(C->getOperand(0)->getType()->getPointerAddressSpace() &&
+   "there should not be AS 0 -> AS X");
+return C->getOperand(0);
+  }
 return V;
   }
 };
diff --git a/llvm/test/CodeGen/AMDGPU/aa-as-infer.ll 
b/llvm/test/CodeGen/AMDGPU/aa-as-infer.ll
index fdc5debb18915c..d1a6414fe49ae1 100644
--- a/llvm/test/CodeGen/AMDGPU/aa-as-infer.ll
+++ b/llvm/test/CodeGen/AMD

[llvm-branch-commits] [llvm] [Attributor] Take the address space from addrspacecast directly (PR #108258)

2024-09-14 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian updated 
https://github.com/llvm/llvm-project/pull/108258

>From 9beeba09cd35aa78d6ebb90bb98bde0b4113554e Mon Sep 17 00:00:00 2001
From: Shilei Tian 
Date: Wed, 11 Sep 2024 12:23:32 -0400
Subject: [PATCH] [Attributor] Take the address space from addrspacecast
 directly

If the value to be analyzed is directly from addrspacecast, we take the source
address space directly. This is to improve the case where in
`AMDGPUPromoteKernelArgumentsPass`, the kernel argument is promoted by
insertting an addrspacecast directly from a generic pointer. However, during the
analysis, the underlying object will be the generic pointer, instead of the
addrspacecast, thus the inferred address space is the generic one, which is not
ideal.
---
 .../Transforms/IPO/AttributorAttributes.cpp   | 63 ++-
 llvm/test/CodeGen/AMDGPU/aa-as-infer.ll   | 35 +++
 2 files changed, 84 insertions(+), 14 deletions(-)

diff --git a/llvm/lib/Transforms/IPO/AttributorAttributes.cpp 
b/llvm/lib/Transforms/IPO/AttributorAttributes.cpp
index 9c775e48f28195..aa5cedd13413d6 100644
--- a/llvm/lib/Transforms/IPO/AttributorAttributes.cpp
+++ b/llvm/lib/Transforms/IPO/AttributorAttributes.cpp
@@ -12589,15 +12589,37 @@ struct AAAddressSpaceImpl : public AAAddressSpace {
 
   ChangeStatus updateImpl(Attributor &A) override {
 uint32_t OldAddressSpace = AssumedAddressSpace;
-auto *AUO = A.getOrCreateAAFor(getIRPosition(), this,
-DepClassTy::REQUIRED);
-auto Pred = [&](Value &Obj) {
+
+auto CheckAddressSpace = [&](Value &Obj) {
   if (isa(&Obj))
 return true;
+  // If an argument in flat address space has addrspace cast uses, and 
those
+  // casts are same, then we take the dst addrspace.
+  if (auto *Arg = dyn_cast(&Obj)) {
+unsigned FlatAS =
+A.getInfoCache().getFlatAddressSpace(Arg->getParent());
+if (FlatAS != InvalidAddressSpace &&
+Arg->getType()->getPointerAddressSpace() == FlatAS) {
+  unsigned CastAddrSpace = FlatAS;
+  for (auto *U : Arg->users()) {
+auto *ASCI = dyn_cast(U);
+if (!ASCI)
+  continue;
+if (CastAddrSpace != FlatAS &&
+CastAddrSpace != ASCI->getDestAddressSpace())
+  return false;
+CastAddrSpace = ASCI->getDestAddressSpace();
+  }
+  if (CastAddrSpace != FlatAS)
+return takeAddressSpace(CastAddrSpace);
+}
+  }
   return takeAddressSpace(Obj.getType()->getPointerAddressSpace());
 };
 
-if (!AUO->forallUnderlyingObjects(Pred))
+auto *AUO = A.getOrCreateAAFor(getIRPosition(), this,
+DepClassTy::REQUIRED);
+if (!AUO->forallUnderlyingObjects(CheckAddressSpace))
   return indicatePessimisticFixpoint();
 
 return OldAddressSpace == AssumedAddressSpace ? ChangeStatus::UNCHANGED
@@ -12606,17 +12628,23 @@ struct AAAddressSpaceImpl : public AAAddressSpace {
 
   /// See AbstractAttribute::manifest(...).
   ChangeStatus manifest(Attributor &A) override {
-if (getAddressSpace() == InvalidAddressSpace ||
-getAddressSpace() == getAssociatedType()->getPointerAddressSpace())
+unsigned NewAS = getAddressSpace();
+
+if (NewAS == InvalidAddressSpace ||
+NewAS == getAssociatedType()->getPointerAddressSpace())
   return ChangeStatus::UNCHANGED;
 
+unsigned FlatAS =
+A.getInfoCache().getFlatAddressSpace(getAssociatedFunction());
+assert(FlatAS != InvalidAddressSpace);
+
 Value *AssociatedValue = &getAssociatedValue();
-Value *OriginalValue = peelAddrspacecast(AssociatedValue);
+Value *OriginalValue = peelAddrspacecast(AssociatedValue, FlatAS);
 
 PointerType *NewPtrTy =
-PointerType::get(getAssociatedType()->getContext(), getAddressSpace());
+PointerType::get(getAssociatedType()->getContext(), NewAS);
 bool UseOriginalValue =
-OriginalValue->getType()->getPointerAddressSpace() == 
getAddressSpace();
+OriginalValue->getType()->getPointerAddressSpace() == NewAS;
 
 bool Changed = false;
 
@@ -12676,12 +12704,19 @@ struct AAAddressSpaceImpl : public AAAddressSpace {
 return AssumedAddressSpace == AS;
   }
 
-  static Value *peelAddrspacecast(Value *V) {
-if (auto *I = dyn_cast(V))
-  return peelAddrspacecast(I->getPointerOperand());
+  static Value *peelAddrspacecast(Value *V, unsigned FlatAS) {
+if (auto *I = dyn_cast(V)) {
+  assert(I->getSrcAddressSpace() != FlatAS &&
+ "there should not be flat AS -> non-flat AS");
+  return I->getPointerOperand();
+}
 if (auto *C = dyn_cast(V))
-  if (C->getOpcode() == Instruction::AddrSpaceCast)
-return peelAddrspacecast(C->getOperand(0));
+  if (C->getOpcode() == Instruction::AddrSpaceCast) {
+assert(C->getOperand(0)->getType()->getPointerAd

[llvm-branch-commits] [llvm] [TargetTransformInfo] Remove `getFlatAddressSpace` (PR #108787)

2024-09-15 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian created 
https://github.com/llvm/llvm-project/pull/108787

This has been moved to `DataLayout`.

>From 8db0d52053b0cde8be13ddb6c669b6b262eefdf8 Mon Sep 17 00:00:00 2001
From: Shilei Tian 
Date: Sun, 15 Sep 2024 23:06:14 -0400
Subject: [PATCH] [TargetTransformInfo] Remove `getFlatAddressSpace`

This has been moved to `DataLayout`.
---
 .../llvm/Analysis/TargetTransformInfo.h   | 21 ---
 .../llvm/Analysis/TargetTransformInfoImpl.h   |  2 --
 llvm/include/llvm/CodeGen/BasicTTIImpl.h  |  5 -
 llvm/lib/Analysis/TargetTransformInfo.cpp |  4 
 .../Target/AMDGPU/AMDGPUTargetTransformInfo.h |  8 ---
 .../Target/NVPTX/NVPTXTargetTransformInfo.h   |  4 
 .../Transforms/Scalar/InferAddressSpaces.cpp  |  2 +-
 7 files changed, 1 insertion(+), 45 deletions(-)

diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h 
b/llvm/include/llvm/Analysis/TargetTransformInfo.h
index b2124c6106198e..e5986225b6fc32 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfo.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h
@@ -451,24 +451,6 @@ class TargetTransformInfo {
   /// Return false if a \p AS0 address cannot possibly alias a \p AS1 address.
   bool addrspacesMayAlias(unsigned AS0, unsigned AS1) const;
 
-  /// Returns the address space ID for a target's 'flat' address space. Note
-  /// this is not necessarily the same as addrspace(0), which LLVM sometimes
-  /// refers to as the generic address space. The flat address space is a
-  /// generic address space that can be used access multiple segments of memory
-  /// with different address spaces. Access of a memory location through a
-  /// pointer with this address space is expected to be legal but slower
-  /// compared to the same memory location accessed through a pointer with a
-  /// different address space.
-  //
-  /// This is for targets with different pointer representations which can
-  /// be converted with the addrspacecast instruction. If a pointer is 
converted
-  /// to this address space, optimizations should attempt to replace the access
-  /// with the source address space.
-  ///
-  /// \returns ~0u if the target does not have such a flat address space to
-  /// optimize away.
-  unsigned getFlatAddressSpace() const;
-
   /// Return any intrinsic address operand indexes which may be rewritten if
   /// they use a flat address space pointer.
   ///
@@ -1836,7 +1818,6 @@ class TargetTransformInfo::Concept {
   virtual bool isAlwaysUniform(const Value *V) = 0;
   virtual bool isValidAddrSpaceCast(unsigned FromAS, unsigned ToAS) const = 0;
   virtual bool addrspacesMayAlias(unsigned AS0, unsigned AS1) const = 0;
-  virtual unsigned getFlatAddressSpace() = 0;
   virtual bool collectFlatAddressOperands(SmallVectorImpl &OpIndexes,
   Intrinsic::ID IID) const = 0;
   virtual bool isNoopAddrSpaceCast(unsigned FromAS, unsigned ToAS) const = 0;
@@ -2263,8 +2244,6 @@ class TargetTransformInfo::Model final : public 
TargetTransformInfo::Concept {
 return Impl.addrspacesMayAlias(AS0, AS1);
   }
 
-  unsigned getFlatAddressSpace() override { return Impl.getFlatAddressSpace(); 
}
-
   bool collectFlatAddressOperands(SmallVectorImpl &OpIndexes,
   Intrinsic::ID IID) const override {
 return Impl.collectFlatAddressOperands(OpIndexes, IID);
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h 
b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
index 90eef93a2a54d5..192a1c15347dc7 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
@@ -115,8 +115,6 @@ class TargetTransformInfoImplBase {
 return true;
   }
 
-  unsigned getFlatAddressSpace() const { return -1; }
-
   bool collectFlatAddressOperands(SmallVectorImpl &OpIndexes,
   Intrinsic::ID IID) const {
 return false;
diff --git a/llvm/include/llvm/CodeGen/BasicTTIImpl.h 
b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
index 50dc7d5c54c54a..05b0e5844ac5d5 100644
--- a/llvm/include/llvm/CodeGen/BasicTTIImpl.h
+++ b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
@@ -292,11 +292,6 @@ class BasicTTIImplBase : public 
TargetTransformInfoImplCRTPBase {
 return true;
   }
 
-  unsigned getFlatAddressSpace() {
-// Return an invalid address space.
-return -1;
-  }
-
   bool collectFlatAddressOperands(SmallVectorImpl &OpIndexes,
   Intrinsic::ID IID) const {
 return false;
diff --git a/llvm/lib/Analysis/TargetTransformInfo.cpp 
b/llvm/lib/Analysis/TargetTransformInfo.cpp
index 2c26493bd3f1ca..5eb6be7a362cb5 100644
--- a/llvm/lib/Analysis/TargetTransformInfo.cpp
+++ b/llvm/lib/Analysis/TargetTransformInfo.cpp
@@ -305,10 +305,6 @@ bool 
llvm::TargetTransformInfo::addrspacesMayAlias(unsigned FromAS,
   return TTIImpl->addrspacesMayAlias(FromAS, ToAS);
 }
 
-unsigned TargetTransformInfo::getFlatAddressSpace() 

[llvm-branch-commits] [llvm] [TargetTransformInfo] Remove `getFlatAddressSpace` (PR #108787)

2024-09-15 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian ready_for_review 
https://github.com/llvm/llvm-project/pull/108787
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [TargetTransformInfo] Remove `getFlatAddressSpace` (PR #108787)

2024-09-15 Thread Shilei Tian via llvm-branch-commits

shiltian wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/108787?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#108787** https://app.graphite.dev/github/pr/llvm/llvm-project/108787?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈
* **#108786** https://app.graphite.dev/github/pr/llvm/llvm-project/108786?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`

This stack of pull requests is managed by Graphite. https://stacking.dev/?utm_source=stack-comment";>Learn more about 
stacking.


 Join @shiltian and the rest of your teammates on https://graphite.dev?utm-source=stack-comment";>https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="11px" height="11px"/> Graphite
  

https://github.com/llvm/llvm-project/pull/108787
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [Attributor] Use more appropriate approach to check flat address space (PR #108713)

2024-09-15 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian edited 
https://github.com/llvm/llvm-project/pull/108713
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [Attributor] Take the address space from addrspacecast directly (PR #108258)

2024-09-15 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian updated 
https://github.com/llvm/llvm-project/pull/108258

>From 20e76a5af6fa001617b5c1ebbad2b3965df922f5 Mon Sep 17 00:00:00 2001
From: Shilei Tian 
Date: Wed, 11 Sep 2024 12:23:32 -0400
Subject: [PATCH] [Attributor] Take the address space from addrspacecast
 directly

If the value to be analyzed is directly from addrspacecast, we take the source
address space directly. This is to improve the case where in
`AMDGPUPromoteKernelArgumentsPass`, the kernel argument is promoted by
insertting an addrspacecast directly from a generic pointer. However, during the
analysis, the underlying object will be the generic pointer, instead of the
addrspacecast, thus the inferred address space is the generic one, which is not
ideal.
---
 .../Transforms/IPO/AttributorAttributes.cpp   | 62 ++-
 llvm/test/CodeGen/AMDGPU/aa-as-infer.ll   | 35 +++
 2 files changed, 83 insertions(+), 14 deletions(-)

diff --git a/llvm/lib/Transforms/IPO/AttributorAttributes.cpp 
b/llvm/lib/Transforms/IPO/AttributorAttributes.cpp
index 72ac08ec2b6e1c..8860d5a3295a47 100644
--- a/llvm/lib/Transforms/IPO/AttributorAttributes.cpp
+++ b/llvm/lib/Transforms/IPO/AttributorAttributes.cpp
@@ -12587,16 +12587,38 @@ struct AAAddressSpaceImpl : public AAAddressSpace {
   }
 
   ChangeStatus updateImpl(Attributor &A) override {
+unsigned FlatAS = A.getInfoCache().getDL().getFlatAddressSpace();
+assert(FlatAS != InvalidAddressSpace);
 uint32_t OldAddressSpace = AssumedAddressSpace;
-auto *AUO = A.getOrCreateAAFor(getIRPosition(), this,
-DepClassTy::REQUIRED);
-auto Pred = [&](Value &Obj) {
+
+auto CheckAddressSpace = [&](Value &Obj) {
   if (isa(&Obj))
 return true;
+  // If an argument in flat address space has addrspace cast uses, and 
those
+  // casts are same, then we take the dst addrspace.
+  if (auto *Arg = dyn_cast(&Obj)) {
+if (FlatAS != InvalidAddressSpace &&
+Arg->getType()->getPointerAddressSpace() == FlatAS) {
+  unsigned CastAddrSpace = FlatAS;
+  for (auto *U : Arg->users()) {
+auto *ASCI = dyn_cast(U);
+if (!ASCI)
+  continue;
+if (CastAddrSpace != FlatAS &&
+CastAddrSpace != ASCI->getDestAddressSpace())
+  return false;
+CastAddrSpace = ASCI->getDestAddressSpace();
+  }
+  if (CastAddrSpace != FlatAS)
+return takeAddressSpace(CastAddrSpace);
+}
+  }
   return takeAddressSpace(Obj.getType()->getPointerAddressSpace());
 };
 
-if (!AUO->forallUnderlyingObjects(Pred))
+auto *AUO = A.getOrCreateAAFor(getIRPosition(), this,
+DepClassTy::REQUIRED);
+if (!AUO->forallUnderlyingObjects(CheckAddressSpace))
   return indicatePessimisticFixpoint();
 
 return OldAddressSpace == AssumedAddressSpace ? ChangeStatus::UNCHANGED
@@ -12605,17 +12627,22 @@ struct AAAddressSpaceImpl : public AAAddressSpace {
 
   /// See AbstractAttribute::manifest(...).
   ChangeStatus manifest(Attributor &A) override {
-if (getAddressSpace() == InvalidAddressSpace ||
-getAddressSpace() == getAssociatedType()->getPointerAddressSpace())
+unsigned NewAS = getAddressSpace();
+
+if (NewAS == InvalidAddressSpace ||
+NewAS == getAssociatedType()->getPointerAddressSpace())
   return ChangeStatus::UNCHANGED;
 
+unsigned FlatAS = A.getInfoCache().getDL().getFlatAddressSpace();
+assert(FlatAS != InvalidAddressSpace);
+
 Value *AssociatedValue = &getAssociatedValue();
-Value *OriginalValue = peelAddrspacecast(AssociatedValue);
+Value *OriginalValue = peelAddrspacecast(AssociatedValue, FlatAS);
 
 PointerType *NewPtrTy =
-PointerType::get(getAssociatedType()->getContext(), getAddressSpace());
+PointerType::get(getAssociatedType()->getContext(), NewAS);
 bool UseOriginalValue =
-OriginalValue->getType()->getPointerAddressSpace() == 
getAddressSpace();
+OriginalValue->getType()->getPointerAddressSpace() == NewAS;
 
 bool Changed = false;
 
@@ -12675,12 +12702,19 @@ struct AAAddressSpaceImpl : public AAAddressSpace {
 return AssumedAddressSpace == AS;
   }
 
-  static Value *peelAddrspacecast(Value *V) {
-if (auto *I = dyn_cast(V))
-  return peelAddrspacecast(I->getPointerOperand());
+  static Value *peelAddrspacecast(Value *V, unsigned FlatAS) {
+if (auto *I = dyn_cast(V)) {
+  assert(I->getSrcAddressSpace() != FlatAS &&
+ "there should not be flat AS -> non-flat AS");
+  return I->getPointerOperand();
+}
 if (auto *C = dyn_cast(V))
-  if (C->getOpcode() == Instruction::AddrSpaceCast)
-return peelAddrspacecast(C->getOperand(0));
+  if (C->getOpcode() == Instruction::AddrSpaceCast) {
+assert(C->getOperand(0)->getType()->getPointerAd

[llvm-branch-commits] [llvm] [Attributor] Take the address space from addrspacecast directly (PR #108258)

2024-09-15 Thread Shilei Tian via llvm-branch-commits

shiltian wrote:

> > If the value to be analyzed is directly from addrspacecast, we take the 
> > source
> > address space directly.
> 
> I don't think this is valid in general. You are allow to speculatively 
> produce invalid addrspacecasts. For example:
> 
> ```
> __generic int* g_ptr = ...;
> __local int* l_ptr = (__local int*) g_ptr;
> if (is_shared(g_ptr)) 
>   *l_ptr = 1;
> ```

That doesn't matter because the scope is the instruction. In your example here, 
the pointer of the store is already in the specific address space. That will 
not change no matter whether the branch is taken or not. `AAAddressSpace` 
doesn't say `g_ptr` is in what address space. It just changes the pointer 
operand of memory instruction.

https://github.com/llvm/llvm-project/pull/108258
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [TargetTransformInfo] Remove `getFlatAddressSpace` (PR #108787)

2024-09-15 Thread Shilei Tian via llvm-branch-commits

shiltian wrote:

RFC: 
https://discourse.llvm.org/t/nfc-remove-getflataddressspace-from-targettransforminfo/81263

https://github.com/llvm/llvm-project/pull/108787
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [Attributor] Take the address space from addrspacecast directly (PR #108258)

2024-09-15 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian updated 
https://github.com/llvm/llvm-project/pull/108258

>From f9d72576e5683154dbc67df051dbd117db539a33 Mon Sep 17 00:00:00 2001
From: Shilei Tian 
Date: Wed, 11 Sep 2024 12:23:32 -0400
Subject: [PATCH] [Attributor] Take the address space from addrspacecast
 directly

If the value to be analyzed is directly from addrspacecast, we take the source
address space directly. This is to improve the case where in
`AMDGPUPromoteKernelArgumentsPass`, the kernel argument is promoted by
insertting an addrspacecast directly from a generic pointer. However, during the
analysis, the underlying object will be the generic pointer, instead of the
addrspacecast, thus the inferred address space is the generic one, which is not
ideal.
---
 .../Transforms/IPO/AttributorAttributes.cpp   | 62 ++-
 llvm/test/CodeGen/AMDGPU/aa-as-infer.ll   | 33 ++
 2 files changed, 81 insertions(+), 14 deletions(-)

diff --git a/llvm/lib/Transforms/IPO/AttributorAttributes.cpp 
b/llvm/lib/Transforms/IPO/AttributorAttributes.cpp
index 72ac08ec2b6e1c..b16bd01af8fa6c 100644
--- a/llvm/lib/Transforms/IPO/AttributorAttributes.cpp
+++ b/llvm/lib/Transforms/IPO/AttributorAttributes.cpp
@@ -12587,16 +12587,38 @@ struct AAAddressSpaceImpl : public AAAddressSpace {
   }
 
   ChangeStatus updateImpl(Attributor &A) override {
+unsigned FlatAS = A.getInfoCache().getDL().getFlatAddressSpace();
+assert(FlatAS != InvalidAddressSpace);
 uint32_t OldAddressSpace = AssumedAddressSpace;
-auto *AUO = A.getOrCreateAAFor(getIRPosition(), this,
-DepClassTy::REQUIRED);
-auto Pred = [&](Value &Obj) {
+
+auto CheckAddressSpace = [&](Value &Obj) {
   if (isa(&Obj))
 return true;
+  // If an argument in flat address space only has addrspace cast uses, and
+  // those casts are same, then we take the dst addrspace.
+  if (auto *Arg = dyn_cast(&Obj)) {
+if (FlatAS != InvalidAddressSpace &&
+Arg->getType()->getPointerAddressSpace() == FlatAS) {
+  unsigned CastAddrSpace = FlatAS;
+  for (auto *U : Arg->users()) {
+auto *ASCI = dyn_cast(U);
+if (!ASCI)
+  return takeAddressSpace(Obj.getType()->getPointerAddressSpace());
+if (CastAddrSpace != FlatAS &&
+CastAddrSpace != ASCI->getDestAddressSpace())
+  return false;
+CastAddrSpace = ASCI->getDestAddressSpace();
+  }
+  if (CastAddrSpace != FlatAS)
+return takeAddressSpace(CastAddrSpace);
+}
+  }
   return takeAddressSpace(Obj.getType()->getPointerAddressSpace());
 };
 
-if (!AUO->forallUnderlyingObjects(Pred))
+auto *AUO = A.getOrCreateAAFor(getIRPosition(), this,
+DepClassTy::REQUIRED);
+if (!AUO->forallUnderlyingObjects(CheckAddressSpace))
   return indicatePessimisticFixpoint();
 
 return OldAddressSpace == AssumedAddressSpace ? ChangeStatus::UNCHANGED
@@ -12605,17 +12627,22 @@ struct AAAddressSpaceImpl : public AAAddressSpace {
 
   /// See AbstractAttribute::manifest(...).
   ChangeStatus manifest(Attributor &A) override {
-if (getAddressSpace() == InvalidAddressSpace ||
-getAddressSpace() == getAssociatedType()->getPointerAddressSpace())
+unsigned NewAS = getAddressSpace();
+
+if (NewAS == InvalidAddressSpace ||
+NewAS == getAssociatedType()->getPointerAddressSpace())
   return ChangeStatus::UNCHANGED;
 
+unsigned FlatAS = A.getInfoCache().getDL().getFlatAddressSpace();
+assert(FlatAS != InvalidAddressSpace);
+
 Value *AssociatedValue = &getAssociatedValue();
-Value *OriginalValue = peelAddrspacecast(AssociatedValue);
+Value *OriginalValue = peelAddrspacecast(AssociatedValue, FlatAS);
 
 PointerType *NewPtrTy =
-PointerType::get(getAssociatedType()->getContext(), getAddressSpace());
+PointerType::get(getAssociatedType()->getContext(), NewAS);
 bool UseOriginalValue =
-OriginalValue->getType()->getPointerAddressSpace() == 
getAddressSpace();
+OriginalValue->getType()->getPointerAddressSpace() == NewAS;
 
 bool Changed = false;
 
@@ -12675,12 +12702,19 @@ struct AAAddressSpaceImpl : public AAAddressSpace {
 return AssumedAddressSpace == AS;
   }
 
-  static Value *peelAddrspacecast(Value *V) {
-if (auto *I = dyn_cast(V))
-  return peelAddrspacecast(I->getPointerOperand());
+  static Value *peelAddrspacecast(Value *V, unsigned FlatAS) {
+if (auto *I = dyn_cast(V)) {
+  assert(I->getSrcAddressSpace() != FlatAS &&
+ "there should not be flat AS -> non-flat AS");
+  return I->getPointerOperand();
+}
 if (auto *C = dyn_cast(V))
-  if (C->getOpcode() == Instruction::AddrSpaceCast)
-return peelAddrspacecast(C->getOperand(0));
+  if (C->getOpcode() == Instruction::AddrSpaceCast) 

Re: [llvm-branch-commits] [clang] 82e537a - [Clang][OpenMP] Fixed an issue that clang crashed when compiling OpenMP program in device only mode without host IR

2021-01-22 Thread Shilei Tian via llvm-branch-commits
Hi Eric,

Sure, will update it soon.

Regards,
Shilei

> On Jan 20, 2021, at 12:10 AM, Eric Christopher  wrote:
> 
> +Tres Popp <mailto:tp...@google.com> (FYI)
> 
> Hi Shilei,
> 
> The other openmp targets tests are all _cc1 tests. I don't think there's a 
> reason for these to not also be cc1, would you mind updating this?
> 
> Thanks!
> 
> -eric
> 
> On Tue, Jan 19, 2021 at 2:22 PM Shilei Tian via llvm-branch-commits 
>  <mailto:llvm-branch-commits@lists.llvm.org>> wrote:
> 
> Author: Shilei Tian
> Date: 2021-01-19T14:18:42-05:00
> New Revision: 82e537a9d28a2c18bd1637e2eac0e0af658ed829
> 
> URL: 
> https://github.com/llvm/llvm-project/commit/82e537a9d28a2c18bd1637e2eac0e0af658ed829
>  
> <https://github.com/llvm/llvm-project/commit/82e537a9d28a2c18bd1637e2eac0e0af658ed829>
> DIFF: 
> https://github.com/llvm/llvm-project/commit/82e537a9d28a2c18bd1637e2eac0e0af658ed829.diff
>  
> <https://github.com/llvm/llvm-project/commit/82e537a9d28a2c18bd1637e2eac0e0af658ed829.diff>
> 
> LOG: [Clang][OpenMP] Fixed an issue that clang crashed when compiling OpenMP 
> program in device only mode without host IR
> 
> D94745 rewrites the `deviceRTLs` using OpenMP and compiles it by directly
> calling the device compilation. `clang` crashes because entry in
> `OffloadEntriesDeviceGlobalVar` is unintialized. Current design supposes the
> device compilation can only be invoked after host compilation with the host IR
> such that `clang` can initialize `OffloadEntriesDeviceGlobalVar` from host IR.
> This avoids us using device compilation directly, especially when we only have
> code wrapped into `declare target` which are all device code. The same issue
> also exists for `OffloadEntriesInfoManager`.
> 
> In this patch, we simply initialized an entry if it is not in the maps. Not 
> sure
> we need an option to tell the device compiler that it is invoked standalone.
> 
> Reviewed By: jdoerfert
> 
> Differential Revision: https://reviews.llvm.org/D94871 
> <https://reviews.llvm.org/D94871>
> 
> Added: 
> clang/test/OpenMP/declare_target_device_only_compilation.cpp
> 
> Modified: 
> clang/lib/CodeGen/CGOpenMPRuntime.cpp
> 
> Removed: 
> 
> 
> 
> 
> diff  --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp 
> b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
> index a3b24039365b..17fa56fb06c8 100644
> --- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp
> +++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
> @@ -2941,16 +2941,12 @@ void CGOpenMPRuntime::OffloadEntriesInfoManagerTy::
>// If we are emitting code for a target, the entry is already initialized,
>// only has to be registered.
>if (CGM.getLangOpts().OpenMPIsDevice) {
> -if (!hasTargetRegionEntryInfo(DeviceID, FileID, ParentName, LineNum)) {
> -  unsigned DiagID = CGM.getDiags().getCustomDiagID(
> -  DiagnosticsEngine::Error,
> -  "Unable to find target region on line '%0' in the device code.");
> -  CGM.getDiags().Report(DiagID) << LineNum;
> -  return;
> -}
> +// This could happen if the device compilation is invoked standalone.
> +if (!hasTargetRegionEntryInfo(DeviceID, FileID, ParentName, LineNum))
> +  initializeTargetRegionEntryInfo(DeviceID, FileID, ParentName, LineNum,
> +  OffloadingEntriesNum);
>  auto &Entry =
>  OffloadEntriesTargetRegion[DeviceID][FileID][ParentName][LineNum];
> -assert(Entry.isValid() && "Entry not initialized!");
>  Entry.setAddress(Addr);
>  Entry.setID(ID);
>  Entry.setFlags(Flags);
> @@ -3017,9 +3013,10 @@ void CGOpenMPRuntime::OffloadEntriesInfoManagerTy::
>   OMPTargetGlobalVarEntryKind Flags,
>   llvm::GlobalValue::LinkageTypes 
> Linkage) {
>if (CGM.getLangOpts().OpenMPIsDevice) {
> +// This could happen if the device compilation is invoked standalone.
> +if (!hasDeviceGlobalVarEntryInfo(VarName))
> +  initializeDeviceGlobalVarEntryInfo(VarName, Flags, 
> OffloadingEntriesNum);
>  auto &Entry = OffloadEntriesDeviceGlobalVar[VarName];
> -assert(Entry.isValid() && Entry.getFlags() == Flags &&
> -   "Entry not initialized!");
>  assert((!Entry.getAddress() || Entry.getAddress() == Addr) &&
> "Resetting with the new address.");
>  if (Entry.getAddress() && hasDeviceGlobalVarEntryInfo(VarName)) {
> 
> diff  --git a/clang/test/OpenMP/declare_target_device_only_compilation.cpp 
> b/clang

[llvm-branch-commits] [clang] 5ad038a - [Clang][OpenMP][NVPTX] Replace `libomptarget-nvptx-path` with `libomptarget-nvptx-bc-path`

2021-01-23 Thread Shilei Tian via llvm-branch-commits

Author: Shilei Tian
Date: 2021-01-23T14:42:38-05:00
New Revision: 5ad038aafa3a07a4491bf12cf6edf2026f3f17d1

URL: 
https://github.com/llvm/llvm-project/commit/5ad038aafa3a07a4491bf12cf6edf2026f3f17d1
DIFF: 
https://github.com/llvm/llvm-project/commit/5ad038aafa3a07a4491bf12cf6edf2026f3f17d1.diff

LOG: [Clang][OpenMP][NVPTX] Replace `libomptarget-nvptx-path` with 
`libomptarget-nvptx-bc-path`

D94700 removed the static library so we no longer need to pass
`-llibomptarget-nvptx` to `nvlink`. Since the bitcode library is the only device
runtime for now, instead of emitting a warning when it is not found, an error
should be raised. We also set a new option `libomptarget-nvptx-bc-path` to let
user choose which bitcode library is being used.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D95161

Added: 
clang/test/Driver/Inputs/libomptarget/libomptarget-nvptx-test.bc

Modified: 
clang/docs/ClangCommandLineReference.rst
clang/include/clang/Basic/DiagnosticDriverKinds.td
clang/include/clang/Driver/Options.td
clang/lib/Driver/ToolChains/Cuda.cpp
clang/test/Driver/openmp-offload-gpu.c

Removed: 




diff  --git a/clang/docs/ClangCommandLineReference.rst 
b/clang/docs/ClangCommandLineReference.rst
index d8ad75ce..fc42bcfe3759 100644
--- a/clang/docs/ClangCommandLineReference.rst
+++ b/clang/docs/ClangCommandLineReference.rst
@@ -1143,9 +1143,9 @@ Set directory to include search path with prefix
 
 Add directory to SYSTEM include search path, absolute paths are relative to 
-isysroot
 
-.. option:: --libomptarget-nvptx-path=
+.. option:: --libomptarget-nvptx-bc-path=
 
-Path to libomptarget-nvptx libraries
+Path to libomptarget-nvptx bitcode library
 
 .. option:: --ptxas-path=
 

diff  --git a/clang/include/clang/Basic/DiagnosticDriverKinds.td 
b/clang/include/clang/Basic/DiagnosticDriverKinds.td
index e92a4bf1dac5..d4ad278da6b7 100644
--- a/clang/include/clang/Basic/DiagnosticDriverKinds.td
+++ b/clang/include/clang/Basic/DiagnosticDriverKinds.td
@@ -263,12 +263,12 @@ def err_drv_omp_host_target_not_supported : Error<
   "The target '%0' is not a supported OpenMP host target.">;
 def err_drv_expecting_fopenmp_with_fopenmp_targets : Error<
   "The option -fopenmp-targets must be used in conjunction with a -fopenmp 
option compatible with offloading, please use -fopenmp=libomp or 
-fopenmp=libiomp5.">;
+def err_drv_omp_offload_target_missingbcruntime : Error<
+  "No library '%0' found in the default clang lib directory or in 
LIBRARY_PATH. Please use --libomptarget-nvptx-bc-path to specify nvptx bitcode 
library.">;
+def err_drv_omp_offload_target_bcruntime_not_found : Error<"Bitcode library 
'%0' does not exist.">;
 def warn_drv_omp_offload_target_duplicate : Warning<
   "The OpenMP offloading target '%0' is similar to target '%1' already 
specified - will be ignored.">,
   InGroup;
-def warn_drv_omp_offload_target_missingbcruntime : Warning<
-  "No library '%0' found in the default clang lib directory or in 
LIBRARY_PATH. Expect degraded performance due to no inlining of runtime 
functions on target devices.">,
-  InGroup;
 def err_drv_unsupported_embed_bitcode
 : Error<"%0 is not supported with -fembed-bitcode">;
 def err_drv_bitcode_unsupported_on_toolchain : Error<

diff  --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index 3bb545f84132..7685d343ab96 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -992,8 +992,8 @@ def gpu_max_threads_per_block_EQ : Joined<["--"], 
"gpu-max-threads-per-block=">,
 def gpu_instrument_lib_EQ : Joined<["--"], "gpu-instrument-lib=">,
   HelpText<"Instrument device library for HIP, which is a LLVM bitcode 
containing "
   "__cyg_profile_func_enter and __cyg_profile_func_exit">;
-def libomptarget_nvptx_path_EQ : Joined<["--"], "libomptarget-nvptx-path=">, 
Group,
-  HelpText<"Path to libomptarget-nvptx libraries">;
+def libomptarget_nvptx_bc_path_EQ : Joined<["--"], 
"libomptarget-nvptx-bc-path=">, Group,
+  HelpText<"Path to libomptarget-nvptx bitcode library">;
 def dD : Flag<["-"], "dD">, Group, Flags<[CC1Option]>,
   HelpText<"Print macro definitions in -E mode in addition to normal output">;
 def dI : Flag<["-"], "dI">, Group, Flags<[CC1Option]>,

diff  --git a/clang/lib/Driver/ToolChains/Cuda.cpp 
b/clang/lib/Driver/ToolChains/Cuda.cpp
index 95fd5a1fbfee..4c549bf91dea 100644
--- a/clang/lib/Driver/ToolChains/Cuda.cpp
+++ b/clang/lib/Driver/ToolChains/Cuda.cpp
@@ -600,11 +600,6 @@ void NVPTX::OpenMPLinker::ConstructJob(Compilation &C, 
const JobAction &JA,
   CmdArgs.push_back("-arch");
   CmdArgs.push_back(Args.MakeArgString(GPUArch));
 
-  // Assume that the directory specified with --libomptarget_nvptx_path
-  // contains the static library libomptarget-nvptx.a.
-  if (const Arg *A = Args.getLastArg(options::OPT_libomptarget_nvptx_path_EQ))
-

[llvm-branch-commits] [openmp] cfd978d - [OpenMP] Fixed test environment of `check-libomptarget-nvptx`

2021-01-24 Thread Shilei Tian via llvm-branch-commits

Author: Shilei Tian
Date: 2021-01-24T13:18:33-05:00
New Revision: cfd978d5d3c8a06813e25f69ff1386428380a7cb

URL: 
https://github.com/llvm/llvm-project/commit/cfd978d5d3c8a06813e25f69ff1386428380a7cb
DIFF: 
https://github.com/llvm/llvm-project/commit/cfd978d5d3c8a06813e25f69ff1386428380a7cb.diff

LOG: [OpenMP] Fixed test environment of `check-libomptarget-nvptx`

D95161 removed the option `--libomptarget-nvptx-path`, which is used in
the tests for `libomptarget-nvptx`.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D95293

Added: 


Modified: 
openmp/libomptarget/deviceRTLs/nvptx/test/lit.cfg

Removed: 




diff  --git a/openmp/libomptarget/deviceRTLs/nvptx/test/lit.cfg 
b/openmp/libomptarget/deviceRTLs/nvptx/test/lit.cfg
index a7a327cd4d14..5b9be32db55f 100644
--- a/openmp/libomptarget/deviceRTLs/nvptx/test/lit.cfg
+++ b/openmp/libomptarget/deviceRTLs/nvptx/test/lit.cfg
@@ -32,8 +32,7 @@ config.test_format = lit.formats.ShTest()
 
 # compiler flags
 config.test_flags = " -I " + config.omp_header_directory + \
-" -L " + config.library_dir + \
-" --libomptarget-nvptx-path=" + config.library_dir;
+" -L " + config.library_dir
 
 if config.omp_host_rtl_directory:
 config.test_flags = config.test_flags + \
@@ -42,6 +41,7 @@ if config.omp_host_rtl_directory:
 config.test_flags = config.test_flags + " " + config.test_extra_flags
 
 # Setup environment to find dynamic library at runtime.
+prepend_library_path('LIBRARY_PATH', config.library_dir, ":")
 prepend_library_path('LD_LIBRARY_PATH', config.library_dir, ":")
 prepend_library_path('LD_LIBRARY_PATH', config.omp_host_rtl_directory, ":")
 



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [openmp] 27cc4a8 - [OpenMP][NVPTX] Rewrite CUDA intrinsics with NVVM intrinsics

2021-01-25 Thread Shilei Tian via llvm-branch-commits

Author: Shilei Tian
Date: 2021-01-25T14:14:30-05:00
New Revision: 27cc4a8138d819f78bc4fc028e39772bbda84dbd

URL: 
https://github.com/llvm/llvm-project/commit/27cc4a8138d819f78bc4fc028e39772bbda84dbd
DIFF: 
https://github.com/llvm/llvm-project/commit/27cc4a8138d819f78bc4fc028e39772bbda84dbd.diff

LOG: [OpenMP][NVPTX] Rewrite CUDA intrinsics with NVVM intrinsics

This patch makes prep for dropping CUDA when compiling `deviceRTLs`.
CUDA intrinsics are replaced by NVVM intrinsics which refers to code in
`__clang_cuda_intrinsics.h`. We don't want to directly include it because in the
near future we're going to switch to OpenMP and by then the header cannot be
used anymore.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D95327

Added: 


Modified: 
openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu

Removed: 




diff  --git a/openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu 
b/openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu
index 1e3ba7d664af..09bf45b005c8 100644
--- a/openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu
+++ b/openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu
@@ -16,20 +16,6 @@
 
 #include 
 
-// Forward declaration of CUDA primitives which will be evetually transformed
-// into LLVM intrinsics.
-extern "C" {
-unsigned int __activemask();
-unsigned int __ballot(unsigned);
-// The default argument here is based on NVIDIA's website
-// https://developer.nvidia.com/blog/using-cuda-warp-level-primitives/
-int __shfl_sync(unsigned mask, int val, int src_line, int width = WARPSIZE);
-int __shfl(int val, int src_line, int width = WARPSIZE);
-int __shfl_down(int var, unsigned detla, int width);
-int __shfl_down_sync(unsigned mask, int var, unsigned detla, int width);
-void __syncwarp(int mask);
-}
-
 DEVICE void __kmpc_impl_unpack(uint64_t val, uint32_t &lo, uint32_t &hi) {
   asm volatile("mov.b64 {%0,%1}, %2;" : "=r"(lo), "=r"(hi) : "l"(val));
 }
@@ -71,10 +57,12 @@ DEVICE double __kmpc_impl_get_wtime() {
 
 // In Cuda 9.0, __ballot(1) from Cuda 8.0 is replaced with __activemask().
 DEVICE __kmpc_impl_lanemask_t __kmpc_impl_activemask() {
-#if CUDA_VERSION >= 9000
-  return __activemask();
+#if CUDA_VERSION < 9020
+  return __nvvm_vote_ballot(1);
 #else
-  return __ballot(1);
+  unsigned int Mask;
+  asm volatile("activemask.b32 %0;" : "=r"(Mask));
+  return Mask;
 #endif
 }
 
@@ -82,19 +70,20 @@ DEVICE __kmpc_impl_lanemask_t __kmpc_impl_activemask() {
 DEVICE int32_t __kmpc_impl_shfl_sync(__kmpc_impl_lanemask_t Mask, int32_t Var,
  int32_t SrcLane) {
 #if CUDA_VERSION >= 9000
-  return __shfl_sync(Mask, Var, SrcLane);
+  return __nvvm_shfl_sync_idx_i32(Mask, Var, SrcLane, 0x1f);
 #else
-  return __shfl(Var, SrcLane);
+  return __nvvm_shfl_idx_i32(Var, SrcLane, 0x1f);
 #endif // CUDA_VERSION
 }
 
 DEVICE int32_t __kmpc_impl_shfl_down_sync(__kmpc_impl_lanemask_t Mask,
   int32_t Var, uint32_t Delta,
   int32_t Width) {
+  int32_t T = ((WARPSIZE - Width) << 8) | 0x1f;
 #if CUDA_VERSION >= 9000
-  return __shfl_down_sync(Mask, Var, Delta, Width);
+  return __nvvm_shfl_sync_down_i32(Mask, Var, Delta, T);
 #else
-  return __shfl_down(Var, Delta, Width);
+  return __nvvm_shfl_down_i32(Var, Delta, T);
 #endif // CUDA_VERSION
 }
 
@@ -102,7 +91,7 @@ DEVICE void __kmpc_impl_syncthreads() { __syncthreads(); }
 
 DEVICE void __kmpc_impl_syncwarp(__kmpc_impl_lanemask_t Mask) {
 #if CUDA_VERSION >= 9000
-  __syncwarp(Mask);
+  __nvvm_bar_warp_sync(Mask);
 #else
   // In Cuda < 9.0 no need to sync threads in warps.
 #endif // CUDA_VERSION



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [openmp] f5602e0 - [OpenMP] Disabled profiling in `libomp` by default to unblock link errors

2021-02-03 Thread Shilei Tian via llvm-branch-commits

Author: Shilei Tian
Date: 2021-02-03T19:18:08-05:00
New Revision: f5602e0bf31ab590da19fa357980a753dbfd666e

URL: 
https://github.com/llvm/llvm-project/commit/f5602e0bf31ab590da19fa357980a753dbfd666e
DIFF: 
https://github.com/llvm/llvm-project/commit/f5602e0bf31ab590da19fa357980a753dbfd666e.diff

LOG: [OpenMP] Disabled profiling in `libomp` by default to unblock link errors

Link error occurred when time profiling in libomp is enabled by default
because `libomp` is assumed to be a C library but the dependence on
`libLLVMSupport` for profiling is a C++ library. Currently the issue blocks all
OpenMP tests in Phabricator.

This patch set a new CMake option `OPENMP_ENABLE_LIBOMP_PROFILING` to
enable/disable the feature. By default it is disabled. Note that once time
profiling is enabled for `libomp`, it becomes a C++ library.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D95585

(cherry picked from commit c571b168349fdf22d1dc8b920bcffa3d5161f0a2)

Added: 


Modified: 
openmp/CMakeLists.txt
openmp/docs/design/Runtimes.rst
openmp/runtime/CMakeLists.txt
openmp/runtime/src/CMakeLists.txt
openmp/runtime/src/kmp_config.h.cmake
openmp/runtime/src/kmp_runtime.cpp

Removed: 




diff  --git a/openmp/CMakeLists.txt b/openmp/CMakeLists.txt
index 67600bebdafb..4787d4b5a321 100644
--- a/openmp/CMakeLists.txt
+++ b/openmp/CMakeLists.txt
@@ -86,6 +86,12 @@ option(OPENMP_ENABLE_LIBOMPTARGET "Enable building 
libomptarget for offloading."
${ENABLE_LIBOMPTARGET})
 option(OPENMP_ENABLE_LIBOMPTARGET_PROFILING "Enable time profiling for 
libomptarget."
${ENABLE_LIBOMPTARGET})
+option(OPENMP_ENABLE_LIBOMP_PROFILING "Enable time profiling for libomp." OFF)
+
+# Build host runtime library, after LIBOMPTARGET variables are set since they 
are needed
+# to enable time profiling support in the OpenMP runtime.
+add_subdirectory(runtime)
+
 if (OPENMP_ENABLE_LIBOMPTARGET)
   # Check that the library can actually be built.
   if (APPLE OR WIN32)

diff  --git a/openmp/docs/design/Runtimes.rst b/openmp/docs/design/Runtimes.rst
index 016b88ba324b..ad36e43eccdc 100644
--- a/openmp/docs/design/Runtimes.rst
+++ b/openmp/docs/design/Runtimes.rst
@@ -48,7 +48,10 @@ similar to Clang's ``-ftime-trace`` option. This generates a 
JSON file based on
 `Speedscope App`_. Building this feature depends on the `LLVM Support Library`_
 for time trace output. Using this library is enabled by default when building
 using the CMake option ``OPENMP_ENABLE_LIBOMPTARGET_PROFILING``. The output 
will
-be saved to the filename specified by the environment variable.
+be saved to the filename specified by the environment variable. For 
multi-threaded
+applications, profiling in ``libomp`` is also needed. Setting the CMake option
+``OPENMP_ENABLE_LIBOMP_PROFILING=ON`` to enable the feature. Note that this 
will
+turn ``libomp`` into a C++ library.
 
 .. _`Chrome Tracing`: 
https://www.chromium.org/developers/how-tos/trace-event-profiling-tool
 

diff  --git a/openmp/runtime/CMakeLists.txt b/openmp/runtime/CMakeLists.txt
index 9fdd04f41646..8828ff8ef455 100644
--- a/openmp/runtime/CMakeLists.txt
+++ b/openmp/runtime/CMakeLists.txt
@@ -34,7 +34,6 @@ if(${OPENMP_STANDALONE_BUILD})
   # Should assertions be enabled?  They are on by default.
   set(LIBOMP_ENABLE_ASSERTIONS TRUE CACHE BOOL
 "enable assertions?")
-  set(LIBOMPTARGET_PROFILING_SUPPORT FALSE)
 else() # Part of LLVM build
   # Determine the native architecture from LLVM.
   string(TOLOWER "${LLVM_TARGET_ARCH}" LIBOMP_NATIVE_ARCH)
@@ -66,10 +65,11 @@ else() # Part of LLVM build
 libomp_get_architecture(LIBOMP_ARCH)
   endif ()
   set(LIBOMP_ENABLE_ASSERTIONS ${LLVM_ENABLE_ASSERTIONS})
-  # Time profiling support
-  set(LIBOMPTARGET_PROFILING_SUPPORT ${OPENMP_ENABLE_LIBOMPTARGET_PROFILING})
 endif()
 
+# Time profiling support
+set(LIBOMP_PROFILING_SUPPORT ${OPENMP_ENABLE_LIBOMP_PROFILING})
+
 # FUJITSU A64FX is a special processor because its cache line size is 256.
 # We need to pass this information into kmp_config.h.
 if(LIBOMP_ARCH STREQUAL "aarch64")

diff  --git a/openmp/runtime/src/CMakeLists.txt 
b/openmp/runtime/src/CMakeLists.txt
index 2e927df84f5c..822f9ca2b825 100644
--- a/openmp/runtime/src/CMakeLists.txt
+++ b/openmp/runtime/src/CMakeLists.txt
@@ -50,6 +50,14 @@ if(${LIBOMP_USE_HWLOC})
   include_directories(${LIBOMP_HWLOC_INSTALL_DIR}/include)
 endif()
 
+# Building with time profiling support requires LLVM directory includes.
+if(LIBOMP_PROFILING_SUPPORT)
+  include_directories(
+${LLVM_MAIN_INCLUDE_DIR}
+${LLVM_INCLUDE_DIR}
+  )
+endif()
+
 # Getting correct source files to build library
 set(LIBOMP_CXXFILES)
 set(LIBOMP_ASMFILES)
@@ -135,7 +143,7 @@ libomp_get_ldflags(LIBOMP_CONFIGURED_LDFLAGS)
 
 libomp_get_libflags(LIBOMP_CONFIGURED_LIBFLAGS)
 # Build libomp library. Add LLVMSupport dependency if building in-tree with 
libomp

[llvm-branch-commits] [openmp] 7fad20e - Revert "[OpenMP] Disabled profiling in `libomp` by default to unblock link errors"

2021-02-04 Thread Shilei Tian via llvm-branch-commits

Author: Shilei Tian
Date: 2021-02-04T08:44:20-05:00
New Revision: 7fad20eccc4f9fe5d03b2e381e26e8eb13a3e3be

URL: 
https://github.com/llvm/llvm-project/commit/7fad20eccc4f9fe5d03b2e381e26e8eb13a3e3be
DIFF: 
https://github.com/llvm/llvm-project/commit/7fad20eccc4f9fe5d03b2e381e26e8eb13a3e3be.diff

LOG: Revert "[OpenMP] Disabled profiling in `libomp` by default to unblock link 
errors"

This reverts commit f5602e0bf31ab590da19fa357980a753dbfd666e.

Added: 


Modified: 
openmp/CMakeLists.txt
openmp/docs/design/Runtimes.rst
openmp/runtime/CMakeLists.txt
openmp/runtime/src/CMakeLists.txt
openmp/runtime/src/kmp_config.h.cmake
openmp/runtime/src/kmp_runtime.cpp

Removed: 




diff  --git a/openmp/CMakeLists.txt b/openmp/CMakeLists.txt
index 4787d4b5a321..67600bebdafb 100644
--- a/openmp/CMakeLists.txt
+++ b/openmp/CMakeLists.txt
@@ -86,12 +86,6 @@ option(OPENMP_ENABLE_LIBOMPTARGET "Enable building 
libomptarget for offloading."
${ENABLE_LIBOMPTARGET})
 option(OPENMP_ENABLE_LIBOMPTARGET_PROFILING "Enable time profiling for 
libomptarget."
${ENABLE_LIBOMPTARGET})
-option(OPENMP_ENABLE_LIBOMP_PROFILING "Enable time profiling for libomp." OFF)
-
-# Build host runtime library, after LIBOMPTARGET variables are set since they 
are needed
-# to enable time profiling support in the OpenMP runtime.
-add_subdirectory(runtime)
-
 if (OPENMP_ENABLE_LIBOMPTARGET)
   # Check that the library can actually be built.
   if (APPLE OR WIN32)

diff  --git a/openmp/docs/design/Runtimes.rst b/openmp/docs/design/Runtimes.rst
index ad36e43eccdc..016b88ba324b 100644
--- a/openmp/docs/design/Runtimes.rst
+++ b/openmp/docs/design/Runtimes.rst
@@ -48,10 +48,7 @@ similar to Clang's ``-ftime-trace`` option. This generates a 
JSON file based on
 `Speedscope App`_. Building this feature depends on the `LLVM Support Library`_
 for time trace output. Using this library is enabled by default when building
 using the CMake option ``OPENMP_ENABLE_LIBOMPTARGET_PROFILING``. The output 
will
-be saved to the filename specified by the environment variable. For 
multi-threaded
-applications, profiling in ``libomp`` is also needed. Setting the CMake option
-``OPENMP_ENABLE_LIBOMP_PROFILING=ON`` to enable the feature. Note that this 
will
-turn ``libomp`` into a C++ library.
+be saved to the filename specified by the environment variable.
 
 .. _`Chrome Tracing`: 
https://www.chromium.org/developers/how-tos/trace-event-profiling-tool
 

diff  --git a/openmp/runtime/CMakeLists.txt b/openmp/runtime/CMakeLists.txt
index 8828ff8ef455..9fdd04f41646 100644
--- a/openmp/runtime/CMakeLists.txt
+++ b/openmp/runtime/CMakeLists.txt
@@ -34,6 +34,7 @@ if(${OPENMP_STANDALONE_BUILD})
   # Should assertions be enabled?  They are on by default.
   set(LIBOMP_ENABLE_ASSERTIONS TRUE CACHE BOOL
 "enable assertions?")
+  set(LIBOMPTARGET_PROFILING_SUPPORT FALSE)
 else() # Part of LLVM build
   # Determine the native architecture from LLVM.
   string(TOLOWER "${LLVM_TARGET_ARCH}" LIBOMP_NATIVE_ARCH)
@@ -65,11 +66,10 @@ else() # Part of LLVM build
 libomp_get_architecture(LIBOMP_ARCH)
   endif ()
   set(LIBOMP_ENABLE_ASSERTIONS ${LLVM_ENABLE_ASSERTIONS})
+  # Time profiling support
+  set(LIBOMPTARGET_PROFILING_SUPPORT ${OPENMP_ENABLE_LIBOMPTARGET_PROFILING})
 endif()
 
-# Time profiling support
-set(LIBOMP_PROFILING_SUPPORT ${OPENMP_ENABLE_LIBOMP_PROFILING})
-
 # FUJITSU A64FX is a special processor because its cache line size is 256.
 # We need to pass this information into kmp_config.h.
 if(LIBOMP_ARCH STREQUAL "aarch64")

diff  --git a/openmp/runtime/src/CMakeLists.txt 
b/openmp/runtime/src/CMakeLists.txt
index 822f9ca2b825..2e927df84f5c 100644
--- a/openmp/runtime/src/CMakeLists.txt
+++ b/openmp/runtime/src/CMakeLists.txt
@@ -50,14 +50,6 @@ if(${LIBOMP_USE_HWLOC})
   include_directories(${LIBOMP_HWLOC_INSTALL_DIR}/include)
 endif()
 
-# Building with time profiling support requires LLVM directory includes.
-if(LIBOMP_PROFILING_SUPPORT)
-  include_directories(
-${LLVM_MAIN_INCLUDE_DIR}
-${LLVM_INCLUDE_DIR}
-  )
-endif()
-
 # Getting correct source files to build library
 set(LIBOMP_CXXFILES)
 set(LIBOMP_ASMFILES)
@@ -143,7 +135,7 @@ libomp_get_ldflags(LIBOMP_CONFIGURED_LDFLAGS)
 
 libomp_get_libflags(LIBOMP_CONFIGURED_LIBFLAGS)
 # Build libomp library. Add LLVMSupport dependency if building in-tree with 
libomptarget profiling enabled.
-if(OPENMP_STANDALONE_BUILD OR (NOT OPENMP_ENABLE_LIBOMP_PROFILING))
+if(OPENMP_STANDALONE_BUILD OR (NOT OPENMP_ENABLE_LIBOMPTARGET_PROFILING))
   add_library(omp ${LIBOMP_LIBRARY_KIND} ${LIBOMP_SOURCE_FILES})
   # Linking command will include libraries in LIBOMP_CONFIGURED_LIBFLAGS
   target_link_libraries(omp ${LIBOMP_CONFIGURED_LIBFLAGS} ${CMAKE_DL_LIBS})
@@ -152,8 +144,6 @@ else()
 LINK_LIBS ${LIBOMP_CONFIGURED_LIBFLAGS} ${CMAKE_DL_LIBS}
 LINK_COMPONENTS Support
 )
-  # libomp must be a 

[llvm-branch-commits] [openmp] 66c7b44 - [OpenMP] Fix building using LLVM_ENABLE_RUNTIMES

2021-02-04 Thread Shilei Tian via llvm-branch-commits

Author: Giorgis Georgakoudis
Date: 2021-02-04T10:24:40-05:00
New Revision: 66c7b449acf402bdc87b69db5778b7b43958d217

URL: 
https://github.com/llvm/llvm-project/commit/66c7b449acf402bdc87b69db5778b7b43958d217
DIFF: 
https://github.com/llvm/llvm-project/commit/66c7b449acf402bdc87b69db5778b7b43958d217.diff

LOG: [OpenMP] Fix building using LLVM_ENABLE_RUNTIMES

Fix when time profiling is enabled.

Related to: D94855

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D95398

(cherry picked from commit bb40e6731843de92f1c73ad6efceb8a89e045ea6)

Added: 


Modified: 
openmp/CMakeLists.txt
openmp/runtime/src/CMakeLists.txt

Removed: 




diff  --git a/openmp/CMakeLists.txt b/openmp/CMakeLists.txt
index 67600bebdafb..f89857dc98d6 100644
--- a/openmp/CMakeLists.txt
+++ b/openmp/CMakeLists.txt
@@ -55,11 +55,6 @@ set(OPENMP_TEST_FLAGS "" CACHE STRING
 set(OPENMP_TEST_OPENMP_FLAGS ${OPENMP_TEST_COMPILER_OPENMP_FLAGS} CACHE STRING
   "OpenMP compiler flag to use for testing OpenMP runtime libraries.")
 
-
-# Build host runtime library.
-add_subdirectory(runtime)
-
-
 set(ENABLE_LIBOMPTARGET ON)
 # Currently libomptarget cannot be compiled on Windows or MacOS X.
 # Since the device plugins are only supported on Linux anyway,
@@ -86,6 +81,11 @@ option(OPENMP_ENABLE_LIBOMPTARGET "Enable building 
libomptarget for offloading."
${ENABLE_LIBOMPTARGET})
 option(OPENMP_ENABLE_LIBOMPTARGET_PROFILING "Enable time profiling for 
libomptarget."
${ENABLE_LIBOMPTARGET})
+
+# Build host runtime library, after LIBOMPTARGET variables are set since they 
are needed
+# to enable time profiling support in the OpenMP runtime.
+add_subdirectory(runtime)
+
 if (OPENMP_ENABLE_LIBOMPTARGET)
   # Check that the library can actually be built.
   if (APPLE OR WIN32)

diff  --git a/openmp/runtime/src/CMakeLists.txt 
b/openmp/runtime/src/CMakeLists.txt
index 2e927df84f5c..9c5dba55b705 100644
--- a/openmp/runtime/src/CMakeLists.txt
+++ b/openmp/runtime/src/CMakeLists.txt
@@ -50,6 +50,15 @@ if(${LIBOMP_USE_HWLOC})
   include_directories(${LIBOMP_HWLOC_INSTALL_DIR}/include)
 endif()
 
+# Building with time profiling support for libomptarget requires
+# LLVM directory includes.
+if(LIBOMPTARGET_PROFILING_SUPPORT)
+  include_directories(
+${LLVM_MAIN_INCLUDE_DIR}
+${LLVM_INCLUDE_DIR}
+  )
+endif()
+
 # Getting correct source files to build library
 set(LIBOMP_CXXFILES)
 set(LIBOMP_ASMFILES)



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [openmp] 92a5106 - [OpenMP] Disabled profiling in `libomp` by default to unblock link errors

2021-02-04 Thread Shilei Tian via llvm-branch-commits

Author: Shilei Tian
Date: 2021-02-04T10:25:01-05:00
New Revision: 92a5106e8055bab7da46095a83290862728b

URL: 
https://github.com/llvm/llvm-project/commit/92a5106e8055bab7da46095a83290862728b
DIFF: 
https://github.com/llvm/llvm-project/commit/92a5106e8055bab7da46095a83290862728b.diff

LOG: [OpenMP] Disabled profiling in `libomp` by default to unblock link errors

Link error occurred when time profiling in libomp is enabled by default
because `libomp` is assumed to be a C library but the dependence on
`libLLVMSupport` for profiling is a C++ library. Currently the issue blocks all
OpenMP tests in Phabricator.

This patch set a new CMake option `OPENMP_ENABLE_LIBOMP_PROFILING` to
enable/disable the feature. By default it is disabled. Note that once time
profiling is enabled for `libomp`, it becomes a C++ library.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D95585

(cherry picked from commit c571b168349fdf22d1dc8b920bcffa3d5161f0a2)

Added: 


Modified: 
openmp/CMakeLists.txt
openmp/docs/design/Runtimes.rst
openmp/runtime/CMakeLists.txt
openmp/runtime/src/CMakeLists.txt
openmp/runtime/src/kmp_config.h.cmake
openmp/runtime/src/kmp_runtime.cpp

Removed: 




diff  --git a/openmp/CMakeLists.txt b/openmp/CMakeLists.txt
index f89857dc98d6..b8a2822877e3 100644
--- a/openmp/CMakeLists.txt
+++ b/openmp/CMakeLists.txt
@@ -81,6 +81,7 @@ option(OPENMP_ENABLE_LIBOMPTARGET "Enable building 
libomptarget for offloading."
${ENABLE_LIBOMPTARGET})
 option(OPENMP_ENABLE_LIBOMPTARGET_PROFILING "Enable time profiling for 
libomptarget."
${ENABLE_LIBOMPTARGET})
+option(OPENMP_ENABLE_LIBOMP_PROFILING "Enable time profiling for libomp." OFF)
 
 # Build host runtime library, after LIBOMPTARGET variables are set since they 
are needed
 # to enable time profiling support in the OpenMP runtime.

diff  --git a/openmp/docs/design/Runtimes.rst b/openmp/docs/design/Runtimes.rst
index 016b88ba324b..ad36e43eccdc 100644
--- a/openmp/docs/design/Runtimes.rst
+++ b/openmp/docs/design/Runtimes.rst
@@ -48,7 +48,10 @@ similar to Clang's ``-ftime-trace`` option. This generates a 
JSON file based on
 `Speedscope App`_. Building this feature depends on the `LLVM Support Library`_
 for time trace output. Using this library is enabled by default when building
 using the CMake option ``OPENMP_ENABLE_LIBOMPTARGET_PROFILING``. The output 
will
-be saved to the filename specified by the environment variable.
+be saved to the filename specified by the environment variable. For 
multi-threaded
+applications, profiling in ``libomp`` is also needed. Setting the CMake option
+``OPENMP_ENABLE_LIBOMP_PROFILING=ON`` to enable the feature. Note that this 
will
+turn ``libomp`` into a C++ library.
 
 .. _`Chrome Tracing`: 
https://www.chromium.org/developers/how-tos/trace-event-profiling-tool
 

diff  --git a/openmp/runtime/CMakeLists.txt b/openmp/runtime/CMakeLists.txt
index 9fdd04f41646..8828ff8ef455 100644
--- a/openmp/runtime/CMakeLists.txt
+++ b/openmp/runtime/CMakeLists.txt
@@ -34,7 +34,6 @@ if(${OPENMP_STANDALONE_BUILD})
   # Should assertions be enabled?  They are on by default.
   set(LIBOMP_ENABLE_ASSERTIONS TRUE CACHE BOOL
 "enable assertions?")
-  set(LIBOMPTARGET_PROFILING_SUPPORT FALSE)
 else() # Part of LLVM build
   # Determine the native architecture from LLVM.
   string(TOLOWER "${LLVM_TARGET_ARCH}" LIBOMP_NATIVE_ARCH)
@@ -66,10 +65,11 @@ else() # Part of LLVM build
 libomp_get_architecture(LIBOMP_ARCH)
   endif ()
   set(LIBOMP_ENABLE_ASSERTIONS ${LLVM_ENABLE_ASSERTIONS})
-  # Time profiling support
-  set(LIBOMPTARGET_PROFILING_SUPPORT ${OPENMP_ENABLE_LIBOMPTARGET_PROFILING})
 endif()
 
+# Time profiling support
+set(LIBOMP_PROFILING_SUPPORT ${OPENMP_ENABLE_LIBOMP_PROFILING})
+
 # FUJITSU A64FX is a special processor because its cache line size is 256.
 # We need to pass this information into kmp_config.h.
 if(LIBOMP_ARCH STREQUAL "aarch64")

diff  --git a/openmp/runtime/src/CMakeLists.txt 
b/openmp/runtime/src/CMakeLists.txt
index 9c5dba55b705..822f9ca2b825 100644
--- a/openmp/runtime/src/CMakeLists.txt
+++ b/openmp/runtime/src/CMakeLists.txt
@@ -50,9 +50,8 @@ if(${LIBOMP_USE_HWLOC})
   include_directories(${LIBOMP_HWLOC_INSTALL_DIR}/include)
 endif()
 
-# Building with time profiling support for libomptarget requires
-# LLVM directory includes.
-if(LIBOMPTARGET_PROFILING_SUPPORT)
+# Building with time profiling support requires LLVM directory includes.
+if(LIBOMP_PROFILING_SUPPORT)
   include_directories(
 ${LLVM_MAIN_INCLUDE_DIR}
 ${LLVM_INCLUDE_DIR}
@@ -144,7 +143,7 @@ libomp_get_ldflags(LIBOMP_CONFIGURED_LDFLAGS)
 
 libomp_get_libflags(LIBOMP_CONFIGURED_LIBFLAGS)
 # Build libomp library. Add LLVMSupport dependency if building in-tree with 
libomptarget profiling enabled.
-if(OPENMP_STANDALONE_BUILD OR (NOT OPENMP_ENABLE_LIBOMPTARGET_PROFILING))
+if(OPENMP_STAND

[llvm-branch-commits] [openmp] bdd1ad5 - [OpenMP] Fixed include directories for OpenMP when building OpenMP with LLVM_ENABLE_RUNTIMES

2021-01-12 Thread Shilei Tian via llvm-branch-commits

Author: Shilei Tian
Date: 2021-01-12T14:32:38-05:00
New Revision: bdd1ad5e5c57ae0f0bf899517c540ad8a679f01a

URL: 
https://github.com/llvm/llvm-project/commit/bdd1ad5e5c57ae0f0bf899517c540ad8a679f01a
DIFF: 
https://github.com/llvm/llvm-project/commit/bdd1ad5e5c57ae0f0bf899517c540ad8a679f01a.diff

LOG: [OpenMP] Fixed include directories for OpenMP when building OpenMP with 
LLVM_ENABLE_RUNTIMES

Some LLVM headers are generated by CMake. Before the installation,
LLVM's headers are distributed everywhere, some of which are in
`${LLVM_SRC_ROOT}/llvm/include/llvm`, and some are in
`${LLVM_BINARY_ROOT}/include/llvm`. After intallation, they're all in
`${LLVM_INSTALLATION_ROOT}/include/llvm`.

OpenMP now depends on LLVM headers. Some headers depend on headers generated
by CMake. When building OpenMP along with LLVM, a.k.a via 
`LLVM_ENABLE_RUNTIMES`,
we need to tell OpenMP where it can find those headers, especially those still
have not been copied/installed.

Reviewed By: jdoerfert, jhuber6

Differential Revision: https://reviews.llvm.org/D94534

Added: 


Modified: 
openmp/CMakeLists.txt
openmp/libomptarget/CMakeLists.txt
openmp/libomptarget/plugins/amdgpu/CMakeLists.txt
openmp/libomptarget/src/CMakeLists.txt

Removed: 




diff  --git a/openmp/CMakeLists.txt b/openmp/CMakeLists.txt
index dc0d3a6e718a..12e8d542f9f6 100644
--- a/openmp/CMakeLists.txt
+++ b/openmp/CMakeLists.txt
@@ -39,6 +39,8 @@ else()
 set(OPENMP_TEST_C_COMPILER ${LLVM_RUNTIME_OUTPUT_INTDIR}/clang.exe)
 set(OPENMP_TEST_CXX_COMPILER ${LLVM_RUNTIME_OUTPUT_INTDIR}/clang++.exe)
   endif()
+
+  list(APPEND LIBOMPTARGET_LLVM_INCLUDE_DIRS ${LLVM_MAIN_INCLUDE_DIR} 
${LLVM_BINARY_DIR}/include)
 endif()
 
 # Check and set up common compiler flags.
@@ -67,16 +69,16 @@ if (APPLE OR WIN32 OR NOT OPENMP_HAVE_STD_CPP14_FLAG)
 endif()
 
 # Attempt to locate LLVM source, required by libomptarget
-if (NOT LIBOMPTARGET_LLVM_MAIN_INCLUDE_DIR)
+if (NOT LIBOMPTARGET_LLVM_INCLUDE_DIRS)
   if (LLVM_MAIN_INCLUDE_DIR)
-set(LIBOMPTARGET_LLVM_MAIN_INCLUDE_DIR ${LLVM_MAIN_INCLUDE_DIR})
+list(APPEND LIBOMPTARGET_LLVM_INCLUDE_DIRS ${LLVM_MAIN_INCLUDE_DIR})
   elseif (EXISTS ${CMAKE_CURRENT_SOURCE_DIR}/../llvm/include)
-set(LIBOMPTARGET_LLVM_MAIN_INCLUDE_DIR 
${CMAKE_CURRENT_SOURCE_DIR}/../llvm/include)
+list(APPENDset LIBOMPTARGET_LLVM_INCLUDE_DIRS 
${CMAKE_CURRENT_SOURCE_DIR}/../llvm/include)
   endif()
 endif()
 
-if (NOT LIBOMPTARGET_LLVM_MAIN_INCLUDE_DIR)
-  message(STATUS "Missing definition for LIBOMPTARGET_LLVM_MAIN_INCLUDE_DIR, 
disabling libomptarget")
+if (NOT LIBOMPTARGET_LLVM_INCLUDE_DIRS)
+  message(STATUS "Missing definition for LIBOMPTARGET_LLVM_INCLUDE_DIRS, 
disabling libomptarget")
   set(ENABLE_LIBOMPTARGET OFF)
 endif()
 

diff  --git a/openmp/libomptarget/CMakeLists.txt 
b/openmp/libomptarget/CMakeLists.txt
index 06db7b4c35e2..6c90ced107eb 100644
--- a/openmp/libomptarget/CMakeLists.txt
+++ b/openmp/libomptarget/CMakeLists.txt
@@ -31,8 +31,8 @@ include(LibomptargetUtils)
 include(LibomptargetGetDependencies)
 
 # LLVM source tree is required at build time for libomptarget
-if (NOT LIBOMPTARGET_LLVM_MAIN_INCLUDE_DIR)
-  message(FATAL_ERROR "Missing definition for 
LIBOMPTARGET_LLVM_MAIN_INCLUDE_DIR")
+if (NOT LIBOMPTARGET_LLVM_INCLUDE_DIRS)
+  message(FATAL_ERROR "Missing definition for LIBOMPTARGET_LLVM_INCLUDE_DIRS")
 endif()
 
 # This is a list of all the targets that are supported/tested right now.

diff  --git a/openmp/libomptarget/plugins/amdgpu/CMakeLists.txt 
b/openmp/libomptarget/plugins/amdgpu/CMakeLists.txt
index 2d58388c80bb..43934b52e42b 100644
--- a/openmp/libomptarget/plugins/amdgpu/CMakeLists.txt
+++ b/openmp/libomptarget/plugins/amdgpu/CMakeLists.txt
@@ -30,8 +30,8 @@ if(NOT CMAKE_SYSTEM_PROCESSOR MATCHES 
"(x86_64)|(ppc64le)|(aarch64)$" AND CMAKE_
   return()
 endif()
 
-if (NOT LIBOMPTARGET_LLVM_MAIN_INCLUDE_DIR)
-  libomptarget_say("Not building AMDGPU plugin: Missing definition for 
LIBOMPTARGET_LLVM_MAIN_INCLUDE_DIR")
+if (NOT LIBOMPTARGET_LLVM_INCLUDE_DIRS)
+  libomptarget_say("Not building AMDGPU plugin: Missing definition for 
LIBOMPTARGET_LLVM_INCLUDE_DIRS")
   return()
 endif()
 
@@ -50,7 +50,7 @@ endif()
 
 include_directories(
   ${CMAKE_CURRENT_SOURCE_DIR}/impl
-  ${LIBOMPTARGET_LLVM_MAIN_INCLUDE_DIR}
+  ${LIBOMPTARGET_LLVM_INCLUDE_DIRS}
 )
 
 add_library(omptarget.rtl.amdgpu SHARED

diff  --git a/openmp/libomptarget/src/CMakeLists.txt 
b/openmp/libomptarget/src/CMakeLists.txt
index 4088f59042fc..38eaf455f95b 100644
--- a/openmp/libomptarget/src/CMakeLists.txt
+++ b/openmp/libomptarget/src/CMakeLists.txt
@@ -20,7 +20,7 @@ set(LIBOMPTARGET_SRC_FILES
   ${CMAKE_CURRENT_SOURCE_DIR}/omptarget.cpp
 )
 
-include_directories(${LIBOMPTARGET_LLVM_MAIN_INCLUDE_DIR})
+include_directories(${LIBOMPTARGET_LLVM_INCLUDE_DIRS})
 
 # Build libomptarget library with libdl dependency. Add LLVMSupport
 # dependency if bu

[llvm-branch-commits] [openmp] 68ff52f - [OpenMP] Fixed the link error that cannot find static data member

2021-01-12 Thread Shilei Tian via llvm-branch-commits

Author: Shilei Tian
Date: 2021-01-12T16:48:28-05:00
New Revision: 68ff52ffead2ba25cca442778ab19286000daad7

URL: 
https://github.com/llvm/llvm-project/commit/68ff52ffead2ba25cca442778ab19286000daad7
DIFF: 
https://github.com/llvm/llvm-project/commit/68ff52ffead2ba25cca442778ab19286000daad7.diff

LOG: [OpenMP] Fixed the link error that cannot find static data member

Constant static data member can be defined in the class without another
define after the class in C++17. Although it is C++17, Clang can still handle it
even w/o the flag for C++17. Unluckily, GCC cannot handle that.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D94541

Added: 


Modified: 
openmp/libomptarget/plugins/common/MemoryManager/MemoryManager.h

Removed: 




diff  --git a/openmp/libomptarget/plugins/common/MemoryManager/MemoryManager.h 
b/openmp/libomptarget/plugins/common/MemoryManager/MemoryManager.h
index 1f9cbeb00394..6e00728a658f 100644
--- a/openmp/libomptarget/plugins/common/MemoryManager/MemoryManager.h
+++ b/openmp/libomptarget/plugins/common/MemoryManager/MemoryManager.h
@@ -338,4 +338,9 @@ class MemoryManagerTy {
   }
 };
 
+// GCC still cannot handle the static data member like Clang so we still need
+// this part.
+constexpr const size_t MemoryManagerTy::BucketSize[];
+constexpr const int MemoryManagerTy::NumBuckets;
+
 #endif // LLVM_OPENMP_LIBOMPTARGET_PLUGINS_COMMON_MEMORYMANAGER_MEMORYMANAGER_H



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [openmp] 01f1273 - [OpenMP] Fixed a typo in openmp/CMakeLists.txt

2021-01-12 Thread Shilei Tian via llvm-branch-commits

Author: Shilei Tian
Date: 2021-01-12T17:00:49-05:00
New Revision: 01f1273fe2f0c246f17162de24a8b6e11bad23a8

URL: 
https://github.com/llvm/llvm-project/commit/01f1273fe2f0c246f17162de24a8b6e11bad23a8
DIFF: 
https://github.com/llvm/llvm-project/commit/01f1273fe2f0c246f17162de24a8b6e11bad23a8.diff

LOG: [OpenMP] Fixed a typo in openmp/CMakeLists.txt

Added: 


Modified: 
openmp/CMakeLists.txt

Removed: 




diff  --git a/openmp/CMakeLists.txt b/openmp/CMakeLists.txt
index 12e8d542f9f6..67600bebdafb 100644
--- a/openmp/CMakeLists.txt
+++ b/openmp/CMakeLists.txt
@@ -73,7 +73,7 @@ if (NOT LIBOMPTARGET_LLVM_INCLUDE_DIRS)
   if (LLVM_MAIN_INCLUDE_DIR)
 list(APPEND LIBOMPTARGET_LLVM_INCLUDE_DIRS ${LLVM_MAIN_INCLUDE_DIR})
   elseif (EXISTS ${CMAKE_CURRENT_SOURCE_DIR}/../llvm/include)
-list(APPENDset LIBOMPTARGET_LLVM_INCLUDE_DIRS 
${CMAKE_CURRENT_SOURCE_DIR}/../llvm/include)
+list(APPEND LIBOMPTARGET_LLVM_INCLUDE_DIRS 
${CMAKE_CURRENT_SOURCE_DIR}/../llvm/include)
   endif()
 endif()
 



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [openmp] 763c1f9 - [OpenMP] Drop the static library libomptarget-nvptx

2021-01-14 Thread Shilei Tian via llvm-branch-commits

Author: Shilei Tian
Date: 2021-01-14T13:34:25-05:00
New Revision: 763c1f9933463c40c39c04b68bbe4d296823b003

URL: 
https://github.com/llvm/llvm-project/commit/763c1f9933463c40c39c04b68bbe4d296823b003
DIFF: 
https://github.com/llvm/llvm-project/commit/763c1f9933463c40c39c04b68bbe4d296823b003.diff

LOG: [OpenMP] Drop the static library libomptarget-nvptx

For NVPTX target, OpenMP provides a static library `libomptarget-nvptx`
built by NVCC, and another bitcode `libomptarget-nvptx-sm_{$sm}.bc` generated by
Clang. When compiling an OpenMP program, the `.bc` file will be fed to `clang`
in the second run on the program that compiles the target part. Then the 
generated
PTX file will be fed to `ptxas` to generate the object file, and finally the 
driver
invokes `nvlink` to generate the binary, where the static library will be 
appened
to `nvlink`.

One question is, why do we need two libraries? The only difference is, the 
static
library contains `omp_data.cu` and the bitcode library doesn't. It's unclear why
they were implemented in this way, but per D94565, there is no issue if we also
include the file into the bitcode library. Therefore, we can safely drop the
static library.

This patch is about the change in OpenMP. The driver will be updated as well if
this patch is accepted.

Reviewed By: jdoerfert, JonChesterfield

Differential Revision: https://reviews.llvm.org/D94573

Added: 


Modified: 
openmp/libomptarget/deviceRTLs/nvptx/CMakeLists.txt

Removed: 




diff  --git a/openmp/libomptarget/deviceRTLs/nvptx/CMakeLists.txt 
b/openmp/libomptarget/deviceRTLs/nvptx/CMakeLists.txt
index ea11c8114166..200c6401d628 100644
--- a/openmp/libomptarget/deviceRTLs/nvptx/CMakeLists.txt
+++ b/openmp/libomptarget/deviceRTLs/nvptx/CMakeLists.txt
@@ -10,31 +10,6 @@
 #
 
##===--===##
 
-set(LIBOMPTARGET_NVPTX_ALTERNATE_HOST_COMPILER "" CACHE STRING
-  "Path to alternate NVCC host compiler to be used by the NVPTX device RTL.")
-
-if(LIBOMPTARGET_NVPTX_ALTERNATE_HOST_COMPILER)
-  find_program(ALTERNATE_CUDA_HOST_COMPILER NAMES 
${LIBOMPTARGET_NVPTX_ALTERNATE_HOST_COMPILER})
-  if(NOT ALTERNATE_CUDA_HOST_COMPILER)
-libomptarget_say("Not building CUDA offloading device RTL: invalid NVPTX 
alternate host compiler.")
-  endif()
-  set(CUDA_HOST_COMPILER ${ALTERNATE_CUDA_HOST_COMPILER} CACHE FILEPATH "" 
FORCE)
-endif()
-
-# We can't use clang as nvcc host preprocessor, so we attempt to replace it 
with
-# gcc.
-if(CUDA_HOST_COMPILER MATCHES clang)
-
-  find_program(LIBOMPTARGET_NVPTX_ALTERNATE_GCC_HOST_COMPILER NAMES gcc)
-
-  if(NOT LIBOMPTARGET_NVPTX_ALTERNATE_GCC_HOST_COMPILER)
-libomptarget_say("Not building CUDA offloading device RTL: clang is not 
supported as NVCC host compiler.")
-libomptarget_say("Please include gcc in your path or set 
LIBOMPTARGET_NVPTX_ALTERNATE_HOST_COMPILER to the full path of of valid 
compiler.")
-return()
-  endif()
-  set(CUDA_HOST_COMPILER "${LIBOMPTARGET_NVPTX_ALTERNATE_GCC_HOST_COMPILER}" 
CACHE FILEPATH "" FORCE)
-endif()
-
 get_filename_component(devicertl_base_directory
   ${CMAKE_CURRENT_SOURCE_DIR}
   DIRECTORY)
@@ -44,28 +19,6 @@ set(devicertl_nvptx_directory
   ${devicertl_base_directory}/nvptx)
 
 if(LIBOMPTARGET_DEP_CUDA_FOUND)
-  libomptarget_say("Building CUDA offloading device RTL.")
-
-  # We really don't have any host code, so we don't need to care about
-  # propagating host flags.
-  set(CUDA_PROPAGATE_HOST_FLAGS OFF)
-
-  set(cuda_src_files
-  ${devicertl_common_directory}/src/cancel.cu
-  ${devicertl_common_directory}/src/critical.cu
-  ${devicertl_common_directory}/src/data_sharing.cu
-  ${devicertl_common_directory}/src/libcall.cu
-  ${devicertl_common_directory}/src/loop.cu
-  ${devicertl_common_directory}/src/omp_data.cu
-  ${devicertl_common_directory}/src/omptarget.cu
-  ${devicertl_common_directory}/src/parallel.cu
-  ${devicertl_common_directory}/src/reduction.cu
-  ${devicertl_common_directory}/src/support.cu
-  ${devicertl_common_directory}/src/sync.cu
-  ${devicertl_common_directory}/src/task.cu
-  src/target_impl.cu
-  )
-
   # Build library support for the highest compute capability the system 
supports
   # and always build support for sm_35 by default
   if (${LIBOMPTARGET_DEP_CUDA_ARCH} EQUAL 35)
@@ -94,24 +47,6 @@ if(LIBOMPTARGET_DEP_CUDA_FOUND)
   # Activate RTL message dumps if requested by the user.
   set(LIBOMPTARGET_NVPTX_DEBUG FALSE CACHE BOOL
 "Activate NVPTX device RTL debug messages.")
-  if(${LIBOMPTARGET_NVPTX_DEBUG})
-set(CUDA_DEBUG -DOMPTARGET_NVPTX_DEBUG=-1 -g --ptxas-options=-v)
-  endif()
-
-  # NVPTX runtime library has to be statically linked. Dynamic linking is not
-  # yet supported by the CUDA toolchain on the device.
-  set(BUILD_SHARED_LIBS OFF)
-  set(CUDA_SEPARABLE_COMPILATION ON)
-  list(APPEND CUDA_N

[llvm-branch-commits] [openmp] 64e9e9a - [OpenMP] Dropped unnecessary define when compiling deviceRTLs for NVPTX

2021-01-14 Thread Shilei Tian via llvm-branch-commits

Author: Shilei Tian
Date: 2021-01-14T13:55:12-05:00
New Revision: 64e9e9aeee0155fc12d7d40d56e7611a63d8e47d

URL: 
https://github.com/llvm/llvm-project/commit/64e9e9aeee0155fc12d7d40d56e7611a63d8e47d
DIFF: 
https://github.com/llvm/llvm-project/commit/64e9e9aeee0155fc12d7d40d56e7611a63d8e47d.diff

LOG: [OpenMP] Dropped unnecessary define when compiling deviceRTLs for NVPTX

The comment said CUDA 9 header files use the `nv_weak` attribute which
`clang` is not yet prepared to handle. It's three years ago and now things have
changed. Based on my test, removing the definition doesn't have any problem on
my machine with CUDA 11.1 installed.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D94700

Added: 


Modified: 
openmp/libomptarget/deviceRTLs/nvptx/CMakeLists.txt

Removed: 




diff  --git a/openmp/libomptarget/deviceRTLs/nvptx/CMakeLists.txt 
b/openmp/libomptarget/deviceRTLs/nvptx/CMakeLists.txt
index 200c6401d628..c8acf6a31966 100644
--- a/openmp/libomptarget/deviceRTLs/nvptx/CMakeLists.txt
+++ b/openmp/libomptarget/deviceRTLs/nvptx/CMakeLists.txt
@@ -89,13 +89,6 @@ if(LIBOMPTARGET_DEP_CUDA_FOUND)
   set(bc_flags ${bc_flags} -DOMPTARGET_NVPTX_DEBUG=0)
 endif()
 
-# CUDA 9 header files use the nv_weak attribute which clang is not yet 
prepared
-# to handle. Therefore, we use 'weak' instead. We are compiling only for 
the
-# device, so it should be equivalent.
-if(CUDA_VERSION_MAJOR GREATER 8)
-  set(bc_flags ${bc_flags} -Dnv_weak=weak)
-endif()
-
 # Create target to build all Bitcode libraries.
 add_custom_target(omptarget-nvptx-bc)
 



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [openmp] 547b032 - [OpenMP] Remove omptarget-nvptx from deps as it is no longer a valid target

2021-01-14 Thread Shilei Tian via llvm-branch-commits

Author: Shilei Tian
Date: 2021-01-14T19:16:11-05:00
New Revision: 547b032ccc8e1da5d1716afeb0afa8988e129fd0

URL: 
https://github.com/llvm/llvm-project/commit/547b032ccc8e1da5d1716afeb0afa8988e129fd0
DIFF: 
https://github.com/llvm/llvm-project/commit/547b032ccc8e1da5d1716afeb0afa8988e129fd0.diff

LOG: [OpenMP] Remove omptarget-nvptx from deps as it is no longer a valid target

`omptarget-nvptx` is still a dependence for `check-libomptarget-nvtpx`
although it has been removed by D94573.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D94725

Added: 


Modified: 
openmp/libomptarget/deviceRTLs/nvptx/test/CMakeLists.txt

Removed: 




diff  --git a/openmp/libomptarget/deviceRTLs/nvptx/test/CMakeLists.txt 
b/openmp/libomptarget/deviceRTLs/nvptx/test/CMakeLists.txt
index 1eabeb25ff98..df6f665329ea 100644
--- a/openmp/libomptarget/deviceRTLs/nvptx/test/CMakeLists.txt
+++ b/openmp/libomptarget/deviceRTLs/nvptx/test/CMakeLists.txt
@@ -3,7 +3,7 @@ if(NOT OPENMP_TEST_COMPILER_ID STREQUAL "Clang")
   return()
 endif()
 
-set(deps omptarget-nvptx omptarget omp)
+set(deps omptarget omp)
 if(LIBOMPTARGET_NVPTX_ENABLE_BCLIB)
   set(deps ${deps} omptarget-nvptx-bc)
 endif()



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [openmp] ed939f8 - [OpenMP] Added the support for hidden helper task in RTL

2021-01-16 Thread Shilei Tian via llvm-branch-commits

Author: Shilei Tian
Date: 2021-01-16T14:13:35-05:00
New Revision: ed939f853da1f2266f00ea087f778fda88848f73

URL: 
https://github.com/llvm/llvm-project/commit/ed939f853da1f2266f00ea087f778fda88848f73
DIFF: 
https://github.com/llvm/llvm-project/commit/ed939f853da1f2266f00ea087f778fda88848f73.diff

LOG: [OpenMP] Added the support for hidden helper task in RTL

The basic design is to create an outer-most parallel team. It is not a regular 
team because it is only created when the first hidden helper task is 
encountered, and is only responsible for the execution of hidden helper tasks.  
We first use `pthread_create` to create a new thread, let's call it the initial 
and also the main thread of the hidden helper team. This initial thread then 
initializes a new root, just like what RTL does in initialization. After that, 
it directly calls `__kmpc_fork_call`. It is like the initial thread encounters 
a parallel region. The wrapped function for this team is, for main thread, 
which is the initial thread that we create via `pthread_create` on Linux, waits 
on a condition variable. The condition variable can only be signaled when RTL 
is being destroyed. For other work threads, they just do nothing. The reason 
that main thread needs to wait there is, in current implementation, once the 
main thread finishes the wrapped function of this team, it starts to free the 
team which is not what we want.

Two environment variables, `LIBOMP_NUM_HIDDEN_HELPER_THREADS` and 
`LIBOMP_USE_HIDDEN_HELPER_TASK`, are also set to configure the number of 
threads and enable/disable this feature. By default, the number of hidden 
helper threads is 8.

Here are some open issues to be discussed:
1. The main thread goes to sleeping when the initialization is finished. As 
Andrey mentioned, we might need it to be awaken from time to time to do some 
stuffs. What kind of update/check should be put here?

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D77609

Added: 
openmp/runtime/test/tasking/hidden_helper_task/common.h
openmp/runtime/test/tasking/hidden_helper_task/depend.cpp
openmp/runtime/test/tasking/hidden_helper_task/gtid.cpp
openmp/runtime/test/tasking/hidden_helper_task/taskgroup.cpp

Modified: 
openmp/runtime/src/kmp.h
openmp/runtime/src/kmp_global.cpp
openmp/runtime/src/kmp_runtime.cpp
openmp/runtime/src/kmp_settings.cpp
openmp/runtime/src/kmp_taskdeps.h
openmp/runtime/src/kmp_tasking.cpp
openmp/runtime/src/kmp_wait_release.h
openmp/runtime/src/z_Linux_util.cpp
openmp/runtime/test/worksharing/for/kmp_sch_simd_guided.c

Removed: 




diff  --git a/openmp/runtime/src/kmp.h b/openmp/runtime/src/kmp.h
index 983511042fa7..8a2d44d6bd48 100644
--- a/openmp/runtime/src/kmp.h
+++ b/openmp/runtime/src/kmp.h
@@ -2334,7 +2334,8 @@ typedef struct kmp_tasking_flags { /* Total struct must 
be exactly 32 bits */
   unsigned priority_specified : 1; /* set if the compiler provides priority
   setting for the task */
   unsigned detachable : 1; /* 1 == can detach */
-  unsigned reserved : 9; /* reserved for compiler use */
+  unsigned hidden_helper : 1; /* 1 == hidden helper task */
+  unsigned reserved : 8; /* reserved for compiler use */
 
   /* Library flags */ /* Total library flags must be 16 bits */
   unsigned tasktype : 1; /* task is either explicit(1) or implicit (0) */
@@ -2382,6 +2383,18 @@ struct kmp_taskdata { /* aligned during dynamic 
allocation   */
   kmp_depnode_t
   *td_depnode; // Pointer to graph node if this task has dependencies
   kmp_task_team_t *td_task_team;
+  // The parent task team. Usually we could access it via
+  // parent_task->td_task_team, but it is possible to be nullptr because of 
late
+  // initialization. Sometimes we must use it. Since the td_task_team of the
+  // encountering thread is never nullptr, we set it when this task is created.
+  kmp_task_team_t *td_parent_task_team;
+  // The global thread id of the encountering thread. We need it because when a
+  // regular task depends on a hidden helper task, and the hidden helper task
+  // is finished on a hidden helper thread, it will call __kmp_release_deps to
+  // release all dependences. If now the task is a regular task, we need to 
pass
+  // the encountering gtid such that the task will be picked up and executed by
+  // its encountering team instead of hidden helper team.
+  kmp_int32 encountering_gtid;
   size_t td_size_alloc; // Size of task structure, including shareds etc.
 #if defined(KMP_GOMP_COMPAT)
   // 4 or 8 byte integers for the loop bounds in GOMP_taskloop
@@ -2449,10 +2462,16 @@ typedef struct kmp_base_task_team {
   kmp_int32 tt_max_threads; // # entries allocated for threads_data array
   kmp_int32 tt_found_proxy_tasks; // found proxy tasks since last barrier
   kmp_int32 tt_untied_task_encountered;
+  // There is hidden helpe

[llvm-branch-commits] [openmp] 9bf843b - Revert "[OpenMP] Added the support for hidden helper task in RTL"

2021-01-18 Thread Shilei Tian via llvm-branch-commits

Author: Shilei Tian
Date: 2021-01-18T06:57:52-05:00
New Revision: 9bf843bdc88f89193939445828105d97ac83f963

URL: 
https://github.com/llvm/llvm-project/commit/9bf843bdc88f89193939445828105d97ac83f963
DIFF: 
https://github.com/llvm/llvm-project/commit/9bf843bdc88f89193939445828105d97ac83f963.diff

LOG: Revert "[OpenMP] Added the support for hidden helper task in RTL"

This reverts commit ed939f853da1f2266f00ea087f778fda88848f73.

Added: 


Modified: 
openmp/runtime/src/kmp.h
openmp/runtime/src/kmp_global.cpp
openmp/runtime/src/kmp_runtime.cpp
openmp/runtime/src/kmp_settings.cpp
openmp/runtime/src/kmp_taskdeps.h
openmp/runtime/src/kmp_tasking.cpp
openmp/runtime/src/kmp_wait_release.h
openmp/runtime/src/z_Linux_util.cpp
openmp/runtime/test/worksharing/for/kmp_sch_simd_guided.c

Removed: 
openmp/runtime/test/tasking/hidden_helper_task/common.h
openmp/runtime/test/tasking/hidden_helper_task/depend.cpp
openmp/runtime/test/tasking/hidden_helper_task/gtid.cpp
openmp/runtime/test/tasking/hidden_helper_task/taskgroup.cpp



diff  --git a/openmp/runtime/src/kmp.h b/openmp/runtime/src/kmp.h
index 8a2d44d6bd48..983511042fa7 100644
--- a/openmp/runtime/src/kmp.h
+++ b/openmp/runtime/src/kmp.h
@@ -2334,8 +2334,7 @@ typedef struct kmp_tasking_flags { /* Total struct must 
be exactly 32 bits */
   unsigned priority_specified : 1; /* set if the compiler provides priority
   setting for the task */
   unsigned detachable : 1; /* 1 == can detach */
-  unsigned hidden_helper : 1; /* 1 == hidden helper task */
-  unsigned reserved : 8; /* reserved for compiler use */
+  unsigned reserved : 9; /* reserved for compiler use */
 
   /* Library flags */ /* Total library flags must be 16 bits */
   unsigned tasktype : 1; /* task is either explicit(1) or implicit (0) */
@@ -2383,18 +2382,6 @@ struct kmp_taskdata { /* aligned during dynamic 
allocation   */
   kmp_depnode_t
   *td_depnode; // Pointer to graph node if this task has dependencies
   kmp_task_team_t *td_task_team;
-  // The parent task team. Usually we could access it via
-  // parent_task->td_task_team, but it is possible to be nullptr because of 
late
-  // initialization. Sometimes we must use it. Since the td_task_team of the
-  // encountering thread is never nullptr, we set it when this task is created.
-  kmp_task_team_t *td_parent_task_team;
-  // The global thread id of the encountering thread. We need it because when a
-  // regular task depends on a hidden helper task, and the hidden helper task
-  // is finished on a hidden helper thread, it will call __kmp_release_deps to
-  // release all dependences. If now the task is a regular task, we need to 
pass
-  // the encountering gtid such that the task will be picked up and executed by
-  // its encountering team instead of hidden helper team.
-  kmp_int32 encountering_gtid;
   size_t td_size_alloc; // Size of task structure, including shareds etc.
 #if defined(KMP_GOMP_COMPAT)
   // 4 or 8 byte integers for the loop bounds in GOMP_taskloop
@@ -2462,16 +2449,10 @@ typedef struct kmp_base_task_team {
   kmp_int32 tt_max_threads; // # entries allocated for threads_data array
   kmp_int32 tt_found_proxy_tasks; // found proxy tasks since last barrier
   kmp_int32 tt_untied_task_encountered;
-  // There is hidden helper thread encountered in this task team so that we 
must
-  // wait when waiting on task team
-  kmp_int32 tt_hidden_helper_task_encountered;
 
   KMP_ALIGN_CACHE
   std::atomic tt_unfinished_threads; /* #threads still active */
 
-  KMP_ALIGN_CACHE
-  std::atomic tt_unfinished_hidden_helper_tasks;
-
   KMP_ALIGN_CACHE
   volatile kmp_uint32
   tt_active; /* is the team still actively executing tasks */
@@ -2936,7 +2917,6 @@ extern volatile int __kmp_init_parallel;
 extern volatile int __kmp_init_monitor;
 #endif
 extern volatile int __kmp_init_user_locks;
-extern volatile int __kmp_init_hidden_helper_threads;
 extern int __kmp_init_counter;
 extern int __kmp_root_counter;
 extern int __kmp_version;
@@ -4005,45 +3985,6 @@ static inline void __kmp_resume_if_hard_paused() {
 
 extern void __kmp_omp_display_env(int verbose);
 
-// 1: it is initializing hidden helper team
-extern volatile int __kmp_init_hidden_helper;
-// 1: the hidden helper team is done
-extern volatile int __kmp_hidden_helper_team_done;
-// 1: enable hidden helper task
-extern kmp_int32 __kmp_enable_hidden_helper;
-// Main thread of hidden helper team
-extern kmp_info_t *__kmp_hidden_helper_main_thread;
-// Descriptors for the hidden helper threads
-extern kmp_info_t **__kmp_hidden_helper_threads;
-// Number of hidden helper threads
-extern kmp_int32 __kmp_hidden_helper_threads_num;
-// Number of hidden helper tasks that have not been executed yet
-extern std::atomic __kmp_unexecuted_hidden_helper_tasks;
-
-extern void __kmp_hidden_helper_initialize();
-exter

[llvm-branch-commits] [clang] 82e537a - [Clang][OpenMP] Fixed an issue that clang crashed when compiling OpenMP program in device only mode without host IR

2021-01-19 Thread Shilei Tian via llvm-branch-commits

Author: Shilei Tian
Date: 2021-01-19T14:18:42-05:00
New Revision: 82e537a9d28a2c18bd1637e2eac0e0af658ed829

URL: 
https://github.com/llvm/llvm-project/commit/82e537a9d28a2c18bd1637e2eac0e0af658ed829
DIFF: 
https://github.com/llvm/llvm-project/commit/82e537a9d28a2c18bd1637e2eac0e0af658ed829.diff

LOG: [Clang][OpenMP] Fixed an issue that clang crashed when compiling OpenMP 
program in device only mode without host IR

D94745 rewrites the `deviceRTLs` using OpenMP and compiles it by directly
calling the device compilation. `clang` crashes because entry in
`OffloadEntriesDeviceGlobalVar` is unintialized. Current design supposes the
device compilation can only be invoked after host compilation with the host IR
such that `clang` can initialize `OffloadEntriesDeviceGlobalVar` from host IR.
This avoids us using device compilation directly, especially when we only have
code wrapped into `declare target` which are all device code. The same issue
also exists for `OffloadEntriesInfoManager`.

In this patch, we simply initialized an entry if it is not in the maps. Not sure
we need an option to tell the device compiler that it is invoked standalone.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D94871

Added: 
clang/test/OpenMP/declare_target_device_only_compilation.cpp

Modified: 
clang/lib/CodeGen/CGOpenMPRuntime.cpp

Removed: 




diff  --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp 
b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
index a3b24039365b..17fa56fb06c8 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -2941,16 +2941,12 @@ void CGOpenMPRuntime::OffloadEntriesInfoManagerTy::
   // If we are emitting code for a target, the entry is already initialized,
   // only has to be registered.
   if (CGM.getLangOpts().OpenMPIsDevice) {
-if (!hasTargetRegionEntryInfo(DeviceID, FileID, ParentName, LineNum)) {
-  unsigned DiagID = CGM.getDiags().getCustomDiagID(
-  DiagnosticsEngine::Error,
-  "Unable to find target region on line '%0' in the device code.");
-  CGM.getDiags().Report(DiagID) << LineNum;
-  return;
-}
+// This could happen if the device compilation is invoked standalone.
+if (!hasTargetRegionEntryInfo(DeviceID, FileID, ParentName, LineNum))
+  initializeTargetRegionEntryInfo(DeviceID, FileID, ParentName, LineNum,
+  OffloadingEntriesNum);
 auto &Entry =
 OffloadEntriesTargetRegion[DeviceID][FileID][ParentName][LineNum];
-assert(Entry.isValid() && "Entry not initialized!");
 Entry.setAddress(Addr);
 Entry.setID(ID);
 Entry.setFlags(Flags);
@@ -3017,9 +3013,10 @@ void CGOpenMPRuntime::OffloadEntriesInfoManagerTy::
  OMPTargetGlobalVarEntryKind Flags,
  llvm::GlobalValue::LinkageTypes Linkage) {
   if (CGM.getLangOpts().OpenMPIsDevice) {
+// This could happen if the device compilation is invoked standalone.
+if (!hasDeviceGlobalVarEntryInfo(VarName))
+  initializeDeviceGlobalVarEntryInfo(VarName, Flags, OffloadingEntriesNum);
 auto &Entry = OffloadEntriesDeviceGlobalVar[VarName];
-assert(Entry.isValid() && Entry.getFlags() == Flags &&
-   "Entry not initialized!");
 assert((!Entry.getAddress() || Entry.getAddress() == Addr) &&
"Resetting with the new address.");
 if (Entry.getAddress() && hasDeviceGlobalVarEntryInfo(VarName)) {

diff  --git a/clang/test/OpenMP/declare_target_device_only_compilation.cpp 
b/clang/test/OpenMP/declare_target_device_only_compilation.cpp
new file mode 100644
index ..280959540306
--- /dev/null
+++ b/clang/test/OpenMP/declare_target_device_only_compilation.cpp
@@ -0,0 +1,15 @@
+//==///
+// RUN: %clang -S -target powerpc64le-ibm-linux-gnu -fopenmp 
-fopenmp-targets=powerpc64le-ibm-linux-gnu -emit-llvm %s -o - | FileCheck %s
+// RUN: %clang -S -target i386-pc-linux-gnu -fopenmp 
-fopenmp-targets=i386-pc-linux-gnu -emit-llvm %s -o - | FileCheck %s
+// RUN: %clang -S -target x86_64-unknown-linux-gnu -fopenmp 
-fopenmp-targets=x86_64-unknown-linux-gnu -emit-llvm %s -o - | FileCheck %s
+// expected-no-diagnostics
+
+#pragma omp declare target
+#pragma omp begin declare variant match(device={kind(nohost)})
+int G1;
+#pragma omp end declare variant
+#pragma omp end declare target
+
+// CHECK: @[[G:.+]] = hidden {{.*}}global i32 0, align 4
+// CHECK: !omp_offload.info = !{!0}
+// CHECK: !0 = !{i32 1, !"[[G]]", i32 0, i32 0}



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [openmp] fd70f70 - [OpenMP][NVPTX] Replaced CUDA builtin vars with LLVM intrinsics

2021-01-20 Thread Shilei Tian via llvm-branch-commits

Author: Shilei Tian
Date: 2021-01-20T12:02:06-05:00
New Revision: fd70f70d1e02752f411fcf923fddda31cce376ae

URL: 
https://github.com/llvm/llvm-project/commit/fd70f70d1e02752f411fcf923fddda31cce376ae
DIFF: 
https://github.com/llvm/llvm-project/commit/fd70f70d1e02752f411fcf923fddda31cce376ae.diff

LOG: [OpenMP][NVPTX] Replaced CUDA builtin vars with LLVM intrinsics

Replaced CUDA builtin vars with LLVM intrinsics such that we don't need
definitions of those intrinsics.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D95013

Added: 


Modified: 
openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu

Removed: 




diff  --git a/openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu 
b/openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu
index 8052b92a7dee..b5ef549ece57 100644
--- a/openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu
+++ b/openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu
@@ -115,10 +115,12 @@ DEVICE void __kmpc_impl_threadfence_block() { 
__threadfence_block(); }
 DEVICE void __kmpc_impl_threadfence_system() { __threadfence_system(); }
 
 // Calls to the NVPTX layer (assuming 1D layout)
-DEVICE int GetThreadIdInBlock() { return threadIdx.x; }
-DEVICE int GetBlockIdInKernel() { return blockIdx.x; }
-DEVICE int GetNumberOfBlocksInKernel() { return gridDim.x; }
-DEVICE int GetNumberOfThreadsInBlock() { return blockDim.x; }
+DEVICE int GetThreadIdInBlock() { return __nvvm_read_ptx_sreg_tid_x(); }
+DEVICE int GetBlockIdInKernel() { return __nvvm_read_ptx_sreg_ctaid_x(); }
+DEVICE int GetNumberOfBlocksInKernel() {
+  return __nvvm_read_ptx_sreg_nctaid_x();
+}
+DEVICE int GetNumberOfThreadsInBlock() { return __nvvm_read_ptx_sreg_ntid_x(); 
}
 DEVICE unsigned GetWarpId() { return GetThreadIdInBlock() / WARPSIZE; }
 DEVICE unsigned GetLaneId() { return GetThreadIdInBlock() & (WARPSIZE - 1); }
 



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [openmp] 33a5d21 - [OpenMP][NVPTX] Added forward declaration to pave the way for building deviceRTLs with OpenMP

2021-01-20 Thread Shilei Tian via llvm-branch-commits

Author: Shilei Tian
Date: 2021-01-20T15:56:02-05:00
New Revision: 33a5d212c6198af2bd902bb8e4cfd0f0bec0114f

URL: 
https://github.com/llvm/llvm-project/commit/33a5d212c6198af2bd902bb8e4cfd0f0bec0114f
DIFF: 
https://github.com/llvm/llvm-project/commit/33a5d212c6198af2bd902bb8e4cfd0f0bec0114f.diff

LOG: [OpenMP][NVPTX] Added forward declaration to pave the way for building 
deviceRTLs with OpenMP

Once we switch to build deviceRTLs with OpenMP, primitives and CUDA
intrinsics cannot be used directly anymore because `__device__` is not 
recognized
by OpenMP compiler. To avoid involving all CUDA internal headers we had in 
`clang`,
we forward declared these functions. Eventually they will be transformed into
right LLVM instrinsics.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D95058

Added: 


Modified: 
openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu

Removed: 




diff  --git a/openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu 
b/openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu
index ffc7498e662e..75945e3cd8c4 100644
--- a/openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu
+++ b/openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu
@@ -16,6 +16,23 @@
 
 #include 
 
+// Forward declaration of CUDA primitives which will be evetually transformed
+// into LLVM intrinsics.
+extern "C" {
+unsigned int __activemask();
+unsigned int __ballot(unsigned);
+// The default argument here is based on NVIDIA's website
+// https://developer.nvidia.com/blog/using-cuda-warp-level-primitives/
+int __shfl_sync(unsigned mask, int val, int src_line, int width = WARPSIZE);
+int __shfl(int val, int src_line, int width = WARPSIZE);
+int __shfl_down(int var, unsigned detla, int width);
+int __shfl_down_sync(unsigned mask, int var, unsigned detla, int width);
+void __syncwarp(int mask);
+void __threadfence();
+void __threadfence_block();
+void __threadfence_system();
+}
+
 DEVICE void __kmpc_impl_unpack(uint64_t val, uint32_t &lo, uint32_t &hi) {
   asm volatile("mov.b64 {%0,%1}, %2;" : "=r"(lo), "=r"(hi) : "l"(val));
 }



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] 3809e5d - [Clang][OpenMP] Use `clang_cc1` test for `declare_target_device_only_compilation.cpp`

2021-01-20 Thread Shilei Tian via llvm-branch-commits

Author: Shilei Tian
Date: 2021-01-20T20:34:10-05:00
New Revision: 3809e5dac965e7c25f3c286884a7af6e48946865

URL: 
https://github.com/llvm/llvm-project/commit/3809e5dac965e7c25f3c286884a7af6e48946865
DIFF: 
https://github.com/llvm/llvm-project/commit/3809e5dac965e7c25f3c286884a7af6e48946865.diff

LOG: [Clang][OpenMP] Use `clang_cc1` test for 
`declare_target_device_only_compilation.cpp`

Use `clang_cc1` test for `declare_target_device_only_compilation.cpp`

Reviewed By: echristo

Differential Revision: https://reviews.llvm.org/D95089

Added: 


Modified: 
clang/test/OpenMP/declare_target_device_only_compilation.cpp

Removed: 




diff  --git a/clang/test/OpenMP/declare_target_device_only_compilation.cpp 
b/clang/test/OpenMP/declare_target_device_only_compilation.cpp
index 280959540306..7be635d454e1 100644
--- a/clang/test/OpenMP/declare_target_device_only_compilation.cpp
+++ b/clang/test/OpenMP/declare_target_device_only_compilation.cpp
@@ -1,7 +1,12 @@
-//==///
-// RUN: %clang -S -target powerpc64le-ibm-linux-gnu -fopenmp 
-fopenmp-targets=powerpc64le-ibm-linux-gnu -emit-llvm %s -o - | FileCheck %s
-// RUN: %clang -S -target i386-pc-linux-gnu -fopenmp 
-fopenmp-targets=i386-pc-linux-gnu -emit-llvm %s -o - | FileCheck %s
-// RUN: %clang -S -target x86_64-unknown-linux-gnu -fopenmp 
-fopenmp-targets=x86_64-unknown-linux-gnu -emit-llvm %s -o - | FileCheck %s
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple powerpc64le-unknown-unknown 
-fopenmp-targets=powerpc64le-ibm-linux-gnu -emit-llvm-bc %s -o %t-ppc-host.bc
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple powerpc64le-unknown-unknown 
-emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - 
| FileCheck %s
+
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple i386-pc-linux-gnu 
-fopenmp-targets=i386-pc-linux-gnu -emit-llvm-bc %s -o %t-i386-host.bc
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple i386-pc-linux-gnu 
-emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-i386-host.bc -o 
- | FileCheck %s
+
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple x86_64-unknown-linux-gnu 
-fopenmp-targets=x86_64-unknown-linux-gnu -emit-llvm-bc %s -o %t-x86_64-host.bc
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple x86_64-unknown-linux-gnu 
-emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-x86_64-host.bc 
-o - | FileCheck %s
+
 // expected-no-diagnostics
 
 #pragma omp declare target



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


  1   2   3   4   5   6   >