[llvm-branch-commits] [llvm] [BOLT] Drop high discrepancy profiles in matching (PR #95156)

2024-06-15 Thread shaw young via llvm-branch-commits

https://github.com/shawbyoung updated 
https://github.com/llvm/llvm-project/pull/95156

>From aa441dc0163d3d0f63de1e4dd1fa359180f82f1f Mon Sep 17 00:00:00 2001
From: shawbyoung 
Date: Tue, 11 Jun 2024 11:43:13 -0700
Subject: [PATCH 01/12] Summary: Functions with little exact matching

Created using spr 1.3.4
---
 bolt/docs/CommandLineArgumentReference.md | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/bolt/docs/CommandLineArgumentReference.md 
b/bolt/docs/CommandLineArgumentReference.md
index 8887d1f5d5bd4..bdc1d9dfd735c 100644
--- a/bolt/docs/CommandLineArgumentReference.md
+++ b/bolt/docs/CommandLineArgumentReference.md
@@ -614,6 +614,17 @@
 
 - `--lite-threshold-pct=`
 
+  Threshold (in percent) of matched profile at which stale profile inference is
+  applied to functions. Argument corresponds to the sum of matched execution
+  counts of function blocks divided by the sum of execution counts of function
+  blocks. E.g if the sum of a function blocks' execution counts is 100, the sum
+  of the function blocks' matched execution counts is 10, and the argument is 
15
+  (15%), profile inference will not be applied to that function. A higher
+  threshold will correlate with fewer functions to process in cases of stale
+  profile. Default set to %5.
+
+- `--matched-profile-threshold=`
+
   Threshold (in percent) for selecting functions to process in lite mode. 
Higher
   threshold means fewer functions to process. E.g threshold of 90 means only 
top
   10 percent of functions with profile will be processed.
@@ -1161,4 +1172,4 @@
 
 - `--print-options`
 
-  Print non-default options after command line parsing
\ No newline at end of file
+  Print non-default options after command line parsing

>From 46fa37a054a129ca36e7b6ae126273e40fddea98 Mon Sep 17 00:00:00 2001
From: shaw young <58664393+shawbyo...@users.noreply.github.com>
Date: Tue, 11 Jun 2024 14:32:40 -0700
Subject: [PATCH 02/12] Update SampleProfileInference.h

---
 llvm/include/llvm/Transforms/Utils/SampleProfileInference.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h 
b/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h
index c654715c0ae9f..9ccbd0fa88f3d 100644
--- a/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h
+++ b/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h
@@ -58,6 +58,7 @@ struct FlowFunction {
   std::vector Jumps;
   /// The index of the entry block.
   uint64_t Entry{0};
+  uint64_t Sink{UINT64_MAX};
   // Matched execution count for the function.
   uint64_t MatchedExecCount{0};
 };

>From d532514257feb5e86232e76c437c99a41d5f2cea Mon Sep 17 00:00:00 2001
From: shaw young <58664393+shawbyo...@users.noreply.github.com>
Date: Tue, 11 Jun 2024 14:39:28 -0700
Subject: [PATCH 03/12] Update StaleProfileMatching.cpp

---
 bolt/lib/Profile/StaleProfileMatching.cpp | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/bolt/lib/Profile/StaleProfileMatching.cpp 
b/bolt/lib/Profile/StaleProfileMatching.cpp
index 41afa6b4bbb19..47335163263a4 100644
--- a/bolt/lib/Profile/StaleProfileMatching.cpp
+++ b/bolt/lib/Profile/StaleProfileMatching.cpp
@@ -604,8 +604,8 @@ bool canApplyInference(const FlowFunction &Func,
   if (Func.Blocks.size() > opts::StaleMatchingMaxFuncSize)
 return false;
 
-  if (Func.MatchedExecCount / YamlBF.ExecCount >=
-  opts::MatchedProfileThreshold / 100)
+  if ((double)Func.MatchedExecCount / YamlBF.ExecCount >=
+  opts::MatchedProfileThreshold / 100.0)
 return false;
 
   bool HasExitBlocks = llvm::any_of(

>From 3fc6d72d866333d8ce964fdfaa748791d4f8d2b4 Mon Sep 17 00:00:00 2001
From: shawbyoung 
Date: Fri, 14 Jun 2024 08:38:19 -0700
Subject: [PATCH 04/12] spr amend

Created using spr 1.3.4
---
 bolt/lib/Profile/StaleProfileMatching.cpp | 37 +++
 .../Transforms/Utils/SampleProfileInference.h |  3 --
 2 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/bolt/lib/Profile/StaleProfileMatching.cpp 
b/bolt/lib/Profile/StaleProfileMatching.cpp
index 47335163263a4..cb356afdd2948 100644
--- a/bolt/lib/Profile/StaleProfileMatching.cpp
+++ b/bolt/lib/Profile/StaleProfileMatching.cpp
@@ -53,9 +53,9 @@ cl::opt
 
 cl::opt MatchedProfileThreshold(
 "matched-profile-threshold",
-cl::desc("Percentage threshold of matched execution counts at which stale "
+cl::desc("Percentage threshold of matched basic blocks at which stale "
  "profile inference is executed."),
-cl::init(5), cl::Hidden, cl::cat(BoltOptCategory));
+cl::init(0), cl::Hidden, cl::cat(BoltOptCategory));
 
 cl::opt StaleMatchingMaxFuncSize(
 "stale-matching-max-func-size",
@@ -186,6 +186,17 @@ struct BlendedBlockHash {
   uint8_t SuccHash{0};
 };
 
+/// A data object containing function matching information.
+struct FunctionMatchingData {
+public:
+  /// The number of blocks matched exactly.
+  uint64_t MatchedExactBlocks{0};
+  /// The number of b

[llvm-branch-commits] [llvm] AMDGPU: Handle legal v2f16/v2bf16 atomicrmw fadd for global/flat (PR #95394)

2024-06-15 Thread Matt Arsenault via llvm-branch-commits

arsenm wrote:

### Merge activity

* **Jun 15, 3:51 AM EDT**: @arsenm started a stack merge that includes this 
pull request via 
[Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/95394).


https://github.com/llvm/llvm-project/pull/95394
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Fix buffer load/store of pointers (PR #95379)

2024-06-15 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/95379

>From c895288fc5ba347b5be14dae8802073f6037e59b Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Wed, 12 Jun 2024 10:10:20 +0200
Subject: [PATCH] AMDGPU: Fix buffer load/store of pointers

Make sure we test all the address spaces since this support isn't
free in gisel.
---
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp |  31 +-
 .../AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll | 596 ++
 .../llvm.amdgcn.raw.ptr.buffer.store.ll   | 456 ++
 3 files changed, 1071 insertions(+), 12 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index ba9259541310a..bcc6122c84beb 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -1112,29 +1112,33 @@ unsigned 
SITargetLowering::getVectorTypeBreakdownForCallingConv(
 Context, CC, VT, IntermediateVT, NumIntermediates, RegisterVT);
 }
 
-static EVT memVTFromLoadIntrData(Type *Ty, unsigned MaxNumLanes) {
+static EVT memVTFromLoadIntrData(const SITargetLowering &TLI,
+ const DataLayout &DL, Type *Ty,
+ unsigned MaxNumLanes) {
   assert(MaxNumLanes != 0);
 
+  LLVMContext &Ctx = Ty->getContext();
   if (auto *VT = dyn_cast(Ty)) {
 unsigned NumElts = std::min(MaxNumLanes, VT->getNumElements());
-return EVT::getVectorVT(Ty->getContext(),
-EVT::getEVT(VT->getElementType()),
+return EVT::getVectorVT(Ctx, TLI.getValueType(DL, VT->getElementType()),
 NumElts);
   }
 
-  return EVT::getEVT(Ty);
+  return TLI.getValueType(DL, Ty);
 }
 
 // Peek through TFE struct returns to only use the data size.
-static EVT memVTFromLoadIntrReturn(Type *Ty, unsigned MaxNumLanes) {
+static EVT memVTFromLoadIntrReturn(const SITargetLowering &TLI,
+   const DataLayout &DL, Type *Ty,
+   unsigned MaxNumLanes) {
   auto *ST = dyn_cast(Ty);
   if (!ST)
-return memVTFromLoadIntrData(Ty, MaxNumLanes);
+return memVTFromLoadIntrData(TLI, DL, Ty, MaxNumLanes);
 
   // TFE intrinsics return an aggregate type.
   assert(ST->getNumContainedTypes() == 2 &&
  ST->getContainedType(1)->isIntegerTy(32));
-  return memVTFromLoadIntrData(ST->getContainedType(0), MaxNumLanes);
+  return memVTFromLoadIntrData(TLI, DL, ST->getContainedType(0), MaxNumLanes);
 }
 
 /// Map address space 7 to MVT::v5i32 because that's its in-memory
@@ -1219,10 +1223,12 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
&Info,
   MaxNumLanes = DMask == 0 ? 1 : llvm::popcount(DMask);
 }
 
-Info.memVT = memVTFromLoadIntrReturn(CI.getType(), MaxNumLanes);
+Info.memVT = memVTFromLoadIntrReturn(*this, MF.getDataLayout(),
+ CI.getType(), MaxNumLanes);
   } else {
-Info.memVT = memVTFromLoadIntrReturn(
-CI.getType(), std::numeric_limits::max());
+Info.memVT =
+memVTFromLoadIntrReturn(*this, MF.getDataLayout(), CI.getType(),
+std::numeric_limits::max());
   }
 
   // FIXME: What does alignment mean for an image?
@@ -1235,9 +1241,10 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
&Info,
   if (RsrcIntr->IsImage) {
 unsigned DMask = 
cast(CI.getArgOperand(1))->getZExtValue();
 unsigned DMaskLanes = DMask == 0 ? 1 : llvm::popcount(DMask);
-Info.memVT = memVTFromLoadIntrData(DataTy, DMaskLanes);
+Info.memVT = memVTFromLoadIntrData(*this, MF.getDataLayout(), DataTy,
+   DMaskLanes);
   } else
-Info.memVT = EVT::getEVT(DataTy);
+Info.memVT = getValueType(MF.getDataLayout(), DataTy);
 
   Info.flags |= MachineMemOperand::MOStore;
 } else {
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll 
b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll
index 3e3371091ef72..4d557c76dc4d0 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll
@@ -1280,6 +1280,602 @@ define <2 x i64> @buffer_load_v2i64__voffset_add(ptr 
addrspace(8) inreg %rsrc, i
   ret <2 x i64> %data
 }
 
+define ptr @buffer_load_p0__voffset_add(ptr addrspace(8) inreg %rsrc, i32 
%voffset) {
+; PREGFX10-LABEL: buffer_load_p0__voffset_add:
+; PREGFX10:   ; %bb.0:
+; PREGFX10-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; PREGFX10-NEXT:buffer_load_dwordx2 v[0:1], v0, s[4:7], 0 offen offset:60
+; PREGFX10-NEXT:s_waitcnt vmcnt(0)
+; PREGFX10-NEXT:s_setpc_b64 s[30:31]
+;
+; GFX10-LABEL: buffer_load_p0__voffset_add:
+; GFX10:   ; %bb.0:
+; GFX10-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX10-NEXT:buffer_load_dwordx2 v[0:1], v0, s[4:7

[llvm-branch-commits] [llvm] AMDGPU: Start selecting flat/global atomicrmw fmin/fmax. (PR #95592)

2024-06-15 Thread Matt Arsenault via llvm-branch-commits


@@ -1582,33 +1603,33 @@ let OtherPredicates = [isGFX12Plus] in {
   }
 }
 
-let OtherPredicates = [isGFX10Plus] in {
+let SubtargetPredicate = HasAtomicFMinFMaxF32GlobalInsts, OtherPredicates = 
[HasFlatGlobalInsts] in {
 defm : GlobalFLATAtomicPats <"GLOBAL_ATOMIC_FMIN", "atomic_load_fmin_global", 
f32>;
 defm : GlobalFLATAtomicPats <"GLOBAL_ATOMIC_FMAX", "atomic_load_fmax_global", 
f32>;
-defm : FlatSignedAtomicPat <"FLAT_ATOMIC_FMIN", "atomic_load_fmin_flat", f32>;
-defm : FlatSignedAtomicPat <"FLAT_ATOMIC_FMAX", "atomic_load_fmax_flat", f32>;
-}
-
-let OtherPredicates = [isGFX10GFX11] in {
 defm : GlobalFLATAtomicIntrPats <"GLOBAL_ATOMIC_FMIN", 
"int_amdgcn_global_atomic_fmin", f32>;
 defm : GlobalFLATAtomicIntrPats <"GLOBAL_ATOMIC_FMAX", 
"int_amdgcn_global_atomic_fmax", f32>;
+}
 
+let SubtargetPredicate = HasAtomicFMinFMaxF32FlatInsts in {
+defm : FlatSignedAtomicPat <"FLAT_ATOMIC_FMIN", "atomic_load_fmin_flat", f32>;
+defm : FlatSignedAtomicPat <"FLAT_ATOMIC_FMAX", "atomic_load_fmax_flat", f32>;
 defm : FlatSignedAtomicIntrPat <"FLAT_ATOMIC_FMIN", 
"int_amdgcn_flat_atomic_fmin", f32>;
 defm : FlatSignedAtomicIntrPat <"FLAT_ATOMIC_FMAX", 
"int_amdgcn_flat_atomic_fmax", f32>;
 }
 
-let OtherPredicates = [isGFX10Only] in {
-defm : GlobalFLATAtomicPats <"GLOBAL_ATOMIC_MIN_F64", 
"atomic_load_fmin_global", f64>;
-defm : GlobalFLATAtomicPats <"GLOBAL_ATOMIC_MAX_F64", 
"atomic_load_fmax_global", f64>;
-defm : GlobalFLATAtomicIntrPats <"GLOBAL_ATOMIC_MIN_F64", 
"int_amdgcn_global_atomic_fmin", f64>;
-defm : GlobalFLATAtomicIntrPats <"GLOBAL_ATOMIC_MAX_F64", 
"int_amdgcn_global_atomic_fmax", f64>;
-defm : FlatSignedAtomicPat <"FLAT_ATOMIC_MIN_F64", "atomic_load_fmin_flat", 
f64>;
-defm : FlatSignedAtomicPat <"FLAT_ATOMIC_MAX_F64", "atomic_load_fmax_flat", 
f64>;
-defm : FlatSignedAtomicIntrPat <"FLAT_ATOMIC_MIN_F64", 
"int_amdgcn_flat_atomic_fmin", f64>;
-defm : FlatSignedAtomicIntrPat <"FLAT_ATOMIC_MAX_F64", 
"int_amdgcn_flat_atomic_fmax", f64>;
-}
+// let OtherPredicates = [isGFX10Only] in { // fixme

arsenm wrote:

We had duplicated pseudos and handling on different sub targets, I unified 
these in #95591 so this should be removable 

https://github.com/llvm/llvm-project/pull/95592
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [libcxx] 72c9425 - [libc++][NFC] Rewrite function call on two lines for clarity (#79141)

2024-06-15 Thread via llvm-branch-commits

Author: Louis Dionne
Date: 2024-06-15T10:21:32-07:00
New Revision: 72c9425a79fda8e9001fcde091e8703f9fb2a43a

URL: 
https://github.com/llvm/llvm-project/commit/72c9425a79fda8e9001fcde091e8703f9fb2a43a
DIFF: 
https://github.com/llvm/llvm-project/commit/72c9425a79fda8e9001fcde091e8703f9fb2a43a.diff

LOG: [libc++][NFC] Rewrite function call on two lines for clarity (#79141)

Previously, there was a ternary conditional with a less-than comparison
appearing inside a template argument, which was really confusing because
of the <...> of the function template. This patch rewrites the same
statement on two lines for clarity.

(cherry picked from commit 382f70a877f00ab71f3cb5ba461b52e1b59cd292)

Added: 


Modified: 
libcxx/include/string

Removed: 




diff  --git a/libcxx/include/string b/libcxx/include/string
index ba169c3dbfc9e..618ceb71b26b4 100644
--- a/libcxx/include/string
+++ b/libcxx/include/string
@@ -1943,8 +1943,8 @@ private:
 if (__s < __min_cap) {
   return static_cast(__min_cap) - 1;
 }
-size_type __guess =
-__align_it < sizeof(value_type) < __alignment ? __alignment / 
sizeof(value_type) : 1 > (__s + 1) - 1;
+const size_type __boundary = sizeof(value_type) < __alignment ? 
__alignment / sizeof(value_type) : 1;
+size_type __guess  = __align_it<__boundary>(__s + 1) - 1;
 if (__guess == __min_cap)
   ++__guess;
 return __guess;



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [libcxx] 3b5b5c1 - [libcxx] Align `__recommend() + 1` by __endian_factor (#90292)

2024-06-15 Thread via llvm-branch-commits

Author: Vitaly Buka
Date: 2024-06-15T10:21:32-07:00
New Revision: 3b5b5c1ec4a3095ab096dd780e84d7ab81f3d7ff

URL: 
https://github.com/llvm/llvm-project/commit/3b5b5c1ec4a3095ab096dd780e84d7ab81f3d7ff
DIFF: 
https://github.com/llvm/llvm-project/commit/3b5b5c1ec4a3095ab096dd780e84d7ab81f3d7ff.diff

LOG: [libcxx] Align `__recommend() + 1`  by __endian_factor (#90292)

This is detected by asan after #83774

Allocation size will be divided by `__endian_factor` before storing. If
it's not aligned,
we will not be able to recover allocation size to pass into
`__alloc_traits::deallocate`.

we have code like this
```
 auto __allocation = std::__allocate_at_least(__alloc(), __recommend(__sz) + 1);
__p   = __allocation.ptr;
__set_long_cap(__allocation.count);

void __set_long_cap(size_type __s) _NOEXCEPT {
__r_.first().__l.__cap_ = __s / __endian_factor;
__r_.first().__l.__is_long_ = true;
  }

size_type __get_long_cap() const _NOEXCEPT {
return __r_.first().__l.__cap_ * __endian_factor;
  }

inline ~basic_string() {
__annotate_delete();
if (__is_long())
  __alloc_traits::deallocate(__alloc(), __get_long_pointer(), 
__get_long_cap());
  }
```
1. __recommend() -> even size
2. `std::__allocate_at_least(__alloc(), __recommend(__sz) + 1)` - > not
even size
3. ` __set_long_cap() `- > lose one bit of size for __endian_factor == 2
(see `/ __endian_factor`)
4. `__alloc_traits::deallocate(__alloc(), __get_long_pointer(),
__get_long_cap())` -> uses even size (see `__get_long_cap`)

(cherry picked from commit d129ea8d2fa347e63deec0791faf389b84f20ce1)

Added: 


Modified: 
libcxx/include/string

Removed: 




diff  --git a/libcxx/include/string b/libcxx/include/string
index 618ceb71b26b4..56e2ef09947f4 100644
--- a/libcxx/include/string
+++ b/libcxx/include/string
@@ -1943,10 +1943,10 @@ private:
 if (__s < __min_cap) {
   return static_cast(__min_cap) - 1;
 }
-const size_type __boundary = sizeof(value_type) < __alignment ? 
__alignment / sizeof(value_type) : 1;
+const size_type __boundary = sizeof(value_type) < __alignment ? 
__alignment / sizeof(value_type) : __endian_factor;
 size_type __guess  = __align_it<__boundary>(__s + 1) - 1;
 if (__guess == __min_cap)
-  ++__guess;
+  __guess += __endian_factor;
 return __guess;
   }
 



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [openmp] release/18.x: [OpenMP][OMPT] Fix hsa include when building amdgpu/src/rtl.cpp (PR #95484)

2024-06-15 Thread Thomas Debesse via llvm-branch-commits

illwieckz wrote:

> If the system fails to find HSA, then it will use the dynamic version.

Why is it expected for LLVM to build against system HSA first, then to rely on 
LLVM's HSA only if system one is missing?

I'm building LLVM, not something else, so I expect LLVM to build against LLVM 
files.

> unfortunately the HSA headers really don't give you much versioning to work 
> with, so we can't do `ifdef` on this stuff.

If LLVM was attempting to use LLVM's HSA first, it would be possible to do `#if 
__has_include("dynamic_hsa/hsa.h")`, then `#elif __has_include("hsa/hsa.h")` 
for system's one. But since LLVM provides its own HSA, it would always use its 
own one.

If a two-weeks old ROCm 6.1.2 is already considered too old by both LLVM 17, it 
is safe to assume that ROCm 6.1.2 will never be young enough for LLVM17… Then I 
don't see why both LLVM 17 and LLVM 18 would try to use something that will 
never be young enough…

https://github.com/llvm/llvm-project/pull/95484
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [openmp] release/18.x: [OpenMP][OMPT] Fix hsa include when building amdgpu/src/rtl.cpp (PR #95484)

2024-06-15 Thread Thomas Debesse via llvm-branch-commits

illwieckz wrote:

Setting `-DLIBOMPTARGET_FORCE_DLOPEN_LIBHSA=ON` that would always add 
`dynamic_hsa` to the include dir doesn't fix the build error.

https://github.com/llvm/llvm-project/pull/95484
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [openmp] release/18.x: [OpenMP][OMPT] Fix hsa include when building amdgpu/src/rtl.cpp (PR #95484)

2024-06-15 Thread Thomas Debesse via llvm-branch-commits

illwieckz wrote:

And the ROCm HSA also provides 
`hsa_amd_agent_info_s::HSA_AMD_AGENT_INFO_TIMESTAMP_FREQUENCY` so if it was 
using such system HSA, it should not error out.

https://github.com/llvm/llvm-project/pull/95484
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [openmp] release/18.x: [OpenMP][OMPT] Fix hsa include when building amdgpu/src/rtl.cpp (PR #95484)

2024-06-15 Thread Thomas Debesse via llvm-branch-commits

illwieckz wrote:

I discovered something wrong:

```
ii  libhsa-runtime-dev 5.2.3-5   amd64HSA Runtime 
API and runtime for ROCm - development files
ii  libhsa-runtime64-1 5.2.3-5   amd64HSA Runtime 
API and runtime for ROCm
ii  libhsakmt1:amd64   5.2.3+dfsg-1  amd64Thunk library 
for AMD KFD (shlib)
```

It looks like I have some HSA 5.2.3 files… The `libhsa-runtime-dev` provides 
`/usr/include/hsa/hsa.h`.

This `libhsa-runtime-dev` package looks to be provided by the system (Ubuntu), 
but and if I attempt to uninstall it, it attempts to also uninstall 
`rocm-hip-runtime=6.1.2.60102-119~22.04` and `libdrm-amdgpu-dev`…

So ROCm 6.1.2 actually installs both `/opt/rocm-6.1.2/include/hsa/hsa.h` and 
5.2.3 `/usr/include/hsa/hsa.h`, and of course only the first one provides 
`HSA_AMD_AGENT_INFO_TIMESTAMP_FREQUENCY`…

This doesn't make sense, the ROCm packages may be just badly made by AMD.

So I temporarily remove `/usr/include/hsa` without uninstalling 
`libhsa-runtime-dev` **AND** using `-DLIBOMPTARGET_FORCE_DLOPEN_LIBHSA=ON`, I 
can build LLVM18 without patch.

With that option being `-DLIBOMPTARGET_FORCE_DLOPEN_LIBHSA=OFF` CMake complains 
that `/usr/include/hsa` is missing.

So it looks like I'm tracking a ROCm packaging bug from AMD side. I still don't 
get why LLVM doesn't use its own HSA by default.

https://github.com/llvm/llvm-project/pull/95484
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits