https://github.com/Artem-B updated
https://github.com/llvm/llvm-project/pull/74895
>From eace5f13ee62c770a84cdaae441d4c1c6eeb07c2 Mon Sep 17 00:00:00 2001
From: Artem Belevich
Date: Wed, 6 Dec 2023 12:11:38 -0800
Subject: [PATCH 1/3] [CUDA] Add support for CUDA-12.3 and sm_90a
---
clang/docs/
Artem-B wrote:
Tested the changes with cuda test-suite, with cuda-12.1 and 12.3 targeting
`sm_{60,70,80,90,90a}`.
https://github.com/llvm/llvm-project/pull/74895
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/ma
https://github.com/Artem-B closed
https://github.com/llvm/llvm-project/pull/74895
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
Artem-B wrote:
Just a FYI, that recent NVIDIA GPUs have introduced a concept of [thread block
cluster](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#thread-block-clusters).
We may need another level of granularity between the block and device.
https://github.com/llvm/llvm-
Artem-B wrote:
> Nvidia backend doesn't handle scoped atomics at all yet
Yeah, it's on my ever growing todo. :-(
https://github.com/llvm/llvm-project/pull/72280
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/m
https://github.com/Artem-B edited
https://github.com/llvm/llvm-project/pull/72394
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
https://github.com/Artem-B approved this pull request.
LGTM with a couple of nits.
https://github.com/llvm/llvm-project/pull/72394
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
@@ -12,7 +12,7 @@ extern "C" void host_fn() {}
struct Dummy {};
struct S {
- S() {}
+ S() { x = 1; }
Artem-B wrote:
Can we make the purpose of non-trivial constructor more descriptive, here and
in other places?
E.g. `S() { static int nontrivial_ctor = 1; }
@@ -772,6 +772,26 @@ void Sema::maybeAddCUDAHostDeviceAttrs(FunctionDecl *NewD,
NewD->addAttr(CUDADeviceAttr::CreateImplicit(Context));
}
+// If a trivial ctor/dtor has no host/device
+// attributes, make it implicitly host device function.
+void Sema::maybeAddCUDAHostDevice
Artem-B wrote:
We've found a problem with the patch. https://godbolt.org/z/jcKo34vzG
```
template
class C {
explicit C() {};
};
template <> C::C() {};
```
:6:21: error: __host__ function 'C' cannot overload __host__ __device__
function 'C'
6 | template <> C::C() {};
|
https://github.com/Artem-B edited
https://github.com/llvm/llvm-project/pull/72815
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
@@ -1000,13 +1000,9 @@ void Sema::checkCUDATargetOverload(FunctionDecl *NewFD,
// should have the same implementation on both sides.
if (NewTarget != OldTarget &&
((NewTarget == CFT_HostDevice &&
- !(LangOpts.OffloadImplicitHostDeviceTemplates &&
-
https://github.com/Artem-B approved this pull request.
LGTM, with one question.
https://github.com/llvm/llvm-project/pull/72815
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
https://github.com/Artem-B approved this pull request.
https://github.com/llvm/llvm-project/pull/72815
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
Artem-B wrote:
@ldionne - Can you take a look if that would have unintended consequences for
libc++?
https://github.com/llvm/llvm-project/pull/69366
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinf
https://github.com/Artem-B approved this pull request.
https://github.com/llvm/llvm-project/pull/72782
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
https://github.com/Artem-B approved this pull request.
https://github.com/llvm/llvm-project/pull/73140
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
https://github.com/Artem-B created
https://github.com/llvm/llvm-project/pull/74123
https://github.com/llvm/llvm-project/pull/73838
>From 71e24fc704c82c11162313613691d09b9a653bd5 Mon Sep 17 00:00:00 2001
From: Artem Belevich
Date: Fri, 1 Dec 2023 10:37:08 -0800
Subject: [PATCH] [CUDA] work arou
Artem-B wrote:
Yes, I've mentioned that in https://github.com/llvm/llvm-project/pull/73838.
However, we need something to fix the issue right now while we're figuring out
a better solution.
In any case `__noinline__` is unlikely to be widely used, so the wrappers may
be manageable, at least
https://github.com/Artem-B updated
https://github.com/llvm/llvm-project/pull/74123
>From 71e24fc704c82c11162313613691d09b9a653bd5 Mon Sep 17 00:00:00 2001
From: Artem Belevich
Date: Fri, 1 Dec 2023 10:37:08 -0800
Subject: [PATCH 1/2] [CUDA] work around more __noinline__ conflicts with
libc++
Artem-B wrote:
> I think we can find a solution to work around this in libc++ within a
> reasonable timeframe
OK. I'll hold off on landing the patch. I believe we're not blocked on it at
the moment.
https://github.com/llvm/llvm-project/pull/74123
Artem-B wrote:
> FWIW I am not thrilled about using `__config` here. That header is an
> implementation detail of libc++ and defining it and relying on it is somewhat
> brittle.
I'm all for having it fixed in libc++ or in CUDA SDK. Barring that, working
around the specific implementation deta
https://github.com/Artem-B updated
https://github.com/llvm/llvm-project/pull/74123
>From 71e24fc704c82c11162313613691d09b9a653bd5 Mon Sep 17 00:00:00 2001
From: Artem Belevich
Date: Fri, 1 Dec 2023 10:37:08 -0800
Subject: [PATCH 1/3] [CUDA] work around more __noinline__ conflicts with
libc++
@@ -70,6 +70,9 @@ __DEVICE__ double floor(double);
__DEVICE__ float floor(float);
__DEVICE__ double fma(double, double, double);
__DEVICE__ float fma(float, float, float);
+#ifdef _MSC_VER
+__DEVICE__ long double fma(long double, long double, long double);
Arte
https://github.com/Artem-B edited
https://github.com/llvm/llvm-project/pull/73756
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
Artem-B wrote:
This sounds like it may be useful outside of AMDGPU back-end.
@jhuber6 this is something that may come handy for implementing general library
functions.
https://github.com/llvm/llvm-project/pull/74737
___
cfe-commits mailing list
cfe-c
Artem-B wrote:
I was thinking of implementing libm/libc for nvptx, which would produce an IR
library . We'll still need to keep the functions around if they are not used
explicitly, because we may need them to fulfill libcalls later in the
compilation pipeline. Sort of a libdevice replacement
https://github.com/Artem-B approved this pull request.
https://github.com/llvm/llvm-project/pull/74737
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
Author: Artem Belevich
Date: 2022-03-31T13:49:12-07:00
New Revision: fe528e72163371e10242f4748dab687eef30a1f9
URL:
https://github.com/llvm/llvm-project/commit/fe528e72163371e10242f4748dab687eef30a1f9
DIFF:
https://github.com/llvm/llvm-project/commit/fe528e72163371e10242f4748dab687eef30a1f9.diff
Author: Jack Kirk
Date: 2022-08-05T12:14:06-07:00
New Revision: 3e0e5568a6a8c744d26f79a1e55360fe2655867c
URL:
https://github.com/llvm/llvm-project/commit/3e0e5568a6a8c744d26f79a1e55360fe2655867c
DIFF:
https://github.com/llvm/llvm-project/commit/3e0e5568a6a8c744d26f79a1e55360fe2655867c.diff
LOG
Author: Artem Belevich
Date: 2021-07-15T12:02:09-07:00
New Revision: d774b4aa5eac785ffe40009091667521e183df40
URL:
https://github.com/llvm/llvm-project/commit/d774b4aa5eac785ffe40009091667521e183df40
DIFF:
https://github.com/llvm/llvm-project/commit/d774b4aa5eac785ffe40009091667521e183df40.diff
Artem-B wrote:
We happen have a back-end where we do not have conversion instructions between
unsigned int and FP, so this patch complicates things. Would it make sense to
enable this canonicalization only if the target wants it?
https://github.com/llvm/llvm-project/pull/82404
___
https://github.com/Artem-B edited
https://github.com/llvm/llvm-project/pull/85976
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
@@ -1160,9 +1152,8 @@ void CGNVCUDARuntime::createOffloadingEntries() {
// Returns module constructor to be added.
llvm::Function *CGNVCUDARuntime::finalizeModule() {
+ transformManagedVars();
Artem-B wrote:
This does not look like "NFC" as we now perform th
https://github.com/Artem-B approved this pull request.
LGTM, sans the "NFC" part in the description.
https://github.com/llvm/llvm-project/pull/85976
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo
https://github.com/Artem-B approved this pull request.
https://github.com/llvm/llvm-project/pull/86830
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
@@ -186,57 +186,62 @@ GlobalVariable *createBinDesc(Module &M,
ArrayRef> Bufs,
".omp_offloading.descriptor" + Suffix);
}
-void createRegisterFunction(Module &M, GlobalVariable *BinDesc,
-StringRef Suffix) {
+Function *cr
@@ -186,57 +186,62 @@ GlobalVariable *createBinDesc(Module &M,
ArrayRef> Bufs,
".omp_offloading.descriptor" + Suffix);
}
-void createRegisterFunction(Module &M, GlobalVariable *BinDesc,
-StringRef Suffix) {
+Function *cr
https://github.com/Artem-B edited
https://github.com/llvm/llvm-project/pull/86830
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
Artem-B wrote:
> This patch, which simply makes it legal on all architectures but do nothing
> is it's older than sm_70.
I do not think this is the right thing to do. "do nothing" is not what one
would expect from a `nanosleep`.
Let's unpack your problem a bit.
__nvvm_reflect() is probably c
Artem-B wrote:
> Okay, `__nvvm_reflect` doesn't work fully here because the `nanosleep`
> builtin I added requires `sm_70` at the clang level. Either means I'd need to
> go back to inline assembly or remove that requirement at least from clang so
> it's a backend failure.
The question is -- w
https://github.com/Artem-B edited
https://github.com/llvm/llvm-project/pull/81193
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
https://github.com/Artem-B approved this pull request.
https://github.com/llvm/llvm-project/pull/81193
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
Artem-B wrote:
> We should expose it as an intrinsic
I think you mean `builtin` here.
https://github.com/llvm/llvm-project/pull/81277
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
https://github.com/Artem-B approved this pull request.
LGTM overall.
https://github.com/llvm/llvm-project/pull/81277
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
https://github.com/Artem-B edited
https://github.com/llvm/llvm-project/pull/81277
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
@@ -1624,8 +1624,9 @@ def int_nvvm_compiler_error :
def int_nvvm_compiler_warn :
Intrinsic<[], [llvm_anyptr_ty], [], "llvm.nvvm.compiler.warn">;
-def int_nvvm_reflect :
- Intrinsic<[llvm_i32_ty], [llvm_anyptr_ty], [IntrNoMem], "llvm.nvvm.reflect">;
+def int_nvvm_reflect :
@@ -159,6 +159,7 @@ BUILTIN(__nvvm_read_ptx_sreg_pm3, "i", "n")
BUILTIN(__nvvm_prmt, "UiUiUiUi", "")
BUILTIN(__nvvm_exit, "v", "r")
+BUILTIN(__nvvm_reflect, "UicC*", "r")
Artem-B wrote:
Now that we're exposing it to the end users. We should probably document
Artem-B wrote:
LGTM
https://github.com/llvm/llvm-project/pull/81277
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
@@ -140,6 +140,17 @@ define void @test_exit() {
ret void
}
+; CHECK-LABEL: test_globaltimer
+define i64 @test_globaltimer() {
+; CHECK: mov.u64 %r{{.*}}, %globaltimer;
+ %a = tail call i64 @llvm.nvvm.read.ptx.sreg.globaltimer()
Artem-B wrote:
Thise
https://github.com/Artem-B edited
https://github.com/llvm/llvm-project/pull/81331
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
https://github.com/Artem-B commented:
LGTM with few nits for general and NVPTX parts.
https://github.com/llvm/llvm-project/pull/81331
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
@@ -2764,6 +2764,37 @@ Query for this feature with
``__has_builtin(__builtin_readcyclecounter)``. Note
that even if present, its use may depend on run-time privilege or other OS
controlled state.
+``__builtin_readsteadycounter``
+--
+
+``__builtin_
@@ -104,6 +104,7 @@ std::string SDNode::getOperationName(const SelectionDAG *G)
const {
case ISD::ATOMIC_STORE: return "AtomicStore";
case ISD::PCMARKER: return "PCMarker";
case ISD::READCYCLECOUNTER: return "ReadCycleCounter";
+
https://github.com/Artem-B approved this pull request.
https://github.com/llvm/llvm-project/pull/84017
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
@@ -4877,7 +4877,9 @@ void Sema::AddModeAttr(Decl *D, const AttributeCommonInfo
&CI,
NewElemTy = Context.getRealTypeForBitwidth(DestWidth, ExplicitType);
if (NewElemTy.isNull()) {
-Diag(AttrLoc, diag::err_machine_mode) << 1 /*Unsupported*/ << Name;
+// Only emit
@@ -0,0 +1,9 @@
+// CPU-side compilation on x86 (no errors expected).
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -aux-triple nvptx64 -x
cuda -fsyntax-only -verify %s
+
+// GPU-side compilation on x86 (no errors expected)
+// RUN: %clang_cc1 -triple nvptx64 -aux-triple x
https://github.com/Artem-B approved this pull request.
https://github.com/llvm/llvm-project/pull/83918
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
@@ -4625,7 +4625,15 @@ Action *Driver::BuildOffloadingActions(Compilation &C,
DDeps.add(*A, *TCAndArch->first, TCAndArch->second.data(), Kind);
OffloadAction::DeviceDependences DDep;
DDep.add(*A, *TCAndArch->first, TCAndArch->second.data(), Kind);
+
+ //
@@ -4625,7 +4625,15 @@ Action *Driver::BuildOffloadingActions(Compilation &C,
DDeps.add(*A, *TCAndArch->first, TCAndArch->second.data(), Kind);
OffloadAction::DeviceDependences DDep;
DDep.add(*A, *TCAndArch->first, TCAndArch->second.data(), Kind);
+
+ //
Artem-B wrote:
> Should I make `shouldIncludePTX` default to `false` for the new driver?
Yes, I think that's a better default.
https://github.com/llvm/llvm-project/pull/84367
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.o
Artem-B wrote:
> > > Should I make `shouldIncludePTX` default to `false` for the new driver?
> >
> >
> > Yes, I think that's a better default.
>
> Done, now requires `--cuda-include-ptx=`.
This may be worth adding to the release notes.
https://github.com/llvm/llvm-project/pull/84367
___
@@ -503,18 +503,20 @@ void NVPTX::Assembler::ConstructJob(Compilation &C, const
JobAction &JA,
Exec, CmdArgs, Inputs, Output));
}
-static bool shouldIncludePTX(const ArgList &Args, const char *gpu_arch) {
- bool includePTX = true;
- for (Arg *A : Args) {
-if (!(A-
https://github.com/Artem-B approved this pull request.
LGTM overall, with docs/comment nits.
https://github.com/llvm/llvm-project/pull/84367
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-com
https://github.com/Artem-B edited
https://github.com/llvm/llvm-project/pull/83605
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
https://github.com/Artem-B approved this pull request.
LGTM.
https://github.com/llvm/llvm-project/pull/83605
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
@@ -2863,3 +2863,18 @@ void tools::addOutlineAtomicsArgs(const Driver &D, const
ToolChain &TC,
CmdArgs.push_back("+outline-atomics");
}
}
+
+void tools::addOffloadCompressArgs(const llvm::opt::ArgList &TCArgs,
+ llvm::opt::ArgStringList
@@ -1306,15 +1306,68 @@ float min(float __x, float __y) { return
__builtin_fminf(__x, __y); }
__DEVICE__
double min(double __x, double __y) { return __builtin_fmin(__x, __y); }
-#if !defined(__HIPCC_RTC__) && !defined(__OPENMP_AMDGCN__)
-__host__ inline static int min(int __a
@@ -1306,15 +1306,68 @@ float min(float __x, float __y) { return
__builtin_fminf(__x, __y); }
__DEVICE__
double min(double __x, double __y) { return __builtin_fmin(__x, __y); }
-#if !defined(__HIPCC_RTC__) && !defined(__OPENMP_AMDGCN__)
-__host__ inline static int min(int __a
https://github.com/Artem-B edited
https://github.com/llvm/llvm-project/pull/82956
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
@@ -1306,15 +1306,73 @@ float min(float __x, float __y) { return
__builtin_fminf(__x, __y); }
__DEVICE__
double min(double __x, double __y) { return __builtin_fmin(__x, __y); }
-#if !defined(__HIPCC_RTC__) && !defined(__OPENMP_AMDGCN__)
-__host__ inline static int min(int __a
https://github.com/Artem-B approved this pull request.
https://github.com/llvm/llvm-project/pull/82956
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
Artem-B wrote:
> Probably I need to define those functions with mixed args by default to avoid
> regressions.
Are there any other regressions? Can hupCUB be fixed intsead? While their use
case is probably benign, I'd rather fix the user code, than propagate CUDA bugs
into HIP.
https://github
@@ -942,20 +942,28 @@ CompressedOffloadBundle::compress(const
llvm::MemoryBuffer &Input,
Input.getBuffer().size());
llvm::compression::Format CompressionFormat;
+ int Level;
- if (llvm::compression::zstd::isAvailable())
+ if (llvm::compression::zstd::isAvailable(
@@ -942,20 +942,28 @@ CompressedOffloadBundle::compress(const
llvm::MemoryBuffer &Input,
Input.getBuffer().size());
llvm::compression::Format CompressionFormat;
+ int Level;
- if (llvm::compression::zstd::isAvailable())
+ if (llvm::compression::zstd::isAvailable(
@@ -906,6 +906,16 @@ CreateFileHandler(MemoryBuffer &FirstInput,
}
OffloadBundlerConfig::OffloadBundlerConfig() {
+ if (llvm::compression::zstd::isAvailable()) {
+CompressionFormat = llvm::compression::Format::Zstd;
+// Use a high zstd compress level by default for be
https://github.com/Artem-B approved this pull request.
LGTM in principle, but I'd run it by someone with more familiarity with linking
quirks.
@MaskRay PTAL, when you get a chance.
https://github.com/llvm/llvm-project/pull/83870
___
cfe-commits maili
https://github.com/Artem-B edited
https://github.com/llvm/llvm-project/pull/83870
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
@@ -24,6 +24,7 @@
// NEG-NOT: @__clang_gpu_used_external = {{.*}} @_Z7kernel2v
// NEG-NOT: @__clang_gpu_used_external = {{.*}} @_Z7kernel3v
+// XEG-NOT: @__clang_gpu_used_external = {{.*}} @_Z7kernel5v
Artem-B wrote:
Did you mean `NEG-NOT` ?
https://github.c
@@ -906,6 +906,16 @@ CreateFileHandler(MemoryBuffer &FirstInput,
}
OffloadBundlerConfig::OffloadBundlerConfig() {
+ if (llvm::compression::zstd::isAvailable()) {
+CompressionFormat = llvm::compression::Format::Zstd;
+// Use a high zstd compress level by default for be
https://github.com/Artem-B edited
https://github.com/llvm/llvm-project/pull/83605
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
@@ -4877,7 +4877,9 @@ void Sema::AddModeAttr(Decl *D, const AttributeCommonInfo
&CI,
NewElemTy = Context.getRealTypeForBitwidth(DestWidth, ExplicitType);
if (NewElemTy.isNull()) {
-Diag(AttrLoc, diag::err_machine_mode) << 1 /*Unsupported*/ << Name;
+// Only emit
@@ -4877,7 +4877,9 @@ void Sema::AddModeAttr(Decl *D, const AttributeCommonInfo
&CI,
NewElemTy = Context.getRealTypeForBitwidth(DestWidth, ExplicitType);
if (NewElemTy.isNull()) {
-Diag(AttrLoc, diag::err_machine_mode) << 1 /*Unsupported*/ << Name;
+// Only emit
Artem-B wrote:
Considering that it's for the stand-alone compilation only, I'm not going to
block this patch.
That said, please add a `TODO` somewhere to address an issue w/ explicitly
targeting generic variant.
https://github.com/llvm/llvm-project/pull/79873
https://github.com/Artem-B approved this pull request.
https://github.com/llvm/llvm-project/pull/79873
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
Artem-B wrote:
So, the idea is to carry two separate embedded offloading sections -- one for
already fully linked GPU executables, and another for GPU objects to be linked
at the final link stage.
> We also use a sepcial section called something like omp_offloading_entries
Typo in 'special' i
https://github.com/Artem-B edited
https://github.com/llvm/llvm-project/pull/80066
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
https://github.com/Artem-B approved this pull request.
LGTM.
https://github.com/llvm/llvm-project/pull/80066
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
@@ -20,10 +20,12 @@ using EntryArrayTy = std::pair;
/// \param EntryArray Optional pair pointing to the `__start` and `__stop`
/// symbols holding the `__tgt_offload_entry` array.
/// \param Suffix An optional suffix appended to the emitted symbols.
+/// \param Relocatable Indi
@@ -265,6 +329,11 @@ Error runLinker(ArrayRef Files, const ArgList
&Args) {
LinkerArgs.push_back(Arg);
if (Error Err = executeCommands(LinkerPath, LinkerArgs))
return Err;
+
+ if (Args.hasArg(OPT_relocatable))
+if (Error Err = relocateOffloadSection(Args, Execut
Artem-B wrote:
Supporting such mixed mode opens an interesting set of issues we may need to
consider going forward:
* who/where/how runs initializers in the fully linked parts?
* Are public functions in the fully linked parts visible to the functions in
partially linked parts? In the full-rdc m
Artem-B wrote:
> I'm assuming you're talking about GPU-side constructors? I don't think the
> CUDA runtime supports those, but OpenMP runs them when the image is loaded,
> so it would handle both independantly.
Yes. I'm thinking of the expectations from a C++ user standpoint, and this is
one
Artem-B wrote:
> the idea is that it would be the desired effect if someone went out of their
> way to do this GPU subset linking thing.
That would only be true when someone owns the whole build. That will not be the
case in practice. A large enough project is usually a bunch of libraries
cre
https://github.com/Artem-B approved this pull request.
You may want to check that we can still disable the error with
`-Wno-error=atomic-alignment` passed via top-level options.
Other than that LGTM.
https://github.com/llvm/llvm-project/pull/80322
__
Artem-B wrote:
Another corner case here. Untyped GEP resulted in SimpifyCFG producing a
`load(gep(argptr, cond ? 24 : 0))` instead of `load( cond ? gep(argptr, 24) :
argptr)` it produced before the patch, and that eventually prevented SROA from
processing that load.
While it's not a bug in th
@@ -0,0 +1,7 @@
+/// Some target-specific options are ignored for GPU, so %clang exits with
code 0.
+// DEFINE: %{gpu_opts} = --cuda-gpu-arch=sm_60
--cuda-path=%S/Inputs/CUDA/usr/local/cuda --no-cuda-version-check
Artem-B wrote:
+1 for merging them. I'd also re
https://github.com/Artem-B deleted
https://github.com/llvm/llvm-project/pull/79222
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
@@ -0,0 +1,7 @@
+/// Some target-specific options are ignored for GPU, so %clang exits with
code 0.
+// DEFINE: %{gpu_opts} = --cuda-gpu-arch=sm_60
--cuda-path=%S/Inputs/CUDA/usr/local/cuda --no-cuda-version-check
Artem-B wrote:
For the purpose of warning check
@@ -0,0 +1,7 @@
+/// Some target-specific options are ignored for GPU, so %clang exits with
code 0.
+// DEFINE: %{gpu_opts} = --cuda-gpu-arch=sm_60
--cuda-path=%S/Inputs/CUDA/usr/local/cuda --no-cuda-version-check
+// DEFINE: %{check} = %clang -### -c %{gpu_opts} -mcmodel=medium
@@ -0,0 +1,5 @@
+/// Some target-specific options are ignored for GPU, so %clang exits with
code 0.
+// DEFINE: %{check} = %clang -### -c -mcmodel=medium
Artem-B wrote:
> Also, what exactly are we checking here? With `-###` CC1 sub-compilations do
> not run and
101 - 200 of 1265 matches
Mail list logo