arsenm wrote:
Title and description needs rewording. This isn't adding the type "to llvm"
which would imply adding the IR type, but only to APFloat
https://github.com/llvm/llvm-project/pull/97179
___
cfe-commits mailing list
cfe-commits@lists.llvm.or
https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/96442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
https://github.com/arsenm commented:
ping
https://github.com/llvm/llvm-project/pull/96442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
@@ -942,6 +942,36 @@ struct Amdgpu final : public VariadicABIInfo {
}
};
+struct NVPTX final : public VariadicABIInfo {
+
+ bool enableForTarget() override { return true; }
+
+ bool vaListPassedInSSARegister() override { return true; }
+
+ Type *vaListType(LLVMContext &Ct
arsenm wrote:
> You could theoretically break this if you didn't go through the C ABI and
> ignored type promotion, but I'm not concerned with that kind of misuse since
> it's against the ABI in the first place.
The IR has its own ABI that may or may not match whatever the platform "C ABI'
is
@@ -54,7 +54,34 @@ class MockArgList {
}
template LIBC_INLINE T next_var() {
-++arg_counter;
+arg_counter++;
+return T(arg_counter);
+ }
+
+ size_t read_count() const { return arg_counter; }
+};
+
+// Used by the GPU implementation to parse how many bytes ne
@@ -0,0 +1,255 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
UTC_ARGS: --version 5
+// RUN: %clang_cc1 %s -O0 -triple amdgcn-amd-amdhsa -cl-std=CL1.2 \
+// RUN: -emit-llvm -o - | FileCheck --check-prefix=OPENCL12 %s
+// RUN: %clang_cc1 %s -
@@ -0,0 +1,255 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
UTC_ARGS: --version 5
+// RUN: %clang_cc1 %s -O0 -triple amdgcn-amd-amdhsa -cl-std=CL1.2 \
+// RUN: -emit-llvm -o - | FileCheck --check-prefix=OPENCL12 %s
+// RUN: %clang_cc1 %s -
arsenm wrote:
> LangAS::Default is not just determined by target. It also depends on
> language. For OpenCL 1.2 it is private.
I would have hoped this would be implemented by default assuming a hidden
private addrspace qualifier
https://github.com/llvm/llvm-project/pull/95728
__
https://github.com/arsenm updated
https://github.com/llvm/llvm-project/pull/96442
>From cd95b668b34f5d0834b16f441ab131003988266b Mon Sep 17 00:00:00 2001
From: martinboehme
Date: Wed, 26 Jun 2024 15:01:57 +0200
Subject: [PATCH 01/14] [clang][dataflow] Teach `AnalysisASTVisitor` that
`typeid()`
@@ -0,0 +1,255 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
UTC_ARGS: --version 5
+// RUN: %clang_cc1 %s -O0 -triple amdgcn-amd-amdhsa -cl-std=CL1.2 \
+// RUN: -emit-llvm -o - | FileCheck --check-prefix=OPENCL12 %s
+// RUN: %clang_cc1 %s -
https://github.com/arsenm updated
https://github.com/llvm/llvm-project/pull/96442
>From f2654a3ebb73c25fe4565fdb8c7b4b0ca6b6bf06 Mon Sep 17 00:00:00 2001
From: martinboehme
Date: Wed, 26 Jun 2024 15:01:57 +0200
Subject: [PATCH 01/14] [clang][dataflow] Teach `AnalysisASTVisitor` that
`typeid()`
https://github.com/arsenm updated
https://github.com/llvm/llvm-project/pull/96442
>From 803aa8823e1d2a9c4ebf2d0ab4ad7c2e0ad9d7e9 Mon Sep 17 00:00:00 2001
From: martinboehme
Date: Wed, 26 Jun 2024 15:01:57 +0200
Subject: [PATCH 01/14] [clang][dataflow] Teach `AnalysisASTVisitor` that
`typeid()`
@@ -14,13 +14,14 @@
#define LLVM_CODEGEN_MACHINEBRANCHPROBABILITYINFO_H
#include "llvm/CodeGen/MachineBasicBlock.h"
-#include "llvm/CodeGen/MachinePassManager.h"
#include "llvm/Pass.h"
#include "llvm/Support/BranchProbability.h"
namespace llvm {
-class MachineBranchProb
https://github.com/arsenm updated
https://github.com/llvm/llvm-project/pull/96442
>From a70da4e0b569d3e83d405a0248e9c71635f29f96 Mon Sep 17 00:00:00 2001
From: martinboehme
Date: Wed, 26 Jun 2024 15:01:57 +0200
Subject: [PATCH 01/14] [clang][dataflow] Teach `AnalysisASTVisitor` that
`typeid()`
@@ -14,13 +14,14 @@
#define LLVM_CODEGEN_MACHINEBRANCHPROBABILITYINFO_H
#include "llvm/CodeGen/MachineBasicBlock.h"
-#include "llvm/CodeGen/MachinePassManager.h"
#include "llvm/Pass.h"
#include "llvm/Support/BranchProbability.h"
namespace llvm {
-class MachineBranchProb
@@ -2689,6 +2689,12 @@ def int_amdgcn_global_load_tr_b128 :
AMDGPULoadIntrinsic;
def int_amdgcn_wave_id :
DefaultAttrsIntrinsic<[llvm_i32_ty], [], [NoUndef, IntrNoMem,
IntrSpeculatable]>;
+def int_amdgcn_s_prefetch_data :
+ Intrinsic<[], [llvm_anyptr_ty, llvm_i32_ty],
---
https://github.com/arsenm approved this pull request.
https://github.com/llvm/llvm-project/pull/107075
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
@@ -45,7 +45,7 @@ void test3(packedfloat3 *p) {
*p = (packedfloat3) { 3.2f, 2.3f, 0.1f };
}
// CHECK: @test3(
-// CHECK: store <4 x float> {{.*}}, align 4
+// CHECK: store <3 x float> {{.*}}, align 4
arsenm wrote:
The ideal control would be more specific tha
arsenm wrote:
> ok, you mean, i remove the vector testcase for this patch. and just save the
> scalar testcase?
No, keep the tests. Only keep the scalar behavior change. The previous revision
was essentially correct and minimal
https://github.com/llvm/llvm-project/pull/89051
@@ -1431,9 +1431,13 @@ Value *ScalarExprEmitter::EmitScalarCast(Value *Src,
QualType SrcType,
return Builder.CreateFPToUI(Src, DstTy, "conv");
}
- if (DstElementTy->getTypeID() < SrcElementTy->getTypeID())
+ if ((DstElementTy->is16bitFPTy() && SrcElementTy->is16bitFPT
https://github.com/arsenm commented:
The vector tests should still be added
https://github.com/llvm/llvm-project/pull/89051
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
@@ -9934,6 +9934,12 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op,
auto NewMI = DAG.getMachineNode(Opc, DL, Op->getVTList(), Ops);
return SDValue(NewMI, 0);
}
+ case Intrinsic::amdgcn_s_prefetch_data: {
+// For non-global address space preserve the
@@ -9934,6 +9934,12 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op,
auto NewMI = DAG.getMachineNode(Opc, DL, Op->getVTList(), Ops);
return SDValue(NewMI, 0);
}
+ case Intrinsic::amdgcn_s_prefetch_data: {
+// For non-global address space preserve the
https://github.com/arsenm approved this pull request.
I think the parent needs some revision for global/flat/infer handling
https://github.com/llvm/llvm-project/pull/107293
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org
@@ -0,0 +1,36 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
UTC_ARGS: --version 5
+; RUN: llc -global-isel=0 -march=amdgcn -mcpu=gfx1200 < %s | FileCheck
--check-prefix=GCN %s
+; RUN: llc -global-isel=1 -march=amdgcn -mcpu=gfx1200 < %s | FileC
arsenm wrote:
> > The vector tests should still be added
>
> sorry. if i remove the change of the vector. i have to remove the testcase.
> because, for the current code convert between vector type of half and
> bfloat16, it has a bug. And it will be Assert "Invalid cast!""
>
OK, LGTM with th
Author: Matt Arsenault
Date: 2024-09-06T21:18:41+04:00
New Revision: a291fe5ed44fa37493d038c78ff4d73135fd85a9
URL:
https://github.com/llvm/llvm-project/commit/a291fe5ed44fa37493d038c78ff4d73135fd85a9
DIFF:
https://github.com/llvm/llvm-project/commit/a291fe5ed44fa37493d038c78ff4d73135fd85a9.diff
@@ -45,7 +45,7 @@ void test3(packedfloat3 *p) {
*p = (packedfloat3) { 3.2f, 2.3f, 0.1f };
}
// CHECK: @test3(
-// CHECK: store <4 x float> {{.*}}, align 4
+// CHECK: store <3 x float> {{.*}}, align 4
arsenm wrote:
I'd expect this to be in terms of type, not
arsenm wrote:
I don't understand this. The code is a strict aliasing violation, so why should
clang work around it?
https://github.com/llvm/llvm-project/pull/107793
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bi
https://github.com/arsenm approved this pull request.
lgtm assuming the const_cast goes away in a subsequent change
https://github.com/llvm/llvm-project/pull/107692
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin
https://github.com/arsenm approved this pull request.
https://github.com/llvm/llvm-project/pull/89051
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
arsenm wrote:
> Hi, @paulwalker-arm, ACLE allows users to do instruction-level development,
> but mixing intrinsic and regular C code may break some of the rules set by
> the compiler.
The rules are still there. You can always use a union or copy to avoid
violating the rules. I don't think i
@@ -766,8 +766,19 @@ static void EmitAtomicOp(CodeGenFunction &CGF, AtomicExpr
*Expr, Address Dest,
// LLVM atomic instructions always have synch scope. If clang atomic
// expression has no scope operand, use default LLVM synch scope.
if (!ScopeModel) {
+llvm::SyncSc
@@ -58,7 +58,35 @@ class SPIRVTargetCodeGenInfo : public
CommonSPIRTargetCodeGenInfo {
SPIRVTargetCodeGenInfo(CodeGen::CodeGenTypes &CGT)
: CommonSPIRTargetCodeGenInfo(std::make_unique(CGT)) {}
void setCUDAKernelCallingConvention(const FunctionType *&FT) const overri
@@ -251,6 +251,24 @@ SPIRV::MemorySemantics::MemorySemantics
getMemSemantics(AtomicOrdering Ord) {
llvm_unreachable(nullptr);
}
+SPIRV::Scope::Scope getMemScope(const LLVMContext &Ctx, SyncScope::ID ID) {
+ SmallVector SSNs;
+ Ctx.getSyncScopeNames(SSNs);
+
+ StringRef M
@@ -699,9 +699,20 @@ static RValue emitLibraryCall(CodeGenFunction &CGF, const
FunctionDecl *FD,
bool ConstWithoutErrnoAndExceptions =
Context.BuiltinInfo.isConstWithoutErrnoAndExceptions(BuiltinID);
// Restrict to target with errno, for example, MacOS doesn't
@@ -699,9 +699,20 @@ static RValue emitLibraryCall(CodeGenFunction &CGF, const
FunctionDecl *FD,
bool ConstWithoutErrnoAndExceptions =
Context.BuiltinInfo.isConstWithoutErrnoAndExceptions(BuiltinID);
// Restrict to target with errno, for example, MacOS doesn't
@@ -1034,6 +1038,169 @@ inline void FPOptions::applyChanges(FPOptionsOverride
FPO) {
*this = FPO.applyOverrides(*this);
}
+/// Atomic control options
+class AtomicOptionsOverride;
+class AtomicOptions {
+public:
+ using storage_type = uint16_t;
+
+ static constexpr unsign
@@ -0,0 +1,19 @@
+//===--- AtomicOptions.def - Atomic Options database -*- C++
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Ap
@@ -238,3 +238,55 @@ LLVM_DUMP_METHOD void FPOptionsOverride::dump() {
#include "clang/Basic/FPOptions.def"
llvm::errs() << "\n";
}
+
+AtomicOptions
+AtomicOptions::defaultWithoutTrailingStorage(const LangOptions &LO) {
+ AtomicOptions result(LO);
+ return result;
+}
+
+Ato
@@ -1,50 +1,48 @@
// RUN: %clang_cc1 -x hip %s -emit-llvm -o - -triple=amdgcn-amd-amdhsa \
// RUN: -fcuda-is-device -target-cpu gfx906 -fnative-half-type \
-// RUN: -fnative-half-arguments-and-returns | FileCheck
-check-prefixes=CHECK,SAFEIR %s
+// RUN: -fnative-half-argu
@@ -2,315 +2,195 @@
// RUN: %clang_cc1 -fnative-half-arguments-and-returns -triple
amdgcn-amd-amdhsa-gnu -target-cpu gfx900 -emit-llvm -o - %s | FileCheck
-check-prefixes=CHECK,SAFE %s
// RUN: %clang_cc1 -fnative-half-arguments-and-returns -triple
amdgcn-amd-amdhsa-gnu -targe
@@ -61,30 +59,28 @@ __global__ void ffp1(float *p) {
}
__global__ void ffp2(double *p) {
- // CHECK-LABEL: @_Z4ffp2Pd
- // SAFEIR: atomicrmw fadd ptr {{.*}} monotonic, align 8{{$}}
- // SAFEIR: atomicrmw fsub ptr {{.*}} monotonic, align 8{{$}}
- // SAFEIR: atomicrmw fmax p
@@ -0,0 +1,80 @@
+// RUN: %clang_cc1 -ast-dump %s | FileCheck %s
+// RUN: %clang_cc1 -ast-dump -fcuda-is-device %s | FileCheck %s
+// RUN: %clang_cc1 -ast-dump -fcuda-is-device %s \
+// RUN:
-fatomic=no_fine_grained_memory:off,no_remote_memory:on,ignore_denormal_mode:on
\
+//
@@ -5881,6 +5881,32 @@ void Clang::ConstructJob(Compilation &C, const JobAction
&JA,
RenderFloatingPointOptions(TC, D, OFastEnabled, Args, CmdArgs, JA);
+ if (Arg *AtomicArg = Args.getLastArg(options::OPT_fatomic_EQ)) {
+if (!AtomicArg->getNumValues()) {
+ D.Diag
@@ -0,0 +1,28 @@
+// RUN: %clang_cc1 -fsyntax-only -verify %s
+// RUN: %clang_cc1 -fsyntax-only -verify -fcuda-is-device %s
+// RUN: %clang_cc1 -fsyntax-only -verify -fcuda-is-device %s \
+// RUN:
-fatomic=no_fine_grained_memory:off,no_remote_memory:on,ignore_denormal_mode:on
+
@@ -238,3 +238,55 @@ LLVM_DUMP_METHOD void FPOptionsOverride::dump() {
#include "clang/Basic/FPOptions.def"
llvm::errs() << "\n";
}
+
+AtomicOptions
+AtomicOptions::defaultWithoutTrailingStorage(const LangOptions &LO) {
+ AtomicOptions result(LO);
+ return result;
+}
+
+Ato
@@ -123,6 +123,17 @@ bool coro::declaresIntrinsics(const Module &M,
return false;
}
+// Verifies if a module has any intrinsics.
+bool coro::declaresIntrinsics(const Module &M,
+ const DenseSet &Identifiers) {
+ for (const Function &F : M.functi
@@ -123,6 +123,17 @@ bool coro::declaresIntrinsics(const Module &M,
return false;
}
+// Verifies if a module has any intrinsics.
+bool coro::declaresIntrinsics(const Module &M,
+ const DenseSet &Identifiers) {
+ for (const Function &F : M.functi
arsenm wrote:
> but empirically robust and guaranteed to work as the AMDGPU BE retains
> handling of direct passing for legacy reasons.
I would like to get rid of that someday...
https://github.com/llvm/llvm-project/pull/102776
___
cfe-commits mailin
@@ -12,6 +12,10 @@
#error "This file is for CUDA compilation only."
#endif
+// The __CLANG_GPU_DISABLE_MATH_WRAPPERS macro provides a way to let standard
+// libcalls reach the link step instead of being eagerly replaced.
+#ifndef __CLANG_GPU_DISABLE_MATH_WRAPPERS
@@ -1185,6 +1189,9 @@ Currently, only the following parameter attributes are
defined:
value should be sign-extended to the extent required by the target's
ABI (which is usually 32-bits) by the caller (for a parameter) or
the callee (for a return value).
+``noext``
@@ -0,0 +1,94 @@
+//===- AMDGPUMCResourceInfo.h - MC Resource Info --*- C++
-*-=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apa
@@ -84,6 +88,11 @@ class AMDGPUAsmPrinter final : public AsmPrinter {
SmallString<128> getMCExprStr(const MCExpr *Value);
+ /// Attempts to replace the validation that is missed in getSIProgramInfo due
+ /// to MCExpr being unknown. Invoked during doFinalization such that
@@ -0,0 +1,94 @@
+//===- AMDGPUMCResourceInfo.h - MC Resource Info --*- C++
-*-=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apa
@@ -40,12 +41,20 @@ class AMDGPUAsmPrinter final : public AsmPrinter {
AMDGPUResourceUsageAnalysis *ResourceUsage;
+ std::unique_ptr RI;
arsenm wrote:
Why does this need unique_ptr instead of just a plain member?
https://github.com/llvm/llvm-project/pul
@@ -0,0 +1,220 @@
+//===- AMDGPUMCResourceInfo.cpp --- MC Resource Info
--===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Ap
@@ -771,18 +885,38 @@ void AMDGPUAsmPrinter::getSIProgramInfo(SIProgramInfo
&ProgInfo,
return false;
};
- ProgInfo.NumArchVGPR = CreateExpr(Info.NumVGPR);
- ProgInfo.NumAccVGPR = CreateExpr(Info.NumAGPR);
- ProgInfo.NumVGPR = CreateExpr(Info.getTotalNumVGPRs(STM));
-
@@ -3025,8 +3025,8 @@ define amdgpu_kernel void @dyn_extract_v5f64_s_s(ptr
addrspace(1) %out, i32 %sel
; GPRIDX-NEXT: amd_machine_version_stepping = 0
; GPRIDX-NEXT: kernel_code_entry_byte_offset = 256
; GPRIDX-NEXT: kernel_code_prefetch_byte_size = 0
-; GPRIDX-NEX
@@ -0,0 +1,220 @@
+//===- AMDGPUMCResourceInfo.cpp --- MC Resource Info
--===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Ap
@@ -75,10 +75,10 @@ bb.2:
store volatile i32 0, ptr addrspace(1) undef
ret void
}
-; DEFAULTSIZE: .amdhsa_private_segment_fixed_size 16
+; DEFAULTSIZE: .amdhsa_private_segment_fixed_size
kernel_non_entry_block_static_alloca_uniformly_reached_align4.private_seg_size
; DEFA
@@ -40,12 +41,20 @@ class AMDGPUAsmPrinter final : public AsmPrinter {
AMDGPUResourceUsageAnalysis *ResourceUsage;
+ std::unique_ptr RI;
+
SIProgramInfo CurrentProgramInfo;
std::unique_ptr HSAMetadataStream;
MCCodeEmitter *DumpCodeInstEmitter = nullptr;
+ /
@@ -0,0 +1,220 @@
+//===- AMDGPUMCResourceInfo.cpp --- MC Resource Info
--===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Ap
@@ -7147,7 +7147,9 @@ void Sema::ProcessDeclAttributeList(
// good to have a way to specify "these attributes must appear as a group",
// for these. Additionally, it would be good to have a way to specify "these
// attribute must never appear as a group" for attributes li
https://github.com/arsenm updated
https://github.com/llvm/llvm-project/pull/96872
>From 099aec4c343868155104109c66c99d97ae669c4c Mon Sep 17 00:00:00 2001
From: Matt Arsenault
Date: Tue, 11 Jun 2024 10:58:44 +0200
Subject: [PATCH 1/2] clang/AMDGPU: Emit atomicrmw for
__builtin_amdgcn_global_ato
arsenm wrote:
### Merge activity
* **Aug 15, 2:58 PM EDT**: @arsenm started a stack merge that includes this
pull request via
[Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/96872).
https://github.com/llvm/llvm-project/pull/96872
__
https://github.com/arsenm closed https://github.com/llvm/llvm-project/pull/96872
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/96873
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
https://github.com/arsenm updated
https://github.com/llvm/llvm-project/pull/96873
>From f03c978a942c9169b987b817bbb5ae68126dc766 Mon Sep 17 00:00:00 2001
From: Matt Arsenault
Date: Wed, 26 Jun 2024 19:12:59 +0200
Subject: [PATCH] clang/AMDGPU: Emit atomicrmw from
{global|flat}_atomic_fadd_v2f1
https://github.com/arsenm updated
https://github.com/llvm/llvm-project/pull/96873
>From 5d0d09c00b837186a4d40a38c474f32461bac034 Mon Sep 17 00:00:00 2001
From: Matt Arsenault
Date: Wed, 26 Jun 2024 19:12:59 +0200
Subject: [PATCH] clang/AMDGPU: Emit atomicrmw from
{global|flat}_atomic_fadd_v2f1
https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/96873
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
https://github.com/arsenm commented:
ping
https://github.com/llvm/llvm-project/pull/96873
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
@@ -123,6 +123,17 @@ bool coro::declaresIntrinsics(const Module &M,
return false;
}
+// Verifies if a module has any intrinsics.
+bool coro::declaresIntrinsics(const Module &M,
+ const DenseSet &Identifiers) {
+ for (const Function &F : M.functi
@@ -40,12 +41,20 @@ class AMDGPUAsmPrinter final : public AsmPrinter {
AMDGPUResourceUsageAnalysis *ResourceUsage;
+ std::unique_ptr RI;
arsenm wrote:
But OutContext is a reference in the parent already, so you can use it?
https://github.com/llvm/llvm-p
@@ -3025,8 +3025,8 @@ define amdgpu_kernel void @dyn_extract_v5f64_s_s(ptr
addrspace(1) %out, i32 %sel
; GPRIDX-NEXT: amd_machine_version_stepping = 0
; GPRIDX-NEXT: kernel_code_entry_byte_offset = 256
; GPRIDX-NEXT: kernel_code_prefetch_byte_size = 0
-; GPRIDX-NEX
arsenm wrote:
Note this attribute doesn't actually do anything yet. @jwanggit86 are you
working on implementing the propagation and optimizations with this?
https://github.com/llvm/llvm-project/pull/87695
___
cfe-commits mailing list
cfe-commits@lists
@@ -6867,8 +6867,14 @@ void Clang::ConstructJob(Compilation &C, const JobAction
&JA,
CmdArgs.push_back("-nogpulib");
if (Arg *A = Args.getLastArg(options::OPT_fcf_protection_EQ)) {
-CmdArgs.push_back(
-Args.MakeArgString(Twine("-fcf-protection=") + A->getVal
@@ -0,0 +1,39 @@
+// Check that -fcf-protection does not get passed to the device-side
+// compilation.
+
+// RUN: %clang -### -x cuda --target=x86_64-unknown-linux-gnu -nogpulib \
+// RUN: -nogpuinc --offload-arch=sm_52 -fcf-protection=full -c %s 2>&1 \
+// RUN: | FileCheck %s
@@ -0,0 +1,39 @@
+// Check that -fcf-protection does not get passed to the device-side
+// compilation.
+
+// RUN: %clang -### -x cuda --target=x86_64-unknown-linux-gnu -nogpulib \
+// RUN: -nogpuinc --offload-arch=sm_52 -fcf-protection=full -c %s 2>&1 \
+// RUN: | FileCheck %s
@@ -6867,8 +6867,14 @@ void Clang::ConstructJob(Compilation &C, const JobAction
&JA,
CmdArgs.push_back("-nogpulib");
if (Arg *A = Args.getLastArg(options::OPT_fcf_protection_EQ)) {
-CmdArgs.push_back(
-Args.MakeArgString(Twine("-fcf-protection=") + A->getVal
https://github.com/arsenm closed https://github.com/llvm/llvm-project/pull/88293
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
arsenm wrote:
"CDNA version" isn't even a defined, technical concept, much less a useful one.
We should be not be expanding the set of device macros, and strongly
discouraging further use.
https://github.com/llvm/llvm-project/pull/88293
___
cfe-comm
@@ -851,6 +852,16 @@ int main(void) {
// CHECK: call{{.*}} @__kmpc_flush(
#pragma omp atomic seq_cst
rix = dv / rix;
+
+// CHECK: [[LD_CPX:%.+]] = load atomic ptr, ptr @cpx monotonic
+// CHECK: br label %[[CONT:.+]]
+// CHECK: [[CONT]]
+// CHECK: [[PHI:%.+]] = phi ptr
+// CHE
https://github.com/arsenm approved this pull request.
We should really fix using cmpxchg here. Can you open an IR issue for it?
https://github.com/llvm/llvm-project/pull/88215
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.o
arsenm wrote:
> Is there really a good use case for this? Can you use regular stores to
> addrspace(7) instead? @krzysz00
I see these regularly used via inline asm in various ML code. We need to expose
these in some way to stop people from doing that
>
> Also, do you really need a separate
@@ -0,0 +1,264 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+// REQUIRES: amdgpu-registered-target
+// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu verde -emit-llvm
-o - %s | FileCheck %s --check-prefixes=VERDE
+// RUN: %clang_cc
@@ -0,0 +1,1037 @@
+//===-- ExpandVariadicsPass.cpp *- C++ -*-=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apach
@@ -0,0 +1,1037 @@
+//===-- ExpandVariadicsPass.cpp *- C++ -*-=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apach
arsenm wrote:
> @arsenm You're right about passing larger things indirectly. I'm intending to
> land this as-is, with the types inlined, as that unblocks #93362. I'm nervous
> that the extra pointer indirection will hit the same memory error that
> tweaking codegen in that patch hits (it's a s
@@ -0,0 +1,293 @@
+// REQUIRES: amdgpu-registered-target
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
UTC_ARGS: --function-signature
+// RUN: %clang_cc1 -cc1 -std=c23 -triple amdgcn-amd-amdhsa -emit-llvm -O1 %s
-o - | FileCheck %s
+
+void sink_0
@@ -0,0 +1,264 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+// REQUIRES: amdgpu-registered-target
+// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu verde -emit-llvm
-o - %s | FileCheck %s --check-prefixes=VERDE
+// RUN: %clang_cc
https://github.com/arsenm approved this pull request.
https://github.com/llvm/llvm-project/pull/94376
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
@@ -0,0 +1,19 @@
+; RUN: not --crash llc -stop-after=amdgpu-isel -mtriple=amdgcn-- -mcpu=gfx900
-verify-machineinstrs -o - %s 2>&1 | FileCheck %s
arsenm wrote:
This is not an IR verifier test, it is a codegen test that fails the machine
verifier. A machine veri
https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
https://github.com/arsenm requested changes to this pull request.
@jayfoad's testcase fails and the same test should be repeated for all 3
intrinsics
https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.
@@ -0,0 +1,19 @@
+; RUN: not --crash llc -stop-after=amdgpu-isel -mtriple=amdgcn-- -mcpu=gfx900
-verify-machineinstrs -o - %s 2>&1 | FileCheck %s
arsenm wrote:
This should also be repeated for all 3 intrinsics
https://github.com/llvm/llvm-project/pull/89217
__
arsenm wrote:
> If we do want addrspace(7), we'll need to expose `make.buffer.rsrc` and give
> it a `p7` variant probably.
Yes.
We probably should expose some kind of custom type instead of directly using a
C address_space(7) attribute
https://github.com/llvm/llvm-project/pull/94576
___
https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/93601
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
https://github.com/arsenm approved this pull request.
lgtm with nit
https://github.com/llvm/llvm-project/pull/93601
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
801 - 900 of 2504 matches
Mail list logo