[libc] [libcxx] [clang-tools-extra] [flang] [lld] [lldb] [llvm] [clang] [AMDGPU] Add pal metadata 3.0 support to callable pal funcs (PR #67104)

2024-02-05 Thread Matt Arsenault via cfe-commits
@@ -1025,6 +1025,26 @@ void AMDGPUAsmPrinter::EmitProgramInfoSI(const MachineFunction &MF, OutStreamer->emitInt32(MFI->getNumSpilledVGPRs()); } +// Helper function to add common PAL Metadata 3.0+ +static void EmitPALMetadataCommon(AMDGPUPALMetadata *MD, +

[clang] [AMDGPU] Allow w64 ballot to be used on w32 targets (PR #80183)

2024-02-05 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/80183 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[compiler-rt] [libcxx] [flang] [openmp] [llvm] [clang-tools-extra] [clang] [lldb] [lld] [libc] [PGO][OpenMP] Instrumentation for GPU devices (PR #76587)

2024-02-05 Thread Matt Arsenault via cfe-commits
@@ -862,14 +862,18 @@ static void instrumentOneFunc( auto Name = FuncInfo.FuncNameVar; auto CFGHash = ConstantInt::get(Type::getInt64Ty(M->getContext()), FuncInfo.FunctionHash); + // Make sure that pointer to global is passed in with zero

[clang] [llvm] [clang-tools-extra] [AArch64] Implement -fno-plt for SelectionDAG/GlobalISel (PR #78890)

2024-02-05 Thread Matt Arsenault via cfe-commits
@@ -1293,8 +1293,19 @@ bool AArch64CallLowering::lowerCall(MachineIRBuilder &MIRBuilder, !Subtarget.noBTIAtReturnTwice() && MF.getInfo()->branchTargetEnforcement()) Opc = AArch64::BLR_BTI; - else + else { +// For an intrinsic call (e.g. memset),

[clang] [llvm] [clang][HLSL][SPRI-V] Add convergence intrinsics (PR #80680)

2024-02-05 Thread Matt Arsenault via cfe-commits
@@ -1129,8 +1129,97 @@ struct BitTest { static BitTest decodeBitTestBuiltin(unsigned BuiltinID); }; + +// Returns the first convergence entry/loop/anchor instruction found in |BB|. +// std::nullopt otherwise. +std::optional getConvergenceToken(llvm::BasicBlock *BB) { + for

[llvm] [clang] [clang][HLSL][SPRI-V] Add convergence intrinsics (PR #80680)

2024-02-05 Thread Matt Arsenault via cfe-commits
@@ -1129,8 +1129,97 @@ struct BitTest { static BitTest decodeBitTestBuiltin(unsigned BuiltinID); }; + +// Returns the first convergence entry/loop/anchor instruction found in |BB|. +// std::nullopt otherwise. +std::optional getConvergenceToken(llvm::BasicBlock *BB) { + for

[clang] [clang][AMDGPU][CUDA] Handle __builtin_printf for device printf (PR #68515)

2024-02-05 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/68515 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] Disable FTZ/DAZ when compiling shared libraries by default. (PR #80475)

2024-02-05 Thread Matt Arsenault via cfe-commits
arsenm wrote: > * Which value allows generating the "fastest" math code -- disregarding > correctness? I'd assume that "dynamic" is least optimizable, "ieee" in the > middle, and "preserve-sign" is likely to generate the "fastest" code? This depends on the target and operations. For some funct

[clang] [clang][AMDGPU][CUDA] Handle __builtin_printf for device printf (PR #68515)

2024-02-05 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm closed https://github.com/llvm/llvm-project/pull/68515 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [clang-tools-extra] [AArch64] Implement -fno-plt for SelectionDAG/GlobalISel (PR #78890)

2024-02-05 Thread Matt Arsenault via cfe-commits
@@ -1293,8 +1293,19 @@ bool AArch64CallLowering::lowerCall(MachineIRBuilder &MIRBuilder, !Subtarget.noBTIAtReturnTwice() && MF.getInfo()->branchTargetEnforcement()) Opc = AArch64::BLR_BTI; - else + else { +// For an intrinsic call (e.g. memset),

[clang] Disable FTZ/DAZ when compiling shared libraries by default. (PR #80475)

2024-02-06 Thread Matt Arsenault via cfe-commits
arsenm wrote: > I may have mentioned a few times that I don't like function attributes > controlling fast-math behaviors. It doesn't control it, it's informative. You just get undefined behavior if you end up calling mismatched mode functions. It does control it in the AMDGPU entry point func

[clang] [compiler-rt] [HIP] support 128 bit int division (PR #71978)

2024-02-06 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm commented: Can we land the infrastructure to allow linking of compiler-rt binaries without the specifics for divide 128? https://github.com/llvm/llvm-project/pull/71978 ___ cfe-commits mailing list cfe-commits@lists.llvm.org

[clang] [llvm] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)

2024-02-06 Thread Matt Arsenault via cfe-commits
@@ -279,13 +279,25 @@ void AMDGPUTargetInfo::getTargetDefines(const LangOptions &Opts, if (GPUKind == llvm::AMDGPU::GK_NONE && !IsHIPHost) return; - StringRef CanonName = isAMDGCN(getTriple()) ? getArchNameAMDGCN(GPUKind) -

[clang] [llvm] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)

2024-02-06 Thread Matt Arsenault via cfe-commits
@@ -139,10 +139,10 @@ bool AMDGPURemoveIncompatibleFunctions::checkFunction(Function &F) { const GCNSubtarget *ST = static_cast(TM->getSubtargetImpl(F)); - // Check the GPU isn't generic. Generic is used for testing only - // and we don't want this pass to interfere

[clang] [llvm] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)

2024-02-06 Thread Matt Arsenault via cfe-commits
@@ -156,6 +156,12 @@ void AMDGPUAsmPrinter::emitFunctionBodyStart() { const GCNSubtarget &STM = MF->getSubtarget(); const Function &F = MF->getFunction(); + // TODO: We're checking this late, would be nice to check it earlier. + if (STM.requiresCodeObjectV6() && CodeObje

[clang] [llvm] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)

2024-02-06 Thread Matt Arsenault via cfe-commits
@@ -520,6 +520,104 @@ Every processor supports every OS ABI (see :ref:`amdgpu-os`) with the following === === = = === === == +Generic processors also exist. ---

[llvm] [clang] [AMDGPU] Adding the amdgpu-num-work-groups function attribute (PR #79035)

2024-02-06 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm commented: One attribute, with a range, would be better than two attributes. This is how it is handled in the similar cases. I also think this should be in terms of work items, not workgroups https://github.com/llvm/llvm-project/pull/79035 ___

[clang] [llvm] [AMDGPU] Adding the amdgpu-num-work-groups function attribute (PR #79035)

2024-02-06 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm requested changes to this pull request. One attribute https://github.com/llvm/llvm-project/pull/79035 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[lldb] [clang] [clang-tools-extra] [llvm] [libc] [libcxx] [lld] [flang] [AMDGPU] Add pal metadata 3.0 support to callable pal funcs (PR #67104)

2024-02-06 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/67104 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [CUDA][HIP] Exclude external variables from constant promotion. (PR #73549)

2024-02-06 Thread Matt Arsenault via cfe-commits
@@ -104,3 +106,14 @@ void fun() { (void) b; (void) var_host_only; } + +extern __global__ void external_func(); +extern void* const external_dep[] = { arsenm wrote: Sounds broken that the behavior would differ between array and non-array ? https://github.c

[compiler-rt] [libcxx] [lldb] [flang] [libc] [lld] [llvm] [clang-tools-extra] [clang] [AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… (PR #78414)

2024-01-23 Thread Matt Arsenault via cfe-commits
Mirko =?utf-8?q?Brkušanin?= , Mirko =?utf-8?q?Brkušanin?= ,Mirko Brkusanin ,Mariusz Sikora Message-ID: In-Reply-To: @@ -8770,6 +8781,22 @@ void AMDGPUAsmParser::cvtVOP3DPP(MCInst &Inst, const OperandVector &Operands, } } +int VdstInIdx = AMDGPU::getNamedOper

[libc] [flang] [clang] [clang-tools-extra] [lldb] [libcxx] [lld] [compiler-rt] [llvm] [AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… (PR #78414)

2024-01-23 Thread Matt Arsenault via cfe-commits
Mirko =?utf-8?q?Brku=C5=A1anin?= , Mirko =?utf-8?q?Brku=C5=A1anin?= ,Mirko Brkusanin ,Mariusz Sikora Message-ID: In-Reply-To: https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/78414 ___ cfe-commits mail

[flang] [libc] [clang-tools-extra] [libcxx] [llvm] [clang] [compiler-rt] [lldb] [lld] [AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… (PR #78414)

2024-01-23 Thread Matt Arsenault via cfe-commits
Mirko =?utf-8?q?Brkušanin?= , Mirko =?utf-8?q?Brkušanin?= ,Mirko Brkusanin ,Mariusz Sikora Message-ID: In-Reply-To: https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/78414 ___ cfe-commits mailing list cfe-commits@lists.llvm.o

[clang] [mlir] [llvm] [AMDGPU] Add GFX12 WMMA and SWMMAC instructions (PR #77795)

2024-01-23 Thread Matt Arsenault via cfe-commits
@@ -2601,67 +2601,73 @@ def int_amdgcn_ds_bvh_stack_rtn : [ImmArg>, IntrWillReturn, IntrNoCallback, IntrNoFree] >; +def int_amdgcn_s_wait_event_export_ready : + ClangBuiltin<"__builtin_amdgcn_s_wait_event_export_ready">, + Intrinsic<[], [], [IntrNoMem, IntrHasSideEffec

[clang] [NVPTX][AMDGPU][CodeGen] Fix `local_space nullptr` handling for NVPTX and local/private `nullptr` value for AMDGPU. (PR #78759)

2024-01-25 Thread Matt Arsenault via cfe-commits
@@ -418,8 +418,10 @@ class LLVM_LIBRARY_VISIBILITY AMDGPUTargetInfo final : public TargetInfo { // value ~0. uint64_t getNullPointerValue(LangAS AS) const override { // FIXME: Also should handle region. -return (AS == LangAS::opencl_local || AS == LangAS::opencl_pr

[clang-tools-extra] [llvm] [clang] [SeperateConstOffsetFromGEP] Handle `or disjoint` flags (PR #76997)

2024-01-26 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/76997 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang-tools-extra] [llvm] [clang] [SeperateConstOffsetFromGEP] Handle `or disjoint` flags (PR #76997)

2024-01-26 Thread Matt Arsenault via cfe-commits
arsenm wrote: Not sure if we need additional negative tests for missing disjoints https://github.com/llvm/llvm-project/pull/76997 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang-tools-extra] [llvm] ValueTracking: Merge fcmpImpliesClass and fcmpToClassTest (PR #66522)

2024-01-26 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/66522 >From 076ab2374d84c4112e0bf3fb11ecda2f5774785e Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Mon, 11 Sep 2023 10:56:40 +0300 Subject: [PATCH 1/2] ValueTracking: Merge fcmpImpliesClass and fcmpToClassTest --

[clang-tools-extra] [llvm] ValueTracking: Merge fcmpImpliesClass and fcmpToClassTest (PR #66522)

2024-01-26 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/66522 >From 076ab2374d84c4112e0bf3fb11ecda2f5774785e Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Mon, 11 Sep 2023 10:56:40 +0300 Subject: [PATCH 1/6] ValueTracking: Merge fcmpImpliesClass and fcmpToClassTest --

[llvm] [clang-tools-extra] ValueTracking: Merge fcmpImpliesClass and fcmpToClassTest (PR #66522)

2024-01-26 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/66522 >From 076ab2374d84c4112e0bf3fb11ecda2f5774785e Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Mon, 11 Sep 2023 10:56:40 +0300 Subject: [PATCH 1/7] ValueTracking: Merge fcmpImpliesClass and fcmpToClassTest --

[clang-tools-extra] [llvm] ValueTracking: Merge fcmpImpliesClass and fcmpToClassTest (PR #66522)

2024-01-26 Thread Matt Arsenault via cfe-commits
@@ -2641,8 +2641,8 @@ define float @assume_false_smallest_normal(float %arg) { } define float @clamp_false_nan(float %arg) { -; CHECK-LABEL: define float @clamp_false_nan( -; CHECK-SAME: float returned [[ARG:%.*]]) #[[ATTR2]] { +; CHECK-LABEL: define nofpclass(nan inf nzero su

[llvm] [clang-tools-extra] ValueTracking: Merge fcmpImpliesClass and fcmpToClassTest (PR #66522)

2024-01-26 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm closed https://github.com/llvm/llvm-project/pull/66522 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [clang-tools-extra] Reapply "InstCombine: Introduce SimplifyDemandedUseFPClass"" (PR #74056)

2024-01-26 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/74056 >From 9be777d5b39852cf3c0b2538fd5f712922672caa Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Fri, 1 Dec 2023 18:00:13 +0900 Subject: [PATCH] Reapply "InstCombine: Introduce SimplifyDemandedUseFPClass"" This

[llvm] [clang-tools-extra] [clang] Reapply "InstCombine: Introduce SimplifyDemandedUseFPClass"" (PR #74056)

2024-01-29 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/74056 >From 9be777d5b39852cf3c0b2538fd5f712922672caa Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Fri, 1 Dec 2023 18:00:13 +0900 Subject: [PATCH 1/2] Reapply "InstCombine: Introduce SimplifyDemandedUseFPClass""

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-01-29 Thread Matt Arsenault via cfe-commits
@@ -2561,6 +2567,70 @@ bool SIMemoryLegalizer::expandAtomicCmpxchgOrRmw(const SIMemOpInfo &MOI, return Changed; } +bool SIMemoryLegalizer::GFX9InsertWaitcntForPreciseMem(MachineFunction &MF) { arsenm wrote: can you just make this happen as a consequence of

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-01-29 Thread Matt Arsenault via cfe-commits
@@ -2561,6 +2567,70 @@ bool SIMemoryLegalizer::expandAtomicCmpxchgOrRmw(const SIMemOpInfo &MOI, return Changed; } +bool SIMemoryLegalizer::GFX9InsertWaitcntForPreciseMem(MachineFunction &MF) { + const GCNSubtarget &ST = MF.getSubtarget(); + const SIInstrInfo *TII = ST.get

[flang] [libc] [libcxx] [clang] [llvm] [lldb] [compiler-rt] [lld] [ASan][AMDGPU] Fix Assertion Failure. (PR #79795)

2024-01-29 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/79795 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[lld] [llvm] [clang] [AMDGPU] Rename COV module flag to amdhsa_code_object_version (PR #79905)

2024-01-29 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/79905 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang-tools-extra] [clang] [AArch64] Implement -fno-plt for SelectionDAG/GlobalISel (PR #78890)

2024-01-30 Thread Matt Arsenault via cfe-commits
@@ -1293,8 +1293,19 @@ bool AArch64CallLowering::lowerCall(MachineIRBuilder &MIRBuilder, !Subtarget.noBTIAtReturnTwice() && MF.getInfo()->branchTargetEnforcement()) Opc = AArch64::BLR_BTI; - else + else { +// For an intrinsic call (e.g. memset),

[clang] [llvm] [AMDGPU] Add global_load_tr for GFX12 (PR #77772)

2024-01-11 Thread Matt Arsenault via cfe-commits
@@ -18178,6 +18178,51 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, llvm::Function *F = CGM.getIntrinsic(IID, {ArgTy}); return Builder.CreateCall(F, {Addr, Val, ZeroI32, ZeroI32, ZeroI1}); } + case AMDGPU::BI__builtin_amdgcn_global_load_tr_b64

[clang] [llvm] [AMDGPU] Add GFX12 WMMA and SWMMAC instructions (PR #77795)

2024-01-11 Thread Matt Arsenault via cfe-commits
@@ -423,6 +423,67 @@ TARGET_BUILTIN(__builtin_amdgcn_s_wakeup_barrier, "vi", "n", "gfx12-insts") TARGET_BUILTIN(__builtin_amdgcn_s_barrier_leave, "b", "n", "gfx12-insts") TARGET_BUILTIN(__builtin_amdgcn_s_get_barrier_state, "Uii", "n", "gfx12-insts") +//===---

[llvm] [clang] [AMDGPU] Add GFX12 WMMA and SWMMAC instructions (PR #77795)

2024-01-11 Thread Matt Arsenault via cfe-commits
@@ -18240,65 +18240,211 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, case AMDGPU::BI__builtin_amdgcn_wmma_i32_16x16x16_iu4_w32: case AMDGPU::BI__builtin_amdgcn_wmma_i32_16x16x16_iu4_w64: case AMDGPU::BI__builtin_amdgcn_wmma_i32_16x16x16_iu8_w32: -

[llvm] [clang-tools-extra] [clang] [AMDGPU][GFX12] Default component broadcast store (PR #76212)

2024-01-11 Thread Matt Arsenault via cfe-commits
@@ -719,6 +719,18 @@ def FeatureFlatAtomicFaddF32Inst "Has flat_atomic_add_f32 instruction" >; +def FeatureDefaultComponentZero : SubtargetFeature<"default-component-zero", + "HasDefaultComponentZero", + "true", + "BUFFER/IMAGE store instructions set unspecified component

[compiler-rt] [lldb] [libc] [lld] [flang] [clang-tools-extra] [libcxx] [clang] [llvm] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2024-01-12 Thread Matt Arsenault via cfe-commits
@@ -1183,9 +1228,21 @@ bool SIInsertWaitcnts::generateWaitcntInstBefore(MachineInstr &MI, // No need to wait before load from VMEM to LDS. if (TII->mayWriteLDSThroughDMA(MI)) continue; -unsigned RegNo = SQ_MAX_PGM_VGPRS + EXTRA_VGPR_LDS; +

[lldb] [libc] [lld] [libcxx] [clang] [compiler-rt] [flang] [clang-tools-extra] [llvm] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2024-01-12 Thread Matt Arsenault via cfe-commits
@@ -130,6 +130,8 @@ ; GCN-O0-NEXT:MachineDominator Tree Construction ; GCN-O0-NEXT:Machine Natural Loop Construction ; GCN-O0-NEXT:MachinePostDominator Tree Construction +; GCN-O0-NEXT:Basic Alias Analysis (stateless AA impl) +; GCN-O0-NEXT:

[libc] [clang] [libcxx] [llvm] [clang-tools-extra] [flang] [compiler-rt] [lldb] [lld] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2024-01-12 Thread Matt Arsenault via cfe-commits
@@ -703,8 +713,37 @@ void WaitcntBrackets::updateByEvent(const SIInstrInfo *TII, setRegScore(RegNo, T, CurrScore); } } -if (Inst.mayStore() && (TII->isDS(Inst) || mayWriteLDSThroughDMA(Inst))) { - setRegScore(SQ_MAX_PGM_VGPRS + EXTRA_VGPR_LDS, T, Curr

[llvm] [clang] [CodeGen] Port AtomicExpand to new Pass Manager (PR #71220)

2024-01-15 Thread Matt Arsenault via cfe-commits
@@ -429,6 +429,7 @@ FUNCTION_PASS("strip-gc-relocates", StripGCRelocates()) FUNCTION_PASS("structurizecfg", StructurizeCFGPass()) FUNCTION_PASS("tailcallelim", TailCallElimPass()) FUNCTION_PASS("typepromotion", TypePromotionPass(TM)) +FUNCTION_PASS("atomicexpand", AtomicExpandP

[llvm] [clang] [clang-tools-extra] [AMDGPU][GFX12] Add 16 bit atomic fadd instructions (PR #75917)

2024-01-15 Thread Matt Arsenault via cfe-commits
@@ -27,34 +27,23 @@ main_body: ret float %out0 } -define amdgpu_ps float @atomic_pk_add_bf16_1d_v2(<8 x i32> inreg %rsrc, <2 x i16> %data, i32 %s) { +define amdgpu_ps float @atomic_pk_add_bf16_1d_v2(<8 x i32> inreg %rsrc, <2 x bfloat> %data, i32 %s) { ; GFX12-LABEL: atomi

[llvm] [clang] [clang-tools-extra] [AMDGPU][GFX12] Add Atomic cond_sub_u32 (PR #76224)

2024-01-15 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm commented: Missing UniformityAnalysis test for these https://github.com/llvm/llvm-project/pull/76224 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] AMDGPU/GFX12: Add new dot4 fp8/bf8 instructions (PR #77892)

2024-01-15 Thread Matt Arsenault via cfe-commits
@@ -2696,6 +2696,25 @@ def int_amdgcn_udot8 : ImmArg>, ImmArg>, ImmArg>] >; +// f32 %r = llvm.amdgcn.dot4.f32.type_a.type_b (v4type_a (as i32) %a, v4type_b (as i32) %b, f32 %c) +// %r = %a[0] * %b[0] + %a[1] * %b[1] + %a[2] * %b[2] + %a[3] * %b[3] + %c +class AMDGPU

[clang-tools-extra] [llvm] [clang] [AMDGPU][GFX12] Add Atomic cond_sub_u32 (PR #76224)

2024-01-16 Thread Matt Arsenault via cfe-commits
@@ -2502,10 +2500,9 @@ def int_amdgcn_flat_atomic_fmax_num : AMDGPUAtomicRtn; def int_amdgcn_global_atomic_fmin_num : AMDGPUAtomicRtn; def int_amdgcn_global_atomic_fmax_num : AMDGPUAtomicRtn; -def int_amdgcn_flat_atomic_cond_sub_u32 : AMDGPUAtomicRtn; -def int_amdgcn_global

[llvm] [clang] [AMDGPU] Add global_load_tr for GFX12 (PR #77772)

2024-01-16 Thread Matt Arsenault via cfe-commits
@@ -18178,6 +18178,51 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, llvm::Function *F = CGM.getIntrinsic(IID, {ArgTy}); return Builder.CreateCall(F, {Addr, Val, ZeroI32, ZeroI32, ZeroI1}); } + case AMDGPU::BI__builtin_amdgcn_global_load_tr_b64

[llvm] [clang] [AMDGPU] Add global_load_tr for GFX12 (PR #77772)

2024-01-16 Thread Matt Arsenault via cfe-commits
@@ -18178,6 +18178,51 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, llvm::Function *F = CGM.getIntrinsic(IID, {ArgTy}); return Builder.CreateCall(F, {Addr, Val, ZeroI32, ZeroI32, ZeroI1}); } + case AMDGPU::BI__builtin_amdgcn_global_load_tr_b64

[llvm] [clang] [mlir] [clang-tools-extra] [ASan][AMDGPU] Fix Assertion Failure. (PR #78242)

2024-01-16 Thread Matt Arsenault via cfe-commits
@@ -1254,9 +1254,11 @@ Value *AddressSanitizer::memToShadow(Value *Shadow, IRBuilder<> &IRB) { void AddressSanitizer::instrumentMemIntrinsic(MemIntrinsic *MI) { InstrumentationIRBuilder IRB(MI); if (isa(MI)) { -IRB.CreateCall(isa(MI) ? AsanMemmove : AsanMemcpy, -

[llvm] [clang] [LinkerWrapper] Handle AMDGPU Target-IDs correctly when linking (PR #78359)

2024-01-16 Thread Matt Arsenault via cfe-commits
@@ -162,6 +162,19 @@ class OffloadFile : public OwningBinary { std::unique_ptr Buffer) : OwningBinary(std::move(Binary), std::move(Buffer)) {} + /// Make a deep copy of this offloading file. + OffloadFile copy() const { +std::unique_ptr Buffer = Memor

[libc] [clang] [compiler-rt] [clang-tools-extra] [libcxxabi] [libunwind] [lldb] [mlir] [flang] [llvm] [libcxx] [ASan][AMDGPU] Fix Assertion Failure. (PR #78242)

2024-01-16 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,48 @@ +;RUN: opt < %s -passes=asan -S | FileCheck %s + +target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128-p9:192:256:256:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-

[libc] [clang] [compiler-rt] [clang-tools-extra] [lld] [lldb] [flang] [llvm] [libcxx] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2024-01-16 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/74537 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[lldb] [llvm] [libcxx] [clang-tools-extra] [compiler-rt] [clang] [lld] [libc] [flang] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2024-01-16 Thread Matt Arsenault via cfe-commits
@@ -707,7 +723,40 @@ void WaitcntBrackets::updateByEvent(const SIInstrInfo *TII, (TII->isDS(Inst) || TII->mayWriteLDSThroughDMA(Inst))) { // MUBUF and FLAT LDS DMA operations need a wait on vmcnt before LDS // written can be accessed. A load from LDS to VMEM

[clang] [compiler-rt] [lld] [libc] [libcxx] [llvm] [clang-tools-extra] [lldb] [flang] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2024-01-16 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. lgtm, but can still fix the -O0 thing https://github.com/llvm/llvm-project/pull/74537 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-com

[clang] [lldb] [clang-tools-extra] [llvm] [flang] [lld] [compiler-rt] [libcxx] [libc] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2024-01-16 Thread Matt Arsenault via cfe-commits
@@ -130,6 +130,8 @@ ; GCN-O0-NEXT:MachineDominator Tree Construction ; GCN-O0-NEXT:Machine Natural Loop Construction ; GCN-O0-NEXT:MachinePostDominator Tree Construction +; GCN-O0-NEXT:Basic Alias Analysis (stateless AA impl) +; GCN-O0-NEXT:

[clang] [clang][ExprConst] allow single element access of vector object to be constant expression (PR #72607)

2024-01-16 Thread Matt Arsenault via cfe-commits
@@ -437,6 +442,16 @@ namespace { MostDerivedArraySize = 2; MostDerivedPathLength = Entries.size(); } +void addVectorUnchecked(QualType EltTy, uint64_t Size, uint64_t Idx) { + Entries.push_back(PathEntry::ArrayIndex(Idx)); + + // This is technically

[clang-tools-extra] [clang] [llvm] [AMDGPU][GFX12] Add Atomic cond_sub_u32 (PR #76224)

2024-01-16 Thread Matt Arsenault via cfe-commits
@@ -1182,6 +1182,11 @@ The AMDGPU backend implements the following LLVM IR intrinsics. The iglp_opt strategy implementations are subject to change. + llvm.atomic.cond.sub.u32 Provides direct access

[clang] [llvm] [AMDGPU] Add GFX12 WMMA and SWMMAC instructions (PR #77795)

2024-01-16 Thread Matt Arsenault via cfe-commits
@@ -423,6 +423,67 @@ TARGET_BUILTIN(__builtin_amdgcn_s_wakeup_barrier, "vi", "n", "gfx12-insts") TARGET_BUILTIN(__builtin_amdgcn_s_barrier_leave, "b", "n", "gfx12-insts") TARGET_BUILTIN(__builtin_amdgcn_s_get_barrier_state, "Uii", "n", "gfx12-insts") +//===---

[clang-tools-extra] [llvm] [clang] DAG: Fix ABI lowering with FP promote in strictfp functions (PR #74405)

2024-01-17 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/74405 >From cdafeff37cd20e8cb8cdcf6ac8561455d5c9a30a Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Sat, 2 Dec 2023 20:49:51 +0700 Subject: [PATCH] DAG: Fix ABI lowering with FP promote in strictfp functions This

[clang] [llvm] [AMDGPU] Add InstCombine rule for ballot.i64 intrinsic in wave32 mode. (PR #71556)

2024-01-17 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/71556 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Add InstCombine rule for ballot.i64 intrinsic in wave32 mode. (PR #71556)

2024-01-17 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm closed https://github.com/llvm/llvm-project/pull/71556 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[compiler-rt] [lldb] [flang] [clang-tools-extra] [libc] [libcxxabi] [llvm] [clang] [libunwind] [libcxx] [ASan][AMDGPU] Fix Assertion Failure. (PR #78410)

2024-01-17 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,23 @@ +;RUN: opt < %s -passes=asan -S | FileCheck %s + +target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128-p9:192:256:256:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-

[clang-tools-extra] [llvm] [clang] DAG: Fix ABI lowering with FP promote in strictfp functions (PR #74405)

2024-01-17 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/74405 >From cdafeff37cd20e8cb8cdcf6ac8561455d5c9a30a Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Sat, 2 Dec 2023 20:49:51 +0700 Subject: [PATCH 1/2] DAG: Fix ABI lowering with FP promote in strictfp functions

[clang] [clang-tools-extra] [llvm] DAG: Fix ABI lowering with FP promote in strictfp functions (PR #74405)

2024-01-17 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm closed https://github.com/llvm/llvm-project/pull/74405 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [CodeGen] Port AtomicExpand to new Pass Manager (PR #71220)

2024-01-17 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/71220 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [CodeGen] Port AtomicExpand to new Pass Manager (PR #71220)

2024-01-17 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm requested changes to this pull request. https://github.com/llvm/llvm-project/pull/71220 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [CodeGen] Port AtomicExpand to new Pass Manager (PR #71220)

2024-01-17 Thread Matt Arsenault via cfe-commits
@@ -340,7 +342,33 @@ bool AtomicExpand::runOnFunction(Function &F) { return MadeChange; } -bool AtomicExpand::bracketInstWithFences(Instruction *I, AtomicOrdering Order) { +bool AtomicExpandLegacy::runOnFunction(Function &F) { + if (skipFunction(F)) +return false;

[clang] [AMDGPU] Add GFX12 __builtin_amdgcn_s_sleep_var (PR #77926)

2024-01-17 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,11 @@ +// REQUIRES: amdgpu-registered-target + +// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu gfx1100 -verify -S -emit-llvm -o - %s + +typedef unsigned int uint; +typedef uint uint2 __attribute__((ext_vector_type(2))); +typedef uint uint4 __attribute__(

[clang] [AMDGPU] Add GFX12 __builtin_amdgcn_s_sleep_var (PR #77926)

2024-01-17 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,11 @@ +// REQUIRES: amdgpu-registered-target + +// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu gfx1100 -verify -S -emit-llvm -o - %s + +typedef unsigned int uint; +typedef uint uint2 __attribute__((ext_vector_type(2))); +typedef uint uint4 __attribute__(

[clang] [AMDGPU] Add GFX12 __builtin_amdgcn_s_sleep_var (PR #77926)

2024-01-17 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/77926 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU][NFC] Rename feature FP8Insts to FP8ConversionInsts (PR #78439)

2024-01-17 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/78439 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [flang] [compiler-rt] [clang-tools-extra] [llvm] [AMDGPU][GFX12] Add 16 bit atomic fadd instructions (PR #75917)

2024-01-17 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/75917 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang-tools-extra] [llvm] [flang] [compiler-rt] [AMDGPU][GFX12] Add 16 bit atomic fadd instructions (PR #75917)

2024-01-17 Thread Matt Arsenault via cfe-commits
@@ -1,56 +1,244 @@ ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py -; RUN: llc -march=amdgcn -mcpu=gfx1200 -verify-machineinstrs < %s | FileCheck -check-prefix=GFX12 %s -; RUN: llc -march=amdgcn -global-isel=1 -mcpu=gfx1200 -verify-machineinstrs <

[llvm] [compiler-rt] [clang] [clang-tools-extra] [flang] [AMDGPU][GFX12] Add 16 bit atomic fadd instructions (PR #75917)

2024-01-17 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/75917 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang-tools-extra] [clang] [AMDGPU] CodeGen for GFX12 S_WAIT_* instructions (PR #77438)

2024-01-17 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/77438 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [libcxx] [clang-tools-extra] [llvm] [flang] [lld] [compiler-rt] [libc] [lldb] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2024-01-17 Thread Matt Arsenault via cfe-commits
arsenm wrote: > > lgtm, but can still fix the -O0 thing > > But where do I get TM in the getAnalysisUsage? MF.getTarget() https://github.com/llvm/llvm-project/pull/74537 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/c

[compiler-rt] [llvm] [flang] [lldb] [clang] [libcxx] [clang-tools-extra] [libc] [lld] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2024-01-17 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/74537 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang-tools-extra] [clang] DAG: Fix chain mismanagement in SoftenFloatRes_FP_EXTEND (PR #74558)

2024-01-17 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/74558 >From cdafeff37cd20e8cb8cdcf6ac8561455d5c9a30a Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Sat, 2 Dec 2023 20:49:51 +0700 Subject: [PATCH 1/2] DAG: Fix ABI lowering with FP promote in strictfp functions

[clang] [llvm] [clang-tools-extra] DAG: Fix chain mismanagement in SoftenFloatRes_FP_EXTEND (PR #74558)

2024-01-17 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm closed https://github.com/llvm/llvm-project/pull/74558 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [CodeGen] Port AtomicExpand to new Pass Manager (PR #71220)

2024-01-18 Thread Matt Arsenault via cfe-commits
@@ -340,7 +342,31 @@ bool AtomicExpand::runOnFunction(Function &F) { return MadeChange; } -bool AtomicExpand::bracketInstWithFences(Instruction *I, AtomicOrdering Order) { +bool AtomicExpandLegacy::runOnFunction(Function &F) { + i arsenm wrote: Stray char

[llvm] [clang] [CodeGen] Port AtomicExpand to new Pass Manager (PR #71220)

2024-01-18 Thread Matt Arsenault via cfe-commits
@@ -457,7 +457,7 @@ TargetPassConfig *PPCTargetMachine::createPassConfig(PassManagerBase &PM) { void PPCPassConfig::addIRPasses() { if (TM->getOptLevel() != CodeGenOptLevel::None) addPass(createPPCBoolRetToIntPass()); - addPass(createAtomicExpandPass()); + addPass(crea

[clang] [llvm] [docs] Add llvm and clang release notes for the global-var code model attribute (PR #78664)

2024-01-18 Thread Matt Arsenault via cfe-commits
@@ -70,6 +70,8 @@ Changes to the LLVM IR * Added `llvm.exp10` intrinsic. +* Added a code model attribute for the global variable. arsenm wrote: Helpful to include link to the LangRef reference https://github.com/llvm/llvm-project/pull/78664 ___

[llvm] [clang] [AMDGPU] Remove gws feature from GFX12 (PR #78711)

2024-01-19 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/78711 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[libclc] libclc: add missing AMD gfx symlinks (PR #78884)

2024-01-21 Thread Matt Arsenault via cfe-commits
=?utf-8?b?Wm9sdMOhbiBCw7ZzesO2cm3DqW55aQ==?= Message-ID: In-Reply-To: https://github.com/arsenm approved this pull request. can probably drop all the versioning handling here https://github.com/llvm/llvm-project/pull/78884 ___ cfe-commits mailing li

[libclc] libclc: add missing AMD gfx symlinks (PR #78884)

2024-01-21 Thread Matt Arsenault via cfe-commits
=?utf-8?b?Wm9sdMOhbiBCw7ZzesO2cm3DqW55aQ=?Message-ID: In-Reply-To: https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/78884 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/

[libclc] libclc: add missing AMD gfx symlinks (PR #78884)

2024-01-21 Thread Matt Arsenault via cfe-commits
=?utf-8?b?Wm9sdMOhbiBCw7ZzesO2cm3DqW55aQ=?Message-ID: In-Reply-To: @@ -154,6 +154,46 @@ if( ${LLVM_PACKAGE_VERSION} VERSION_GREATER "6.99.99" ) set( tahiti_aliases ${tahiti_aliases} gfx904 gfx906 ) endif() +# Support for gfx909, gfx1010, gfx1011 and gfx1012 was added

[libclc] libclc: add missing AMD gfx symlinks (PR #78884)

2024-01-21 Thread Matt Arsenault via cfe-commits
=?utf-8?b?Wm9sdMOhbiBCw7ZzesO2cm3DqW55aQ=?Message-ID: In-Reply-To: @@ -154,6 +154,46 @@ if( ${LLVM_PACKAGE_VERSION} VERSION_GREATER "6.99.99" ) set( tahiti_aliases ${tahiti_aliases} gfx904 gfx906 ) endif() +# Support for gfx909, gfx1010, gfx1011 and gfx1012 was added

[libclc] libclc: add missing AMD gfx symlinks (PR #78884)

2024-01-22 Thread Matt Arsenault via cfe-commits
=?utf-8?b?Wm9sdMOhbiBCw7ZzesO2cm3DqW55aQ=?=, =?utf-8?b?Wm9sdMOhbiBCw7ZzesO2cm3DqW55aQ=?Message-ID: In-Reply-To: https://github.com/arsenm closed https://github.com/llvm/llvm-project/pull/78884 ___ cfe-commits mailing list cfe-commits@lists.llvm.org htt

[lldb] [libcxx] [libc] [compiler-rt] [flang] [llvm] [clang] [clang-tools-extra] [lld] [AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… (PR #78414)

2024-01-22 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm commented: Why is so there so much special casing in the assembler/disassembler? https://github.com/llvm/llvm-project/pull/78414 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/li

[llvm] [mlir] [clang] [AMDGPU] Change default AMDHSA Code Object version to 5 (PR #79038)

2024-01-22 Thread Matt Arsenault via cfe-commits
arsenm wrote: Should get a mention in the release notes https://github.com/llvm/llvm-project/pull/79038 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [mlir] [llvm] [AMDGPU] Add GFX12 WMMA and SWMMAC instructions (PR #77795)

2024-01-23 Thread Matt Arsenault via cfe-commits
@@ -2601,67 +2601,73 @@ def int_amdgcn_ds_bvh_stack_rtn : [ImmArg>, IntrWillReturn, IntrNoCallback, IntrNoFree] >; +def int_amdgcn_s_wait_event_export_ready : + ClangBuiltin<"__builtin_amdgcn_s_wait_event_export_ready">, + Intrinsic<[], [], [IntrNoMem, IntrHasSideEffec

[clang] [AMDGPU] Lower __builtin_amdgcn_read_exec_hi to use amdgcn_ballot (PR #69567)

2023-10-23 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/69567 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [AMDGPU] Lower __builtin_amdgcn_read_exec_hi to use amdgcn_ballot (PR #69567)

2023-10-23 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm requested changes to this pull request. https://github.com/llvm/llvm-project/pull/69567 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [AMDGPU] Lower __builtin_amdgcn_read_exec_hi to use amdgcn_ballot (PR #69567)

2023-10-23 Thread Matt Arsenault via cfe-commits
@@ -526,7 +526,9 @@ void test_read_exec_lo(global uint* out) { // CHECK: declare i32 @llvm.amdgcn.ballot.i32(i1) #[[$NOUNWIND_READONLY:[0-9]+]] // CHECK-LABEL: @test_read_exec_hi( -// CHECK: call i32 @llvm.amdgcn.ballot.i32(i1 true) +// CHECK: call i64 @llvm.amdgcn.ballot.i64

[clang] [AMDGPU] Lower __builtin_amdgcn_read_exec_hi to use amdgcn_ballot (PR #69567)

2023-10-23 Thread Matt Arsenault via cfe-commits
@@ -7997,14 +7997,26 @@ enum SpecialRegisterAccessKind { static Value *EmitAMDGCNBallotForExec(CodeGenFunction &CGF, const CallExpr *E, llvm::Type *RegisterType, - llvm::Type *ValueType) { +

<    1   2   3   4   5   6   7   8   9   10   >