[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: add RegBankLegalize rules for AND OR and XOR (PR #132382)
https://github.com/Pierre-vh commented: Can you add a testcase in each file with MI flags on the instruction? You have code that preserve flags which needs to be tested https://github.com/llvm/llvm-project/pull/132382 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: add RegBankLegalize rules for AND OR and XOR (PR #132382)
@@ -237,6 +237,21 @@ void RegBankLegalizeHelper::lowerS_BFE(MachineInstr &MI) { MI.eraseFromParent(); } +void RegBankLegalizeHelper::lowerSplitTo32(MachineInstr &MI) { + Register Dst = MI.getOperand(0).getReg(); + LLT Ty = MRI.getType(Dst) == V4S16 ? V2S16 : S32; Pierre-vh wrote: ```suggestion LLT Ty = (MRI.getType(Dst) == V4S16 ? V2S16 : S32); ``` nit for clarity; are you also expecting the type of be something else specifically if it isn't `V4S16` ? If so I'd add an assert to avoid silent failures https://github.com/llvm/llvm-project/pull/132382 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: add RegBankLegalize rules for AND OR and XOR (PR #132382)
https://github.com/Pierre-vh edited https://github.com/llvm/llvm-project/pull/132382 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: add RegBankLegalize rules for extends and trunc (PR #132383)
@@ -131,6 +131,40 @@ void RegBankLegalizeHelper::widenLoad(MachineInstr &MI, LLT WideTy, MI.eraseFromParent(); } +void RegBankLegalizeHelper::lowerVccExtToSel(MachineInstr &MI) { + Register Dst = MI.getOperand(0).getReg(); + LLT Ty = MRI.getType(Dst); + Register Src = MI.getOperand(1).getReg(); + unsigned Opc = MI.getOpcode(); + if (Ty == S32 || Ty == S16) { +auto True = B.buildConstant({VgprRB, Ty}, Opc == G_SEXT ? -1 : 1); Pierre-vh wrote: The `Opc == G_SEXT ? -1 : 1` can go in a variable and be reused in both cases to remove repetitions https://github.com/llvm/llvm-project/pull/132383 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: add RegBankLegalize rules for select (PR #132384)
@@ -485,7 +504,8 @@ LLT RegBankLegalizeHelper::getBTyFromID(RegBankLLTMappingApplyID ID, LLT Ty) { case UniInVgprB64: if (Ty == LLT::scalar(64) || Ty == LLT::fixed_vector(2, 32) || Ty == LLT::fixed_vector(4, 16) || Ty == LLT::pointer(0, 64) || -Ty == LLT::pointer(1, 64) || Ty == LLT::pointer(4, 64)) +Ty == LLT::pointer(1, 64) || Ty == LLT::pointer(4, 64) || +(Ty.isPointer() && Ty.getAddressSpace() > AMDGPUAS::MAX_AMDGPU_ADDRESS)) Pierre-vh wrote: What case is this trying to handle? Is it tested? https://github.com/llvm/llvm-project/pull/132384 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: add RegBankLegalize rules for select (PR #132384)
@@ -286,6 +287,22 @@ void RegBankLegalizeHelper::lowerSplitTo32(MachineInstr &MI) { MI.eraseFromParent(); } +void RegBankLegalizeHelper::lowerSplitTo32Sel(MachineInstr &MI) { + Register Dst = MI.getOperand(0).getReg(); + LLT Ty = MRI.getType(Dst) == V4S16 ? V2S16 : S32; Pierre-vh wrote: ```suggestion LLT Ty = (MRI.getType(Dst) == V4S16 ? V2S16 : S32); ``` nit for clarity + if you expect the type to be something else when it isn't `V4S16`, add an assert? https://github.com/llvm/llvm-project/pull/132384 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: add RegBankLegalize rules for extends and trunc (PR #132383)
@@ -131,6 +131,40 @@ void RegBankLegalizeHelper::widenLoad(MachineInstr &MI, LLT WideTy, MI.eraseFromParent(); } +void RegBankLegalizeHelper::lowerVccExtToSel(MachineInstr &MI) { + Register Dst = MI.getOperand(0).getReg(); + LLT Ty = MRI.getType(Dst); + Register Src = MI.getOperand(1).getReg(); + unsigned Opc = MI.getOpcode(); + if (Ty == S32 || Ty == S16) { +auto True = B.buildConstant({VgprRB, Ty}, Opc == G_SEXT ? -1 : 1); +auto False = B.buildConstant({VgprRB, Ty}, 0); +B.buildSelect(Dst, Src, True, False); + } + if (Ty == S64) { Pierre-vh wrote: ```suggestion } else if (Ty == S64) { ``` Can also add a final `else` with `llvm_unreachable` https://github.com/llvm/llvm-project/pull/132383 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [SelectionDAG] Split vector types for atomic load (PR #120640)
@@ -1421,6 +1424,40 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, unsigned ResNo) { SetSplitVector(SDValue(N, ResNo), Lo, Hi); } +void DAGTypeLegalizer::SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD, SDValue &Lo, + SDValue &Hi) { + EVT LoVT, HiVT; + SDLoc dl(LD); + std::tie(LoVT, HiVT) = DAG.GetSplitDestVTs(LD->getValueType(0)); + + ISD::LoadExtType ExtType = LD->getExtensionType(); + SDValue Ch = LD->getChain(); + SDValue Ptr = LD->getBasePtr(); + EVT MemoryVT = LD->getMemoryVT(); + + EVT LoMemVT, HiMemVT; + std::tie(LoMemVT, HiMemVT) = DAG.GetSplitDestVTs(MemoryVT); + + EVT IntVT = EVT::getIntegerVT(*DAG.getContext(), LD->getValueType(0).getSizeInBits()); + EVT MemIntVT = EVT::getIntegerVT(*DAG.getContext(), LD->getMemoryVT().getSizeInBits()); + SDValue ALD = DAG.getAtomicLoad(ExtType, dl, MemIntVT, IntVT, Ch, Ptr, + LD->getMemOperand()); arsenm wrote: As written this is not going to handle extending loads correctly. I also don't know how you could end up with a vector extending load with the code as it is now. You should either try to handle it, or assert and figure out where it is coming from https://github.com/llvm/llvm-project/pull/120640 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: add RegBankLegalize rules for select (PR #132384)
@@ -198,7 +198,8 @@ UniformityLLTOpPredicateID LLTToBId(LLT Ty) { return B32; if (Ty == LLT::scalar(64) || Ty == LLT::fixed_vector(2, 32) || Ty == LLT::fixed_vector(4, 16) || Ty == LLT::pointer(1, 64) || - Ty == LLT::pointer(4, 64)) + Ty == LLT::pointer(4, 64) || + (Ty.isPointer() && Ty.getAddressSpace() > AMDGPUAS::MAX_AMDGPU_ADDRESS)) Pierre-vh wrote: same question here https://github.com/llvm/llvm-project/pull/132384 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AtomicExpand] Add bitcasts when expanding load atomic vector (PR #120716)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/120716 >From 66207ccdf0f96a5836b3bafeb37c90d9a762d944 Mon Sep 17 00:00:00 2001 From: jofrn Date: Fri, 20 Dec 2024 06:14:28 -0500 Subject: [PATCH] [AtomicExpand] Add bitcasts when expanding load atomic vector AtomicExpand fails for aligned `load atomic ` because it does not find a compatible library call. This change adds appropriate bitcasts so that the call can be lowered. commit-id:f430c1af --- llvm/lib/CodeGen/AtomicExpandPass.cpp | 15 - llvm/test/CodeGen/ARM/atomic-load-store.ll| 51 +++ llvm/test/CodeGen/X86/atomic-load-store.ll| 30 + .../X86/expand-atomic-non-integer.ll | 65 +++ 4 files changed, 158 insertions(+), 3 deletions(-) diff --git a/llvm/lib/CodeGen/AtomicExpandPass.cpp b/llvm/lib/CodeGen/AtomicExpandPass.cpp index c376de877ac7d..70f59eafc6ecb 100644 --- a/llvm/lib/CodeGen/AtomicExpandPass.cpp +++ b/llvm/lib/CodeGen/AtomicExpandPass.cpp @@ -2066,9 +2066,18 @@ bool AtomicExpandImpl::expandAtomicOpToLibcall( I->replaceAllUsesWith(V); } else if (HasResult) { Value *V; -if (UseSizedLibcall) - V = Builder.CreateBitOrPointerCast(Result, I->getType()); -else { +if (UseSizedLibcall) { + // Add bitcasts from Result's scalar type to I's vector type + auto *PtrTy = dyn_cast(I->getType()->getScalarType()); + auto *VTy = dyn_cast(I->getType()); + if (VTy && PtrTy && !Result->getType()->isVectorTy()) { +unsigned AS = PtrTy->getAddressSpace(); +Value *BC = Builder.CreateBitCast( +Result, VTy->getWithNewType(DL.getIntPtrType(Ctx, AS))); +V = Builder.CreateIntToPtr(BC, I->getType()); + } else +V = Builder.CreateBitOrPointerCast(Result, I->getType()); +} else { V = Builder.CreateAlignedLoad(I->getType(), AllocaResult, AllocaAlignment); Builder.CreateLifetimeEnd(AllocaResult, SizeVal64); diff --git a/llvm/test/CodeGen/ARM/atomic-load-store.ll b/llvm/test/CodeGen/ARM/atomic-load-store.ll index 560dfde356c29..eaa2ffd9b2731 100644 --- a/llvm/test/CodeGen/ARM/atomic-load-store.ll +++ b/llvm/test/CodeGen/ARM/atomic-load-store.ll @@ -983,3 +983,54 @@ define void @store_atomic_f64__seq_cst(ptr %ptr, double %val1) { store atomic double %val1, ptr %ptr seq_cst, align 8 ret void } + +define <1 x ptr> @atomic_vec1_ptr(ptr %x) #0 { +; ARM-LABEL: atomic_vec1_ptr: +; ARM: @ %bb.0: +; ARM-NEXT:ldr r0, [r0] +; ARM-NEXT:dmb ish +; ARM-NEXT:bx lr +; +; ARMOPTNONE-LABEL: atomic_vec1_ptr: +; ARMOPTNONE: @ %bb.0: +; ARMOPTNONE-NEXT:ldr r0, [r0] +; ARMOPTNONE-NEXT:dmb ish +; ARMOPTNONE-NEXT:bx lr +; +; THUMBTWO-LABEL: atomic_vec1_ptr: +; THUMBTWO: @ %bb.0: +; THUMBTWO-NEXT:ldr r0, [r0] +; THUMBTWO-NEXT:dmb ish +; THUMBTWO-NEXT:bx lr +; +; THUMBONE-LABEL: atomic_vec1_ptr: +; THUMBONE: @ %bb.0: +; THUMBONE-NEXT:push {r7, lr} +; THUMBONE-NEXT:movs r1, #0 +; THUMBONE-NEXT:mov r2, r1 +; THUMBONE-NEXT:bl __sync_val_compare_and_swap_4 +; THUMBONE-NEXT:pop {r7, pc} +; +; ARMV4-LABEL: atomic_vec1_ptr: +; ARMV4: @ %bb.0: +; ARMV4-NEXT:push {r11, lr} +; ARMV4-NEXT:mov r1, #2 +; ARMV4-NEXT:bl __atomic_load_4 +; ARMV4-NEXT:pop {r11, lr} +; ARMV4-NEXT:mov pc, lr +; +; ARMV6-LABEL: atomic_vec1_ptr: +; ARMV6: @ %bb.0: +; ARMV6-NEXT:ldr r0, [r0] +; ARMV6-NEXT:mov r1, #0 +; ARMV6-NEXT:mcr p15, #0, r1, c7, c10, #5 +; ARMV6-NEXT:bx lr +; +; THUMBM-LABEL: atomic_vec1_ptr: +; THUMBM: @ %bb.0: +; THUMBM-NEXT:ldr r0, [r0] +; THUMBM-NEXT:dmb sy +; THUMBM-NEXT:bx lr + %ret = load atomic <1 x ptr>, ptr %x acquire, align 4 + ret <1 x ptr> %ret +} diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll b/llvm/test/CodeGen/X86/atomic-load-store.ll index 5473ae91ddbf7..053222660e5cd 100644 --- a/llvm/test/CodeGen/X86/atomic-load-store.ll +++ b/llvm/test/CodeGen/X86/atomic-load-store.ll @@ -381,6 +381,21 @@ define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind { ret <2 x i32> %ret } +define <2 x ptr> @atomic_vec2_ptr_align(ptr %x) nounwind { +; CHECK-LABEL: atomic_vec2_ptr_align: +; CHECK: ## %bb.0: +; CHECK-NEXT:pushq %rax +; CHECK-NEXT:movl $2, %esi +; CHECK-NEXT:callq ___atomic_load_16 +; CHECK-NEXT:movq %rdx, %xmm1 +; CHECK-NEXT:movq %rax, %xmm0 +; CHECK-NEXT:punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0] +; CHECK-NEXT:popq %rax +; CHECK-NEXT:retq + %ret = load atomic <2 x ptr>, ptr %x acquire, align 16 + ret <2 x ptr> %ret +} + define <4 x i8> @atomic_vec4_i8(ptr %x) nounwind { ; CHECK3-LABEL: atomic_vec4_i8: ; CHECK3: ## %bb.0: @@ -404,6 +419,21 @@ define <4 x i16> @atomic_vec4_i16(ptr %x) nounwind { ret <4 x i16> %ret } +define <4 x ptr addrspace(270)> @atomic_vec4_ptr270(ptr %x) nounwind { +; CHECK-LABEL: atomic_vec4_ptr270: +; CHECK: ## %b
[llvm-branch-commits] MC: Emit symbols for R_X86_64_PLT32 relocation pointing to symbols with non-zero values. (PR #138795)
MaskRay wrote: > The alternative fix, which I think I'm now leaning towards, would be to > change how the branch-to-branch optimization handles relocations to > STT_SECTION symbols. A relocation pointing to the STT_SECTION for .text with > addend 1 would be treated as a branch to .text+5 and it would be invalid to > assemble a relative vtable relocation that points to STT_SECTION. It would > also be consistent with how we process relocations for string tail merging > and ICF among other places, e.g. > [here](https://github.com/llvm/llvm-project/blob/8602a655a8150753542b0237fcca16d9ee1cd981/lld/ELF/ICF.cpp#L304). > The downside is that it adds another special case for STT_SECTION but I > guess that's fine. Thanks for the explanation. I haven't read through the lld branch-to-branch optimization patch. However, I agree that adding a special case for STT_SECTION to the linker might be the right solution. Changing the assembler regarding PLT32 looks fishy. https://github.com/llvm/llvm-project/pull/138795 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AtomicExpand] Add bitcasts when expanding load atomic vector (PR #120716)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/120716 >From 66207ccdf0f96a5836b3bafeb37c90d9a762d944 Mon Sep 17 00:00:00 2001 From: jofrn Date: Fri, 20 Dec 2024 06:14:28 -0500 Subject: [PATCH] [AtomicExpand] Add bitcasts when expanding load atomic vector AtomicExpand fails for aligned `load atomic ` because it does not find a compatible library call. This change adds appropriate bitcasts so that the call can be lowered. commit-id:f430c1af --- llvm/lib/CodeGen/AtomicExpandPass.cpp | 15 - llvm/test/CodeGen/ARM/atomic-load-store.ll| 51 +++ llvm/test/CodeGen/X86/atomic-load-store.ll| 30 + .../X86/expand-atomic-non-integer.ll | 65 +++ 4 files changed, 158 insertions(+), 3 deletions(-) diff --git a/llvm/lib/CodeGen/AtomicExpandPass.cpp b/llvm/lib/CodeGen/AtomicExpandPass.cpp index c376de877ac7d..70f59eafc6ecb 100644 --- a/llvm/lib/CodeGen/AtomicExpandPass.cpp +++ b/llvm/lib/CodeGen/AtomicExpandPass.cpp @@ -2066,9 +2066,18 @@ bool AtomicExpandImpl::expandAtomicOpToLibcall( I->replaceAllUsesWith(V); } else if (HasResult) { Value *V; -if (UseSizedLibcall) - V = Builder.CreateBitOrPointerCast(Result, I->getType()); -else { +if (UseSizedLibcall) { + // Add bitcasts from Result's scalar type to I's vector type + auto *PtrTy = dyn_cast(I->getType()->getScalarType()); + auto *VTy = dyn_cast(I->getType()); + if (VTy && PtrTy && !Result->getType()->isVectorTy()) { +unsigned AS = PtrTy->getAddressSpace(); +Value *BC = Builder.CreateBitCast( +Result, VTy->getWithNewType(DL.getIntPtrType(Ctx, AS))); +V = Builder.CreateIntToPtr(BC, I->getType()); + } else +V = Builder.CreateBitOrPointerCast(Result, I->getType()); +} else { V = Builder.CreateAlignedLoad(I->getType(), AllocaResult, AllocaAlignment); Builder.CreateLifetimeEnd(AllocaResult, SizeVal64); diff --git a/llvm/test/CodeGen/ARM/atomic-load-store.ll b/llvm/test/CodeGen/ARM/atomic-load-store.ll index 560dfde356c29..eaa2ffd9b2731 100644 --- a/llvm/test/CodeGen/ARM/atomic-load-store.ll +++ b/llvm/test/CodeGen/ARM/atomic-load-store.ll @@ -983,3 +983,54 @@ define void @store_atomic_f64__seq_cst(ptr %ptr, double %val1) { store atomic double %val1, ptr %ptr seq_cst, align 8 ret void } + +define <1 x ptr> @atomic_vec1_ptr(ptr %x) #0 { +; ARM-LABEL: atomic_vec1_ptr: +; ARM: @ %bb.0: +; ARM-NEXT:ldr r0, [r0] +; ARM-NEXT:dmb ish +; ARM-NEXT:bx lr +; +; ARMOPTNONE-LABEL: atomic_vec1_ptr: +; ARMOPTNONE: @ %bb.0: +; ARMOPTNONE-NEXT:ldr r0, [r0] +; ARMOPTNONE-NEXT:dmb ish +; ARMOPTNONE-NEXT:bx lr +; +; THUMBTWO-LABEL: atomic_vec1_ptr: +; THUMBTWO: @ %bb.0: +; THUMBTWO-NEXT:ldr r0, [r0] +; THUMBTWO-NEXT:dmb ish +; THUMBTWO-NEXT:bx lr +; +; THUMBONE-LABEL: atomic_vec1_ptr: +; THUMBONE: @ %bb.0: +; THUMBONE-NEXT:push {r7, lr} +; THUMBONE-NEXT:movs r1, #0 +; THUMBONE-NEXT:mov r2, r1 +; THUMBONE-NEXT:bl __sync_val_compare_and_swap_4 +; THUMBONE-NEXT:pop {r7, pc} +; +; ARMV4-LABEL: atomic_vec1_ptr: +; ARMV4: @ %bb.0: +; ARMV4-NEXT:push {r11, lr} +; ARMV4-NEXT:mov r1, #2 +; ARMV4-NEXT:bl __atomic_load_4 +; ARMV4-NEXT:pop {r11, lr} +; ARMV4-NEXT:mov pc, lr +; +; ARMV6-LABEL: atomic_vec1_ptr: +; ARMV6: @ %bb.0: +; ARMV6-NEXT:ldr r0, [r0] +; ARMV6-NEXT:mov r1, #0 +; ARMV6-NEXT:mcr p15, #0, r1, c7, c10, #5 +; ARMV6-NEXT:bx lr +; +; THUMBM-LABEL: atomic_vec1_ptr: +; THUMBM: @ %bb.0: +; THUMBM-NEXT:ldr r0, [r0] +; THUMBM-NEXT:dmb sy +; THUMBM-NEXT:bx lr + %ret = load atomic <1 x ptr>, ptr %x acquire, align 4 + ret <1 x ptr> %ret +} diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll b/llvm/test/CodeGen/X86/atomic-load-store.ll index 5473ae91ddbf7..053222660e5cd 100644 --- a/llvm/test/CodeGen/X86/atomic-load-store.ll +++ b/llvm/test/CodeGen/X86/atomic-load-store.ll @@ -381,6 +381,21 @@ define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind { ret <2 x i32> %ret } +define <2 x ptr> @atomic_vec2_ptr_align(ptr %x) nounwind { +; CHECK-LABEL: atomic_vec2_ptr_align: +; CHECK: ## %bb.0: +; CHECK-NEXT:pushq %rax +; CHECK-NEXT:movl $2, %esi +; CHECK-NEXT:callq ___atomic_load_16 +; CHECK-NEXT:movq %rdx, %xmm1 +; CHECK-NEXT:movq %rax, %xmm0 +; CHECK-NEXT:punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0] +; CHECK-NEXT:popq %rax +; CHECK-NEXT:retq + %ret = load atomic <2 x ptr>, ptr %x acquire, align 16 + ret <2 x ptr> %ret +} + define <4 x i8> @atomic_vec4_i8(ptr %x) nounwind { ; CHECK3-LABEL: atomic_vec4_i8: ; CHECK3: ## %bb.0: @@ -404,6 +419,21 @@ define <4 x i16> @atomic_vec4_i16(ptr %x) nounwind { ret <4 x i16> %ret } +define <4 x ptr addrspace(270)> @atomic_vec4_ptr270(ptr %x) nounwind { +; CHECK-LABEL: atomic_vec4_ptr270: +; CHECK: ## %b
[llvm-branch-commits] [llvm] [AtomicExpand] Add bitcasts when expanding load atomic vector (PR #120716)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/120716 >From 66207ccdf0f96a5836b3bafeb37c90d9a762d944 Mon Sep 17 00:00:00 2001 From: jofrn Date: Fri, 20 Dec 2024 06:14:28 -0500 Subject: [PATCH] [AtomicExpand] Add bitcasts when expanding load atomic vector AtomicExpand fails for aligned `load atomic ` because it does not find a compatible library call. This change adds appropriate bitcasts so that the call can be lowered. commit-id:f430c1af --- llvm/lib/CodeGen/AtomicExpandPass.cpp | 15 - llvm/test/CodeGen/ARM/atomic-load-store.ll| 51 +++ llvm/test/CodeGen/X86/atomic-load-store.ll| 30 + .../X86/expand-atomic-non-integer.ll | 65 +++ 4 files changed, 158 insertions(+), 3 deletions(-) diff --git a/llvm/lib/CodeGen/AtomicExpandPass.cpp b/llvm/lib/CodeGen/AtomicExpandPass.cpp index c376de877ac7d..70f59eafc6ecb 100644 --- a/llvm/lib/CodeGen/AtomicExpandPass.cpp +++ b/llvm/lib/CodeGen/AtomicExpandPass.cpp @@ -2066,9 +2066,18 @@ bool AtomicExpandImpl::expandAtomicOpToLibcall( I->replaceAllUsesWith(V); } else if (HasResult) { Value *V; -if (UseSizedLibcall) - V = Builder.CreateBitOrPointerCast(Result, I->getType()); -else { +if (UseSizedLibcall) { + // Add bitcasts from Result's scalar type to I's vector type + auto *PtrTy = dyn_cast(I->getType()->getScalarType()); + auto *VTy = dyn_cast(I->getType()); + if (VTy && PtrTy && !Result->getType()->isVectorTy()) { +unsigned AS = PtrTy->getAddressSpace(); +Value *BC = Builder.CreateBitCast( +Result, VTy->getWithNewType(DL.getIntPtrType(Ctx, AS))); +V = Builder.CreateIntToPtr(BC, I->getType()); + } else +V = Builder.CreateBitOrPointerCast(Result, I->getType()); +} else { V = Builder.CreateAlignedLoad(I->getType(), AllocaResult, AllocaAlignment); Builder.CreateLifetimeEnd(AllocaResult, SizeVal64); diff --git a/llvm/test/CodeGen/ARM/atomic-load-store.ll b/llvm/test/CodeGen/ARM/atomic-load-store.ll index 560dfde356c29..eaa2ffd9b2731 100644 --- a/llvm/test/CodeGen/ARM/atomic-load-store.ll +++ b/llvm/test/CodeGen/ARM/atomic-load-store.ll @@ -983,3 +983,54 @@ define void @store_atomic_f64__seq_cst(ptr %ptr, double %val1) { store atomic double %val1, ptr %ptr seq_cst, align 8 ret void } + +define <1 x ptr> @atomic_vec1_ptr(ptr %x) #0 { +; ARM-LABEL: atomic_vec1_ptr: +; ARM: @ %bb.0: +; ARM-NEXT:ldr r0, [r0] +; ARM-NEXT:dmb ish +; ARM-NEXT:bx lr +; +; ARMOPTNONE-LABEL: atomic_vec1_ptr: +; ARMOPTNONE: @ %bb.0: +; ARMOPTNONE-NEXT:ldr r0, [r0] +; ARMOPTNONE-NEXT:dmb ish +; ARMOPTNONE-NEXT:bx lr +; +; THUMBTWO-LABEL: atomic_vec1_ptr: +; THUMBTWO: @ %bb.0: +; THUMBTWO-NEXT:ldr r0, [r0] +; THUMBTWO-NEXT:dmb ish +; THUMBTWO-NEXT:bx lr +; +; THUMBONE-LABEL: atomic_vec1_ptr: +; THUMBONE: @ %bb.0: +; THUMBONE-NEXT:push {r7, lr} +; THUMBONE-NEXT:movs r1, #0 +; THUMBONE-NEXT:mov r2, r1 +; THUMBONE-NEXT:bl __sync_val_compare_and_swap_4 +; THUMBONE-NEXT:pop {r7, pc} +; +; ARMV4-LABEL: atomic_vec1_ptr: +; ARMV4: @ %bb.0: +; ARMV4-NEXT:push {r11, lr} +; ARMV4-NEXT:mov r1, #2 +; ARMV4-NEXT:bl __atomic_load_4 +; ARMV4-NEXT:pop {r11, lr} +; ARMV4-NEXT:mov pc, lr +; +; ARMV6-LABEL: atomic_vec1_ptr: +; ARMV6: @ %bb.0: +; ARMV6-NEXT:ldr r0, [r0] +; ARMV6-NEXT:mov r1, #0 +; ARMV6-NEXT:mcr p15, #0, r1, c7, c10, #5 +; ARMV6-NEXT:bx lr +; +; THUMBM-LABEL: atomic_vec1_ptr: +; THUMBM: @ %bb.0: +; THUMBM-NEXT:ldr r0, [r0] +; THUMBM-NEXT:dmb sy +; THUMBM-NEXT:bx lr + %ret = load atomic <1 x ptr>, ptr %x acquire, align 4 + ret <1 x ptr> %ret +} diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll b/llvm/test/CodeGen/X86/atomic-load-store.ll index 5473ae91ddbf7..053222660e5cd 100644 --- a/llvm/test/CodeGen/X86/atomic-load-store.ll +++ b/llvm/test/CodeGen/X86/atomic-load-store.ll @@ -381,6 +381,21 @@ define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind { ret <2 x i32> %ret } +define <2 x ptr> @atomic_vec2_ptr_align(ptr %x) nounwind { +; CHECK-LABEL: atomic_vec2_ptr_align: +; CHECK: ## %bb.0: +; CHECK-NEXT:pushq %rax +; CHECK-NEXT:movl $2, %esi +; CHECK-NEXT:callq ___atomic_load_16 +; CHECK-NEXT:movq %rdx, %xmm1 +; CHECK-NEXT:movq %rax, %xmm0 +; CHECK-NEXT:punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0] +; CHECK-NEXT:popq %rax +; CHECK-NEXT:retq + %ret = load atomic <2 x ptr>, ptr %x acquire, align 16 + ret <2 x ptr> %ret +} + define <4 x i8> @atomic_vec4_i8(ptr %x) nounwind { ; CHECK3-LABEL: atomic_vec4_i8: ; CHECK3: ## %bb.0: @@ -404,6 +419,21 @@ define <4 x i16> @atomic_vec4_i16(ptr %x) nounwind { ret <4 x i16> %ret } +define <4 x ptr addrspace(270)> @atomic_vec4_ptr270(ptr %x) nounwind { +; CHECK-LABEL: atomic_vec4_ptr270: +; CHECK: ## %b
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: add RegBankLegalize rules for select (PR #132384)
@@ -1390,29 +927,17 @@ legalized: true body: | bb.0: liveins: $sgpr0, $sgpr1, $sgpr2_sgpr3, $sgpr4_sgpr5 -; FAST-LABEL: name: select_p999_scc_ss -; FAST: liveins: $sgpr0, $sgpr1, $sgpr2_sgpr3, $sgpr4_sgpr5 -; FAST-NEXT: {{ $}} -; FAST-NEXT: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0 -; FAST-NEXT: [[COPY1:%[0-9]+]]:sgpr(s32) = COPY $sgpr1 -; FAST-NEXT: [[COPY2:%[0-9]+]]:sgpr(p999) = COPY $sgpr2_sgpr3 -; FAST-NEXT: [[COPY3:%[0-9]+]]:sgpr(p999) = COPY $sgpr4_sgpr5 -; FAST-NEXT: [[ICMP:%[0-9]+]]:sgpr(s32) = G_ICMP intpred(ne), [[COPY]](s32), [[COPY1]] -; FAST-NEXT: [[TRUNC:%[0-9]+]]:sgpr(s1) = G_TRUNC [[ICMP]](s32) -; FAST-NEXT: [[ZEXT:%[0-9]+]]:sgpr(s32) = G_ZEXT [[TRUNC]](s1) -; FAST-NEXT: [[SELECT:%[0-9]+]]:sgpr(p999) = G_SELECT [[ZEXT]](s32), [[COPY2]], [[COPY3]] -; -; GREEDY-LABEL: name: select_p999_scc_ss -; GREEDY: liveins: $sgpr0, $sgpr1, $sgpr2_sgpr3, $sgpr4_sgpr5 -; GREEDY-NEXT: {{ $}} -; GREEDY-NEXT: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0 -; GREEDY-NEXT: [[COPY1:%[0-9]+]]:sgpr(s32) = COPY $sgpr1 -; GREEDY-NEXT: [[COPY2:%[0-9]+]]:sgpr(p999) = COPY $sgpr2_sgpr3 -; GREEDY-NEXT: [[COPY3:%[0-9]+]]:sgpr(p999) = COPY $sgpr4_sgpr5 -; GREEDY-NEXT: [[ICMP:%[0-9]+]]:sgpr(s32) = G_ICMP intpred(ne), [[COPY]](s32), [[COPY1]] -; GREEDY-NEXT: [[TRUNC:%[0-9]+]]:sgpr(s1) = G_TRUNC [[ICMP]](s32) -; GREEDY-NEXT: [[ZEXT:%[0-9]+]]:sgpr(s32) = G_ZEXT [[TRUNC]](s1) -; GREEDY-NEXT: [[SELECT:%[0-9]+]]:sgpr(p999) = G_SELECT [[ZEXT]](s32), [[COPY2]], [[COPY3]] +; CHECK-LABEL: name: select_p999_scc_ss +; CHECK: liveins: $sgpr0, $sgpr1, $sgpr2_sgpr3, $sgpr4_sgpr5 +; CHECK-NEXT: {{ $}} +; CHECK-NEXT: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0 +; CHECK-NEXT: [[COPY1:%[0-9]+]]:sgpr(s32) = COPY $sgpr1 +; CHECK-NEXT: [[COPY2:%[0-9]+]]:sgpr(p999) = COPY $sgpr2_sgpr3 +; CHECK-NEXT: [[COPY3:%[0-9]+]]:sgpr(p999) = COPY $sgpr4_sgpr5 +; CHECK-NEXT: [[ICMP:%[0-9]+]]:sgpr(s32) = G_ICMP intpred(ne), [[COPY]](s32), [[COPY1]] +; CHECK-NEXT: [[C:%[0-9]+]]:sgpr(s32) = G_CONSTANT i32 1 +; CHECK-NEXT: [[AND:%[0-9]+]]:sgpr(s32) = G_AND [[ICMP]], [[C]] +; CHECK-NEXT: [[SELECT:%[0-9]+]]:sgpr(p999) = G_SELECT [[AND]](s32), [[COPY2]], [[COPY3]] petar-avramovic wrote: Already existing test. As suggested by Fabian, this is check for any other address space not p999 specifically. https://github.com/llvm/llvm-project/pull/132384 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: add RegBankLegalize rules for AND OR and XOR (PR #132382)
@@ -341,9 +328,9 @@ body: | ; CHECK-NEXT: [[COPY1:%[0-9]+]]:vgpr(s64) = COPY $vgpr2_vgpr3 ; CHECK-NEXT: [[UV:%[0-9]+]]:vgpr(s32), [[UV1:%[0-9]+]]:vgpr(s32) = G_UNMERGE_VALUES [[COPY]](s64) ; CHECK-NEXT: [[UV2:%[0-9]+]]:vgpr(s32), [[UV3:%[0-9]+]]:vgpr(s32) = G_UNMERGE_VALUES [[COPY1]](s64) -; CHECK-NEXT: %3:vgpr(s32) = disjoint G_OR [[UV]], [[UV2]] -; CHECK-NEXT: %4:vgpr(s32) = disjoint G_OR [[UV1]], [[UV3]] -; CHECK-NEXT: [[MV:%[0-9]+]]:vgpr(s64) = G_MERGE_VALUES %3(s32), %4(s32) +; CHECK-NEXT: [[OR:%[0-9]+]]:vgpr(s32) = disjoint G_OR [[UV]], [[UV2]] +; CHECK-NEXT: [[OR1:%[0-9]+]]:vgpr(s32) = disjoint G_OR [[UV1]], [[UV3]] petar-avramovic wrote: Test with flags on MI https://github.com/llvm/llvm-project/pull/132382 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [SelectionDAG][X86] Remove unused elements from atomic vector. (PR #125432)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/125432 >From b50f6786084321c6c1402b475b07c998157cd506 Mon Sep 17 00:00:00 2001 From: jofrn Date: Fri, 31 Jan 2025 13:12:56 -0500 Subject: [PATCH] [SelectionDAG][X86] Remove unused elements from atomic vector. After splitting, all elements are created. The two components must be found by looking at the upper and lower half of EXTRACT_ELEMENT. This change extends EltsFromConsecutiveLoads to understand AtomicSDNode so that unused elements can be removed. commit-id:b83937a8 --- llvm/include/llvm/CodeGen/SelectionDAG.h | 4 +- .../lib/CodeGen/SelectionDAG/SelectionDAG.cpp | 20 ++- .../SelectionDAGAddressAnalysis.cpp | 30 ++-- llvm/lib/Target/X86/X86ISelLowering.cpp | 73 +++-- llvm/test/CodeGen/X86/atomic-load-store.ll| 149 ++ 5 files changed, 104 insertions(+), 172 deletions(-) diff --git a/llvm/include/llvm/CodeGen/SelectionDAG.h b/llvm/include/llvm/CodeGen/SelectionDAG.h index ba11ddbb5b731..d3cd81c146280 100644 --- a/llvm/include/llvm/CodeGen/SelectionDAG.h +++ b/llvm/include/llvm/CodeGen/SelectionDAG.h @@ -1843,7 +1843,7 @@ class SelectionDAG { /// chain to the token factor. This ensures that the new memory node will have /// the same relative memory dependency position as the old load. Returns the /// new merged load chain. - SDValue makeEquivalentMemoryOrdering(LoadSDNode *OldLoad, SDValue NewMemOp); + SDValue makeEquivalentMemoryOrdering(MemSDNode *OldLoad, SDValue NewMemOp); /// Topological-sort the AllNodes list and a /// assign a unique node id for each node in the DAG based on their @@ -2281,7 +2281,7 @@ class SelectionDAG { /// merged. Check that both are nonvolatile and if LD is loading /// 'Bytes' bytes from a location that is 'Dist' units away from the /// location that the 'Base' load is loading from. - bool areNonVolatileConsecutiveLoads(LoadSDNode *LD, LoadSDNode *Base, + bool areNonVolatileConsecutiveLoads(MemSDNode *LD, MemSDNode *Base, unsigned Bytes, int Dist) const; /// Infer alignment of a load / store address. Return std::nullopt if it diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp index 2a68903c34cef..8e77a542ab029 100644 --- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp @@ -12218,7 +12218,7 @@ SDValue SelectionDAG::makeEquivalentMemoryOrdering(SDValue OldChain, return TokenFactor; } -SDValue SelectionDAG::makeEquivalentMemoryOrdering(LoadSDNode *OldLoad, +SDValue SelectionDAG::makeEquivalentMemoryOrdering(MemSDNode *OldLoad, SDValue NewMemOp) { assert(isa(NewMemOp.getNode()) && "Expected a memop node"); SDValue OldChain = SDValue(OldLoad, 1); @@ -12911,17 +12911,21 @@ std::pair SelectionDAG::UnrollVectorOverflowOp( getBuildVector(NewOvVT, dl, OvScalars)); } -bool SelectionDAG::areNonVolatileConsecutiveLoads(LoadSDNode *LD, - LoadSDNode *Base, +bool SelectionDAG::areNonVolatileConsecutiveLoads(MemSDNode *LD, + MemSDNode *Base, unsigned Bytes, int Dist) const { if (LD->isVolatile() || Base->isVolatile()) return false; - // TODO: probably too restrictive for atomics, revisit - if (!LD->isSimple()) -return false; - if (LD->isIndexed() || Base->isIndexed()) -return false; + if (auto Ld = dyn_cast(LD)) { +if (!Ld->isSimple()) + return false; +if (Ld->isIndexed()) + return false; + } + if (auto Ld = dyn_cast(Base)) +if (Ld->isIndexed()) + return false; if (LD->getChain() != Base->getChain()) return false; EVT VT = LD->getMemoryVT(); diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp index f2ab88851b780..c29cb424c7a4c 100644 --- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp @@ -195,8 +195,8 @@ bool BaseIndexOffset::contains(const SelectionDAG &DAG, int64_t BitSize, } /// Parses tree in Ptr for base, index, offset addresses. -static BaseIndexOffset matchLSNode(const LSBaseSDNode *N, - const SelectionDAG &DAG) { +template +static BaseIndexOffset matchSDNode(const T *N, const SelectionDAG &DAG) { SDValue Ptr = N->getBasePtr(); // (((B + I*M) + c)) + c ... @@ -206,16 +206,18 @@ static BaseIndexOffset matchLSNode(const LSBaseSDNode *N, bool IsIndexSignExt = false; // pre-inc/pre-dec ops are components of EA. - if (N->getAddressingMode() == ISD::PRE_INC) { -if (auto *C = dyn_cast(N->getOffset(
[llvm-branch-commits] [clang] [libcxx] [lld] [lldb] [llvm] [mlir] [KeyInstr] Add MIR parser support (PR #133494)
https://github.com/OCHyams updated https://github.com/llvm/llvm-project/pull/133494 error: too big or took too long to generate ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AtomicExpand] Add bitcasts when expanding load atomic vector (PR #120716)
@@ -2066,9 +2066,18 @@ bool AtomicExpandImpl::expandAtomicOpToLibcall( I->replaceAllUsesWith(V); } else if (HasResult) { Value *V; -if (UseSizedLibcall) - V = Builder.CreateBitOrPointerCast(Result, I->getType()); -else { +if (UseSizedLibcall) { + // Add bitcasts from Result's scalar type to I's vector type + auto *PtrTy = dyn_cast(I->getType()->getScalarType()); + auto *VTy = dyn_cast(I->getType()); + if (VTy && PtrTy && !Result->getType()->isVectorTy()) { +unsigned AS = PtrTy->getAddressSpace(); +Value *BC = Builder.CreateBitCast( +Result, VTy->getWithNewType(DL.getIntPtrType(Ctx, AS))); +V = Builder.CreateIntToPtr(BC, I->getType()); + } else +V = Builder.CreateBitOrPointerCast(Result, I->getType()); arsenm wrote: ```suggestion V = Builder.CreateBitOrPointerCast(Result, VTy); ``` https://github.com/llvm/llvm-project/pull/120716 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AtomicExpand] Add bitcasts when expanding load atomic vector (PR #120716)
@@ -2066,9 +2066,18 @@ bool AtomicExpandImpl::expandAtomicOpToLibcall( I->replaceAllUsesWith(V); } else if (HasResult) { Value *V; -if (UseSizedLibcall) - V = Builder.CreateBitOrPointerCast(Result, I->getType()); -else { +if (UseSizedLibcall) { + // Add bitcasts from Result's scalar type to I's vector type + auto *PtrTy = dyn_cast(I->getType()->getScalarType()); + auto *VTy = dyn_cast(I->getType()); + if (VTy && PtrTy && !Result->getType()->isVectorTy()) { arsenm wrote: There should probably be a utility for this somewhere https://github.com/llvm/llvm-project/pull/120716 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AtomicExpand] Add bitcasts when expanding load atomic vector (PR #120716)
@@ -2066,9 +2066,18 @@ bool AtomicExpandImpl::expandAtomicOpToLibcall( I->replaceAllUsesWith(V); } else if (HasResult) { Value *V; -if (UseSizedLibcall) - V = Builder.CreateBitOrPointerCast(Result, I->getType()); -else { +if (UseSizedLibcall) { + // Add bitcasts from Result's scalar type to I's vector type + auto *PtrTy = dyn_cast(I->getType()->getScalarType()); + auto *VTy = dyn_cast(I->getType()); + if (VTy && PtrTy && !Result->getType()->isVectorTy()) { +unsigned AS = PtrTy->getAddressSpace(); +Value *BC = Builder.CreateBitCast( +Result, VTy->getWithNewType(DL.getIntPtrType(Ctx, AS))); +V = Builder.CreateIntToPtr(BC, I->getType()); arsenm wrote: ```suggestion V = Builder.CreateIntToPtr(BC, VTy); ``` https://github.com/llvm/llvm-project/pull/120716 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AtomicExpand] Add bitcasts when expanding load atomic vector (PR #120716)
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/120716 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AtomicExpand] Add bitcasts when expanding load atomic vector (PR #120716)
@@ -151,3 +151,68 @@ define void @pointer_cmpxchg_expand6(ptr addrspace(1) %ptr, ret void } +define <2 x ptr> @atomic_vec2_ptr_align(ptr %x) nounwind { +; CHECK-LABEL: define <2 x ptr> @atomic_vec2_ptr_align( +; CHECK-SAME: ptr [[X:%.*]]) #[[ATTR0:[0-9]+]] { +; CHECK-NEXT:[[TMP1:%.*]] = call i128 @__atomic_load_16(ptr [[X]], i32 2) +; CHECK-NEXT:[[TMP6:%.*]] = bitcast i128 [[TMP1]] to <2 x i64> +; CHECK-NEXT:[[TMP7:%.*]] = inttoptr <2 x i64> [[TMP6]] to <2 x ptr> +; CHECK-NEXT:ret <2 x ptr> [[TMP7]] +; + %ret = load atomic <2 x ptr>, ptr %x acquire, align 16 + ret <2 x ptr> %ret +} + +define <4 x ptr addrspace(270)> @atomic_vec4_ptr_align(ptr %x) nounwind { +; CHECK-LABEL: define <4 x ptr addrspace(270)> @atomic_vec4_ptr_align( +; CHECK-SAME: ptr [[X:%.*]]) #[[ATTR0]] { +; CHECK-NEXT:[[TMP1:%.*]] = call i128 @__atomic_load_16(ptr [[X]], i32 2) +; CHECK-NEXT:[[TMP2:%.*]] = bitcast i128 [[TMP1]] to <4 x i32> +; CHECK-NEXT:[[TMP3:%.*]] = inttoptr <4 x i32> [[TMP2]] to <4 x ptr addrspace(270)> +; CHECK-NEXT:ret <4 x ptr addrspace(270)> [[TMP3]] +; + %ret = load atomic <4 x ptr addrspace(270)>, ptr %x acquire, align 16 + ret <4 x ptr addrspace(270)> %ret +} + +define <2 x i16> @atomic_vec2_i16(ptr %x) nounwind { +; CHECK-LABEL: define <2 x i16> @atomic_vec2_i16( +; CHECK-SAME: ptr [[X:%.*]]) #[[ATTR0]] { +; CHECK-NEXT:[[RET:%.*]] = load atomic <2 x i16>, ptr [[X]] acquire, align 8 +; CHECK-NEXT:ret <2 x i16> [[RET]] +; + %ret = load atomic <2 x i16>, ptr %x acquire, align 8 + ret <2 x i16> %ret +} + +define <2 x half> @atomic_vec2_half(ptr %x) nounwind { +; CHECK-LABEL: define <2 x half> @atomic_vec2_half( +; CHECK-SAME: ptr [[X:%.*]]) #[[ATTR0]] { +; CHECK-NEXT:[[RET:%.*]] = load atomic <2 x half>, ptr [[X]] acquire, align 8 +; CHECK-NEXT:ret <2 x half> [[RET]] +; + %ret = load atomic <2 x half>, ptr %x acquire, align 8 arsenm wrote: The 2 x i16 and 2 x half cases aren't expanded, I guess the default expansion rule missed these for some reason? https://github.com/llvm/llvm-project/pull/120716 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions with different extensions (PR #136997)
https://github.com/SamTebbs33 updated https://github.com/llvm/llvm-project/pull/136997 >From 10c4727074a7f5b4502ad08dc655be8fa5ffa3d2 Mon Sep 17 00:00:00 2001 From: Samuel Tebbs Date: Wed, 23 Apr 2025 13:16:38 +0100 Subject: [PATCH 1/3] [LoopVectorizer] Bundle partial reductions with different extensions This PR adds support for extensions of different signedness to VPMulAccumulateReductionRecipe and allows such partial reductions to be bundled into that class. --- llvm/lib/Transforms/Vectorize/VPlan.h | 42 +- .../lib/Transforms/Vectorize/VPlanRecipes.cpp | 27 ++--- .../Transforms/Vectorize/VPlanTransforms.cpp | 25 - .../partial-reduce-dot-product-mixed.ll | 56 +-- .../LoopVectorize/AArch64/vplan-printing.ll | 29 +- 5 files changed, 99 insertions(+), 80 deletions(-) diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h index 20d272e69e6e7..e11f608d068da 100644 --- a/llvm/lib/Transforms/Vectorize/VPlan.h +++ b/llvm/lib/Transforms/Vectorize/VPlan.h @@ -2493,11 +2493,13 @@ class VPExtendedReductionRecipe : public VPReductionRecipe { /// recipe is abstract and needs to be lowered to concrete recipes before /// codegen. The Operands are {ChainOp, VecOp1, VecOp2, [Condition]}. class VPMulAccumulateReductionRecipe : public VPReductionRecipe { - /// Opcode of the extend recipe. - Instruction::CastOps ExtOp; + /// Opcodes of the extend recipes. + Instruction::CastOps ExtOp0; + Instruction::CastOps ExtOp1; - /// Non-neg flag of the extend recipe. - bool IsNonNeg = false; + /// Non-neg flags of the extend recipe. + bool IsNonNeg0 = false; + bool IsNonNeg1 = false; Type *ResultTy; @@ -2512,7 +2514,8 @@ class VPMulAccumulateReductionRecipe : public VPReductionRecipe { MulAcc->getCondOp(), MulAcc->isOrdered(), WrapFlagsTy(MulAcc->hasNoUnsignedWrap(), MulAcc->hasNoSignedWrap()), MulAcc->getDebugLoc()), -ExtOp(MulAcc->getExtOpcode()), IsNonNeg(MulAcc->isNonNeg()), +ExtOp0(MulAcc->getExt0Opcode()), ExtOp1(MulAcc->getExt1Opcode()), +IsNonNeg0(MulAcc->isNonNeg0()), IsNonNeg1(MulAcc->isNonNeg1()), ResultTy(MulAcc->getResultType()), IsPartialReduction(MulAcc->isPartialReduction()) {} @@ -2526,7 +2529,8 @@ class VPMulAccumulateReductionRecipe : public VPReductionRecipe { R->getCondOp(), R->isOrdered(), WrapFlagsTy(Mul->hasNoUnsignedWrap(), Mul->hasNoSignedWrap()), R->getDebugLoc()), -ExtOp(Ext0->getOpcode()), IsNonNeg(Ext0->isNonNeg()), +ExtOp0(Ext0->getOpcode()), ExtOp1(Ext1->getOpcode()), +IsNonNeg0(Ext0->isNonNeg()), IsNonNeg1(Ext1->isNonNeg()), ResultTy(ResultTy), IsPartialReduction(isa(R)) { assert(RecurrenceDescriptor::getOpcode(getRecurrenceKind()) == @@ -2542,7 +2546,8 @@ class VPMulAccumulateReductionRecipe : public VPReductionRecipe { R->getCondOp(), R->isOrdered(), WrapFlagsTy(Mul->hasNoUnsignedWrap(), Mul->hasNoSignedWrap()), R->getDebugLoc()), -ExtOp(Instruction::CastOps::CastOpsEnd) { +ExtOp0(Instruction::CastOps::CastOpsEnd), +ExtOp1(Instruction::CastOps::CastOpsEnd) { assert(RecurrenceDescriptor::getOpcode(getRecurrenceKind()) == Instruction::Add && "The reduction instruction in MulAccumulateReductionRecipe must be " @@ -2586,19 +2591,26 @@ class VPMulAccumulateReductionRecipe : public VPReductionRecipe { VPValue *getVecOp1() const { return getOperand(2); } /// Return if this MulAcc recipe contains extend instructions. - bool isExtended() const { return ExtOp != Instruction::CastOps::CastOpsEnd; } + bool isExtended() const { return ExtOp0 != Instruction::CastOps::CastOpsEnd; } /// Return if the operands of mul instruction come from same extend. - bool isSameExtend() const { return getVecOp0() == getVecOp1(); } + bool isSameExtendVal() const { return getVecOp0() == getVecOp1(); } - /// Return the opcode of the underlying extend. - Instruction::CastOps getExtOpcode() const { return ExtOp; } + /// Return the opcode of the underlying extends. + Instruction::CastOps getExt0Opcode() const { return ExtOp0; } + Instruction::CastOps getExt1Opcode() const { return ExtOp1; } + + /// Return if the first extend's opcode is ZExt. + bool isZExt0() const { return ExtOp0 == Instruction::CastOps::ZExt; } + + /// Return if the second extend's opcode is ZExt. + bool isZExt1() const { return ExtOp1 == Instruction::CastOps::ZExt; } - /// Return if the extend opcode is ZExt. - bool isZExt() const { return ExtOp == Instruction::CastOps::ZExt; } + /// Return the non negative flag of the first ext recipe. + bool isNonNeg0() const { return IsNonNeg0; } - /// Return the non negative flag of the ext recipe. - bool isNonNeg() const { return IsNonNeg; } + /// Return the non negative flag of the second
[llvm-branch-commits] [llvm] [SelectionDAG][X86] Remove unused elements from atomic vector. (PR #125432)
https://github.com/arsenm commented: Title seems to not match the implementation anymore https://github.com/llvm/llvm-project/pull/125432 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: add RegBankLegalize rules for bit shifts and sext-inreg (PR #132385)
https://github.com/petar-avramovic updated https://github.com/llvm/llvm-project/pull/132385 >From 183f6cc9a037bed5f472be13e32f39002520 Mon Sep 17 00:00:00 2001 From: Petar Avramovic Date: Mon, 14 Apr 2025 16:35:19 +0200 Subject: [PATCH] AMDGPU/GlobalISel: add RegBankLegalize rules for bit shifts and sext-inreg Uniform S16 shifts have to be extended to S32 using appropriate Extend before lowering to S32 instruction. Uniform packed V2S16 are lowered to SGPR S32 instructions, other option is to use VALU packed V2S16 and ReadAnyLane. For uniform S32 and S64 and divergent S16, S32, S64 and V2S16 there are instructions available. --- .../Target/AMDGPU/AMDGPURegBankLegalize.cpp | 2 +- .../AMDGPU/AMDGPURegBankLegalizeHelper.cpp| 105 ++ .../AMDGPU/AMDGPURegBankLegalizeHelper.h | 5 + .../AMDGPU/AMDGPURegBankLegalizeRules.cpp | 43 +++- .../AMDGPU/AMDGPURegBankLegalizeRules.h | 11 ++ llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll | 10 +- llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll | 187 +- .../AMDGPU/GlobalISel/regbankselect-ashr.mir | 6 +- .../AMDGPU/GlobalISel/regbankselect-lshr.mir | 17 +- .../GlobalISel/regbankselect-sext-inreg.mir | 24 +-- .../AMDGPU/GlobalISel/regbankselect-shl.mir | 6 +- .../CodeGen/AMDGPU/GlobalISel/sext_inreg.ll | 34 ++-- llvm/test/CodeGen/AMDGPU/GlobalISel/shl.ll| 10 +- 13 files changed, 309 insertions(+), 151 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp index 9544c9f43eeaf..15584f16a0638 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp @@ -310,7 +310,7 @@ bool AMDGPURegBankLegalize::runOnMachineFunction(MachineFunction &MF) { // Opcodes that support pretty much all combinations of reg banks and LLTs // (except S1). There is no point in writing rules for them. if (Opc == AMDGPU::G_BUILD_VECTOR || Opc == AMDGPU::G_UNMERGE_VALUES || -Opc == AMDGPU::G_MERGE_VALUES) { +Opc == AMDGPU::G_MERGE_VALUES || Opc == G_BITCAST) { RBLHelper.applyMappingTrivial(*MI); continue; } diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp index f4e99b7d3872e..f69e90f602b74 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp @@ -171,6 +171,59 @@ void RegBankLegalizeHelper::lowerVccExtToSel(MachineInstr &MI) { MI.eraseFromParent(); } +std::pair RegBankLegalizeHelper::unpackZExt(Register Reg) { + auto PackedS32 = B.buildBitcast(SgprRB_S32, Reg); + auto Mask = B.buildConstant(SgprRB_S32, 0x); + auto Lo = B.buildAnd(SgprRB_S32, PackedS32, Mask); + auto Hi = B.buildLShr(SgprRB_S32, PackedS32, B.buildConstant(SgprRB_S32, 16)); + return {Lo.getReg(0), Hi.getReg(0)}; +} + +std::pair RegBankLegalizeHelper::unpackSExt(Register Reg) { + auto PackedS32 = B.buildBitcast(SgprRB_S32, Reg); + auto Lo = B.buildSExtInReg(SgprRB_S32, PackedS32, 16); + auto Hi = B.buildAShr(SgprRB_S32, PackedS32, B.buildConstant(SgprRB_S32, 16)); + return {Lo.getReg(0), Hi.getReg(0)}; +} + +std::pair RegBankLegalizeHelper::unpackAExt(Register Reg) { + auto PackedS32 = B.buildBitcast(SgprRB_S32, Reg); + auto Lo = PackedS32; + auto Hi = B.buildLShr(SgprRB_S32, PackedS32, B.buildConstant(SgprRB_S32, 16)); + return {Lo.getReg(0), Hi.getReg(0)}; +} + +void RegBankLegalizeHelper::lowerUnpack(MachineInstr &MI) { + Register Lo, Hi; + switch (MI.getOpcode()) { + case AMDGPU::G_SHL: { +auto [Val0, Val1] = unpackAExt(MI.getOperand(1).getReg()); +auto [Amt0, Amt1] = unpackAExt(MI.getOperand(2).getReg()); +Lo = B.buildInstr(MI.getOpcode(), {SgprRB_S32}, {Val0, Amt0}).getReg(0); +Hi = B.buildInstr(MI.getOpcode(), {SgprRB_S32}, {Val1, Amt1}).getReg(0); +break; + } + case AMDGPU::G_LSHR: { +auto [Val0, Val1] = unpackZExt(MI.getOperand(1).getReg()); +auto [Amt0, Amt1] = unpackZExt(MI.getOperand(2).getReg()); +Lo = B.buildInstr(MI.getOpcode(), {SgprRB_S32}, {Val0, Amt0}).getReg(0); +Hi = B.buildInstr(MI.getOpcode(), {SgprRB_S32}, {Val1, Amt1}).getReg(0); +break; + } + case AMDGPU::G_ASHR: { +auto [Val0, Val1] = unpackSExt(MI.getOperand(1).getReg()); +auto [Amt0, Amt1] = unpackSExt(MI.getOperand(2).getReg()); +Lo = B.buildAShr(SgprRB_S32, Val0, Amt0).getReg(0); +Hi = B.buildAShr(SgprRB_S32, Val1, Amt1).getReg(0); +break; + } + default: +llvm_unreachable("Unpack lowering not implemented"); + } + B.buildBuildVectorTrunc(MI.getOperand(0).getReg(), {Lo, Hi}); + MI.eraseFromParent(); +} + static bool isSignedBFE(MachineInstr &MI) { if (GIntrinsic *GI = dyn_cast(&MI)) { if (GI->is(Intrinsic::amdgcn_sbfe)) @@ -313,6 +366,33 @@ void RegBankLegalizeHelper::lowerSplitTo32Sel(MachineInstr &MI) { MI.eraseFromParent(); }
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: add RegBankLegalize rules for bit shifts and sext-inreg (PR #132385)
https://github.com/petar-avramovic updated https://github.com/llvm/llvm-project/pull/132385 >From 183f6cc9a037bed5f472be13e32f39002520 Mon Sep 17 00:00:00 2001 From: Petar Avramovic Date: Mon, 14 Apr 2025 16:35:19 +0200 Subject: [PATCH] AMDGPU/GlobalISel: add RegBankLegalize rules for bit shifts and sext-inreg Uniform S16 shifts have to be extended to S32 using appropriate Extend before lowering to S32 instruction. Uniform packed V2S16 are lowered to SGPR S32 instructions, other option is to use VALU packed V2S16 and ReadAnyLane. For uniform S32 and S64 and divergent S16, S32, S64 and V2S16 there are instructions available. --- .../Target/AMDGPU/AMDGPURegBankLegalize.cpp | 2 +- .../AMDGPU/AMDGPURegBankLegalizeHelper.cpp| 105 ++ .../AMDGPU/AMDGPURegBankLegalizeHelper.h | 5 + .../AMDGPU/AMDGPURegBankLegalizeRules.cpp | 43 +++- .../AMDGPU/AMDGPURegBankLegalizeRules.h | 11 ++ llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll | 10 +- llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll | 187 +- .../AMDGPU/GlobalISel/regbankselect-ashr.mir | 6 +- .../AMDGPU/GlobalISel/regbankselect-lshr.mir | 17 +- .../GlobalISel/regbankselect-sext-inreg.mir | 24 +-- .../AMDGPU/GlobalISel/regbankselect-shl.mir | 6 +- .../CodeGen/AMDGPU/GlobalISel/sext_inreg.ll | 34 ++-- llvm/test/CodeGen/AMDGPU/GlobalISel/shl.ll| 10 +- 13 files changed, 309 insertions(+), 151 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp index 9544c9f43eeaf..15584f16a0638 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp @@ -310,7 +310,7 @@ bool AMDGPURegBankLegalize::runOnMachineFunction(MachineFunction &MF) { // Opcodes that support pretty much all combinations of reg banks and LLTs // (except S1). There is no point in writing rules for them. if (Opc == AMDGPU::G_BUILD_VECTOR || Opc == AMDGPU::G_UNMERGE_VALUES || -Opc == AMDGPU::G_MERGE_VALUES) { +Opc == AMDGPU::G_MERGE_VALUES || Opc == G_BITCAST) { RBLHelper.applyMappingTrivial(*MI); continue; } diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp index f4e99b7d3872e..f69e90f602b74 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp @@ -171,6 +171,59 @@ void RegBankLegalizeHelper::lowerVccExtToSel(MachineInstr &MI) { MI.eraseFromParent(); } +std::pair RegBankLegalizeHelper::unpackZExt(Register Reg) { + auto PackedS32 = B.buildBitcast(SgprRB_S32, Reg); + auto Mask = B.buildConstant(SgprRB_S32, 0x); + auto Lo = B.buildAnd(SgprRB_S32, PackedS32, Mask); + auto Hi = B.buildLShr(SgprRB_S32, PackedS32, B.buildConstant(SgprRB_S32, 16)); + return {Lo.getReg(0), Hi.getReg(0)}; +} + +std::pair RegBankLegalizeHelper::unpackSExt(Register Reg) { + auto PackedS32 = B.buildBitcast(SgprRB_S32, Reg); + auto Lo = B.buildSExtInReg(SgprRB_S32, PackedS32, 16); + auto Hi = B.buildAShr(SgprRB_S32, PackedS32, B.buildConstant(SgprRB_S32, 16)); + return {Lo.getReg(0), Hi.getReg(0)}; +} + +std::pair RegBankLegalizeHelper::unpackAExt(Register Reg) { + auto PackedS32 = B.buildBitcast(SgprRB_S32, Reg); + auto Lo = PackedS32; + auto Hi = B.buildLShr(SgprRB_S32, PackedS32, B.buildConstant(SgprRB_S32, 16)); + return {Lo.getReg(0), Hi.getReg(0)}; +} + +void RegBankLegalizeHelper::lowerUnpack(MachineInstr &MI) { + Register Lo, Hi; + switch (MI.getOpcode()) { + case AMDGPU::G_SHL: { +auto [Val0, Val1] = unpackAExt(MI.getOperand(1).getReg()); +auto [Amt0, Amt1] = unpackAExt(MI.getOperand(2).getReg()); +Lo = B.buildInstr(MI.getOpcode(), {SgprRB_S32}, {Val0, Amt0}).getReg(0); +Hi = B.buildInstr(MI.getOpcode(), {SgprRB_S32}, {Val1, Amt1}).getReg(0); +break; + } + case AMDGPU::G_LSHR: { +auto [Val0, Val1] = unpackZExt(MI.getOperand(1).getReg()); +auto [Amt0, Amt1] = unpackZExt(MI.getOperand(2).getReg()); +Lo = B.buildInstr(MI.getOpcode(), {SgprRB_S32}, {Val0, Amt0}).getReg(0); +Hi = B.buildInstr(MI.getOpcode(), {SgprRB_S32}, {Val1, Amt1}).getReg(0); +break; + } + case AMDGPU::G_ASHR: { +auto [Val0, Val1] = unpackSExt(MI.getOperand(1).getReg()); +auto [Amt0, Amt1] = unpackSExt(MI.getOperand(2).getReg()); +Lo = B.buildAShr(SgprRB_S32, Val0, Amt0).getReg(0); +Hi = B.buildAShr(SgprRB_S32, Val1, Amt1).getReg(0); +break; + } + default: +llvm_unreachable("Unpack lowering not implemented"); + } + B.buildBuildVectorTrunc(MI.getOperand(0).getReg(), {Lo, Hi}); + MI.eraseFromParent(); +} + static bool isSignedBFE(MachineInstr &MI) { if (GIntrinsic *GI = dyn_cast(&MI)) { if (GI->is(Intrinsic::amdgcn_sbfe)) @@ -313,6 +366,33 @@ void RegBankLegalizeHelper::lowerSplitTo32Sel(MachineInstr &MI) { MI.eraseFromParent(); }
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: add RegBankLegalize rules for extends and trunc (PR #132383)
https://github.com/petar-avramovic updated https://github.com/llvm/llvm-project/pull/132383 >From 915bee3729c66b327885838c6384d14f92c2af2d Mon Sep 17 00:00:00 2001 From: Petar Avramovic Date: Thu, 8 May 2025 12:03:28 +0200 Subject: [PATCH] AMDGPU/GlobalISel: add RegBankLegalize rules for extends and trunc Uniform S1: Truncs to uniform S1 and AnyExts from S1 are left as is as they are meant to be combined away. Uniform S1 ZExt and SExt are lowered using select. Divergent S1: Trunc of VGPR to VCC is lowered as compare. Extends of VCC are lowered using select. For remaining types: S32 to S64 ZExt and SExt are lowered using merge values, AnyExt and Trunc are again left as is to be combined away. Notably uniform S16 for SExt and Zext is not lowered to S32 and left as is for instruction select to deal with them. This is because there are patterns that check for S16 type. --- .../Target/AMDGPU/AMDGPURegBankLegalize.cpp | 7 ++ .../AMDGPU/AMDGPURegBankLegalizeHelper.cpp| 110 +- .../AMDGPU/AMDGPURegBankLegalizeHelper.h | 1 + .../AMDGPU/AMDGPURegBankLegalizeRules.cpp | 47 +++- .../AMDGPU/AMDGPURegBankLegalizeRules.h | 3 + .../GlobalISel/regbankselect-and-s1.mir | 105 + .../GlobalISel/regbankselect-anyext.mir | 59 +- .../AMDGPU/GlobalISel/regbankselect-sext.mir | 100 ++-- .../AMDGPU/GlobalISel/regbankselect-trunc.mir | 22 +++- .../AMDGPU/GlobalISel/regbankselect-zext.mir | 89 +- 10 files changed, 360 insertions(+), 183 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp index ad6a0772fe8b6..9544c9f43eeaf 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp @@ -214,6 +214,13 @@ class AMDGPURegBankLegalizeCombiner { return; } +if (DstTy == S64 && TruncSrcTy == S32) { + B.buildMergeLikeInstr(MI.getOperand(0).getReg(), +{TruncSrc, B.buildUndef({SgprRB, S32})}); + cleanUpAfterCombine(MI, Trunc); + return; +} + if (DstTy == S32 && TruncSrcTy == S16) { B.buildAnyExt(Dst, TruncSrc); cleanUpAfterCombine(MI, Trunc); diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp index aa4584bb0f6a1..77f37f3d19875 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp @@ -133,6 +133,43 @@ void RegBankLegalizeHelper::widenLoad(MachineInstr &MI, LLT WideTy, MI.eraseFromParent(); } +void RegBankLegalizeHelper::lowerVccExtToSel(MachineInstr &MI) { + Register Dst = MI.getOperand(0).getReg(); + LLT Ty = MRI.getType(Dst); + Register Src = MI.getOperand(1).getReg(); + unsigned Opc = MI.getOpcode(); + int TrueExtCst = (Opc == G_SEXT ? -1 : 1); + if (Ty == S32 || Ty == S16) { +auto True = B.buildConstant({VgprRB, Ty}, TrueExtCst); +auto False = B.buildConstant({VgprRB, Ty}, 0); +B.buildSelect(Dst, Src, True, False); + } else if (Ty == S64) { +auto True = B.buildConstant({VgprRB_S32}, TrueExtCst); +auto False = B.buildConstant({VgprRB_S32}, 0); +auto Lo = B.buildSelect({VgprRB_S32}, Src, True, False); +MachineInstrBuilder Hi; +switch (Opc) { +case G_SEXT: + Hi = Lo; + break; +case G_ZEXT: + Hi = False; + break; +case G_ANYEXT: + Hi = B.buildUndef({VgprRB_S32}); + break; +default: + llvm_unreachable("Opcode not supported"); +} + +B.buildMergeValues(Dst, {Lo.getReg(0), Hi.getReg(0)}); + } else { +llvm_unreachable("Type not supported"); + } + + MI.eraseFromParent(); +} + static bool isSignedBFE(MachineInstr &MI) { if (GIntrinsic *GI = dyn_cast(&MI)) { if (GI->is(Intrinsic::amdgcn_sbfe)) @@ -263,26 +300,8 @@ void RegBankLegalizeHelper::lower(MachineInstr &MI, switch (Mapping.LoweringMethod) { case DoNotLower: return; - case VccExtToSel: { -LLT Ty = MRI.getType(MI.getOperand(0).getReg()); -Register Src = MI.getOperand(1).getReg(); -unsigned Opc = MI.getOpcode(); -if (Ty == S32 || Ty == S16) { - auto True = B.buildConstant({VgprRB, Ty}, Opc == G_SEXT ? -1 : 1); - auto False = B.buildConstant({VgprRB, Ty}, 0); - B.buildSelect(MI.getOperand(0).getReg(), Src, True, False); -} -if (Ty == S64) { - auto True = B.buildConstant({VgprRB, S32}, Opc == G_SEXT ? -1 : 1); - auto False = B.buildConstant({VgprRB, S32}, 0); - auto Sel = B.buildSelect({VgprRB, S32}, Src, True, False); - B.buildMergeValues( - MI.getOperand(0).getReg(), - {Sel.getReg(0), Opc == G_SEXT ? Sel.getReg(0) : False.getReg(0)}); -} -MI.eraseFromParent(); -return; - } + case VccExtToSel: +return lowerVccExtToSel(MI); case UniExtToSel: { LLT Ty = MRI.getType(MI.getOperand(0).ge
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: add RegBankLegalize rules for AND OR and XOR (PR #132382)
https://github.com/petar-avramovic updated https://github.com/llvm/llvm-project/pull/132382 Rate limit ยท GitHub body { background-color: #f6f8fa; color: #24292e; font-family: -apple-system,BlinkMacSystemFont,Segoe UI,Helvetica,Arial,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol; font-size: 14px; line-height: 1.5; margin: 0; } .container { margin: 50px auto; max-width: 600px; text-align: center; padding: 0 24px; } a { color: #0366d6; text-decoration: none; } a:hover { text-decoration: underline; } h1 { line-height: 60px; font-size: 48px; font-weight: 300; margin: 0px; text-shadow: 0 1px 0 #fff; } p { color: rgba(0, 0, 0, 0.5); margin: 20px 0 40px; } ul { list-style: none; margin: 25px 0; padding: 0; } li { display: table-cell; font-weight: bold; width: 1%; } .logo { display: inline-block; margin-top: 35px; } .logo-img-2x { display: none; } @media only screen and (-webkit-min-device-pixel-ratio: 2), only screen and ( min--moz-device-pixel-ratio: 2), only screen and ( -o-min-device-pixel-ratio: 2/1), only screen and (min-device-pixel-ratio: 2), only screen and (min-resolution: 192dpi), only screen and (min-resolution: 2dppx) { .logo-img-1x { display: none; } .logo-img-2x { display: inline-block; } } #suggestions { margin-top: 35px; color: #ccc; } #suggestions a { color: #66; font-weight: 200; font-size: 14px; margin: 0 10px; } Whoa there! You have exceeded a secondary rate limit. Please wait a few minutes before you try again; in some cases this may take up to an hour. https://support.github.com/contact";>Contact Support โ https://githubstatus.com";>GitHub Status โ https://twitter.com/githubstatus";>@githubstatus ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [SelectionDAG][X86] Remove unused elements from atomic vector. (PR #125432)
https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/125432 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [SelectionDAG][X86] Remove unused elements from atomic vector. (PR #125432)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/125432 >From b50f6786084321c6c1402b475b07c998157cd506 Mon Sep 17 00:00:00 2001 From: jofrn Date: Fri, 31 Jan 2025 13:12:56 -0500 Subject: [PATCH] [SelectionDAG][X86] Remove unused elements from atomic vector. After splitting, all elements are created. The two components must be found by looking at the upper and lower half of EXTRACT_ELEMENT. This change extends EltsFromConsecutiveLoads to understand AtomicSDNode so that unused elements can be removed. commit-id:b83937a8 --- llvm/include/llvm/CodeGen/SelectionDAG.h | 4 +- .../lib/CodeGen/SelectionDAG/SelectionDAG.cpp | 20 ++- .../SelectionDAGAddressAnalysis.cpp | 30 ++-- llvm/lib/Target/X86/X86ISelLowering.cpp | 73 +++-- llvm/test/CodeGen/X86/atomic-load-store.ll| 149 ++ 5 files changed, 104 insertions(+), 172 deletions(-) diff --git a/llvm/include/llvm/CodeGen/SelectionDAG.h b/llvm/include/llvm/CodeGen/SelectionDAG.h index ba11ddbb5b731..d3cd81c146280 100644 --- a/llvm/include/llvm/CodeGen/SelectionDAG.h +++ b/llvm/include/llvm/CodeGen/SelectionDAG.h @@ -1843,7 +1843,7 @@ class SelectionDAG { /// chain to the token factor. This ensures that the new memory node will have /// the same relative memory dependency position as the old load. Returns the /// new merged load chain. - SDValue makeEquivalentMemoryOrdering(LoadSDNode *OldLoad, SDValue NewMemOp); + SDValue makeEquivalentMemoryOrdering(MemSDNode *OldLoad, SDValue NewMemOp); /// Topological-sort the AllNodes list and a /// assign a unique node id for each node in the DAG based on their @@ -2281,7 +2281,7 @@ class SelectionDAG { /// merged. Check that both are nonvolatile and if LD is loading /// 'Bytes' bytes from a location that is 'Dist' units away from the /// location that the 'Base' load is loading from. - bool areNonVolatileConsecutiveLoads(LoadSDNode *LD, LoadSDNode *Base, + bool areNonVolatileConsecutiveLoads(MemSDNode *LD, MemSDNode *Base, unsigned Bytes, int Dist) const; /// Infer alignment of a load / store address. Return std::nullopt if it diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp index 2a68903c34cef..8e77a542ab029 100644 --- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp @@ -12218,7 +12218,7 @@ SDValue SelectionDAG::makeEquivalentMemoryOrdering(SDValue OldChain, return TokenFactor; } -SDValue SelectionDAG::makeEquivalentMemoryOrdering(LoadSDNode *OldLoad, +SDValue SelectionDAG::makeEquivalentMemoryOrdering(MemSDNode *OldLoad, SDValue NewMemOp) { assert(isa(NewMemOp.getNode()) && "Expected a memop node"); SDValue OldChain = SDValue(OldLoad, 1); @@ -12911,17 +12911,21 @@ std::pair SelectionDAG::UnrollVectorOverflowOp( getBuildVector(NewOvVT, dl, OvScalars)); } -bool SelectionDAG::areNonVolatileConsecutiveLoads(LoadSDNode *LD, - LoadSDNode *Base, +bool SelectionDAG::areNonVolatileConsecutiveLoads(MemSDNode *LD, + MemSDNode *Base, unsigned Bytes, int Dist) const { if (LD->isVolatile() || Base->isVolatile()) return false; - // TODO: probably too restrictive for atomics, revisit - if (!LD->isSimple()) -return false; - if (LD->isIndexed() || Base->isIndexed()) -return false; + if (auto Ld = dyn_cast(LD)) { +if (!Ld->isSimple()) + return false; +if (Ld->isIndexed()) + return false; + } + if (auto Ld = dyn_cast(Base)) +if (Ld->isIndexed()) + return false; if (LD->getChain() != Base->getChain()) return false; EVT VT = LD->getMemoryVT(); diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp index f2ab88851b780..c29cb424c7a4c 100644 --- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp @@ -195,8 +195,8 @@ bool BaseIndexOffset::contains(const SelectionDAG &DAG, int64_t BitSize, } /// Parses tree in Ptr for base, index, offset addresses. -static BaseIndexOffset matchLSNode(const LSBaseSDNode *N, - const SelectionDAG &DAG) { +template +static BaseIndexOffset matchSDNode(const T *N, const SelectionDAG &DAG) { SDValue Ptr = N->getBasePtr(); // (((B + I*M) + c)) + c ... @@ -206,16 +206,18 @@ static BaseIndexOffset matchLSNode(const LSBaseSDNode *N, bool IsIndexSignExt = false; // pre-inc/pre-dec ops are components of EA. - if (N->getAddressingMode() == ISD::PRE_INC) { -if (auto *C = dyn_cast(N->getOffset(
[llvm-branch-commits] [llvm] [ConstraintElim] Simplify `usub_with_overflow` when A uge B (PR #135785)
https://github.com/el-ev updated https://github.com/llvm/llvm-project/pull/135785 >From d3a9e6efc11df2ef910d7a4d67b6d03204422ec5 Mon Sep 17 00:00:00 2001 From: Iris Shi <0...@owo.li> Date: Tue, 15 Apr 2025 20:20:45 +0800 Subject: [PATCH] [ConstraintElim] Simplify `usub_with_overflow` when A uge B --- .../Scalar/ConstraintElimination.cpp | 11 .../usub-with-overflow.ll | 54 +++ 2 files changed, 44 insertions(+), 21 deletions(-) diff --git a/llvm/lib/Transforms/Scalar/ConstraintElimination.cpp b/llvm/lib/Transforms/Scalar/ConstraintElimination.cpp index 0227a959b895d..bc1bbec08f25f 100644 --- a/llvm/lib/Transforms/Scalar/ConstraintElimination.cpp +++ b/llvm/lib/Transforms/Scalar/ConstraintElimination.cpp @@ -1123,6 +1123,7 @@ void State::addInfoFor(BasicBlock &BB) { // Enqueue overflow intrinsics for simplification. case Intrinsic::sadd_with_overflow: case Intrinsic::ssub_with_overflow: +case Intrinsic::usub_with_overflow: case Intrinsic::ucmp: case Intrinsic::scmp: WorkList.push_back( @@ -1742,6 +1743,16 @@ tryToSimplifyOverflowMath(IntrinsicInst *II, ConstraintInfo &Info, Changed = true; break; } + case Intrinsic::usub_with_overflow: { +// usub overflows iff A < B +// TODO: If the operation is guaranteed to overflow, we could +// also apply some simplifications. +if (DoesConditionHold(CmpInst::ICMP_UGE, A, B, Info)) { + replaceAddOrSubOverflowUses(II, A, B, ToRemove); + Changed = true; +} +break; + } } return Changed; diff --git a/llvm/test/Transforms/ConstraintElimination/usub-with-overflow.ll b/llvm/test/Transforms/ConstraintElimination/usub-with-overflow.ll index 06bfd8269d97d..722116cc6ebd0 100644 --- a/llvm/test/Transforms/ConstraintElimination/usub-with-overflow.ll +++ b/llvm/test/Transforms/ConstraintElimination/usub-with-overflow.ll @@ -9,12 +9,14 @@ define i8 @usub_no_overflow_due_to_cmp_condition(i8 %a, i8 %b) { ; CHECK-NEXT:[[C_1:%.*]] = icmp uge i8 [[B:%.*]], [[A:%.*]] ; CHECK-NEXT:br i1 [[C_1]], label [[MATH:%.*]], label [[EXIT_FAIL:%.*]] ; CHECK: math: -; CHECK-NEXT:[[OP:%.*]] = tail call { i8, i1 } @llvm.usub.with.overflow.i8(i8 [[B]], i8 [[A]]) -; CHECK-NEXT:[[STATUS:%.*]] = extractvalue { i8, i1 } [[OP]], 1 +; CHECK-NEXT:[[RES:%.*]] = sub i8 [[B]], [[A]] +; CHECK-NEXT:[[TMP1:%.*]] = insertvalue { i8, i1 } poison, i8 [[RES]], 0 +; CHECK-NEXT:[[TMP2:%.*]] = insertvalue { i8, i1 } [[TMP1]], i1 false, 1 +; CHECK-NEXT:[[STATUS:%.*]] = extractvalue { i8, i1 } [[TMP2]], 1 ; CHECK-NEXT:br i1 [[STATUS]], label [[EXIT_FAIL]], label [[EXIT_OK:%.*]] ; CHECK: exit.ok: -; CHECK-NEXT:[[RES:%.*]] = extractvalue { i8, i1 } [[OP]], 0 -; CHECK-NEXT:ret i8 [[RES]] +; CHECK-NEXT:[[RES1:%.*]] = extractvalue { i8, i1 } [[TMP2]], 0 +; CHECK-NEXT:ret i8 [[RES1]] ; CHECK: exit.fail: ; CHECK-NEXT:ret i8 0 ; @@ -41,12 +43,14 @@ define i8 @usub_no_overflow_due_to_cmp_condition2(i8 %a, i8 %b) { ; CHECK-NEXT:[[C_1:%.*]] = icmp ule i8 [[B:%.*]], [[A:%.*]] ; CHECK-NEXT:br i1 [[C_1]], label [[EXIT_FAIL:%.*]], label [[MATH:%.*]] ; CHECK: math: -; CHECK-NEXT:[[OP:%.*]] = tail call { i8, i1 } @llvm.usub.with.overflow.i8(i8 [[B]], i8 [[A]]) -; CHECK-NEXT:[[STATUS:%.*]] = extractvalue { i8, i1 } [[OP]], 1 +; CHECK-NEXT:[[RES:%.*]] = sub i8 [[B]], [[A]] +; CHECK-NEXT:[[TMP1:%.*]] = insertvalue { i8, i1 } poison, i8 [[RES]], 0 +; CHECK-NEXT:[[TMP2:%.*]] = insertvalue { i8, i1 } [[TMP1]], i1 false, 1 +; CHECK-NEXT:[[STATUS:%.*]] = extractvalue { i8, i1 } [[TMP2]], 1 ; CHECK-NEXT:br i1 [[STATUS]], label [[EXIT_FAIL]], label [[EXIT_OK:%.*]] ; CHECK: exit.ok: -; CHECK-NEXT:[[RES:%.*]] = extractvalue { i8, i1 } [[OP]], 0 -; CHECK-NEXT:ret i8 [[RES]] +; CHECK-NEXT:[[RES1:%.*]] = extractvalue { i8, i1 } [[TMP2]], 0 +; CHECK-NEXT:ret i8 [[RES1]] ; CHECK: exit.fail: ; CHECK-NEXT:ret i8 0 ; @@ -75,13 +79,15 @@ define i8 @sub_no_overflow_due_to_cmp_condition_result_used(i8 %a, i8 %b) { ; CHECK-NEXT:[[C_1:%.*]] = icmp ule i8 [[B:%.*]], [[A:%.*]] ; CHECK-NEXT:br i1 [[C_1]], label [[EXIT_FAIL:%.*]], label [[MATH:%.*]] ; CHECK: math: -; CHECK-NEXT:[[OP:%.*]] = tail call { i8, i1 } @llvm.usub.with.overflow.i8(i8 [[B]], i8 [[A]]) +; CHECK-NEXT:[[RES:%.*]] = sub i8 [[B]], [[A]] +; CHECK-NEXT:[[TMP1:%.*]] = insertvalue { i8, i1 } poison, i8 [[RES]], 0 +; CHECK-NEXT:[[OP:%.*]] = insertvalue { i8, i1 } [[TMP1]], i1 false, 1 ; CHECK-NEXT:call void @use_res({ i8, i1 } [[OP]]) ; CHECK-NEXT:[[STATUS:%.*]] = extractvalue { i8, i1 } [[OP]], 1 ; CHECK-NEXT:br i1 [[STATUS]], label [[EXIT_FAIL]], label [[EXIT_OK:%.*]] ; CHECK: exit.ok: -; CHECK-NEXT:[[RES:%.*]] = extractvalue { i8, i1 } [[OP]], 0 -; CHECK-NEXT:ret i8 [[RES]] +; CHECK-NEXT:[[RES1:%.*]] = extractvalue { i8, i1 }
[llvm-branch-commits] [llvm] [SelectionDAG][X86] Remove unused elements from atomic vector. (PR #125432)
@@ -7230,6 +7234,20 @@ static bool findEltLoadSrc(SDValue Elt, LoadSDNode *&Ld, int64_t &ByteOffset) { } } break; + case ISD::EXTRACT_ELEMENT: +if (auto *IdxC = dyn_cast(Elt.getOperand(1))) { arsenm wrote: Pretty sure this must be a constant. But this could also be done separately, it's not related to the atomic But also, can we avoid this by not using EXTRACT_ELEMENT in the first place? EXTRACT_ELEMENT has weird handling I've never understood where it's hardly used https://github.com/llvm/llvm-project/pull/125432 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [NFC][ubsan_minimal] Clang-format a file (PR #139000)
https://github.com/vitalybuka updated https://github.com/llvm/llvm-project/pull/139000 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [NFC][ubsan_minimal] Clang-format a file (PR #139000)
https://github.com/vitalybuka updated https://github.com/llvm/llvm-project/pull/139000 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] [HLSL][RootSignature] Add mandatory parameters for RootConstants (PR #138002)
https://github.com/inbelic updated https://github.com/llvm/llvm-project/pull/138002 >From 15857bf8e1303e2325b48e417e7abd26aa77910e Mon Sep 17 00:00:00 2001 From: Finn Plummer Date: Wed, 30 Apr 2025 17:53:11 + Subject: [PATCH 1/3] [HLSL][RootSignature] Add mandatory parameters for RootConstants - defines the `parseRootConstantParams` function and adds handling for the mandatory arguments of `num32BitConstants` and `bReg` - adds corresponding unit tests Part two of implementing --- .../clang/Parse/ParseHLSLRootSignature.h | 10 ++- clang/lib/Parse/ParseHLSLRootSignature.cpp| 68 ++- .../Parse/ParseHLSLRootSignatureTest.cpp | 14 +++- .../llvm/Frontend/HLSL/HLSLRootSignature.h| 5 +- 4 files changed, 89 insertions(+), 8 deletions(-) diff --git a/clang/include/clang/Parse/ParseHLSLRootSignature.h b/clang/include/clang/Parse/ParseHLSLRootSignature.h index efa735ea03d94..0f05b05ed4df6 100644 --- a/clang/include/clang/Parse/ParseHLSLRootSignature.h +++ b/clang/include/clang/Parse/ParseHLSLRootSignature.h @@ -77,8 +77,14 @@ class RootSignatureParser { parseDescriptorTableClause(); /// Parameter arguments (eg. `bReg`, `space`, ...) can be specified in any - /// order and only exactly once. `ParsedClauseParams` denotes the current - /// state of parsed params + /// order and only exactly once. The following methods define a + /// `Parsed.*Params` struct to denote the current state of parsed params + struct ParsedConstantParams { +std::optional Reg; +std::optional Num32BitConstants; + }; + std::optional parseRootConstantParams(); + struct ParsedClauseParams { std::optional Reg; std::optional NumDescriptors; diff --git a/clang/lib/Parse/ParseHLSLRootSignature.cpp b/clang/lib/Parse/ParseHLSLRootSignature.cpp index 48d3e38b0519d..2ce8e6e5cca98 100644 --- a/clang/lib/Parse/ParseHLSLRootSignature.cpp +++ b/clang/lib/Parse/ParseHLSLRootSignature.cpp @@ -57,6 +57,27 @@ std::optional RootSignatureParser::parseRootConstants() { RootConstants Constants; + auto Params = parseRootConstantParams(); + if (!Params.has_value()) +return std::nullopt; + + // Check mandatory parameters were provided + if (!Params->Num32BitConstants.has_value()) { +getDiags().Report(CurToken.TokLoc, diag::err_hlsl_rootsig_missing_param) +<< TokenKind::kw_num32BitConstants; +return std::nullopt; + } + + Constants.Num32BitConstants = Params->Num32BitConstants.value(); + + if (!Params->Reg.has_value()) { +getDiags().Report(CurToken.TokLoc, diag::err_hlsl_rootsig_missing_param) +<< TokenKind::bReg; +return std::nullopt; + } + + Constants.Reg = Params->Reg.value(); + if (consumeExpectedToken(TokenKind::pu_r_paren, diag::err_hlsl_unexpected_end_of_params, /*param of=*/TokenKind::kw_RootConstants)) @@ -187,14 +208,55 @@ RootSignatureParser::parseDescriptorTableClause() { return Clause; } +// Parameter arguments (eg. `bReg`, `space`, ...) can be specified in any +// order and only exactly once. The following methods will parse through as +// many arguments as possible reporting an error if a duplicate is seen. +std::optional +RootSignatureParser::parseRootConstantParams() { + assert(CurToken.TokKind == TokenKind::pu_l_paren && + "Expects to only be invoked starting at given token"); + + ParsedConstantParams Params; + do { +// `num32BitConstants` `=` POS_INT +if (tryConsumeExpectedToken(TokenKind::kw_num32BitConstants)) { + if (Params.Num32BitConstants.has_value()) { +getDiags().Report(CurToken.TokLoc, diag::err_hlsl_rootsig_repeat_param) +<< CurToken.TokKind; +return std::nullopt; + } + + if (consumeExpectedToken(TokenKind::pu_equal)) +return std::nullopt; + + auto Num32BitConstants = parseUIntParam(); + if (!Num32BitConstants.has_value()) +return std::nullopt; + Params.Num32BitConstants = Num32BitConstants; +} + +// `b` POS_INT +if (tryConsumeExpectedToken(TokenKind::bReg)) { + if (Params.Reg.has_value()) { +getDiags().Report(CurToken.TokLoc, diag::err_hlsl_rootsig_repeat_param) +<< CurToken.TokKind; +return std::nullopt; + } + auto Reg = parseRegister(); + if (!Reg.has_value()) +return std::nullopt; + Params.Reg = Reg; +} + } while (tryConsumeExpectedToken(TokenKind::pu_comma)); + + return Params; +} + std::optional RootSignatureParser::parseDescriptorTableClauseParams(TokenKind RegType) { assert(CurToken.TokKind == TokenKind::pu_l_paren && "Expects to only be invoked starting at given token"); - // Parameter arguments (eg. `bReg`, `space`, ...) can be specified in any - // order and only exactly once. Parse through as many arguments as possible - // reporting an error if a duplicate is seen. ParsedClauseParams Params; do { // ( `b` | `t` | `u` | `s`) POS_INT
[llvm-branch-commits] [clang] [llvm] [HLSL][RootSignature] Add optional parameters for RootConstants (PR #138007)
https://github.com/inbelic updated https://github.com/llvm/llvm-project/pull/138007 Rate limit ยท GitHub body { background-color: #f6f8fa; color: #24292e; font-family: -apple-system,BlinkMacSystemFont,Segoe UI,Helvetica,Arial,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol; font-size: 14px; line-height: 1.5; margin: 0; } .container { margin: 50px auto; max-width: 600px; text-align: center; padding: 0 24px; } a { color: #0366d6; text-decoration: none; } a:hover { text-decoration: underline; } h1 { line-height: 60px; font-size: 48px; font-weight: 300; margin: 0px; text-shadow: 0 1px 0 #fff; } p { color: rgba(0, 0, 0, 0.5); margin: 20px 0 40px; } ul { list-style: none; margin: 25px 0; padding: 0; } li { display: table-cell; font-weight: bold; width: 1%; } .logo { display: inline-block; margin-top: 35px; } .logo-img-2x { display: none; } @media only screen and (-webkit-min-device-pixel-ratio: 2), only screen and ( min--moz-device-pixel-ratio: 2), only screen and ( -o-min-device-pixel-ratio: 2/1), only screen and (min-device-pixel-ratio: 2), only screen and (min-resolution: 192dpi), only screen and (min-resolution: 2dppx) { .logo-img-1x { display: none; } .logo-img-2x { display: inline-block; } } #suggestions { margin-top: 35px; color: #ccc; } #suggestions a { color: #66; font-weight: 200; font-size: 14px; margin: 0 10px; } Whoa there! You have exceeded a secondary rate limit. Please wait a few minutes before you try again; in some cases this may take up to an hour. https://support.github.com/contact";>Contact Support โ https://githubstatus.com";>GitHub Status โ https://twitter.com/githubstatus";>@githubstatus ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [SelectionDAG][X86] Remove unused elements from atomic vector. (PR #125432)
@@ -7230,6 +7234,20 @@ static bool findEltLoadSrc(SDValue Elt, LoadSDNode *&Ld, int64_t &ByteOffset) { } } break; + case ISD::EXTRACT_ELEMENT: +if (auto *IdxC = dyn_cast(Elt.getOperand(1))) { arsenm wrote: > If it isn't a constant, shall we assert false or abort the transform? Ignore it as a possibility and just let it crash. No efforts should be made to support invalid constructs > Why do we want to avoid EXTRACT_ELEMENT? It seems to work here. Of course it works, but seems poorly supported by optimizations. In particular the comment on it says "This is only for use before legalization, for values that will be broken into multiple registers.". This is use during legalization (dont' really know why we have this, or this fake restriction documented). You could get the same with just shifts. https://github.com/llvm/llvm-project/pull/125432 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: add RegBankLegalize rules for AND OR and XOR (PR #132382)
https://github.com/petar-avramovic updated https://github.com/llvm/llvm-project/pull/132382 >From e6b6d99ef184920dbc418b7fd8b545b5c6ed59c7 Mon Sep 17 00:00:00 2001 From: Petar Avramovic Date: Thu, 8 May 2025 12:02:27 +0200 Subject: [PATCH] AMDGPU/GlobalISel: add RegBankLegalize rules for AND OR and XOR Uniform S1 is lowered to S32. Divergent S1 is selected as VCC(S1) instruction select will select SALU instruction based on wavesize (S32 or S64). S16 are selected as is. There are register classes for vgpr S16. Since some isel patterns check for sgpr S16 we don't lower to S32. For 32 and 64 bit types we use B32/B64 rules that cover scalar vector and pointers types. SALU B32 and B64 and VALU B32 instructions are available. Divergent B64 is lowered to B32. --- .../AMDGPU/AMDGPURegBankLegalizeHelper.cpp| 34 +--- .../AMDGPU/AMDGPURegBankLegalizeHelper.h | 1 + .../AMDGPU/AMDGPURegBankLegalizeRules.cpp | 10 ++- .../AMDGPU/AMDGPURegBankLegalizeRules.h | 2 + .../AMDGPU/GlobalISel/regbankselect-and.mir | 33 --- .../AMDGPU/GlobalISel/regbankselect-or.mir| 85 +-- .../AMDGPU/GlobalISel/regbankselect-xor.mir | 84 +- 7 files changed, 136 insertions(+), 113 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp index d49072c65629b..aa4584bb0f6a1 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp @@ -14,6 +14,7 @@ #include "AMDGPURegBankLegalizeHelper.h" #include "AMDGPUGlobalISelUtils.h" #include "AMDGPUInstrInfo.h" +#include "AMDGPURegBankLegalizeRules.h" #include "AMDGPURegisterBankInfo.h" #include "GCNSubtarget.h" #include "MCTargetDesc/AMDGPUMCTargetDesc.h" @@ -238,6 +239,23 @@ void RegBankLegalizeHelper::lowerS_BFE(MachineInstr &MI) { MI.eraseFromParent(); } +void RegBankLegalizeHelper::lowerSplitTo32(MachineInstr &MI) { + Register Dst = MI.getOperand(0).getReg(); + LLT DstTy = MRI.getType(Dst); + assert(DstTy == V4S16 || DstTy == V2S32 || DstTy == S64); + LLT Ty = (DstTy == V4S16 ? V2S16 : S32); + auto Op1 = B.buildUnmerge({VgprRB, Ty}, MI.getOperand(1).getReg()); + auto Op2 = B.buildUnmerge({VgprRB, Ty}, MI.getOperand(2).getReg()); + unsigned Opc = MI.getOpcode(); + auto Flags = MI.getFlags(); + auto Lo = + B.buildInstr(Opc, {{VgprRB, Ty}}, {Op1.getReg(0), Op2.getReg(0)}, Flags); + auto Hi = + B.buildInstr(Opc, {{VgprRB, Ty}}, {Op1.getReg(1), Op2.getReg(1)}, Flags); + B.buildMergeLikeInstr(Dst, {Lo, Hi}); + MI.eraseFromParent(); +} + void RegBankLegalizeHelper::lower(MachineInstr &MI, const RegBankLLTMapping &Mapping, SmallSet &WaterfallSgprs) { @@ -326,20 +344,12 @@ void RegBankLegalizeHelper::lower(MachineInstr &MI, MI.eraseFromParent(); return; } - case SplitTo32: { -auto Op1 = B.buildUnmerge(VgprRB_S32, MI.getOperand(1).getReg()); -auto Op2 = B.buildUnmerge(VgprRB_S32, MI.getOperand(2).getReg()); -unsigned Opc = MI.getOpcode(); -auto Lo = B.buildInstr(Opc, {VgprRB_S32}, {Op1.getReg(0), Op2.getReg(0)}); -auto Hi = B.buildInstr(Opc, {VgprRB_S32}, {Op1.getReg(1), Op2.getReg(1)}); -B.buildMergeLikeInstr(MI.getOperand(0).getReg(), {Lo, Hi}); -MI.eraseFromParent(); -break; - } case V_BFE: return lowerV_BFE(MI); case S_BFE: return lowerS_BFE(MI); + case SplitTo32: +return lowerSplitTo32(MI); case SplitLoad: { LLT DstTy = MRI.getType(MI.getOperand(0).getReg()); unsigned Size = DstTy.getSizeInBits(); @@ -399,6 +409,7 @@ LLT RegBankLegalizeHelper::getTyFromID(RegBankLLTMappingApplyID ID) { case UniInVcc: return LLT::scalar(1); case Sgpr16: + case Vgpr16: return LLT::scalar(16); case Sgpr32: case Sgpr32Trunc: @@ -518,6 +529,7 @@ RegBankLegalizeHelper::getRegBankFromID(RegBankLLTMappingApplyID ID) { case Sgpr32AExtBoolInReg: case Sgpr32SExt: return SgprRB; + case Vgpr16: case Vgpr32: case Vgpr64: case VgprP0: @@ -561,6 +573,7 @@ void RegBankLegalizeHelper::applyMappingDst( case SgprP4: case SgprP5: case SgprV4S32: +case Vgpr16: case Vgpr32: case Vgpr64: case VgprP0: @@ -692,6 +705,7 @@ void RegBankLegalizeHelper::applyMappingSrc( break; } // vgpr scalars, pointers and vectors +case Vgpr16: case Vgpr32: case Vgpr64: case VgprP0: diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.h b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.h index 2d4da4cc90ea7..bbfa7b3986fd2 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.h +++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.h @@ -112,6 +112,7 @@ class RegBankLegalizeHelper { void lowerV_BFE(MachineInstr &MI); void lowerS_BFE(MachineInstr &MI); + void lowerSplitTo32(MachineInstr &MI); }; } // end
[llvm-branch-commits] [clang] [llvm] [HLSL][RootSignature] Add parsing for RootFlags (PR #138055)
https://github.com/inbelic updated https://github.com/llvm/llvm-project/pull/138055 >From 85e92c438ca39c24fdc7840afebd88ebaf025a3c Mon Sep 17 00:00:00 2001 From: Finn Plummer Date: Wed, 30 Apr 2025 23:14:07 + Subject: [PATCH 1/6] pre-req: define missing lexer tokens for flags --- .../clang/Lex/HLSLRootSignatureTokenKinds.def | 19 +++ .../Lex/LexHLSLRootSignatureTest.cpp | 15 ++- 2 files changed, 33 insertions(+), 1 deletion(-) diff --git a/clang/include/clang/Lex/HLSLRootSignatureTokenKinds.def b/clang/include/clang/Lex/HLSLRootSignatureTokenKinds.def index ecb8cfc7afa16..eac6ebda84965 100644 --- a/clang/include/clang/Lex/HLSLRootSignatureTokenKinds.def +++ b/clang/include/clang/Lex/HLSLRootSignatureTokenKinds.def @@ -27,6 +27,9 @@ #endif // Defines the various types of enum +#ifndef ROOT_FLAG_ENUM +#define ROOT_FLAG_ENUM(NAME, LIT) ENUM(NAME, LIT) +#endif #ifndef UNBOUNDED_ENUM #define UNBOUNDED_ENUM(NAME, LIT) ENUM(NAME, LIT) #endif @@ -73,6 +76,7 @@ PUNCTUATOR(minus, '-') // RootElement Keywords: KEYWORD(RootSignature) // used only for diagnostic messaging +KEYWORD(RootFlags) KEYWORD(DescriptorTable) KEYWORD(RootConstants) @@ -100,6 +104,20 @@ UNBOUNDED_ENUM(unbounded, "unbounded") // Descriptor Range Offset Enum: DESCRIPTOR_RANGE_OFFSET_ENUM(DescriptorRangeOffsetAppend, "DESCRIPTOR_RANGE_OFFSET_APPEND") +// Root Flag Enums: +ROOT_FLAG_ENUM(AllowInputAssemblerInputLayout, "ALLOW_INPUT_ASSEMBLER_INPUT_LAYOUT") +ROOT_FLAG_ENUM(DenyVertexShaderRootAccess, "DENY_VERTEX_SHADER_ROOT_ACCESS") +ROOT_FLAG_ENUM(DenyHullShaderRootAccess, "DENY_HULL_SHADER_ROOT_ACCESS") +ROOT_FLAG_ENUM(DenyDomainShaderRootAccess, "DENY_DOMAIN_SHADER_ROOT_ACCESS") +ROOT_FLAG_ENUM(DenyGeometryShaderRootAccess, "DENY_GEOMETRY_SHADER_ROOT_ACCESS") +ROOT_FLAG_ENUM(DenyPixelShaderRootAccess, "DENY_PIXEL_SHADER_ROOT_ACCESS") +ROOT_FLAG_ENUM(DenyAmplificationShaderRootAccess, "DENY_AMPLIFICATION_SHADER_ROOT_ACCESS") +ROOT_FLAG_ENUM(DenyMeshShaderRootAccess, "DENY_MESH_SHADER_ROOT_ACCESS") +ROOT_FLAG_ENUM(AllowStreamOutput, "ALLOW_STREAM_OUTPUT") +ROOT_FLAG_ENUM(LocalRootSignature, "LOCAL_ROOT_SIGNATURE") +ROOT_FLAG_ENUM(CBVSRVUAVHeapDirectlyIndexed, "CBV_SRV_UAV_HEAP_DIRECTLY_INDEXED") +ROOT_FLAG_ENUM(SamplerHeapDirectlyIndexed , "SAMPLER_HEAP_DIRECTLY_INDEXED") + // Root Descriptor Flag Enums: ROOT_DESCRIPTOR_FLAG_ENUM(DataVolatile, "DATA_VOLATILE") ROOT_DESCRIPTOR_FLAG_ENUM(DataStaticWhileSetAtExecute, "DATA_STATIC_WHILE_SET_AT_EXECUTE") @@ -127,6 +145,7 @@ SHADER_VISIBILITY_ENUM(Mesh, "SHADER_VISIBILITY_MESH") #undef DESCRIPTOR_RANGE_FLAG_ENUM_OFF #undef DESCRIPTOR_RANGE_FLAG_ENUM_ON #undef ROOT_DESCRIPTOR_FLAG_ENUM +#undef ROOT_FLAG_ENUM #undef DESCRIPTOR_RANGE_OFFSET_ENUM #undef UNBOUNDED_ENUM #undef ENUM diff --git a/clang/unittests/Lex/LexHLSLRootSignatureTest.cpp b/clang/unittests/Lex/LexHLSLRootSignatureTest.cpp index 89e9a3183ad03..21a1f1f08ae05 100644 --- a/clang/unittests/Lex/LexHLSLRootSignatureTest.cpp +++ b/clang/unittests/Lex/LexHLSLRootSignatureTest.cpp @@ -87,7 +87,7 @@ TEST_F(LexHLSLRootSignatureTest, ValidLexAllTokensTest) { RootSignature -DescriptorTable RootConstants +RootFlags DescriptorTable RootConstants num32BitConstants @@ -98,6 +98,19 @@ TEST_F(LexHLSLRootSignatureTest, ValidLexAllTokensTest) { unbounded DESCRIPTOR_RANGE_OFFSET_APPEND +allow_input_assembler_input_layout +deny_vertex_shader_root_access +deny_hull_shader_root_access +deny_domain_shader_root_access +deny_geometry_shader_root_access +deny_pixel_shader_root_access +deny_amplification_shader_root_access +deny_mesh_shader_root_access +allow_stream_output +local_root_signature +cbv_srv_uav_heap_directly_indexed +sampler_heap_directly_indexed + DATA_VOLATILE DATA_STATIC_WHILE_SET_AT_EXECUTE DATA_STATIC >From 2f8222ec73805bd8d4a6f3634e7702c7e925dd50 Mon Sep 17 00:00:00 2001 From: Finn Plummer Date: Wed, 30 Apr 2025 23:21:06 + Subject: [PATCH 2/6] [HLSL][RootSignature] Add parsing for empty RootFlags - defines the `RootFlags` in-memory enum - defines a template of `parseRootFlags` that will allow handling of parsing root flags --- .../clang/Parse/ParseHLSLRootSignature.h | 1 + clang/lib/Parse/ParseHLSLRootSignature.cpp| 25 + .../Parse/ParseHLSLRootSignatureTest.cpp | 27 +++ .../llvm/Frontend/HLSL/HLSLRootSignature.h| 19 - 4 files changed, 71 insertions(+), 1 deletion(-) diff --git a/clang/include/clang/Parse/ParseHLSLRootSignature.h b/clang/include/clang/Parse/ParseHLSLRootSignature.h index 2ac2083983741..915266f8a36ae 100644 --- a/clang/include/clang/Parse/ParseHLSLRootSignature.h +++ b/clang/include/clang/Parse/ParseHLSLRootSignature.h @@ -71,6 +71,7 @@ class RootSignatureParser { // expected, or, there is a lexing error /// Root Element parse methods: + std::optional parseRootFlags()
[llvm-branch-commits] [llvm] [SelectionDAG][X86] Remove unused elements from atomic vector. (PR #125432)
@@ -7230,6 +7234,20 @@ static bool findEltLoadSrc(SDValue Elt, LoadSDNode *&Ld, int64_t &ByteOffset) { } } break; + case ISD::EXTRACT_ELEMENT: +if (auto *IdxC = dyn_cast(Elt.getOperand(1))) { jofrn wrote: If it isn't a constant, shall we assert false or abort the transform? Right now it aborts the transform. If we know it always is and will be constant, I guess asserting is better. Separately as in another PR? There would be no associated test change as this is required for the optimization. Yes, if we use the ones already implemented here, then we will be able to discover the `ByteOffset`. Why do we want to avoid EXTRACT_ELEMENT? It seems to work here. https://github.com/llvm/llvm-project/pull/125432 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [SelectionDAG][X86] Remove unused elements from atomic vector. (PR #125432)
https://github.com/jofrn edited https://github.com/llvm/llvm-project/pull/125432 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [lld] release/20.x: [wasm-ld] Refactor WasmSym from static globals to per-link context (#134970) (PR #137620)
anutosh491 wrote: > If this patch is needed to run your emscripten tests, how did they pass prior > to the llvm 20 upgrade? Hey @sbc100 Not sure you remember but we did disucss a good chunk of this in the issue I had raised https://github.com/llvm/llvm-project/issues/134809 This is exactly what changed from 19.1.7 to 20.x https://github.com/llvm/llvm-project/issues/134809#issuecomment-2786937154 https://github.com/llvm/llvm-project/pull/137620 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] IR: Remove redundant UseList check in addUse (PR #138676)
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/138676 >From dedd3209ffec0129ff862ff8808eacbf13c15688 Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Tue, 6 May 2025 12:06:17 +0200 Subject: [PATCH] IR: Remove redundant UseList check in addUse Not sure if this did anything for compile time or not. --- llvm/include/llvm/IR/Value.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/llvm/include/llvm/IR/Value.h b/llvm/include/llvm/IR/Value.h index 241b9e2860c4c..c276899e673a3 100644 --- a/llvm/include/llvm/IR/Value.h +++ b/llvm/include/llvm/IR/Value.h @@ -509,7 +509,7 @@ class Value { /// This method should only be used by the Use class. void addUse(Use &U) { -if (UseList || hasUseList()) +if (hasUseList()) U.addToList(&UseList); } ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] IR: Remove redundant UseList check in addUse (PR #138676)
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/138676 >From dedd3209ffec0129ff862ff8808eacbf13c15688 Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Tue, 6 May 2025 12:06:17 +0200 Subject: [PATCH] IR: Remove redundant UseList check in addUse Not sure if this did anything for compile time or not. --- llvm/include/llvm/IR/Value.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/llvm/include/llvm/IR/Value.h b/llvm/include/llvm/IR/Value.h index 241b9e2860c4c..c276899e673a3 100644 --- a/llvm/include/llvm/IR/Value.h +++ b/llvm/include/llvm/IR/Value.h @@ -509,7 +509,7 @@ class Value { /// This method should only be used by the Use class. void addUse(Use &U) { -if (UseList || hasUseList()) +if (hasUseList()) U.addToList(&UseList); } ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [NFC] Refactoring MCDXBC to support out of order storage of root parameters (PR #137284)
@@ -6,21 +6,87 @@ // //===--===// +#include "llvm/ADT/STLForwardCompat.h" #include "llvm/BinaryFormat/DXContainer.h" +#include inbelic wrote: nit: to check that all these headers are required https://github.com/llvm/llvm-project/pull/137284 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [NFC] Refactoring MCDXBC to support out of order storage of root parameters (PR #137284)
@@ -274,27 +274,37 @@ void DXContainerWriter::writeParts(raw_ostream &OS) { RS.StaticSamplersOffset = P.RootSignature->StaticSamplersOffset; for (const auto &Param : P.RootSignature->Parameters) { -mcdxbc::RootParameter NewParam; -NewParam.Header = dxbc::RootParameterHeader{ -Param.Type, Param.Visibility, Param.Offset}; +auto Header = dxbc::RootParameterHeader{Param.Type, Param.Visibility, +Param.Offset}; switch (Param.Type) { case llvm::to_underlying(dxbc::RootParameterType::Constants32Bit): - NewParam.Constants.Num32BitValues = Param.Constants.Num32BitValues; - NewParam.Constants.RegisterSpace = Param.Constants.RegisterSpace; - NewParam.Constants.ShaderRegister = Param.Constants.ShaderRegister; + dxbc::RootConstants Constants; + Constants.Num32BitValues = Param.Constants.Num32BitValues; + Constants.RegisterSpace = Param.Constants.RegisterSpace; + Constants.ShaderRegister = Param.Constants.ShaderRegister; + RS.ParametersContainer.addParameter(Header, Constants); break; case llvm::to_underlying(dxbc::RootParameterType::SRV): case llvm::to_underlying(dxbc::RootParameterType::UAV): case llvm::to_underlying(dxbc::RootParameterType::CBV): - NewParam.Descriptor.RegisterSpace = Param.Descriptor.RegisterSpace; - NewParam.Descriptor.ShaderRegister = Param.Descriptor.ShaderRegister; - if (P.RootSignature->Version > 1) -NewParam.Descriptor.Flags = Param.Descriptor.getEncodedFlags(); + if (RS.Version == 1) { +dxbc::RST0::v0::RootDescriptor Descriptor; +Descriptor.RegisterSpace = Param.Descriptor.RegisterSpace; +Descriptor.ShaderRegister = Param.Descriptor.ShaderRegister; +RS.ParametersContainer.addParameter(Header, Descriptor); + } else { +dxbc::RST0::v1::RootDescriptor Descriptor; +Descriptor.RegisterSpace = Param.Descriptor.RegisterSpace; +Descriptor.ShaderRegister = Param.Descriptor.ShaderRegister; +Descriptor.Flags = Param.Descriptor.getEncodedFlags(); + RS.ParametersContainer.addParameter(Header, Descriptor); + } break; +default: + // Handling invalid parameter type edge case inbelic wrote: Should this not raise an error of sorts? Seems like we silently ignore an invalid parameter? If it can't happen then an assert or unreachable might be better https://github.com/llvm/llvm-project/pull/137284 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [NFC] Refactoring MCDXBC to support out of order storage of root parameters (PR #137284)
@@ -274,27 +274,37 @@ void DXContainerWriter::writeParts(raw_ostream &OS) { RS.StaticSamplersOffset = P.RootSignature->StaticSamplersOffset; for (const auto &Param : P.RootSignature->Parameters) { -mcdxbc::RootParameter NewParam; -NewParam.Header = dxbc::RootParameterHeader{ -Param.Type, Param.Visibility, Param.Offset}; +auto Header = dxbc::RootParameterHeader{Param.Type, Param.Visibility, +Param.Offset}; switch (Param.Type) { case llvm::to_underlying(dxbc::RootParameterType::Constants32Bit): - NewParam.Constants.Num32BitValues = Param.Constants.Num32BitValues; - NewParam.Constants.RegisterSpace = Param.Constants.RegisterSpace; - NewParam.Constants.ShaderRegister = Param.Constants.ShaderRegister; + dxbc::RootConstants Constants; + Constants.Num32BitValues = Param.Constants.Num32BitValues; + Constants.RegisterSpace = Param.Constants.RegisterSpace; + Constants.ShaderRegister = Param.Constants.ShaderRegister; + RS.ParametersContainer.addParameter(Header, Constants); break; case llvm::to_underlying(dxbc::RootParameterType::SRV): case llvm::to_underlying(dxbc::RootParameterType::UAV): case llvm::to_underlying(dxbc::RootParameterType::CBV): - NewParam.Descriptor.RegisterSpace = Param.Descriptor.RegisterSpace; - NewParam.Descriptor.ShaderRegister = Param.Descriptor.ShaderRegister; - if (P.RootSignature->Version > 1) -NewParam.Descriptor.Flags = Param.Descriptor.getEncodedFlags(); + if (RS.Version == 1) { +dxbc::RST0::v0::RootDescriptor Descriptor; +Descriptor.RegisterSpace = Param.Descriptor.RegisterSpace; +Descriptor.ShaderRegister = Param.Descriptor.ShaderRegister; +RS.ParametersContainer.addParameter(Header, Descriptor); + } else { +dxbc::RST0::v1::RootDescriptor Descriptor; +Descriptor.RegisterSpace = Param.Descriptor.RegisterSpace; +Descriptor.ShaderRegister = Param.Descriptor.ShaderRegister; +Descriptor.Flags = Param.Descriptor.getEncodedFlags(); + RS.ParametersContainer.addParameter(Header, Descriptor); + } break; +default: + // Handling invalid parameter type edge case joaosaffran wrote: We intentionally let obj2yaml/yaml2obj parse and emit invalid dxcontainer data, in order for that to be used as a testing tool more effectively. https://github.com/llvm/llvm-project/pull/137284 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [NFC] Refactoring MCDXBC to support out of order storage of root parameters (PR #137284)
@@ -274,27 +274,37 @@ void DXContainerWriter::writeParts(raw_ostream &OS) { RS.StaticSamplersOffset = P.RootSignature->StaticSamplersOffset; for (const auto &Param : P.RootSignature->Parameters) { -mcdxbc::RootParameter NewParam; -NewParam.Header = dxbc::RootParameterHeader{ -Param.Type, Param.Visibility, Param.Offset}; +auto Header = dxbc::RootParameterHeader{Param.Type, Param.Visibility, +Param.Offset}; switch (Param.Type) { case llvm::to_underlying(dxbc::RootParameterType::Constants32Bit): - NewParam.Constants.Num32BitValues = Param.Constants.Num32BitValues; - NewParam.Constants.RegisterSpace = Param.Constants.RegisterSpace; - NewParam.Constants.ShaderRegister = Param.Constants.ShaderRegister; + dxbc::RootConstants Constants; + Constants.Num32BitValues = Param.Constants.Num32BitValues; + Constants.RegisterSpace = Param.Constants.RegisterSpace; + Constants.ShaderRegister = Param.Constants.ShaderRegister; + RS.ParametersContainer.addParameter(Header, Constants); break; case llvm::to_underlying(dxbc::RootParameterType::SRV): case llvm::to_underlying(dxbc::RootParameterType::UAV): case llvm::to_underlying(dxbc::RootParameterType::CBV): - NewParam.Descriptor.RegisterSpace = Param.Descriptor.RegisterSpace; - NewParam.Descriptor.ShaderRegister = Param.Descriptor.ShaderRegister; - if (P.RootSignature->Version > 1) -NewParam.Descriptor.Flags = Param.Descriptor.getEncodedFlags(); + if (RS.Version == 1) { +dxbc::RST0::v0::RootDescriptor Descriptor; +Descriptor.RegisterSpace = Param.Descriptor.RegisterSpace; +Descriptor.ShaderRegister = Param.Descriptor.ShaderRegister; +RS.ParametersContainer.addParameter(Header, Descriptor); + } else { +dxbc::RST0::v1::RootDescriptor Descriptor; +Descriptor.RegisterSpace = Param.Descriptor.RegisterSpace; +Descriptor.ShaderRegister = Param.Descriptor.ShaderRegister; +Descriptor.Flags = Param.Descriptor.getEncodedFlags(); + RS.ParametersContainer.addParameter(Header, Descriptor); + } break; +default: + // Handling invalid parameter type edge case inbelic wrote: Can we add that as a comment to the file to provide the context for the future person to read and know why https://github.com/llvm/llvm-project/pull/137284 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [GOFF] Add writing of text records (PR #137235)
https://github.com/redstar updated https://github.com/llvm/llvm-project/pull/137235 Rate limit ยท GitHub body { background-color: #f6f8fa; color: #24292e; font-family: -apple-system,BlinkMacSystemFont,Segoe UI,Helvetica,Arial,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol; font-size: 14px; line-height: 1.5; margin: 0; } .container { margin: 50px auto; max-width: 600px; text-align: center; padding: 0 24px; } a { color: #0366d6; text-decoration: none; } a:hover { text-decoration: underline; } h1 { line-height: 60px; font-size: 48px; font-weight: 300; margin: 0px; text-shadow: 0 1px 0 #fff; } p { color: rgba(0, 0, 0, 0.5); margin: 20px 0 40px; } ul { list-style: none; margin: 25px 0; padding: 0; } li { display: table-cell; font-weight: bold; width: 1%; } .logo { display: inline-block; margin-top: 35px; } .logo-img-2x { display: none; } @media only screen and (-webkit-min-device-pixel-ratio: 2), only screen and ( min--moz-device-pixel-ratio: 2), only screen and ( -o-min-device-pixel-ratio: 2/1), only screen and (min-device-pixel-ratio: 2), only screen and (min-resolution: 192dpi), only screen and (min-resolution: 2dppx) { .logo-img-1x { display: none; } .logo-img-2x { display: inline-block; } } #suggestions { margin-top: 35px; color: #ccc; } #suggestions a { color: #66; font-weight: 200; font-size: 14px; margin: 0 10px; } Whoa there! You have exceeded a secondary rate limit. Please wait a few minutes before you try again; in some cases this may take up to an hour. https://support.github.com/contact";>Contact Support โ https://githubstatus.com";>GitHub Status โ https://twitter.com/githubstatus";>@githubstatus ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] clang: Remove dest LangAS argument from performAddrSpaceCast (PR #138866)
https://github.com/yxsamliu approved this pull request. The intention was to make the interface more flexible in cases that a target may want to do some arithmetic directly based on target address space instead of an addrspacecast inst. However, so many years have passed and no target was doing that. I think we could remove it. https://github.com/llvm/llvm-project/pull/138866 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [ObjC] Support objc_claimAutoreleasedReturnValue (PR #138696)
citymarina wrote: Thanks for the feedback, I'm working on an update to address the comments, but currently juggling too many tasks. https://github.com/llvm/llvm-project/pull/138696 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: add RegBankLegalize rules for extends and trunc (PR #132383)
https://github.com/petar-avramovic updated https://github.com/llvm/llvm-project/pull/132383 >From 915bee3729c66b327885838c6384d14f92c2af2d Mon Sep 17 00:00:00 2001 From: Petar Avramovic Date: Thu, 8 May 2025 12:03:28 +0200 Subject: [PATCH] AMDGPU/GlobalISel: add RegBankLegalize rules for extends and trunc Uniform S1: Truncs to uniform S1 and AnyExts from S1 are left as is as they are meant to be combined away. Uniform S1 ZExt and SExt are lowered using select. Divergent S1: Trunc of VGPR to VCC is lowered as compare. Extends of VCC are lowered using select. For remaining types: S32 to S64 ZExt and SExt are lowered using merge values, AnyExt and Trunc are again left as is to be combined away. Notably uniform S16 for SExt and Zext is not lowered to S32 and left as is for instruction select to deal with them. This is because there are patterns that check for S16 type. --- .../Target/AMDGPU/AMDGPURegBankLegalize.cpp | 7 ++ .../AMDGPU/AMDGPURegBankLegalizeHelper.cpp| 110 +- .../AMDGPU/AMDGPURegBankLegalizeHelper.h | 1 + .../AMDGPU/AMDGPURegBankLegalizeRules.cpp | 47 +++- .../AMDGPU/AMDGPURegBankLegalizeRules.h | 3 + .../GlobalISel/regbankselect-and-s1.mir | 105 + .../GlobalISel/regbankselect-anyext.mir | 59 +- .../AMDGPU/GlobalISel/regbankselect-sext.mir | 100 ++-- .../AMDGPU/GlobalISel/regbankselect-trunc.mir | 22 +++- .../AMDGPU/GlobalISel/regbankselect-zext.mir | 89 +- 10 files changed, 360 insertions(+), 183 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp index ad6a0772fe8b6..9544c9f43eeaf 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp @@ -214,6 +214,13 @@ class AMDGPURegBankLegalizeCombiner { return; } +if (DstTy == S64 && TruncSrcTy == S32) { + B.buildMergeLikeInstr(MI.getOperand(0).getReg(), +{TruncSrc, B.buildUndef({SgprRB, S32})}); + cleanUpAfterCombine(MI, Trunc); + return; +} + if (DstTy == S32 && TruncSrcTy == S16) { B.buildAnyExt(Dst, TruncSrc); cleanUpAfterCombine(MI, Trunc); diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp index aa4584bb0f6a1..77f37f3d19875 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp @@ -133,6 +133,43 @@ void RegBankLegalizeHelper::widenLoad(MachineInstr &MI, LLT WideTy, MI.eraseFromParent(); } +void RegBankLegalizeHelper::lowerVccExtToSel(MachineInstr &MI) { + Register Dst = MI.getOperand(0).getReg(); + LLT Ty = MRI.getType(Dst); + Register Src = MI.getOperand(1).getReg(); + unsigned Opc = MI.getOpcode(); + int TrueExtCst = (Opc == G_SEXT ? -1 : 1); + if (Ty == S32 || Ty == S16) { +auto True = B.buildConstant({VgprRB, Ty}, TrueExtCst); +auto False = B.buildConstant({VgprRB, Ty}, 0); +B.buildSelect(Dst, Src, True, False); + } else if (Ty == S64) { +auto True = B.buildConstant({VgprRB_S32}, TrueExtCst); +auto False = B.buildConstant({VgprRB_S32}, 0); +auto Lo = B.buildSelect({VgprRB_S32}, Src, True, False); +MachineInstrBuilder Hi; +switch (Opc) { +case G_SEXT: + Hi = Lo; + break; +case G_ZEXT: + Hi = False; + break; +case G_ANYEXT: + Hi = B.buildUndef({VgprRB_S32}); + break; +default: + llvm_unreachable("Opcode not supported"); +} + +B.buildMergeValues(Dst, {Lo.getReg(0), Hi.getReg(0)}); + } else { +llvm_unreachable("Type not supported"); + } + + MI.eraseFromParent(); +} + static bool isSignedBFE(MachineInstr &MI) { if (GIntrinsic *GI = dyn_cast(&MI)) { if (GI->is(Intrinsic::amdgcn_sbfe)) @@ -263,26 +300,8 @@ void RegBankLegalizeHelper::lower(MachineInstr &MI, switch (Mapping.LoweringMethod) { case DoNotLower: return; - case VccExtToSel: { -LLT Ty = MRI.getType(MI.getOperand(0).getReg()); -Register Src = MI.getOperand(1).getReg(); -unsigned Opc = MI.getOpcode(); -if (Ty == S32 || Ty == S16) { - auto True = B.buildConstant({VgprRB, Ty}, Opc == G_SEXT ? -1 : 1); - auto False = B.buildConstant({VgprRB, Ty}, 0); - B.buildSelect(MI.getOperand(0).getReg(), Src, True, False); -} -if (Ty == S64) { - auto True = B.buildConstant({VgprRB, S32}, Opc == G_SEXT ? -1 : 1); - auto False = B.buildConstant({VgprRB, S32}, 0); - auto Sel = B.buildSelect({VgprRB, S32}, Src, True, False); - B.buildMergeValues( - MI.getOperand(0).getReg(), - {Sel.getReg(0), Opc == G_SEXT ? Sel.getReg(0) : False.getReg(0)}); -} -MI.eraseFromParent(); -return; - } + case VccExtToSel: +return lowerVccExtToSel(MI); case UniExtToSel: { LLT Ty = MRI.getType(MI.getOperand(0).ge
[llvm-branch-commits] [clang] [clang][OpenMP] Pass OpenMP version to getOpenMPDirectiveName (PR #139115)
https://github.com/erichkeane commented: This seems to be reasonable as far as I can tell. I DO wonder if `getOpenMPDirectiveName` should lose non-version overload though, so we make sure we get this right everywhere. https://github.com/llvm/llvm-project/pull/139115 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [clang][OpenMP] Pass OpenMP version to getOpenMPDirectiveName (PR #139115)
@@ -94,6 +94,10 @@ struct PrintingPolicy { /// The number of spaces to use to indent each line. unsigned Indentation : 8; + /// Version of the effective OpenMP spec (used to select directive name + /// spelling). + unsigned OpenMP : 8; + alexey-bataev wrote: Maybe instead store openmp version in OMPClausePrinter? https://github.com/llvm/llvm-project/pull/139115 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [clang][OpenMP] Pass OpenMP version to getOpenMPDirectiveName (PR #139115)
@@ -965,13 +965,13 @@ void StmtPrinter::VisitOMPTeamsDirective(OMPTeamsDirective *Node) { void StmtPrinter::VisitOMPCancellationPointDirective( OMPCancellationPointDirective *Node) { Indent() << "#pragma omp cancellation point " - << getOpenMPDirectiveName(Node->getCancelRegion()); + << getOpenMPDirectiveName(Node->getCancelRegion(), Policy.OpenMP); alexey-bataev wrote: The version can be obtained via Context->getLangOpts().OpenMP, if Context is non-nullptr. https://github.com/llvm/llvm-project/pull/139115 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [clang][OpenMP] Pass OpenMP version to getOpenMPDirectiveName (PR #139115)
https://github.com/kparzysz updated https://github.com/llvm/llvm-project/pull/139115 >From fa6e19481f448db273f84d270891e737ff021749 Mon Sep 17 00:00:00 2001 From: Krzysztof Parzyszek Date: Wed, 7 May 2025 15:32:08 -0500 Subject: [PATCH 1/2] [clang][OpenMP] Pass OpenMP version to getOpenMPDirectiveName The OpenMP version is stored in language options in Sema. For use in objects that do not have access to Sema, add OpenMP version field to PrintingPolicy, giving it 8 bits. RFC: https://discourse.llvm.org/t/rfc-alternative-spellings-of-openmp-directives/85507 --- clang/include/clang/AST/PrettyPrinter.h | 6 +- clang/lib/AST/OpenMPClause.cpp | 7 +- clang/lib/AST/StmtPrinter.cpp | 4 +- clang/lib/Basic/OpenMPKinds.cpp | 3 +- clang/lib/Parse/ParseOpenMP.cpp | 128 +++--- clang/lib/Sema/SemaOpenMP.cpp | 167 +++- clang/lib/Sema/TreeTransform.h | 3 +- 7 files changed, 202 insertions(+), 116 deletions(-) diff --git a/clang/include/clang/AST/PrettyPrinter.h b/clang/include/clang/AST/PrettyPrinter.h index 5a98ae1987b16..1f64466956b66 100644 --- a/clang/include/clang/AST/PrettyPrinter.h +++ b/clang/include/clang/AST/PrettyPrinter.h @@ -59,7 +59,7 @@ struct PrintingPolicy { /// Create a default printing policy for the specified language. PrintingPolicy(const LangOptions &LO) - : Indentation(2), SuppressSpecifiers(false), + : Indentation(2), OpenMP(LO.OpenMP), SuppressSpecifiers(false), SuppressTagKeyword(LO.CPlusPlus), IncludeTagDefinition(false), SuppressScope(false), SuppressUnwrittenScope(false), SuppressInlineNamespace(SuppressInlineNamespaceMode::Redundant), @@ -94,6 +94,10 @@ struct PrintingPolicy { /// The number of spaces to use to indent each line. unsigned Indentation : 8; + /// Version of the effective OpenMP spec (used to select directive name + /// spelling). + unsigned OpenMP : 8; + /// Whether we should suppress printing of the actual specifiers for /// the given type or declaration. /// diff --git a/clang/lib/AST/OpenMPClause.cpp b/clang/lib/AST/OpenMPClause.cpp index 2226791a70b6e..5ed04d250f5a1 100644 --- a/clang/lib/AST/OpenMPClause.cpp +++ b/clang/lib/AST/OpenMPClause.cpp @@ -1821,7 +1821,8 @@ OMPThreadLimitClause *OMPThreadLimitClause::CreateEmpty(const ASTContext &C, void OMPClausePrinter::VisitOMPIfClause(OMPIfClause *Node) { OS << "if("; if (Node->getNameModifier() != OMPD_unknown) -OS << getOpenMPDirectiveName(Node->getNameModifier()) << ": "; +OS << getOpenMPDirectiveName(Node->getNameModifier(), Policy.OpenMP) + << ": "; Node->getCondition()->printPretty(OS, nullptr, Policy, 0); OS << ")"; } @@ -2049,7 +2050,7 @@ void OMPClausePrinter::VisitOMPAbsentClause(OMPAbsentClause *Node) { for (auto &D : Node->getDirectiveKinds()) { if (!First) OS << ", "; -OS << getOpenMPDirectiveName(D); +OS << getOpenMPDirectiveName(D, Policy.OpenMP); First = false; } OS << ")"; @@ -2067,7 +2068,7 @@ void OMPClausePrinter::VisitOMPContainsClause(OMPContainsClause *Node) { for (auto &D : Node->getDirectiveKinds()) { if (!First) OS << ", "; -OS << getOpenMPDirectiveName(D); +OS << getOpenMPDirectiveName(D, Policy.OpenMP); First = false; } OS << ")"; diff --git a/clang/lib/AST/StmtPrinter.cpp b/clang/lib/AST/StmtPrinter.cpp index c6c49c6c1ba4d..53f7bb2f64249 100644 --- a/clang/lib/AST/StmtPrinter.cpp +++ b/clang/lib/AST/StmtPrinter.cpp @@ -965,13 +965,13 @@ void StmtPrinter::VisitOMPTeamsDirective(OMPTeamsDirective *Node) { void StmtPrinter::VisitOMPCancellationPointDirective( OMPCancellationPointDirective *Node) { Indent() << "#pragma omp cancellation point " - << getOpenMPDirectiveName(Node->getCancelRegion()); + << getOpenMPDirectiveName(Node->getCancelRegion(), Policy.OpenMP); PrintOMPExecutableDirective(Node); } void StmtPrinter::VisitOMPCancelDirective(OMPCancelDirective *Node) { Indent() << "#pragma omp cancel " - << getOpenMPDirectiveName(Node->getCancelRegion()); + << getOpenMPDirectiveName(Node->getCancelRegion(), Policy.OpenMP); PrintOMPExecutableDirective(Node); } diff --git a/clang/lib/Basic/OpenMPKinds.cpp b/clang/lib/Basic/OpenMPKinds.cpp index 7b90861c78de0..a451fc7c01841 100644 --- a/clang/lib/Basic/OpenMPKinds.cpp +++ b/clang/lib/Basic/OpenMPKinds.cpp @@ -850,7 +850,8 @@ void clang::getOpenMPCaptureRegions( case OMPD_master: return false; default: - llvm::errs() << getOpenMPDirectiveName(LKind) << '\n'; + llvm::errs() << getOpenMPDirectiveName(LKind, llvm::omp::FallbackVersion) + << '\n'; llvm_unreachable("Unexpected directive"); } return false; diff --git a/clang/lib/Parse/ParseOpenMP.cpp b/clang/lib/Parse/ParseOpenMP.cpp index 8d8698e61216f..4f87d6c74b251 100644 --- a/clang/lib/Parse/ParseOpenMP.cpp +++ b
[llvm-branch-commits] [clang] clang: Remove dest LangAS argument from performAddrSpaceCast (PR #138866)
rjmccall wrote: I would like to continue to traffic in LangAS, if for no other reason than that it promotes better habits within CodeGen. Otherwise, I think we will end up with a lot of code that immediately lowers address spaces, and that will make it very difficult to treat address spaces as a frontend abstraction if we ever do have a need for that, which I continue to believe is a likely future development. https://github.com/llvm/llvm-project/pull/138866 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [lld] release/20.x: [wasm-ld] Refactor WasmSym from static globals to per-link context (#134970) (PR #137620)
sbc100 wrote: > > > Do you not build llvm from source in your project? Can't you therefore > > > build from tip-of-tree? > > > > > > Hi, yes I think the latest changes on the release/latest_version.x branch > > is being used for https://github.com/compiler-research/CppInterOp > > (@vgvassilev or @mcbarton can confirm) > > So probably we don't exactly need a release everytime I think but yeah need > > these changes to go into the release branch > > CppInterOp is capable of building against the head of the release branches, > and this is what we do in our ci. Without this patch we are unable to run our > Emscripten tests in our PR to upgrade to llvm 20 here > [compiler-research/CppInterOp#491] If this patch is needed to run your emscripten tests, how did the pass prior to the llvm 20 upgrade? https://github.com/llvm/llvm-project/pull/137620 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [clang][OpenMP] Pass OpenMP version to getOpenMPDirectiveName (PR #139115)
https://github.com/kparzysz updated https://github.com/llvm/llvm-project/pull/139115 Rate limit ยท GitHub body { background-color: #f6f8fa; color: #24292e; font-family: -apple-system,BlinkMacSystemFont,Segoe UI,Helvetica,Arial,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol; font-size: 14px; line-height: 1.5; margin: 0; } .container { margin: 50px auto; max-width: 600px; text-align: center; padding: 0 24px; } a { color: #0366d6; text-decoration: none; } a:hover { text-decoration: underline; } h1 { line-height: 60px; font-size: 48px; font-weight: 300; margin: 0px; text-shadow: 0 1px 0 #fff; } p { color: rgba(0, 0, 0, 0.5); margin: 20px 0 40px; } ul { list-style: none; margin: 25px 0; padding: 0; } li { display: table-cell; font-weight: bold; width: 1%; } .logo { display: inline-block; margin-top: 35px; } .logo-img-2x { display: none; } @media only screen and (-webkit-min-device-pixel-ratio: 2), only screen and ( min--moz-device-pixel-ratio: 2), only screen and ( -o-min-device-pixel-ratio: 2/1), only screen and (min-device-pixel-ratio: 2), only screen and (min-resolution: 192dpi), only screen and (min-resolution: 2dppx) { .logo-img-1x { display: none; } .logo-img-2x { display: inline-block; } } #suggestions { margin-top: 35px; color: #ccc; } #suggestions a { color: #66; font-weight: 200; font-size: 14px; margin: 0 10px; } Whoa there! You have exceeded a secondary rate limit. Please wait a few minutes before you try again; in some cases this may take up to an hour. https://support.github.com/contact";>Contact Support โ https://githubstatus.com";>GitHub Status โ https://twitter.com/githubstatus";>@githubstatus ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [clang][OpenMP] Pass OpenMP version to getOpenMPDirectiveName (PR #139115)
@@ -965,13 +965,13 @@ void StmtPrinter::VisitOMPTeamsDirective(OMPTeamsDirective *Node) { void StmtPrinter::VisitOMPCancellationPointDirective( OMPCancellationPointDirective *Node) { Indent() << "#pragma omp cancellation point " - << getOpenMPDirectiveName(Node->getCancelRegion()); + << getOpenMPDirectiveName(Node->getCancelRegion(), Policy.OpenMP); kparzysz wrote: Done https://github.com/llvm/llvm-project/pull/139115 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [clang][OpenMP] Pass OpenMP version to getOpenMPDirectiveName (PR #139115)
https://github.com/kparzysz updated https://github.com/llvm/llvm-project/pull/139115 Rate limit ยท GitHub body { background-color: #f6f8fa; color: #24292e; font-family: -apple-system,BlinkMacSystemFont,Segoe UI,Helvetica,Arial,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol; font-size: 14px; line-height: 1.5; margin: 0; } .container { margin: 50px auto; max-width: 600px; text-align: center; padding: 0 24px; } a { color: #0366d6; text-decoration: none; } a:hover { text-decoration: underline; } h1 { line-height: 60px; font-size: 48px; font-weight: 300; margin: 0px; text-shadow: 0 1px 0 #fff; } p { color: rgba(0, 0, 0, 0.5); margin: 20px 0 40px; } ul { list-style: none; margin: 25px 0; padding: 0; } li { display: table-cell; font-weight: bold; width: 1%; } .logo { display: inline-block; margin-top: 35px; } .logo-img-2x { display: none; } @media only screen and (-webkit-min-device-pixel-ratio: 2), only screen and ( min--moz-device-pixel-ratio: 2), only screen and ( -o-min-device-pixel-ratio: 2/1), only screen and (min-device-pixel-ratio: 2), only screen and (min-resolution: 192dpi), only screen and (min-resolution: 2dppx) { .logo-img-1x { display: none; } .logo-img-2x { display: inline-block; } } #suggestions { margin-top: 35px; color: #ccc; } #suggestions a { color: #66; font-weight: 200; font-size: 14px; margin: 0 10px; } Whoa there! You have exceeded a secondary rate limit. Please wait a few minutes before you try again; in some cases this may take up to an hour. https://support.github.com/contact";>Contact Support โ https://githubstatus.com";>GitHub Status โ https://twitter.com/githubstatus";>@githubstatus ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [clang][OpenMP] Pass OpenMP version to getOpenMPDirectiveName (PR #139115)
https://github.com/kparzysz updated https://github.com/llvm/llvm-project/pull/139115 >From fa6e19481f448db273f84d270891e737ff021749 Mon Sep 17 00:00:00 2001 From: Krzysztof Parzyszek Date: Wed, 7 May 2025 15:32:08 -0500 Subject: [PATCH 1/6] [clang][OpenMP] Pass OpenMP version to getOpenMPDirectiveName The OpenMP version is stored in language options in Sema. For use in objects that do not have access to Sema, add OpenMP version field to PrintingPolicy, giving it 8 bits. RFC: https://discourse.llvm.org/t/rfc-alternative-spellings-of-openmp-directives/85507 --- clang/include/clang/AST/PrettyPrinter.h | 6 +- clang/lib/AST/OpenMPClause.cpp | 7 +- clang/lib/AST/StmtPrinter.cpp | 4 +- clang/lib/Basic/OpenMPKinds.cpp | 3 +- clang/lib/Parse/ParseOpenMP.cpp | 128 +++--- clang/lib/Sema/SemaOpenMP.cpp | 167 +++- clang/lib/Sema/TreeTransform.h | 3 +- 7 files changed, 202 insertions(+), 116 deletions(-) diff --git a/clang/include/clang/AST/PrettyPrinter.h b/clang/include/clang/AST/PrettyPrinter.h index 5a98ae1987b16..1f64466956b66 100644 --- a/clang/include/clang/AST/PrettyPrinter.h +++ b/clang/include/clang/AST/PrettyPrinter.h @@ -59,7 +59,7 @@ struct PrintingPolicy { /// Create a default printing policy for the specified language. PrintingPolicy(const LangOptions &LO) - : Indentation(2), SuppressSpecifiers(false), + : Indentation(2), OpenMP(LO.OpenMP), SuppressSpecifiers(false), SuppressTagKeyword(LO.CPlusPlus), IncludeTagDefinition(false), SuppressScope(false), SuppressUnwrittenScope(false), SuppressInlineNamespace(SuppressInlineNamespaceMode::Redundant), @@ -94,6 +94,10 @@ struct PrintingPolicy { /// The number of spaces to use to indent each line. unsigned Indentation : 8; + /// Version of the effective OpenMP spec (used to select directive name + /// spelling). + unsigned OpenMP : 8; + /// Whether we should suppress printing of the actual specifiers for /// the given type or declaration. /// diff --git a/clang/lib/AST/OpenMPClause.cpp b/clang/lib/AST/OpenMPClause.cpp index 2226791a70b6e..5ed04d250f5a1 100644 --- a/clang/lib/AST/OpenMPClause.cpp +++ b/clang/lib/AST/OpenMPClause.cpp @@ -1821,7 +1821,8 @@ OMPThreadLimitClause *OMPThreadLimitClause::CreateEmpty(const ASTContext &C, void OMPClausePrinter::VisitOMPIfClause(OMPIfClause *Node) { OS << "if("; if (Node->getNameModifier() != OMPD_unknown) -OS << getOpenMPDirectiveName(Node->getNameModifier()) << ": "; +OS << getOpenMPDirectiveName(Node->getNameModifier(), Policy.OpenMP) + << ": "; Node->getCondition()->printPretty(OS, nullptr, Policy, 0); OS << ")"; } @@ -2049,7 +2050,7 @@ void OMPClausePrinter::VisitOMPAbsentClause(OMPAbsentClause *Node) { for (auto &D : Node->getDirectiveKinds()) { if (!First) OS << ", "; -OS << getOpenMPDirectiveName(D); +OS << getOpenMPDirectiveName(D, Policy.OpenMP); First = false; } OS << ")"; @@ -2067,7 +2068,7 @@ void OMPClausePrinter::VisitOMPContainsClause(OMPContainsClause *Node) { for (auto &D : Node->getDirectiveKinds()) { if (!First) OS << ", "; -OS << getOpenMPDirectiveName(D); +OS << getOpenMPDirectiveName(D, Policy.OpenMP); First = false; } OS << ")"; diff --git a/clang/lib/AST/StmtPrinter.cpp b/clang/lib/AST/StmtPrinter.cpp index c6c49c6c1ba4d..53f7bb2f64249 100644 --- a/clang/lib/AST/StmtPrinter.cpp +++ b/clang/lib/AST/StmtPrinter.cpp @@ -965,13 +965,13 @@ void StmtPrinter::VisitOMPTeamsDirective(OMPTeamsDirective *Node) { void StmtPrinter::VisitOMPCancellationPointDirective( OMPCancellationPointDirective *Node) { Indent() << "#pragma omp cancellation point " - << getOpenMPDirectiveName(Node->getCancelRegion()); + << getOpenMPDirectiveName(Node->getCancelRegion(), Policy.OpenMP); PrintOMPExecutableDirective(Node); } void StmtPrinter::VisitOMPCancelDirective(OMPCancelDirective *Node) { Indent() << "#pragma omp cancel " - << getOpenMPDirectiveName(Node->getCancelRegion()); + << getOpenMPDirectiveName(Node->getCancelRegion(), Policy.OpenMP); PrintOMPExecutableDirective(Node); } diff --git a/clang/lib/Basic/OpenMPKinds.cpp b/clang/lib/Basic/OpenMPKinds.cpp index 7b90861c78de0..a451fc7c01841 100644 --- a/clang/lib/Basic/OpenMPKinds.cpp +++ b/clang/lib/Basic/OpenMPKinds.cpp @@ -850,7 +850,8 @@ void clang::getOpenMPCaptureRegions( case OMPD_master: return false; default: - llvm::errs() << getOpenMPDirectiveName(LKind) << '\n'; + llvm::errs() << getOpenMPDirectiveName(LKind, llvm::omp::FallbackVersion) + << '\n'; llvm_unreachable("Unexpected directive"); } return false; diff --git a/clang/lib/Parse/ParseOpenMP.cpp b/clang/lib/Parse/ParseOpenMP.cpp index 8d8698e61216f..4f87d6c74b251 100644 --- a/clang/lib/Parse/ParseOpenMP.cpp +++ b
[llvm-branch-commits] [clang] [clang][OpenMP] Pass OpenMP version to getOpenMPDirectiveName (PR #139115)
@@ -73,13 +73,16 @@ namespace { PrintingPolicy Policy; std::string NL; const ASTContext *Context; +unsigned Version; kparzysz wrote: Oops, you're right. Removed. https://github.com/llvm/llvm-project/pull/139115 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [clang][OpenMP] Pass OpenMP version to getOpenMPDirectiveName (PR #139115)
https://github.com/alexey-bataev approved this pull request. https://github.com/llvm/llvm-project/pull/139115 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [clang][OpenMP] Pass OpenMP version to getOpenMPDirectiveName (PR #139115)
https://github.com/kparzysz edited https://github.com/llvm/llvm-project/pull/139115 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][OpenMP] Pass OpenMP version to getOpenMPDirectiveName (PR #139131)
llvmbot wrote: @llvm/pr-subscribers-flang-semantics Author: Krzysztof Parzyszek (kparzysz) Changes The OpenMP version is stored in LangOptions in SemanticsContext. Use the fallback version where SemanticsContext is unavailable (mostly in case of debug dumps). RFC: https://discourse.llvm.org/t/rfc-alternative-spellings-of-openmp-directives/85507 --- Patch is 24.47 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/139131.diff 15 Files Affected: - (modified) flang/examples/FlangOmpReport/FlangOmpReportVisitor.cpp (+3-2) - (modified) flang/include/flang/Parser/dump-parse-tree.h (+3-2) - (modified) flang/include/flang/Parser/unparse.h (+7) - (modified) flang/include/flang/Semantics/unparse-with-symbols.h (+5) - (modified) flang/lib/Frontend/ParserActions.cpp (+2-1) - (modified) flang/lib/Lower/OpenMP/ClauseProcessor.h (+3-1) - (modified) flang/lib/Lower/OpenMP/Decomposer.cpp (+2-1) - (modified) flang/lib/Lower/OpenMP/OpenMP.cpp (+4-2) - (modified) flang/lib/Parser/openmp-parsers.cpp (+2-1) - (modified) flang/lib/Parser/parse-tree.cpp (+4-1) - (modified) flang/lib/Parser/unparse.cpp (+24-15) - (modified) flang/lib/Semantics/check-omp-structure.cpp (+9-9) - (modified) flang/lib/Semantics/mod-file.cpp (+6-5) - (modified) flang/lib/Semantics/resolve-directives.cpp (+16-9) - (modified) flang/lib/Semantics/unparse-with-symbols.cpp (+3-3) ``diff diff --git a/flang/examples/FlangOmpReport/FlangOmpReportVisitor.cpp b/flang/examples/FlangOmpReport/FlangOmpReportVisitor.cpp index dbbf86a6c6151..bf66151d59950 100644 --- a/flang/examples/FlangOmpReport/FlangOmpReportVisitor.cpp +++ b/flang/examples/FlangOmpReport/FlangOmpReportVisitor.cpp @@ -267,8 +267,9 @@ void OpenMPCounterVisitor::Post(const OmpScheduleClause::Kind &c) { "type=" + std::string{OmpScheduleClause::EnumToString(c)} + ";"; } void OpenMPCounterVisitor::Post(const OmpDirectiveNameModifier &c) { - clauseDetails += - "name_modifier=" + llvm::omp::getOpenMPDirectiveName(c.v).str() + ";"; + clauseDetails += "name_modifier=" + + llvm::omp::getOpenMPDirectiveName(c.v, llvm::omp::FallbackVersion).str() + + ";"; } void OpenMPCounterVisitor::Post(const OmpClause &c) { PostClauseCommon(normalize_clause_name(c.source.ToString())); diff --git a/flang/include/flang/Parser/dump-parse-tree.h b/flang/include/flang/Parser/dump-parse-tree.h index a3721bc8410ba..df9278697346f 100644 --- a/flang/include/flang/Parser/dump-parse-tree.h +++ b/flang/include/flang/Parser/dump-parse-tree.h @@ -17,6 +17,7 @@ #include "flang/Common/idioms.h" #include "flang/Common/indirection.h" #include "flang/Support/Fortran.h" +#include "llvm/Frontend/OpenMP/OMP.h" #include "llvm/Support/raw_ostream.h" #include #include @@ -545,8 +546,8 @@ class ParseTreeDumper { NODE(parser, OmpBeginSectionsDirective) NODE(parser, OmpBlockDirective) static std::string GetNodeName(const llvm::omp::Directive &x) { -return llvm::Twine( -"llvm::omp::Directive = ", llvm::omp::getOpenMPDirectiveName(x)) +return llvm::Twine("llvm::omp::Directive = ", +llvm::omp::getOpenMPDirectiveName(x, llvm::omp::FallbackVersion)) .str(); } NODE(parser, OmpClause) diff --git a/flang/include/flang/Parser/unparse.h b/flang/include/flang/Parser/unparse.h index 40094ecbc85e5..349597213d904 100644 --- a/flang/include/flang/Parser/unparse.h +++ b/flang/include/flang/Parser/unparse.h @@ -18,6 +18,10 @@ namespace llvm { class raw_ostream; } +namespace Fortran::common { +class LangOptions; +} + namespace Fortran::evaluate { struct GenericExprWrapper; struct GenericAssignmentWrapper; @@ -47,14 +51,17 @@ struct AnalyzedObjectsAsFortran { // Converts parsed program (or fragment) to out as Fortran. template void Unparse(llvm::raw_ostream &out, const A &root, +const common::LangOptions &langOpts, Encoding encoding = Encoding::UTF_8, bool capitalizeKeywords = true, bool backslashEscapes = true, preStatementType *preStatement = nullptr, AnalyzedObjectsAsFortran * = nullptr); extern template void Unparse(llvm::raw_ostream &out, const Program &program, +const common::LangOptions &langOpts, Encoding encoding, bool capitalizeKeywords, bool backslashEscapes, preStatementType *preStatement, AnalyzedObjectsAsFortran *); extern template void Unparse(llvm::raw_ostream &out, const Expr &expr, +const common::LangOptions &langOpts, Encoding encoding, bool capitalizeKeywords, bool backslashEscapes, preStatementType *preStatement, AnalyzedObjectsAsFortran *); } // namespace Fortran::parser diff --git a/flang/include/flang/Semantics/unparse-with-symbols.h b/flang/include/flang/Semantics/unparse-with-symbols.h index 5e18b3fc3063d..702911bbab627 100644 --- a/flang/include/flang/Semantics/unparse-with-symbols.h +++ b/flang/include/flang/Semantics/unparse-with-symbols.h @@ -16,6 +16,10 @@ namespace llvm { class raw_ostream; } +names
[llvm-branch-commits] [flang] [flang][OpenMP] Pass OpenMP version to getOpenMPDirectiveName (PR #139131)
https://github.com/kparzysz updated https://github.com/llvm/llvm-project/pull/139131 Rate limit ยท GitHub body { background-color: #f6f8fa; color: #24292e; font-family: -apple-system,BlinkMacSystemFont,Segoe UI,Helvetica,Arial,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol; font-size: 14px; line-height: 1.5; margin: 0; } .container { margin: 50px auto; max-width: 600px; text-align: center; padding: 0 24px; } a { color: #0366d6; text-decoration: none; } a:hover { text-decoration: underline; } h1 { line-height: 60px; font-size: 48px; font-weight: 300; margin: 0px; text-shadow: 0 1px 0 #fff; } p { color: rgba(0, 0, 0, 0.5); margin: 20px 0 40px; } ul { list-style: none; margin: 25px 0; padding: 0; } li { display: table-cell; font-weight: bold; width: 1%; } .logo { display: inline-block; margin-top: 35px; } .logo-img-2x { display: none; } @media only screen and (-webkit-min-device-pixel-ratio: 2), only screen and ( min--moz-device-pixel-ratio: 2), only screen and ( -o-min-device-pixel-ratio: 2/1), only screen and (min-device-pixel-ratio: 2), only screen and (min-resolution: 192dpi), only screen and (min-resolution: 2dppx) { .logo-img-1x { display: none; } .logo-img-2x { display: inline-block; } } #suggestions { margin-top: 35px; color: #ccc; } #suggestions a { color: #66; font-weight: 200; font-size: 14px; margin: 0 10px; } Whoa there! You have exceeded a secondary rate limit. Please wait a few minutes before you try again; in some cases this may take up to an hour. https://support.github.com/contact";>Contact Support โ https://githubstatus.com";>GitHub Status โ https://twitter.com/githubstatus";>@githubstatus ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [GOFF] Add writing of text records (PR #137235)
@@ -571,7 +571,6 @@ void MCObjectFileInfo::initGOFFMCObjectFileInfo(const Triple &T) { GOFF::ESD_LB_Initial, GOFF::ESD_RQ_0, GOFF::ESD_ALIGN_Doubleword, 0}, RootSDSection); - redstar wrote: Fixed. https://github.com/llvm/llvm-project/pull/137235 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][OpenMP] Pass OpenMP version to getOpenMPDirectiveName (PR #139131)
https://github.com/kparzysz updated https://github.com/llvm/llvm-project/pull/139131 >From 9566bf6fd60d2b4f1dac86f6646002b2541e6736 Mon Sep 17 00:00:00 2001 From: Krzysztof Parzyszek Date: Wed, 7 May 2025 15:32:28 -0500 Subject: [PATCH 1/4] [flang][OpenMP] Pass OpenMP version to getOpenMPDirectiveName The OpenMP version is stored in LangOptions in SemanticsContext. Use the fallback version where SemanticsContext is unavailable (mostly in case of debug dumps). RFC: https://discourse.llvm.org/t/rfc-alternative-spellings-of-openmp-directives/85507 --- .../FlangOmpReport/FlangOmpReportVisitor.cpp | 5 ++- flang/include/flang/Parser/dump-parse-tree.h | 5 ++- flang/include/flang/Parser/unparse.h | 7 .../flang/Semantics/unparse-with-symbols.h| 5 +++ flang/lib/Frontend/ParserActions.cpp | 3 +- flang/lib/Lower/OpenMP/ClauseProcessor.h | 4 +- flang/lib/Lower/OpenMP/Decomposer.cpp | 3 +- flang/lib/Lower/OpenMP/OpenMP.cpp | 6 ++- flang/lib/Parser/openmp-parsers.cpp | 3 +- flang/lib/Parser/parse-tree.cpp | 5 ++- flang/lib/Parser/unparse.cpp | 39 --- flang/lib/Semantics/check-omp-structure.cpp | 18 - flang/lib/Semantics/mod-file.cpp | 11 +++--- flang/lib/Semantics/resolve-directives.cpp| 25 +++- flang/lib/Semantics/unparse-with-symbols.cpp | 6 +-- 15 files changed, 93 insertions(+), 52 deletions(-) diff --git a/flang/examples/FlangOmpReport/FlangOmpReportVisitor.cpp b/flang/examples/FlangOmpReport/FlangOmpReportVisitor.cpp index dbbf86a6c6151..bf66151d59950 100644 --- a/flang/examples/FlangOmpReport/FlangOmpReportVisitor.cpp +++ b/flang/examples/FlangOmpReport/FlangOmpReportVisitor.cpp @@ -267,8 +267,9 @@ void OpenMPCounterVisitor::Post(const OmpScheduleClause::Kind &c) { "type=" + std::string{OmpScheduleClause::EnumToString(c)} + ";"; } void OpenMPCounterVisitor::Post(const OmpDirectiveNameModifier &c) { - clauseDetails += - "name_modifier=" + llvm::omp::getOpenMPDirectiveName(c.v).str() + ";"; + clauseDetails += "name_modifier=" + + llvm::omp::getOpenMPDirectiveName(c.v, llvm::omp::FallbackVersion).str() + + ";"; } void OpenMPCounterVisitor::Post(const OmpClause &c) { PostClauseCommon(normalize_clause_name(c.source.ToString())); diff --git a/flang/include/flang/Parser/dump-parse-tree.h b/flang/include/flang/Parser/dump-parse-tree.h index a3721bc8410ba..df9278697346f 100644 --- a/flang/include/flang/Parser/dump-parse-tree.h +++ b/flang/include/flang/Parser/dump-parse-tree.h @@ -17,6 +17,7 @@ #include "flang/Common/idioms.h" #include "flang/Common/indirection.h" #include "flang/Support/Fortran.h" +#include "llvm/Frontend/OpenMP/OMP.h" #include "llvm/Support/raw_ostream.h" #include #include @@ -545,8 +546,8 @@ class ParseTreeDumper { NODE(parser, OmpBeginSectionsDirective) NODE(parser, OmpBlockDirective) static std::string GetNodeName(const llvm::omp::Directive &x) { -return llvm::Twine( -"llvm::omp::Directive = ", llvm::omp::getOpenMPDirectiveName(x)) +return llvm::Twine("llvm::omp::Directive = ", +llvm::omp::getOpenMPDirectiveName(x, llvm::omp::FallbackVersion)) .str(); } NODE(parser, OmpClause) diff --git a/flang/include/flang/Parser/unparse.h b/flang/include/flang/Parser/unparse.h index 40094ecbc85e5..349597213d904 100644 --- a/flang/include/flang/Parser/unparse.h +++ b/flang/include/flang/Parser/unparse.h @@ -18,6 +18,10 @@ namespace llvm { class raw_ostream; } +namespace Fortran::common { +class LangOptions; +} + namespace Fortran::evaluate { struct GenericExprWrapper; struct GenericAssignmentWrapper; @@ -47,14 +51,17 @@ struct AnalyzedObjectsAsFortran { // Converts parsed program (or fragment) to out as Fortran. template void Unparse(llvm::raw_ostream &out, const A &root, +const common::LangOptions &langOpts, Encoding encoding = Encoding::UTF_8, bool capitalizeKeywords = true, bool backslashEscapes = true, preStatementType *preStatement = nullptr, AnalyzedObjectsAsFortran * = nullptr); extern template void Unparse(llvm::raw_ostream &out, const Program &program, +const common::LangOptions &langOpts, Encoding encoding, bool capitalizeKeywords, bool backslashEscapes, preStatementType *preStatement, AnalyzedObjectsAsFortran *); extern template void Unparse(llvm::raw_ostream &out, const Expr &expr, +const common::LangOptions &langOpts, Encoding encoding, bool capitalizeKeywords, bool backslashEscapes, preStatementType *preStatement, AnalyzedObjectsAsFortran *); } // namespace Fortran::parser diff --git a/flang/include/flang/Semantics/unparse-with-symbols.h b/flang/include/flang/Semantics/unparse-with-symbols.h index 5e18b3fc3063d..702911bbab627 100644 --- a/flang/include/flang/Semantics/unparse-with-symbols.h +++ b/flang/include/flang/Semantics/unparse-with-symbols.h @@ -16,6 +16,10 @@ n
[llvm-branch-commits] [flang] [flang][OpenMP] Pass OpenMP version to getOpenMPDirectiveName (PR #139131)
github-actions[bot] wrote: :warning: C/C++ code formatter, clang-format found issues in your code. :warning: You can test this locally with the following command: ``bash git-clang-format --diff HEAD~1 HEAD --extensions h,cpp -- flang/examples/FlangOmpReport/FlangOmpReportVisitor.cpp flang/include/flang/Parser/dump-parse-tree.h flang/include/flang/Parser/unparse.h flang/include/flang/Semantics/unparse-with-symbols.h flang/lib/Frontend/ParserActions.cpp flang/lib/Lower/OpenMP/ClauseProcessor.h flang/lib/Lower/OpenMP/Decomposer.cpp flang/lib/Lower/OpenMP/OpenMP.cpp flang/lib/Parser/openmp-parsers.cpp flang/lib/Parser/parse-tree.cpp flang/lib/Parser/unparse.cpp flang/lib/Semantics/check-omp-structure.cpp flang/lib/Semantics/mod-file.cpp flang/lib/Semantics/resolve-directives.cpp flang/lib/Semantics/unparse-with-symbols.cpp `` View the diff from clang-format here. ``diff diff --git a/flang/include/flang/Parser/unparse.h b/flang/include/flang/Parser/unparse.h index 349597213..d796109ca 100644 --- a/flang/include/flang/Parser/unparse.h +++ b/flang/include/flang/Parser/unparse.h @@ -51,18 +51,18 @@ struct AnalyzedObjectsAsFortran { // Converts parsed program (or fragment) to out as Fortran. template void Unparse(llvm::raw_ostream &out, const A &root, -const common::LangOptions &langOpts, -Encoding encoding = Encoding::UTF_8, bool capitalizeKeywords = true, -bool backslashEscapes = true, preStatementType *preStatement = nullptr, +const common::LangOptions &langOpts, Encoding encoding = Encoding::UTF_8, +bool capitalizeKeywords = true, bool backslashEscapes = true, +preStatementType *preStatement = nullptr, AnalyzedObjectsAsFortran * = nullptr); extern template void Unparse(llvm::raw_ostream &out, const Program &program, -const common::LangOptions &langOpts, -Encoding encoding, bool capitalizeKeywords, bool backslashEscapes, +const common::LangOptions &langOpts, Encoding encoding, +bool capitalizeKeywords, bool backslashEscapes, preStatementType *preStatement, AnalyzedObjectsAsFortran *); extern template void Unparse(llvm::raw_ostream &out, const Expr &expr, -const common::LangOptions &langOpts, -Encoding encoding, bool capitalizeKeywords, bool backslashEscapes, +const common::LangOptions &langOpts, Encoding encoding, +bool capitalizeKeywords, bool backslashEscapes, preStatementType *preStatement, AnalyzedObjectsAsFortran *); } // namespace Fortran::parser `` https://github.com/llvm/llvm-project/pull/139131 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][OpenMP] Pass OpenMP version to getOpenMPDirectiveName (PR #139131)
https://github.com/kparzysz updated https://github.com/llvm/llvm-project/pull/139131 >From 9566bf6fd60d2b4f1dac86f6646002b2541e6736 Mon Sep 17 00:00:00 2001 From: Krzysztof Parzyszek Date: Wed, 7 May 2025 15:32:28 -0500 Subject: [PATCH 1/2] [flang][OpenMP] Pass OpenMP version to getOpenMPDirectiveName The OpenMP version is stored in LangOptions in SemanticsContext. Use the fallback version where SemanticsContext is unavailable (mostly in case of debug dumps). RFC: https://discourse.llvm.org/t/rfc-alternative-spellings-of-openmp-directives/85507 --- .../FlangOmpReport/FlangOmpReportVisitor.cpp | 5 ++- flang/include/flang/Parser/dump-parse-tree.h | 5 ++- flang/include/flang/Parser/unparse.h | 7 .../flang/Semantics/unparse-with-symbols.h| 5 +++ flang/lib/Frontend/ParserActions.cpp | 3 +- flang/lib/Lower/OpenMP/ClauseProcessor.h | 4 +- flang/lib/Lower/OpenMP/Decomposer.cpp | 3 +- flang/lib/Lower/OpenMP/OpenMP.cpp | 6 ++- flang/lib/Parser/openmp-parsers.cpp | 3 +- flang/lib/Parser/parse-tree.cpp | 5 ++- flang/lib/Parser/unparse.cpp | 39 --- flang/lib/Semantics/check-omp-structure.cpp | 18 - flang/lib/Semantics/mod-file.cpp | 11 +++--- flang/lib/Semantics/resolve-directives.cpp| 25 +++- flang/lib/Semantics/unparse-with-symbols.cpp | 6 +-- 15 files changed, 93 insertions(+), 52 deletions(-) diff --git a/flang/examples/FlangOmpReport/FlangOmpReportVisitor.cpp b/flang/examples/FlangOmpReport/FlangOmpReportVisitor.cpp index dbbf86a6c6151..bf66151d59950 100644 --- a/flang/examples/FlangOmpReport/FlangOmpReportVisitor.cpp +++ b/flang/examples/FlangOmpReport/FlangOmpReportVisitor.cpp @@ -267,8 +267,9 @@ void OpenMPCounterVisitor::Post(const OmpScheduleClause::Kind &c) { "type=" + std::string{OmpScheduleClause::EnumToString(c)} + ";"; } void OpenMPCounterVisitor::Post(const OmpDirectiveNameModifier &c) { - clauseDetails += - "name_modifier=" + llvm::omp::getOpenMPDirectiveName(c.v).str() + ";"; + clauseDetails += "name_modifier=" + + llvm::omp::getOpenMPDirectiveName(c.v, llvm::omp::FallbackVersion).str() + + ";"; } void OpenMPCounterVisitor::Post(const OmpClause &c) { PostClauseCommon(normalize_clause_name(c.source.ToString())); diff --git a/flang/include/flang/Parser/dump-parse-tree.h b/flang/include/flang/Parser/dump-parse-tree.h index a3721bc8410ba..df9278697346f 100644 --- a/flang/include/flang/Parser/dump-parse-tree.h +++ b/flang/include/flang/Parser/dump-parse-tree.h @@ -17,6 +17,7 @@ #include "flang/Common/idioms.h" #include "flang/Common/indirection.h" #include "flang/Support/Fortran.h" +#include "llvm/Frontend/OpenMP/OMP.h" #include "llvm/Support/raw_ostream.h" #include #include @@ -545,8 +546,8 @@ class ParseTreeDumper { NODE(parser, OmpBeginSectionsDirective) NODE(parser, OmpBlockDirective) static std::string GetNodeName(const llvm::omp::Directive &x) { -return llvm::Twine( -"llvm::omp::Directive = ", llvm::omp::getOpenMPDirectiveName(x)) +return llvm::Twine("llvm::omp::Directive = ", +llvm::omp::getOpenMPDirectiveName(x, llvm::omp::FallbackVersion)) .str(); } NODE(parser, OmpClause) diff --git a/flang/include/flang/Parser/unparse.h b/flang/include/flang/Parser/unparse.h index 40094ecbc85e5..349597213d904 100644 --- a/flang/include/flang/Parser/unparse.h +++ b/flang/include/flang/Parser/unparse.h @@ -18,6 +18,10 @@ namespace llvm { class raw_ostream; } +namespace Fortran::common { +class LangOptions; +} + namespace Fortran::evaluate { struct GenericExprWrapper; struct GenericAssignmentWrapper; @@ -47,14 +51,17 @@ struct AnalyzedObjectsAsFortran { // Converts parsed program (or fragment) to out as Fortran. template void Unparse(llvm::raw_ostream &out, const A &root, +const common::LangOptions &langOpts, Encoding encoding = Encoding::UTF_8, bool capitalizeKeywords = true, bool backslashEscapes = true, preStatementType *preStatement = nullptr, AnalyzedObjectsAsFortran * = nullptr); extern template void Unparse(llvm::raw_ostream &out, const Program &program, +const common::LangOptions &langOpts, Encoding encoding, bool capitalizeKeywords, bool backslashEscapes, preStatementType *preStatement, AnalyzedObjectsAsFortran *); extern template void Unparse(llvm::raw_ostream &out, const Expr &expr, +const common::LangOptions &langOpts, Encoding encoding, bool capitalizeKeywords, bool backslashEscapes, preStatementType *preStatement, AnalyzedObjectsAsFortran *); } // namespace Fortran::parser diff --git a/flang/include/flang/Semantics/unparse-with-symbols.h b/flang/include/flang/Semantics/unparse-with-symbols.h index 5e18b3fc3063d..702911bbab627 100644 --- a/flang/include/flang/Semantics/unparse-with-symbols.h +++ b/flang/include/flang/Semantics/unparse-with-symbols.h @@ -16,6 +16,10 @@ n
[llvm-branch-commits] [BOLT] Compute section utilization in heatmap (PR #139193)
llvmbot wrote: @llvm/pr-subscribers-bolt Author: Amir Ayupov (aaupov) Changes Heatmap collects samples grouped by buckets. The size is configurable via `--block-size`, with 64 bytes as the default (X86 cache line size). Define section utilization as the number of buckets mapped to the section with non-zero samples divided by the total number of buckets covering the section. Note that for buckets that cross section boundaries, we will attribute the utilization to the first overlapping section. Test Plan: updated heatmap-preagg.test --- Full diff: https://github.com/llvm/llvm-project/pull/139193.diff 4 Files Affected: - (modified) bolt/include/bolt/Profile/Heatmap.h (+20-2) - (modified) bolt/lib/Profile/DataAggregator.cpp (+4-2) - (modified) bolt/lib/Profile/Heatmap.cpp (+46-21) - (modified) bolt/test/X86/heatmap-preagg.test (+11-9) ``diff diff --git a/bolt/include/bolt/Profile/Heatmap.h b/bolt/include/bolt/Profile/Heatmap.h index fc1e2cd30011e..c7b3d45fa5cc2 100644 --- a/bolt/include/bolt/Profile/Heatmap.h +++ b/bolt/include/bolt/Profile/Heatmap.h @@ -9,6 +9,7 @@ #ifndef BOLT_PROFILE_HEATMAP_H #define BOLT_PROFILE_HEATMAP_H +#include "llvm/ADT/StringMap.h" #include "llvm/ADT/StringRef.h" #include #include @@ -45,6 +46,10 @@ class Heatmap { /// Map section names to their address range. const std::vector TextSections; + uint64_t getNumBuckets(uint64_t Begin, uint64_t End) const { +return End / BucketSize + !!(End % BucketSize) - Begin / BucketSize; + }; + public: explicit Heatmap(uint64_t BucketSize = 4096, uint64_t MinAddress = 0, uint64_t MaxAddress = std::numeric_limits::max(), @@ -77,9 +82,22 @@ class Heatmap { void printCDF(raw_ostream &OS) const; - void printSectionHotness(StringRef Filename) const; + /// Struct describing individual section hotness. + struct SectionStats { +uint64_t Samples{0}; +uint64_t Buckets{0}; + }; + + /// Mapping from section name to associated \p SectionStats. Special entries: + /// - [total] for total stats, + /// - [unmapped] for samples outside any section, if non-zero. + using SectionStatsMap = StringMap; + + SectionStatsMap computeSectionStats() const; + + void printSectionHotness(const SectionStatsMap &, StringRef Filename) const; - void printSectionHotness(raw_ostream &OS) const; + void printSectionHotness(const SectionStatsMap &, raw_ostream &OS) const; size_t size() const { return Map.size(); } }; diff --git a/bolt/lib/Profile/DataAggregator.cpp b/bolt/lib/Profile/DataAggregator.cpp index a5ac87ee781b2..11850fab28bb8 100644 --- a/bolt/lib/Profile/DataAggregator.cpp +++ b/bolt/lib/Profile/DataAggregator.cpp @@ -1357,10 +1357,12 @@ std::error_code DataAggregator::printLBRHeatMap() { HM.printCDF(opts::OutputFilename); else HM.printCDF(opts::OutputFilename + ".csv"); + Heatmap::SectionStatsMap Stats = HM.computeSectionStats(); if (opts::OutputFilename == "-") -HM.printSectionHotness(opts::OutputFilename); +HM.printSectionHotness(Stats, opts::OutputFilename); else -HM.printSectionHotness(opts::OutputFilename + "-section-hotness.csv"); +HM.printSectionHotness(Stats, + opts::OutputFilename + "-section-hotness.csv"); return std::error_code(); } diff --git a/bolt/lib/Profile/Heatmap.cpp b/bolt/lib/Profile/Heatmap.cpp index c7821b3a1a15a..d3ff74f664046 100644 --- a/bolt/lib/Profile/Heatmap.cpp +++ b/bolt/lib/Profile/Heatmap.cpp @@ -284,23 +284,24 @@ void Heatmap::printCDF(raw_ostream &OS) const { Counts.clear(); } -void Heatmap::printSectionHotness(StringRef FileName) const { +void Heatmap::printSectionHotness(const Heatmap::SectionStatsMap &Stats, + StringRef FileName) const { std::error_code EC; raw_fd_ostream OS(FileName, EC, sys::fs::OpenFlags::OF_None); if (EC) { errs() << "error opening output file: " << EC.message() << '\n'; exit(1); } - printSectionHotness(OS); + printSectionHotness(Stats, OS); } -void Heatmap::printSectionHotness(raw_ostream &OS) const { +StringMap Heatmap::computeSectionStats() const { uint64_t NumTotalCounts = 0; - StringMap SectionHotness; + StringMap Stat; unsigned TextSectionIndex = 0; if (TextSections.empty()) -return; +return Stat; uint64_t UnmappedHotness = 0; auto RecordUnmappedBucket = [&](uint64_t Address, uint64_t Frequency) { @@ -312,37 +313,61 @@ void Heatmap::printSectionHotness(raw_ostream &OS) const { UnmappedHotness += Frequency; }; - for (const std::pair &KV : Map) { -NumTotalCounts += KV.second; + for (const auto [Bucket, Count] : Map) { +NumTotalCounts += Count; // We map an address bucket to the first section (lowest address) // overlapping with that bucket. -auto Address = KV.first * BucketSize; +auto Address = Bucket * BucketSize; while (TextSectionIndex < TextSections.size() && Address >= TextSections[
[llvm-branch-commits] [BOLT] Print .text scores in perf2bolt (PR #139194)
https://github.com/aaupov created https://github.com/llvm/llvm-project/pull/139194 Expose heatmap functionality of profile score computation for text section under a new option `--print-heatmap-stats`. This option collects and prints the following stats: - hotness is the percentage of samples attributed to the section, - utilization: percentage of executed buckets, - partition score: hotness times utilization, higher is better. Test Plan: updated per2bolt tests - pre-aggregated-perf.test: pre-aggregated data - bolt-address-translation-yaml.test: pre-aggregated + BOLTed input - perf_test.test: no-LBR perf data Rate limit ยท GitHub body { background-color: #f6f8fa; color: #24292e; font-family: -apple-system,BlinkMacSystemFont,Segoe UI,Helvetica,Arial,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol; font-size: 14px; line-height: 1.5; margin: 0; } .container { margin: 50px auto; max-width: 600px; text-align: center; padding: 0 24px; } a { color: #0366d6; text-decoration: none; } a:hover { text-decoration: underline; } h1 { line-height: 60px; font-size: 48px; font-weight: 300; margin: 0px; text-shadow: 0 1px 0 #fff; } p { color: rgba(0, 0, 0, 0.5); margin: 20px 0 40px; } ul { list-style: none; margin: 25px 0; padding: 0; } li { display: table-cell; font-weight: bold; width: 1%; } .logo { display: inline-block; margin-top: 35px; } .logo-img-2x { display: none; } @media only screen and (-webkit-min-device-pixel-ratio: 2), only screen and ( min--moz-device-pixel-ratio: 2), only screen and ( -o-min-device-pixel-ratio: 2/1), only screen and (min-device-pixel-ratio: 2), only screen and (min-resolution: 192dpi), only screen and (min-resolution: 2dppx) { .logo-img-1x { display: none; } .logo-img-2x { display: inline-block; } } #suggestions { margin-top: 35px; color: #ccc; } #suggestions a { color: #66; font-weight: 200; font-size: 14px; margin: 0 10px; } Whoa there! You have exceeded a secondary rate limit. Please wait a few minutes before you try again; in some cases this may take up to an hour. https://support.github.com/contact";>Contact Support โ https://githubstatus.com";>GitHub Status โ https://twitter.com/githubstatus";>@githubstatus ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [BOLT] Compute section utilization in heatmap (PR #139193)
https://github.com/aaupov created https://github.com/llvm/llvm-project/pull/139193 Heatmap collects samples grouped by buckets. The size is configurable via `--block-size`, with 64 bytes as the default (X86 cache line size). Define section utilization as the number of buckets mapped to the section with non-zero samples divided by the total number of buckets covering the section. Note that for buckets that cross section boundaries, we will attribute the utilization to the first overlapping section. Test Plan: updated heatmap-preagg.test ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [BOLT] Print .text scores in perf2bolt (PR #139194)
llvmbot wrote: @llvm/pr-subscribers-bolt Author: Amir Ayupov (aaupov) Changes Expose heatmap functionality of profile score computation for text section under a new option `--print-heatmap-stats`. This option collects and prints the following stats: - hotness is the percentage of samples attributed to the section, - utilization: percentage of executed buckets, - partition score: hotness times utilization, higher is better. Test Plan: updated per2bolt tests - pre-aggregated-perf.test: pre-aggregated data - bolt-address-translation-yaml.test: pre-aggregated + BOLTed input - perf_test.test: no-LBR perf data --- Full diff: https://github.com/llvm/llvm-project/pull/139194.diff 9 Files Affected: - (modified) bolt/include/bolt/Profile/DataAggregator.h (+6-1) - (modified) bolt/include/bolt/Profile/Heatmap.h (+2) - (modified) bolt/include/bolt/Utils/CommandLineOpts.h (+1) - (modified) bolt/lib/Profile/DataAggregator.cpp (+50-27) - (modified) bolt/lib/Profile/Heatmap.cpp (+9) - (modified) bolt/lib/Utils/CommandLineOpts.cpp (+5) - (modified) bolt/test/X86/bolt-address-translation-yaml.test (+2-1) - (modified) bolt/test/X86/pre-aggregated-perf.test (+2-1) - (modified) bolt/test/perf2bolt/perf_test.test (+5-2) ``diff diff --git a/bolt/include/bolt/Profile/DataAggregator.h b/bolt/include/bolt/Profile/DataAggregator.h index d66d198e37d61..ac036fe167eed 100644 --- a/bolt/include/bolt/Profile/DataAggregator.h +++ b/bolt/include/bolt/Profile/DataAggregator.h @@ -15,6 +15,7 @@ #define BOLT_PROFILE_DATA_AGGREGATOR_H #include "bolt/Profile/DataReader.h" +#include "bolt/Profile/Heatmap.h" #include "bolt/Profile/YAMLProfileWriter.h" #include "llvm/ADT/StringRef.h" #include "llvm/Support/Error.h" @@ -270,8 +271,10 @@ class DataAggregator : public DataReader { /// everything bool hasData() const { return !ParsingBuf.empty(); } + /// Build heat map based on LBR samples. + Expected buildHeatMap(); /// Print heat map based on LBR samples. - std::error_code printLBRHeatMap(); + void printHeatMap(const Heatmap::SectionStatsMap &, const Heatmap &) const; /// Parse a single perf sample containing a PID associated with a sequence of /// LBR entries. If the PID does not correspond to the binary we are looking @@ -473,6 +476,8 @@ class DataAggregator : public DataReader { void printBranchSamplesDiagnostics() const; void printBasicSamplesDiagnostics(uint64_t OutOfRangeSamples) const; void printBranchStacksDiagnostics(uint64_t IgnoredSamples) const; + void printHeatmapTextStats(const Heatmap &, + const Heatmap::SectionStatsMap &) const; public: /// If perf.data was collected without build ids, the buildid-list may contain diff --git a/bolt/include/bolt/Profile/Heatmap.h b/bolt/include/bolt/Profile/Heatmap.h index c7b3d45fa5cc2..bb073833ec9f7 100644 --- a/bolt/include/bolt/Profile/Heatmap.h +++ b/bolt/include/bolt/Profile/Heatmap.h @@ -88,6 +88,8 @@ class Heatmap { uint64_t Buckets{0}; }; + uint64_t getNumBuckets(StringRef Name) const; + /// Mapping from section name to associated \p SectionStats. Special entries: /// - [total] for total stats, /// - [unmapped] for samples outside any section, if non-zero. diff --git a/bolt/include/bolt/Utils/CommandLineOpts.h b/bolt/include/bolt/Utils/CommandLineOpts.h index 3de945f6a1507..b5a7be53e4189 100644 --- a/bolt/include/bolt/Utils/CommandLineOpts.h +++ b/bolt/include/bolt/Utils/CommandLineOpts.h @@ -44,6 +44,7 @@ extern llvm::cl::opt HeatmapBlock; extern llvm::cl::opt HeatmapMaxAddress; extern llvm::cl::opt HeatmapMinAddress; extern llvm::cl::opt HeatmapPrintMappings; +extern llvm::cl::opt HeatmapStats; extern llvm::cl::opt HotData; extern llvm::cl::opt HotFunctionsAtEnd; extern llvm::cl::opt HotText; diff --git a/bolt/lib/Profile/DataAggregator.cpp b/bolt/lib/Profile/DataAggregator.cpp index 11850fab28bb8..b0ad4c69e2334 100644 --- a/bolt/lib/Profile/DataAggregator.cpp +++ b/bolt/lib/Profile/DataAggregator.cpp @@ -508,21 +508,27 @@ Error DataAggregator::preprocessProfile(BinaryContext &BC) { errs() << "PERF2BOLT: failed to parse samples\n"; // Special handling for memory events - if (prepareToParse("mem events", MemEventsPPI, MemEventsErrorCallback)) -return Error::success(); - - if (const std::error_code EC = parseMemEvents()) -errs() << "PERF2BOLT: failed to parse memory events: " << EC.message() - << '\n'; + if (!prepareToParse("mem events", MemEventsPPI, MemEventsErrorCallback)) +if (const std::error_code EC = parseMemEvents()) + errs() << "PERF2BOLT: failed to parse memory events: " << EC.message() + << '\n'; deleteTempFiles(); heatmap: + if (!opts::HeatmapMode && !opts::HeatmapStats) +return Error::success(); + + Expected HM = buildHeatMap(); + if (!HM) +return HM.takeError(); + Heatmap::SectionStatsMap Stats = HM->computeSectionStats(); if (opts::HeatmapMode) { -if (std::error_
[llvm-branch-commits] MC: Emit symbols for R_X86_64_PLT32 relocation pointing to symbols with non-zero values. (PR #138795)
pcc wrote: Abandoning this and will address in the linker change (#138366) instead. https://github.com/llvm/llvm-project/pull/138795 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] [HLSL][RootSignature] Add optional parameters for RootConstants (PR #138007)
@@ -82,6 +82,8 @@ class RootSignatureParser { struct ParsedConstantParams { std::optional Reg; std::optional Num32BitConstants; +std::optional Space; V-FEXrt wrote: Yeah I think I agree with Finn here. Seems to map more directly to the source to treat space as a param https://github.com/llvm/llvm-project/pull/138007 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] [HLSL][RootSignature] Add optional parameters for RootConstants (PR #138007)
@@ -78,6 +78,13 @@ std::optional RootSignatureParser::parseRootConstants() { Constants.Reg = Params->Reg.value(); + // Fill in optional parameters + if (Params->Visibility.has_value()) V-FEXrt wrote: nit: ```suggestion if (Params->Visibility) ``` https://github.com/llvm/llvm-project/pull/138007 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] [HLSL][RootSignature] Add optional parameters for RootConstants (PR #138007)
https://github.com/V-FEXrt approved this pull request. https://github.com/llvm/llvm-project/pull/138007 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Form min3/max3 from minimumnum/maximumnum (PR #139137)
arsenm wrote: ### Merge activity * **May 9, 2:00 AM EDT**: A user started a stack merge that includes this pull request via [Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/139137). https://github.com/llvm/llvm-project/pull/139137 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Add minimumnum/maximumnum tests with amdgpu-ieee=0 (PR #139145)
arsenm wrote: ### Merge activity * **May 9, 2:00 AM EDT**: A user started a stack merge that includes this pull request via [Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/139145). https://github.com/llvm/llvm-project/pull/139145 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Add baseline tests for min3/max3 from minimumnum/maximumnum (PR #139136)
arsenm wrote: ### Merge activity * **May 9, 2:00 AM EDT**: A user started a stack merge that includes this pull request via [Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/139136). https://github.com/llvm/llvm-project/pull/139136 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Test more subtargets in minimumnum/maximumnum tests (PR #139144)
arsenm wrote: ### Merge activity * **May 9, 2:00 AM EDT**: A user started a stack merge that includes this pull request via [Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/139144). https://github.com/llvm/llvm-project/pull/139144 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Handle minimumnum/maximumnum in fneg combines (PR #139133)
https://github.com/rampitec approved this pull request. LGTM, although I do not see practical improvements in the tests. https://github.com/llvm/llvm-project/pull/139133 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Handle minimumnum/maximumnum in fneg combines (PR #139133)
@@ -2513,9 +2513,9 @@ define { float, float } @v_fneg_maximumnum_multi_use_maximumnum_f32_ieee(float % ; GCN-LABEL: v_fneg_maximumnum_multi_use_maximumnum_f32_ieee: ; GCN: ; %bb.0: ; GCN-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GCN-NEXT:v_mul_f32_e32 v1, 1.0, v1 -; GCN-NEXT:v_mul_f32_e32 v0, 1.0, v0 -; GCN-NEXT:v_min_f32_e64 v0, -v0, -v1 +; GCN-NEXT:v_mul_f32_e32 v1, -1.0, v1 +; GCN-NEXT:v_mul_f32_e32 v0, -1.0, v0 +; GCN-NEXT:v_min_f32_e32 v0, v0, v1 arsenm wrote: This is an encoding size improvement https://github.com/llvm/llvm-project/pull/139133 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [NFC] Refactoring MCDXBC to support out of order storage of root parameters (PR #137284)
@@ -274,27 +274,37 @@ void DXContainerWriter::writeParts(raw_ostream &OS) { RS.StaticSamplersOffset = P.RootSignature->StaticSamplersOffset; for (const auto &Param : P.RootSignature->Parameters) { -mcdxbc::RootParameter NewParam; -NewParam.Header = dxbc::RootParameterHeader{ -Param.Type, Param.Visibility, Param.Offset}; +auto Header = dxbc::RootParameterHeader{Param.Type, Param.Visibility, +Param.Offset}; switch (Param.Type) { case llvm::to_underlying(dxbc::RootParameterType::Constants32Bit): - NewParam.Constants.Num32BitValues = Param.Constants.Num32BitValues; - NewParam.Constants.RegisterSpace = Param.Constants.RegisterSpace; - NewParam.Constants.ShaderRegister = Param.Constants.ShaderRegister; + dxbc::RootConstants Constants; + Constants.Num32BitValues = Param.Constants.Num32BitValues; + Constants.RegisterSpace = Param.Constants.RegisterSpace; + Constants.ShaderRegister = Param.Constants.ShaderRegister; + RS.ParametersContainer.addParameter(Header, Constants); break; case llvm::to_underlying(dxbc::RootParameterType::SRV): case llvm::to_underlying(dxbc::RootParameterType::UAV): case llvm::to_underlying(dxbc::RootParameterType::CBV): - NewParam.Descriptor.RegisterSpace = Param.Descriptor.RegisterSpace; - NewParam.Descriptor.ShaderRegister = Param.Descriptor.ShaderRegister; - if (P.RootSignature->Version > 1) -NewParam.Descriptor.Flags = Param.Descriptor.getEncodedFlags(); + if (RS.Version == 1) { +dxbc::RST0::v0::RootDescriptor Descriptor; +Descriptor.RegisterSpace = Param.Descriptor.RegisterSpace; +Descriptor.ShaderRegister = Param.Descriptor.ShaderRegister; +RS.ParametersContainer.addParameter(Header, Descriptor); + } else { +dxbc::RST0::v1::RootDescriptor Descriptor; +Descriptor.RegisterSpace = Param.Descriptor.RegisterSpace; +Descriptor.ShaderRegister = Param.Descriptor.ShaderRegister; +Descriptor.Flags = Param.Descriptor.getEncodedFlags(); + RS.ParametersContainer.addParameter(Header, Descriptor); inbelic wrote: nit: ```suggestion RS.ParametersContainer.addParameter(Header, Descriptor); ``` https://github.com/llvm/llvm-project/pull/137284 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [clang][OpenMP] Pass OpenMP version to getOpenMPDirectiveName (PR #139115)
@@ -94,6 +94,10 @@ struct PrintingPolicy { /// The number of spaces to use to indent each line. unsigned Indentation : 8; + /// Version of the effective OpenMP spec (used to select directive name + /// spelling). + unsigned OpenMP : 8; + kparzysz wrote: Done https://github.com/llvm/llvm-project/pull/139115 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Test more subtargets in minimumnum/maximumnum tests (PR #139144)
arsenm wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/139144?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#139145** https://app.graphite.dev/github/pr/llvm/llvm-project/139145?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#139144** https://app.graphite.dev/github/pr/llvm/llvm-project/139144?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> ๐ https://app.graphite.dev/github/pr/llvm/llvm-project/139144?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#139137** https://app.graphite.dev/github/pr/llvm/llvm-project/139137?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#139136** https://app.graphite.dev/github/pr/llvm/llvm-project/139136?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#139133** https://app.graphite.dev/github/pr/llvm/llvm-project/139133?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#139132** https://app.graphite.dev/github/pr/llvm/llvm-project/139132?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn more about https://stacking.dev/?utm_source=stack-comment";>stacking. https://github.com/llvm/llvm-project/pull/139144 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Add minimumnum/maximumnum tests with amdgpu-ieee=0 (PR #139145)
arsenm wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/139145?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#139145** https://app.graphite.dev/github/pr/llvm/llvm-project/139145?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> ๐ https://app.graphite.dev/github/pr/llvm/llvm-project/139145?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#139144** https://app.graphite.dev/github/pr/llvm/llvm-project/139144?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#139137** https://app.graphite.dev/github/pr/llvm/llvm-project/139137?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#139136** https://app.graphite.dev/github/pr/llvm/llvm-project/139136?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#139133** https://app.graphite.dev/github/pr/llvm/llvm-project/139133?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#139132** https://app.graphite.dev/github/pr/llvm/llvm-project/139132?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn more about https://stacking.dev/?utm_source=stack-comment";>stacking. https://github.com/llvm/llvm-project/pull/139145 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Test more subtargets in minimumnum/maximumnum tests (PR #139144)
https://github.com/arsenm created https://github.com/llvm/llvm-project/pull/139144 None Rate limit ยท GitHub body { background-color: #f6f8fa; color: #24292e; font-family: -apple-system,BlinkMacSystemFont,Segoe UI,Helvetica,Arial,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol; font-size: 14px; line-height: 1.5; margin: 0; } .container { margin: 50px auto; max-width: 600px; text-align: center; padding: 0 24px; } a { color: #0366d6; text-decoration: none; } a:hover { text-decoration: underline; } h1 { line-height: 60px; font-size: 48px; font-weight: 300; margin: 0px; text-shadow: 0 1px 0 #fff; } p { color: rgba(0, 0, 0, 0.5); margin: 20px 0 40px; } ul { list-style: none; margin: 25px 0; padding: 0; } li { display: table-cell; font-weight: bold; width: 1%; } .logo { display: inline-block; margin-top: 35px; } .logo-img-2x { display: none; } @media only screen and (-webkit-min-device-pixel-ratio: 2), only screen and ( min--moz-device-pixel-ratio: 2), only screen and ( -o-min-device-pixel-ratio: 2/1), only screen and (min-device-pixel-ratio: 2), only screen and (min-resolution: 192dpi), only screen and (min-resolution: 2dppx) { .logo-img-1x { display: none; } .logo-img-2x { display: inline-block; } } #suggestions { margin-top: 35px; color: #ccc; } #suggestions a { color: #66; font-weight: 200; font-size: 14px; margin: 0 10px; } Whoa there! You have exceeded a secondary rate limit. Please wait a few minutes before you try again; in some cases this may take up to an hour. https://support.github.com/contact";>Contact Support โ https://githubstatus.com";>GitHub Status โ https://twitter.com/githubstatus";>@githubstatus ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Add minimumnum/maximumnum tests with amdgpu-ieee=0 (PR #139145)
https://github.com/arsenm created https://github.com/llvm/llvm-project/pull/139145 With the IEEE bit disabled, the hardware instructions have the same behavior as these operations. >From 207f1fdab531781f5bf3bf7393dd2f8011227321 Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Thu, 8 May 2025 22:09:16 +0200 Subject: [PATCH] AMDGPU: Add minimumnum/maximumnum tests with amdgpu-ieee=0 With the IEEE bit disabled, the hardware instructions have the same behavior as these operations. --- llvm/test/CodeGen/AMDGPU/maximumnum.ll | 483 + llvm/test/CodeGen/AMDGPU/minimumnum.ll | 483 + 2 files changed, 966 insertions(+) diff --git a/llvm/test/CodeGen/AMDGPU/maximumnum.ll b/llvm/test/CodeGen/AMDGPU/maximumnum.ll index df79534a0844e..f3ed13a737748 100644 --- a/llvm/test/CodeGen/AMDGPU/maximumnum.ll +++ b/llvm/test/CodeGen/AMDGPU/maximumnum.ll @@ -4202,3 +4202,486 @@ define <4 x double> @v_maximumnum_v4f64_nnan(<4 x double> %x, <4 x double> %y) { %result = call nnan <4 x double> @llvm.maximumnum.v4f64(<4 x double> %x, <4 x double> %y) ret <4 x double> %result } + +define half @v_maximumnum_f16_no_ieee(half %x, half %y) #0 { +; GFX7-LABEL: v_maximumnum_f16_no_ieee: +; GFX7: ; %bb.0: +; GFX7-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX7-NEXT:v_cvt_f16_f32_e32 v0, v0 +; GFX7-NEXT:v_cvt_f16_f32_e32 v1, v1 +; GFX7-NEXT:v_cvt_f32_f16_e32 v0, v0 +; GFX7-NEXT:v_cvt_f32_f16_e32 v1, v1 +; GFX7-NEXT:v_max_f32_e32 v0, v0, v1 +; GFX7-NEXT:s_setpc_b64 s[30:31] +; +; GFX8-LABEL: v_maximumnum_f16_no_ieee: +; GFX8: ; %bb.0: +; GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX8-NEXT:v_max_f16_e32 v1, v1, v1 +; GFX8-NEXT:v_max_f16_e32 v0, v0, v0 +; GFX8-NEXT:v_max_f16_e32 v0, v0, v1 +; GFX8-NEXT:s_setpc_b64 s[30:31] +; +; GFX9-LABEL: v_maximumnum_f16_no_ieee: +; GFX9: ; %bb.0: +; GFX9-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX9-NEXT:v_max_f16_e32 v1, v1, v1 +; GFX9-NEXT:v_max_f16_e32 v0, v0, v0 +; GFX9-NEXT:v_max_f16_e32 v0, v0, v1 +; GFX9-NEXT:s_setpc_b64 s[30:31] +; +; GFX10-LABEL: v_maximumnum_f16_no_ieee: +; GFX10: ; %bb.0: +; GFX10-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX10-NEXT:v_max_f16_e32 v1, v1, v1 +; GFX10-NEXT:v_max_f16_e32 v0, v0, v0 +; GFX10-NEXT:v_max_f16_e32 v0, v0, v1 +; GFX10-NEXT:s_setpc_b64 s[30:31] +; +; GFX11-TRUE16-LABEL: v_maximumnum_f16_no_ieee: +; GFX11-TRUE16: ; %bb.0: +; GFX11-TRUE16-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX11-TRUE16-NEXT:v_max_f16_e32 v0.h, v1.l, v1.l +; GFX11-TRUE16-NEXT:v_max_f16_e32 v0.l, v0.l, v0.l +; GFX11-TRUE16-NEXT:s_delay_alu instid0(VALU_DEP_1) +; GFX11-TRUE16-NEXT:v_max_f16_e32 v0.l, v0.l, v0.h +; GFX11-TRUE16-NEXT:s_setpc_b64 s[30:31] +; +; GFX11-FAKE16-LABEL: v_maximumnum_f16_no_ieee: +; GFX11-FAKE16: ; %bb.0: +; GFX11-FAKE16-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX11-FAKE16-NEXT:v_max_f16_e32 v1, v1, v1 +; GFX11-FAKE16-NEXT:v_max_f16_e32 v0, v0, v0 +; GFX11-FAKE16-NEXT:s_delay_alu instid0(VALU_DEP_1) +; GFX11-FAKE16-NEXT:v_max_f16_e32 v0, v0, v1 +; GFX11-FAKE16-NEXT:s_setpc_b64 s[30:31] +; +; GFX12-TRUE16-LABEL: v_maximumnum_f16_no_ieee: +; GFX12-TRUE16: ; %bb.0: +; GFX12-TRUE16-NEXT:s_wait_loadcnt_dscnt 0x0 +; GFX12-TRUE16-NEXT:s_wait_expcnt 0x0 +; GFX12-TRUE16-NEXT:s_wait_samplecnt 0x0 +; GFX12-TRUE16-NEXT:s_wait_bvhcnt 0x0 +; GFX12-TRUE16-NEXT:s_wait_kmcnt 0x0 +; GFX12-TRUE16-NEXT:v_max_num_f16_e32 v0.h, v1.l, v1.l +; GFX12-TRUE16-NEXT:v_max_num_f16_e32 v0.l, v0.l, v0.l +; GFX12-TRUE16-NEXT:s_delay_alu instid0(VALU_DEP_1) +; GFX12-TRUE16-NEXT:v_max_num_f16_e32 v0.l, v0.l, v0.h +; GFX12-TRUE16-NEXT:s_setpc_b64 s[30:31] +; +; GFX12-FAKE16-LABEL: v_maximumnum_f16_no_ieee: +; GFX12-FAKE16: ; %bb.0: +; GFX12-FAKE16-NEXT:s_wait_loadcnt_dscnt 0x0 +; GFX12-FAKE16-NEXT:s_wait_expcnt 0x0 +; GFX12-FAKE16-NEXT:s_wait_samplecnt 0x0 +; GFX12-FAKE16-NEXT:s_wait_bvhcnt 0x0 +; GFX12-FAKE16-NEXT:s_wait_kmcnt 0x0 +; GFX12-FAKE16-NEXT:v_max_num_f16_e32 v1, v1, v1 +; GFX12-FAKE16-NEXT:v_max_num_f16_e32 v0, v0, v0 +; GFX12-FAKE16-NEXT:s_delay_alu instid0(VALU_DEP_1) +; GFX12-FAKE16-NEXT:v_max_num_f16_e32 v0, v0, v1 +; GFX12-FAKE16-NEXT:s_setpc_b64 s[30:31] + %result = call half @llvm.maximumnum.f16(half %x, half %y) + ret half %result +} + +define half @v_maximumnum_f16_nan_no_ieee(half %x, half %y) #0 { +; GFX7-LABEL: v_maximumnum_f16_nan_no_ieee: +; GFX7: ; %bb.0: +; GFX7-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX7-NEXT:v_cvt_f16_f32_e32 v1, v1 +; GFX7-NEXT:v_cvt_f16_f32_e32 v0, v0 +; GFX7-NEXT:v_cvt_f32_f16_e32 v1, v1 +; GFX7-NEXT:v_cvt_f32_f16_e32 v0, v0 +; GFX7-NEXT:v_max_f32_e32 v0, v0, v1 +; GFX7-NEXT:s_setpc_b64 s[30:31] +; +; GFX8-LABEL: v_maximumnum_f16_nan_no_iee
[llvm-branch-commits] [llvm] AMDGPU: Add minimumnum/maximumnum tests with amdgpu-ieee=0 (PR #139145)
llvmbot wrote: @llvm/pr-subscribers-backend-amdgpu Author: Matt Arsenault (arsenm) Changes With the IEEE bit disabled, the hardware instructions have the same behavior as these operations. --- Patch is 37.84 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/139145.diff 2 Files Affected: - (modified) llvm/test/CodeGen/AMDGPU/maximumnum.ll (+483) - (modified) llvm/test/CodeGen/AMDGPU/minimumnum.ll (+483) ``diff diff --git a/llvm/test/CodeGen/AMDGPU/maximumnum.ll b/llvm/test/CodeGen/AMDGPU/maximumnum.ll index df79534a0844e..f3ed13a737748 100644 --- a/llvm/test/CodeGen/AMDGPU/maximumnum.ll +++ b/llvm/test/CodeGen/AMDGPU/maximumnum.ll @@ -4202,3 +4202,486 @@ define <4 x double> @v_maximumnum_v4f64_nnan(<4 x double> %x, <4 x double> %y) { %result = call nnan <4 x double> @llvm.maximumnum.v4f64(<4 x double> %x, <4 x double> %y) ret <4 x double> %result } + +define half @v_maximumnum_f16_no_ieee(half %x, half %y) #0 { +; GFX7-LABEL: v_maximumnum_f16_no_ieee: +; GFX7: ; %bb.0: +; GFX7-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX7-NEXT:v_cvt_f16_f32_e32 v0, v0 +; GFX7-NEXT:v_cvt_f16_f32_e32 v1, v1 +; GFX7-NEXT:v_cvt_f32_f16_e32 v0, v0 +; GFX7-NEXT:v_cvt_f32_f16_e32 v1, v1 +; GFX7-NEXT:v_max_f32_e32 v0, v0, v1 +; GFX7-NEXT:s_setpc_b64 s[30:31] +; +; GFX8-LABEL: v_maximumnum_f16_no_ieee: +; GFX8: ; %bb.0: +; GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX8-NEXT:v_max_f16_e32 v1, v1, v1 +; GFX8-NEXT:v_max_f16_e32 v0, v0, v0 +; GFX8-NEXT:v_max_f16_e32 v0, v0, v1 +; GFX8-NEXT:s_setpc_b64 s[30:31] +; +; GFX9-LABEL: v_maximumnum_f16_no_ieee: +; GFX9: ; %bb.0: +; GFX9-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX9-NEXT:v_max_f16_e32 v1, v1, v1 +; GFX9-NEXT:v_max_f16_e32 v0, v0, v0 +; GFX9-NEXT:v_max_f16_e32 v0, v0, v1 +; GFX9-NEXT:s_setpc_b64 s[30:31] +; +; GFX10-LABEL: v_maximumnum_f16_no_ieee: +; GFX10: ; %bb.0: +; GFX10-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX10-NEXT:v_max_f16_e32 v1, v1, v1 +; GFX10-NEXT:v_max_f16_e32 v0, v0, v0 +; GFX10-NEXT:v_max_f16_e32 v0, v0, v1 +; GFX10-NEXT:s_setpc_b64 s[30:31] +; +; GFX11-TRUE16-LABEL: v_maximumnum_f16_no_ieee: +; GFX11-TRUE16: ; %bb.0: +; GFX11-TRUE16-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX11-TRUE16-NEXT:v_max_f16_e32 v0.h, v1.l, v1.l +; GFX11-TRUE16-NEXT:v_max_f16_e32 v0.l, v0.l, v0.l +; GFX11-TRUE16-NEXT:s_delay_alu instid0(VALU_DEP_1) +; GFX11-TRUE16-NEXT:v_max_f16_e32 v0.l, v0.l, v0.h +; GFX11-TRUE16-NEXT:s_setpc_b64 s[30:31] +; +; GFX11-FAKE16-LABEL: v_maximumnum_f16_no_ieee: +; GFX11-FAKE16: ; %bb.0: +; GFX11-FAKE16-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX11-FAKE16-NEXT:v_max_f16_e32 v1, v1, v1 +; GFX11-FAKE16-NEXT:v_max_f16_e32 v0, v0, v0 +; GFX11-FAKE16-NEXT:s_delay_alu instid0(VALU_DEP_1) +; GFX11-FAKE16-NEXT:v_max_f16_e32 v0, v0, v1 +; GFX11-FAKE16-NEXT:s_setpc_b64 s[30:31] +; +; GFX12-TRUE16-LABEL: v_maximumnum_f16_no_ieee: +; GFX12-TRUE16: ; %bb.0: +; GFX12-TRUE16-NEXT:s_wait_loadcnt_dscnt 0x0 +; GFX12-TRUE16-NEXT:s_wait_expcnt 0x0 +; GFX12-TRUE16-NEXT:s_wait_samplecnt 0x0 +; GFX12-TRUE16-NEXT:s_wait_bvhcnt 0x0 +; GFX12-TRUE16-NEXT:s_wait_kmcnt 0x0 +; GFX12-TRUE16-NEXT:v_max_num_f16_e32 v0.h, v1.l, v1.l +; GFX12-TRUE16-NEXT:v_max_num_f16_e32 v0.l, v0.l, v0.l +; GFX12-TRUE16-NEXT:s_delay_alu instid0(VALU_DEP_1) +; GFX12-TRUE16-NEXT:v_max_num_f16_e32 v0.l, v0.l, v0.h +; GFX12-TRUE16-NEXT:s_setpc_b64 s[30:31] +; +; GFX12-FAKE16-LABEL: v_maximumnum_f16_no_ieee: +; GFX12-FAKE16: ; %bb.0: +; GFX12-FAKE16-NEXT:s_wait_loadcnt_dscnt 0x0 +; GFX12-FAKE16-NEXT:s_wait_expcnt 0x0 +; GFX12-FAKE16-NEXT:s_wait_samplecnt 0x0 +; GFX12-FAKE16-NEXT:s_wait_bvhcnt 0x0 +; GFX12-FAKE16-NEXT:s_wait_kmcnt 0x0 +; GFX12-FAKE16-NEXT:v_max_num_f16_e32 v1, v1, v1 +; GFX12-FAKE16-NEXT:v_max_num_f16_e32 v0, v0, v0 +; GFX12-FAKE16-NEXT:s_delay_alu instid0(VALU_DEP_1) +; GFX12-FAKE16-NEXT:v_max_num_f16_e32 v0, v0, v1 +; GFX12-FAKE16-NEXT:s_setpc_b64 s[30:31] + %result = call half @llvm.maximumnum.f16(half %x, half %y) + ret half %result +} + +define half @v_maximumnum_f16_nan_no_ieee(half %x, half %y) #0 { +; GFX7-LABEL: v_maximumnum_f16_nan_no_ieee: +; GFX7: ; %bb.0: +; GFX7-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX7-NEXT:v_cvt_f16_f32_e32 v1, v1 +; GFX7-NEXT:v_cvt_f16_f32_e32 v0, v0 +; GFX7-NEXT:v_cvt_f32_f16_e32 v1, v1 +; GFX7-NEXT:v_cvt_f32_f16_e32 v0, v0 +; GFX7-NEXT:v_max_f32_e32 v0, v0, v1 +; GFX7-NEXT:s_setpc_b64 s[30:31] +; +; GFX8-LABEL: v_maximumnum_f16_nan_no_ieee: +; GFX8: ; %bb.0: +; GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX8-NEXT:v_max_f16_e32 v0, v0, v1 +; GFX8-NEXT:s_setpc_b64 s[30:31] +; +; GFX9-LABEL: v_maximumnum_
[llvm-branch-commits] [llvm] AMDGPU: Test more subtargets in minimumnum/maximumnum tests (PR #139144)
llvmbot wrote: @llvm/pr-subscribers-backend-amdgpu Author: Matt Arsenault (arsenm) Changes --- Patch is 137.50 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/139144.diff 2 Files Affected: - (modified) llvm/test/CodeGen/AMDGPU/maximumnum.ll (+1043-204) - (modified) llvm/test/CodeGen/AMDGPU/minimumnum.ll (+1022-199) ``diff diff --git a/llvm/test/CodeGen/AMDGPU/maximumnum.ll b/llvm/test/CodeGen/AMDGPU/maximumnum.ll index 718a266f49f5d..df79534a0844e 100644 --- a/llvm/test/CodeGen/AMDGPU/maximumnum.ll +++ b/llvm/test/CodeGen/AMDGPU/maximumnum.ll @@ -1,6 +1,8 @@ ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5 +; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx700 < %s | FileCheck -check-prefix=GFX7 %s ; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx803 < %s | FileCheck -check-prefix=GFX8 %s -; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s | FileCheck -check-prefix=GFX9 %s +; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s | FileCheck -check-prefixes=GFX9,GFX900 %s +; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx950 < %s | FileCheck -check-prefixes=GFX9,GFX950 %s ; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1030 < %s | FileCheck -check-prefix=GFX10 %s ; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1100 -mattr=+real-true16 < %s | FileCheck -check-prefixes=GFX11,GFX11-TRUE16 %s ; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1100 -mattr=-real-true16 < %s | FileCheck -check-prefixes=GFX11,GFX11-FAKE16 %s @@ -8,6 +10,16 @@ ; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1200 -mattr=-real-true16 < %s | FileCheck -check-prefixes=GFX12,GFX12-FAKE16 %s define half @v_maximumnum_f16(half %x, half %y) { +; GFX7-LABEL: v_maximumnum_f16: +; GFX7: ; %bb.0: +; GFX7-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX7-NEXT:v_cvt_f16_f32_e32 v0, v0 +; GFX7-NEXT:v_cvt_f16_f32_e32 v1, v1 +; GFX7-NEXT:v_cvt_f32_f16_e32 v0, v0 +; GFX7-NEXT:v_cvt_f32_f16_e32 v1, v1 +; GFX7-NEXT:v_max_f32_e32 v0, v0, v1 +; GFX7-NEXT:s_setpc_b64 s[30:31] +; ; GFX8-LABEL: v_maximumnum_f16: ; GFX8: ; %bb.0: ; GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) @@ -80,6 +92,16 @@ define half @v_maximumnum_f16(half %x, half %y) { } define half @v_maximumnum_f16_nnan(half %x, half %y) { +; GFX7-LABEL: v_maximumnum_f16_nnan: +; GFX7: ; %bb.0: +; GFX7-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX7-NEXT:v_cvt_f16_f32_e32 v1, v1 +; GFX7-NEXT:v_cvt_f16_f32_e32 v0, v0 +; GFX7-NEXT:v_cvt_f32_f16_e32 v1, v1 +; GFX7-NEXT:v_cvt_f32_f16_e32 v0, v0 +; GFX7-NEXT:v_max_f32_e32 v0, v0, v1 +; GFX7-NEXT:s_setpc_b64 s[30:31] +; ; GFX8-LABEL: v_maximumnum_f16_nnan: ; GFX8: ; %bb.0: ; GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) @@ -134,6 +156,14 @@ define half @v_maximumnum_f16_nnan(half %x, half %y) { } define half @v_maximumnum_f16_1.0(half %x) { +; GFX7-LABEL: v_maximumnum_f16_1.0: +; GFX7: ; %bb.0: +; GFX7-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX7-NEXT:v_cvt_f16_f32_e32 v0, v0 +; GFX7-NEXT:v_cvt_f32_f16_e32 v0, v0 +; GFX7-NEXT:v_max_f32_e32 v0, 1.0, v0 +; GFX7-NEXT:s_setpc_b64 s[30:31] +; ; GFX8-LABEL: v_maximumnum_f16_1.0: ; GFX8: ; %bb.0: ; GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) @@ -199,6 +229,17 @@ define half @v_maximumnum_f16_1.0(half %x) { } define bfloat @v_maximumnum_bf16(bfloat %x, bfloat %y) { +; GFX7-LABEL: v_maximumnum_bf16: +; GFX7: ; %bb.0: +; GFX7-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX7-NEXT:v_mul_f32_e32 v0, 1.0, v0 +; GFX7-NEXT:v_mul_f32_e32 v1, 1.0, v1 +; GFX7-NEXT:v_and_b32_e32 v1, 0x, v1 +; GFX7-NEXT:v_and_b32_e32 v0, 0x, v0 +; GFX7-NEXT:v_max_f32_e32 v0, v0, v1 +; GFX7-NEXT:v_and_b32_e32 v0, 0x, v0 +; GFX7-NEXT:s_setpc_b64 s[30:31] +; ; GFX8-LABEL: v_maximumnum_bf16: ; GFX8: ; %bb.0: ; GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) @@ -231,36 +272,67 @@ define bfloat @v_maximumnum_bf16(bfloat %x, bfloat %y) { ; GFX8-NEXT:v_cndmask_b32_e32 v0, v3, v0, vcc ; GFX8-NEXT:s_setpc_b64 s[30:31] ; -; GFX9-LABEL: v_maximumnum_bf16: -; GFX9: ; %bb.0: -; GFX9-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX9-NEXT:v_lshlrev_b32_e32 v2, 16, v0 -; GFX9-NEXT:v_cmp_u_f32_e32 vcc, v2, v2 -; GFX9-NEXT:v_lshlrev_b32_e32 v3, 16, v1 -; GFX9-NEXT:v_cndmask_b32_e32 v0, v0, v1, vcc -; GFX9-NEXT:v_cmp_u_f32_e32 vcc, v3, v3 -; GFX9-NEXT:v_cndmask_b32_e32 v1, v1, v0, vcc -; GFX9-NEXT:v_lshlrev_b32_e32 v2, 16, v0 -; GFX9-NEXT:v_lshlrev_b32_e32 v3, 16, v1 -; GFX9-NEXT:v_cmp_gt_f32_e32 vcc, v2, v3 -; GFX9-NEXT:v_cndmask_b32_e32 v2, v1, v0, vcc -; GFX9-NEXT:v_lshlrev_b32_e32 v2, 16, v2 -; GFX9-NEXT:v_max_f32_e32 v2, v2, v2 -; GFX9-NEXT:v_bfe_u32 v3, v2, 16, 1 -; GFX9-NEXT:s_movk_i32 s4, 0x7
[llvm-branch-commits] [llvm] AMDGPU: Test more subtargets in minimumnum/maximumnum tests (PR #139144)
https://github.com/arsenm ready_for_review https://github.com/llvm/llvm-project/pull/139144 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Add baseline tests for min3/max3 from minimumnum/maximumnum (PR #139136)
https://github.com/rampitec approved this pull request. https://github.com/llvm/llvm-project/pull/139136 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Add minimumnum/maximumnum tests with amdgpu-ieee=0 (PR #139145)
https://github.com/arsenm ready_for_review https://github.com/llvm/llvm-project/pull/139145 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Form min3/max3 from minimumnum/maximumnum (PR #139137)
https://github.com/rampitec approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/139137 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Test more subtargets in minimumnum/maximumnum tests (PR #139144)
https://github.com/rampitec approved this pull request. https://github.com/llvm/llvm-project/pull/139144 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Add minimumnum/maximumnum tests with amdgpu-ieee=0 (PR #139145)
https://github.com/rampitec approved this pull request. https://github.com/llvm/llvm-project/pull/139145 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][OpenMP] Pass OpenMP version to getOpenMPDirectiveName (PR #139131)
https://github.com/kparzysz created https://github.com/llvm/llvm-project/pull/139131 The OpenMP version is stored in LangOptions in SemanticsContext. Use the fallback version where SemanticsContext is unavailable (mostly in case of debug dumps). RFC: https://discourse.llvm.org/t/rfc-alternative-spellings-of-openmp-directives/85507 Rate limit ยท GitHub body { background-color: #f6f8fa; color: #24292e; font-family: -apple-system,BlinkMacSystemFont,Segoe UI,Helvetica,Arial,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol; font-size: 14px; line-height: 1.5; margin: 0; } .container { margin: 50px auto; max-width: 600px; text-align: center; padding: 0 24px; } a { color: #0366d6; text-decoration: none; } a:hover { text-decoration: underline; } h1 { line-height: 60px; font-size: 48px; font-weight: 300; margin: 0px; text-shadow: 0 1px 0 #fff; } p { color: rgba(0, 0, 0, 0.5); margin: 20px 0 40px; } ul { list-style: none; margin: 25px 0; padding: 0; } li { display: table-cell; font-weight: bold; width: 1%; } .logo { display: inline-block; margin-top: 35px; } .logo-img-2x { display: none; } @media only screen and (-webkit-min-device-pixel-ratio: 2), only screen and ( min--moz-device-pixel-ratio: 2), only screen and ( -o-min-device-pixel-ratio: 2/1), only screen and (min-device-pixel-ratio: 2), only screen and (min-resolution: 192dpi), only screen and (min-resolution: 2dppx) { .logo-img-1x { display: none; } .logo-img-2x { display: inline-block; } } #suggestions { margin-top: 35px; color: #ccc; } #suggestions a { color: #66; font-weight: 200; font-size: 14px; margin: 0 10px; } Whoa there! You have exceeded a secondary rate limit. Please wait a few minutes before you try again; in some cases this may take up to an hour. https://support.github.com/contact";>Contact Support โ https://githubstatus.com";>GitHub Status โ https://twitter.com/githubstatus";>@githubstatus ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits