[llvm-branch-commits] [llvm] AMDGPU: Handle legal v2f16/v2bf16 atomicrmw fadd for global/flat (PR #95394)
@@ -1669,13 +1670,16 @@ defm : FlatSignedAtomicPatWithAddrSpace <"FLAT_ATOMIC_ADD_F32", "int_amdgcn_flat } let OtherPredicates = [HasAtomicFlatPkAdd16Insts] in { +// FIXME: These do not have signed offsets rampitec wrote: Can you just use FlatAtomicPat? https://github.com/llvm/llvm-project/pull/95394 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Handle legal v2f16/v2bf16 atomicrmw fadd for global/flat (PR #95394)
@@ -15931,6 +15931,26 @@ static OptimizationRemark emitAtomicRMWLegalRemark(const AtomicRMWInst *RMW) { << " operation at memory scope " << MemScope; } +static bool isHalf2OrBFloat2(Type *Ty) { rampitec wrote: Does the underlying type really matter? Is 2 x 16-bit type sufficient? https://github.com/llvm/llvm-project/pull/95394 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] AMDGPU: Remove ds atomic fadd intrinsics (PR #95396)
https://github.com/rampitec approved this pull request. LGTM contingent the plan to produce atomicrmw. https://github.com/llvm/llvm-project/pull/95396 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Handle legal v2f16/v2bf16 atomicrmw fadd for global/flat (PR #95394)
https://github.com/rampitec approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/95394 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Handle legal v2bf16 atomicrmw fadd for gfx12 (PR #95930)
@@ -1735,8 +1737,11 @@ defm : SIBufferAtomicPat<"SIbuffer_atomic_dec", i64, "BUFFER_ATOMIC_DEC_X2">; let OtherPredicates = [HasAtomicCSubNoRtnInsts] in defm : SIBufferAtomicPat<"SIbuffer_atomic_csub", i32, "BUFFER_ATOMIC_CSUB", ["noret"]>; -let SubtargetPredicate = isGFX12Plus in { +let SubtargetPredicate = HasAtomicBufferPkAddBF16Inst in { defm : SIBufferAtomicPat_Common<"SIbuffer_atomic_fadd", v2bf16, "BUFFER_ATOMIC_PK_ADD_BF16_VBUFFER">; rampitec wrote: Should it use OtherPredicates = [HasAtomicBufferPkAddBF16Inst] and SubtargetPredicate = isGFX12Plus because VBUFFER opcode is used? https://github.com/llvm/llvm-project/pull/95930 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Handle legal v2bf16 atomicrmw fadd for gfx12 (PR #95930)
@@ -743,6 +743,12 @@ def FeatureAtomicGlobalPkAddBF16Inst : SubtargetFeature<"atomic-global-pk-add-bf [FeatureFlatGlobalInsts] >; +def FeatureAtomicBufferPkAddBF16Inst : SubtargetFeature<"atomic-buffer-pk-add-bf16-inst", rampitec wrote: I believe it is above FeatureAtomicGlobalPkAddBF16Instin downstream. Can you fix the order here or there? https://github.com/llvm/llvm-project/pull/95930 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Codegen support for constrained multi-dword sloads (PR #96163)
@@ -886,26 +977,17 @@ multiclass SMRD_Pattern { def : GCNPat < (smrd_load (SMRDSgpr i64:$sbase, i32:$soffset)), (vt (!cast(Instr#"_SGPR") $sbase, $soffset, 0))> { -let OtherPredicates = [isNotGFX9Plus]; - } - def : GCNPat < -(smrd_load (SMRDSgpr i64:$sbase, i32:$soffset)), -(vt (!cast(Instr#"_SGPR_IMM") $sbase, $soffset, 0, 0))> { -let OtherPredicates = [isGFX9Plus]; +let OtherPredicates = [isGFX6GFX7]; } - // 4. SGPR+IMM offset + // 4. No offset def : GCNPat < -(smrd_load (SMRDSgprImm i64:$sbase, i32:$soffset, i32:$offset)), -(vt (!cast(Instr#"_SGPR_IMM") $sbase, $soffset, $offset, 0))> { -let OtherPredicates = [isGFX9Plus]; +(vt (smrd_load (i64 SReg_64:$sbase))), +(vt (!cast(Instr#"_IMM") i64:$sbase, 0, 0))> { +let OtherPredicates = [isGFX6GFX7]; } - // 5. No offset - def : GCNPat < -(vt (smrd_load (i64 SReg_64:$sbase))), -(vt (!cast(Instr#"_IMM") i64:$sbase, 0, 0)) - >; + defm : SMRD_Align_Pattern; rampitec wrote: You can avoid duplicating patterns for aligned case, you just need to check if xnack is on (and it is off before gfx8). I also do not see xnack checked anywhere. https://github.com/llvm/llvm-project/pull/96163 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)
@@ -1701,17 +1732,33 @@ unsigned SILoadStoreOptimizer::getNewOpcode(const CombineInfo &CI, return AMDGPU::S_BUFFER_LOAD_DWORDX8_SGPR_IMM; } case S_LOAD_IMM: -switch (Width) { -default: - return 0; -case 2: - return AMDGPU::S_LOAD_DWORDX2_IMM; -case 3: - return AMDGPU::S_LOAD_DWORDX3_IMM; -case 4: - return AMDGPU::S_LOAD_DWORDX4_IMM; -case 8: - return AMDGPU::S_LOAD_DWORDX8_IMM; +// For targets that support XNACK replay, use the constrained load opcode. +if (STI && STI->hasXnackReplay()) { + switch (Width) { rampitec wrote: You can check alignment on the first load if MMO is available and avoid producing _ec version if it is sufficient. https://github.com/llvm/llvm-project/pull/96162 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Add a subtarget feature for fine-grained remote memory support (PR #96442)
rampitec wrote: > We do statically know for some of the targets (mostly gfx12 and gfx940) that > it's supposed to work. This is the "scope downgrade" vs. "nop" cases in the > atomic support table Actually not, we do not know the bus. Moreover, we know this is opposite. https://github.com/llvm/llvm-project/pull/96442 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)
@@ -1701,17 +1732,33 @@ unsigned SILoadStoreOptimizer::getNewOpcode(const CombineInfo &CI, return AMDGPU::S_BUFFER_LOAD_DWORDX8_SGPR_IMM; } case S_LOAD_IMM: -switch (Width) { -default: - return 0; -case 2: - return AMDGPU::S_LOAD_DWORDX2_IMM; -case 3: - return AMDGPU::S_LOAD_DWORDX3_IMM; -case 4: - return AMDGPU::S_LOAD_DWORDX4_IMM; -case 8: - return AMDGPU::S_LOAD_DWORDX8_IMM; +// For targets that support XNACK replay, use the constrained load opcode. +if (STI && STI->hasXnackReplay()) { + switch (Width) { rampitec wrote: > > currently the alignment is picked from the first MMO and that'd definitely > > be smaller than the natural align requirement for the new load > > You don't know that - the alignment in the first MMO will be whatever > alignment the compiler could deduce, which could be large, e.g. if the > pointer used for the first load was known to have a large alignment. Moreover, it can easily be as large as a page. In a case of scalar load and kernarg. https://github.com/llvm/llvm-project/pull/96162 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Add subtarget feature for memory atomic fadd f64 (PR #96444)
rampitec wrote: Use it in a predicate when defining pseudos? https://github.com/llvm/llvm-project/pull/96444 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Add subtarget feature for global atomic fadd denormal support (PR #96443)
rampitec wrote: It is worse than that. It behaves differently depending on where atomic is executed. There is no single answer if this instruction supports denorms or not. https://github.com/llvm/llvm-project/pull/96443 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Add subtarget feature for global atomic fadd denormal support (PR #96443)
https://github.com/rampitec approved this pull request. https://github.com/llvm/llvm-project/pull/96443 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Add subtarget feature for memory atomic fadd f64 (PR #96444)
https://github.com/rampitec approved this pull request. https://github.com/llvm/llvm-project/pull/96444 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Remove ds_fmin/ds_fmax intrinsics (PR #96739)
https://github.com/rampitec approved this pull request. https://github.com/llvm/llvm-project/pull/96739 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Enable vectorization of v2f16 copysign (PR #100799)
https://github.com/rampitec approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/100799 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Correct costs of saturating add/sub intrinsics (PR #100808)
https://github.com/rampitec approved this pull request. https://github.com/llvm/llvm-project/pull/100808 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Support VALU add instructions in localstackalloc (PR #101692)
@@ -809,7 +826,59 @@ int64_t SIRegisterInfo::getFrameIndexInstrOffset(const MachineInstr *MI, return getScratchInstrOffset(MI); } +static bool isFIPlusImmOrVGPR(const SIRegisterInfo &TRI, + const MachineInstr &MI) { + const MachineOperand &Src0 = MI.getOperand(1); rampitec wrote: Assert this is an add or move the function inside the needsFrameBaseReg? https://github.com/llvm/llvm-project/pull/101692 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Support VALU add instructions in localstackalloc (PR #101692)
@@ -797,6 +797,23 @@ int64_t SIRegisterInfo::getScratchInstrOffset(const MachineInstr *MI) const { int64_t SIRegisterInfo::getFrameIndexInstrOffset(const MachineInstr *MI, int Idx) const { + switch (MI->getOpcode()) { rampitec wrote: Bail if any modifiers are set? https://github.com/llvm/llvm-project/pull/101692 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Support VALU add instructions in localstackalloc (PR #101692)
@@ -877,6 +948,86 @@ Register SIRegisterInfo::materializeFrameBaseRegister(MachineBasicBlock *MBB, void SIRegisterInfo::resolveFrameIndex(MachineInstr &MI, Register BaseReg, int64_t Offset) const { const SIInstrInfo *TII = ST.getInstrInfo(); + + switch (MI.getOpcode()) { + case AMDGPU::V_ADD_U32_e32: + case AMDGPU::V_ADD_CO_U32_e32: { +MachineOperand *FIOp = &MI.getOperand(2); +MachineOperand *ImmOp = &MI.getOperand(1); +if (!FIOp->isFI()) + std::swap(FIOp, ImmOp); + +if (!ImmOp->isImm()) { + assert(Offset == 0); + FIOp->ChangeToRegister(BaseReg, false); + TII->legalizeOperandsVOP2(MI.getMF()->getRegInfo(), MI); + return; +} + +int64_t TotalOffset = ImmOp->getImm() + Offset; +if (TotalOffset == 0) { + MI.setDesc(TII->get(AMDGPU::COPY)); + for (unsigned I = MI.getNumOperands() - 1; I != 1; --I) +MI.removeOperand(I); + + MI.getOperand(1).ChangeToRegister(BaseReg, false); + return; +} + +ImmOp->setImm(TotalOffset); + +MachineBasicBlock *MBB = MI.getParent(); +MachineFunction *MF = MBB->getParent(); +MachineRegisterInfo &MRI = MF->getRegInfo(); + +// FIXME: materializeFrameBaseRegister does not know the register class of +// the uses of the frame index, and assumes SGPR for enableFlatScratch. Emit +// a copy so we have a legal operand and hope the register coalescer can +// clean it up. +if (isSGPRReg(MRI, BaseReg)) { + Register BaseRegVGPR = + MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass); + BuildMI(*MBB, MI, MI.getDebugLoc(), TII->get(AMDGPU::COPY), BaseRegVGPR) + .addReg(BaseReg); + MI.getOperand(2).ChangeToRegister(BaseRegVGPR, false); +} else { + MI.getOperand(2).ChangeToRegister(BaseReg, false); +} +return; + } + case AMDGPU::V_ADD_U32_e64: + case AMDGPU::V_ADD_CO_U32_e64: { +int Src0Idx = MI.getNumExplicitDefs(); rampitec wrote: Check that modifiers are clear? https://github.com/llvm/llvm-project/pull/101692 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Support VALU add instructions in localstackalloc (PR #101692)
@@ -797,6 +797,23 @@ int64_t SIRegisterInfo::getScratchInstrOffset(const MachineInstr *MI) const { int64_t SIRegisterInfo::getFrameIndexInstrOffset(const MachineInstr *MI, int Idx) const { + switch (MI->getOpcode()) { rampitec wrote: Ack https://github.com/llvm/llvm-project/pull/101692 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] InferAddressSpaces: Handle llvm.is.constant (PR #102010)
https://github.com/rampitec commented: Add some tests where argument is not a pointer? https://github.com/llvm/llvm-project/pull/102010 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] InferAddressSpaces: Handle masked load and store intrinsics (PR #102007)
https://github.com/rampitec approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/102007 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] InferAddressSpaces: Handle llvm.is.constant (PR #102010)
https://github.com/rampitec approved this pull request. LGTM modulo braces comment. https://github.com/llvm/llvm-project/pull/102010 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Fold frame indexes into s_or_b32 and s_and_b32 (PR #102345)
@@ -190,31 +186,31 @@ body: | ; MUBUFW64-LABEL: name: s_and_b32__sgpr__fi_literal_offset ; MUBUFW64: liveins: $sgpr8 ; MUBUFW64-NEXT: {{ $}} -; MUBUFW64-NEXT: $sgpr4 = S_LSHR_B32 $sgpr32, 6, implicit-def $scc -; MUBUFW64-NEXT: $sgpr4 = S_ADD_I32 killed $sgpr4, 80, implicit-def $scc -; MUBUFW64-NEXT: renamable $sgpr7 = S_AND_B32 $sgpr8, killed $sgpr4, implicit-def $scc +; MUBUFW64-NEXT: renamable $sgpr4 = S_LSHR_B32 $sgpr32, 6, implicit-def dead $scc +; MUBUFW64-NEXT: renamable $sgpr7 = S_ADD_I32 $sgpr4, $sgpr8, implicit-def $scc +; MUBUFW64-NEXT: renamable $sgpr7 = S_AND_B32 $sgpr7, 80, implicit-def $scc ; MUBUFW64-NEXT: SI_RETURN implicit $sgpr7, implicit $scc ; ; MUBUFW32-LABEL: name: s_and_b32__sgpr__fi_literal_offset ; MUBUFW32: liveins: $sgpr8 ; MUBUFW32-NEXT: {{ $}} -; MUBUFW32-NEXT: $sgpr4 = S_LSHR_B32 $sgpr32, 5, implicit-def $scc -; MUBUFW32-NEXT: $sgpr4 = S_ADD_I32 killed $sgpr4, 80, implicit-def $scc -; MUBUFW32-NEXT: renamable $sgpr7 = S_AND_B32 $sgpr8, killed $sgpr4, implicit-def $scc +; MUBUFW32-NEXT: renamable $sgpr4 = S_LSHR_B32 $sgpr32, 5, implicit-def dead $scc +; MUBUFW32-NEXT: renamable $sgpr7 = S_ADD_I32 $sgpr4, $sgpr8, implicit-def $scc +; MUBUFW32-NEXT: renamable $sgpr7 = S_AND_B32 $sgpr7, 80, implicit-def $scc ; MUBUFW32-NEXT: SI_RETURN implicit $sgpr7, implicit $scc ; ; FLATSCRW64-LABEL: name: s_and_b32__sgpr__fi_literal_offset ; FLATSCRW64: liveins: $sgpr8 ; FLATSCRW64-NEXT: {{ $}} -; FLATSCRW64-NEXT: $sgpr4 = S_ADD_I32 $sgpr32, 80, implicit-def $scc -; FLATSCRW64-NEXT: renamable $sgpr7 = S_AND_B32 $sgpr8, killed $sgpr4, implicit-def $scc +; FLATSCRW64-NEXT: renamable $sgpr7 = S_ADD_I32 $sgpr32, $sgpr8, implicit-def $scc +; FLATSCRW64-NEXT: renamable $sgpr7 = S_AND_B32 $sgpr7, 80, implicit-def $scc ; FLATSCRW64-NEXT: SI_RETURN implicit $sgpr7, implicit $scc ; ; FLATSCRW32-LABEL: name: s_and_b32__sgpr__fi_literal_offset ; FLATSCRW32: liveins: $sgpr8 ; FLATSCRW32-NEXT: {{ $}} -; FLATSCRW32-NEXT: $sgpr4 = S_ADD_I32 $sgpr32, 80, implicit-def $scc -; FLATSCRW32-NEXT: renamable $sgpr7 = S_AND_B32 $sgpr8, killed $sgpr4, implicit-def $scc +; FLATSCRW32-NEXT: renamable $sgpr7 = S_ADD_I32 $sgpr32, $sgpr8, implicit-def $scc rampitec wrote: I do not understand this. The transformation is `(s8 & (sp + 80)) ->((s8 + sp) & 80)` does not look immediately obvious. https://github.com/llvm/llvm-project/pull/102345 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Preserve atomicrmw name when specializing address space (PR #102470)
https://github.com/rampitec approved this pull request. https://github.com/llvm/llvm-project/pull/102470 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Add noalias.addrspace metadata when autoupgrading atomic intrinsics (PR #102599)
https://github.com/rampitec approved this pull request. https://github.com/llvm/llvm-project/pull/102599 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] NewPM/AMDGPU: Port AMDGPUPerfHintAnalysis to new pass manager (PR #102645)
@@ -22,6 +22,7 @@ MODULE_PASS("amdgpu-lower-buffer-fat-pointers", AMDGPULowerBufferFatPointersPass(*this)) MODULE_PASS("amdgpu-lower-ctor-dtor", AMDGPUCtorDtorLoweringPass()) MODULE_PASS("amdgpu-lower-module-lds", AMDGPULowerModuleLDSPass(*this)) +MODULE_PASS("amdgpu-perf-hint", AMDGPUPerfHintAnalysisPass(*static_cast(this))) rampitec wrote: Exceeds 80 chars per line. https://github.com/llvm/llvm-project/pull/102645 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] NewPM/AMDGPU: Port AMDGPUPerfHintAnalysis to new pass manager (PR #102645)
@@ -413,18 +439,57 @@ bool AMDGPUPerfHintAnalysis::runOnSCC(CallGraphSCC &SCC) { return Changed; } -bool AMDGPUPerfHintAnalysis::isMemoryBound(const Function *F) const { - auto FI = FIM.find(F); - if (FI == FIM.end()) -return false; +bool AMDGPUPerfHintAnalysis::run(const GCNTargetMachine &TM, + LazyCallGraph &CG) { - return AMDGPUPerfHint::isMemBound(FI->second); + SmallVector Worklist; + CG.buildRefSCCs(); + for (LazyCallGraph::RefSCC &RC : CG.postorder_ref_sccs()) { +for (LazyCallGraph::SCC &SCC : RC) { + if (SCC.size() != 1) +continue; + Function &F = SCC.begin()->getFunction(); + if (!F.isDeclaration() && !F.doesNotRecurse() && F.hasInternalLinkage()) rampitec wrote: Why is it limited to internal linkage? https://github.com/llvm/llvm-project/pull/102645 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU/NewPM: Port AMDGPUAnnotateUniformValues to new pass manager (PR #102654)
https://github.com/rampitec approved this pull request. https://github.com/llvm/llvm-project/pull/102654 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU/NewPM: Port SILowerI1Copies to new pass manager (PR #102663)
https://github.com/rampitec approved this pull request. https://github.com/llvm/llvm-project/pull/102663 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] CodeGen/NewPM: Add ExpandLarge* passes to isel IR passes (PR #102815)
https://github.com/rampitec approved this pull request. https://github.com/llvm/llvm-project/pull/102815 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU/NewPM: Start implementing addCodeGenPrepare (PR #102816)
https://github.com/rampitec approved this pull request. https://github.com/llvm/llvm-project/pull/102816 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Declare pass control flags in header (PR #102865)
rampitec wrote: > I don't really like needing to expose these globally like this; maybe it > would be better to just move TargetPassConfig and the CodeGenPassBuilder into > one common file? Yep, I also do not like extern cl::opt. https://github.com/llvm/llvm-project/pull/102865 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU/NewPM: Fill out passes in addCodeGenPrepare (PR #102867)
https://github.com/rampitec approved this pull request. https://github.com/llvm/llvm-project/pull/102867 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU/NewPM: Start filling out addIRPasses (PR #102884)
https://github.com/rampitec approved this pull request. https://github.com/llvm/llvm-project/pull/102884 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Fix sign confusion in performMulLoHiCombine (PR #106977)
https://github.com/rampitec approved this pull request. https://github.com/llvm/llvm-project/pull/106977 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 607bec0 - Change materializeFrameBaseRegister() to return register
Author: Stanislav Mekhanoshin Date: 2021-01-22T15:51:06-08:00 New Revision: 607bec0bb9f787acca95f53dabe6a5c227f6b6b2 URL: https://github.com/llvm/llvm-project/commit/607bec0bb9f787acca95f53dabe6a5c227f6b6b2 DIFF: https://github.com/llvm/llvm-project/commit/607bec0bb9f787acca95f53dabe6a5c227f6b6b2.diff LOG: Change materializeFrameBaseRegister() to return register The only caller of this function is in the LocalStackSlotAllocation and it creates base register of class returned by the target's getPointerRegClass(). AMDGPU wants to use a different reg class here so let materializeFrameBaseRegister to just create and return whatever it wants. Differential Revision: https://reviews.llvm.org/D95268 Added: Modified: llvm/include/llvm/CodeGen/TargetRegisterInfo.h llvm/lib/CodeGen/LocalStackSlotAllocation.cpp llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp llvm/lib/Target/AArch64/AArch64RegisterInfo.h llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp llvm/lib/Target/AMDGPU/SIRegisterInfo.h llvm/lib/Target/ARM/ARMBaseRegisterInfo.cpp llvm/lib/Target/ARM/ARMBaseRegisterInfo.h llvm/lib/Target/PowerPC/PPCRegisterInfo.cpp llvm/lib/Target/PowerPC/PPCRegisterInfo.h Removed: diff --git a/llvm/include/llvm/CodeGen/TargetRegisterInfo.h b/llvm/include/llvm/CodeGen/TargetRegisterInfo.h index 253f71cb5f1a..8790e2f09eb6 100644 --- a/llvm/include/llvm/CodeGen/TargetRegisterInfo.h +++ b/llvm/include/llvm/CodeGen/TargetRegisterInfo.h @@ -911,11 +911,11 @@ class TargetRegisterInfo : public MCRegisterInfo { return false; } - /// Insert defining instruction(s) for BaseReg to be a pointer to FrameIdx - /// before insertion point I. - virtual void materializeFrameBaseRegister(MachineBasicBlock *MBB, -Register BaseReg, int FrameIdx, -int64_t Offset) const { + /// Insert defining instruction(s) for a pointer to FrameIdx before + /// insertion point I. Return materialized frame pointer. + virtual Register materializeFrameBaseRegister(MachineBasicBlock *MBB, +int FrameIdx, +int64_t Offset) const { llvm_unreachable("materializeFrameBaseRegister does not exist on this " "target"); } diff --git a/llvm/lib/CodeGen/LocalStackSlotAllocation.cpp b/llvm/lib/CodeGen/LocalStackSlotAllocation.cpp index ec3cce3fa1f1..ec6e693e8a46 100644 --- a/llvm/lib/CodeGen/LocalStackSlotAllocation.cpp +++ b/llvm/lib/CodeGen/LocalStackSlotAllocation.cpp @@ -416,15 +416,16 @@ bool LocalStackSlotPass::insertFrameReferenceRegisters(MachineFunction &Fn) { const TargetRegisterClass *RC = TRI->getPointerRegClass(*MF); BaseReg = Fn.getRegInfo().createVirtualRegister(RC); - LLVM_DEBUG(dbgs() << " Materializing base register " << BaseReg + LLVM_DEBUG(dbgs() << " Materializing base register" << " at frame local offset " -<< LocalOffset + InstrOffset << "\n"); +<< LocalOffset + InstrOffset); // Tell the target to insert the instruction to initialize // the base register. //MachineBasicBlock::iterator InsertionPt = Entry->begin(); - TRI->materializeFrameBaseRegister(Entry, BaseReg, FrameIdx, -InstrOffset); + BaseReg = TRI->materializeFrameBaseRegister(Entry, FrameIdx, InstrOffset); + + LLVM_DEBUG(dbgs() << " into " << printReg(BaseReg, TRI) << '\n'); // The base register already includes any offset specified // by the instruction, so account for that so it doesn't get diff --git a/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp b/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp index 231e8b3089f6..f90856d14b2f 100644 --- a/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp +++ b/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp @@ -531,10 +531,10 @@ bool AArch64RegisterInfo::isFrameOffsetLegal(const MachineInstr *MI, /// Insert defining instruction(s) for BaseReg to be a pointer to FrameIdx /// at the beginning of the basic block. -void AArch64RegisterInfo::materializeFrameBaseRegister(MachineBasicBlock *MBB, - Register BaseReg, - int FrameIdx, - int64_t Offset) const { +Register +AArch64RegisterInfo::materializeFrameBaseRegister(MachineBasicBlock *MBB, + int FrameIdx, + int64_t Offset) const { MachineBasicBlock::iterator Ins = MBB->begin(); DebugLoc DL; // Defaults to "unknown" if (Ins != MBB->end()) @@ -544,6 +544,7 @@ void AArch64RegisterI
[llvm-branch-commits] [llvm] ca904b8 - [AMDGPU] Fix FP materialization/resolve with flat scratch
Author: Stanislav Mekhanoshin Date: 2021-01-22T16:06:47-08:00 New Revision: ca904b81e6488b45cbfe846dc86f1406b8e9c03d URL: https://github.com/llvm/llvm-project/commit/ca904b81e6488b45cbfe846dc86f1406b8e9c03d DIFF: https://github.com/llvm/llvm-project/commit/ca904b81e6488b45cbfe846dc86f1406b8e9c03d.diff LOG: [AMDGPU] Fix FP materialization/resolve with flat scratch Differential Revision: https://reviews.llvm.org/D95266 Added: Modified: llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp llvm/test/CodeGen/AMDGPU/flat-scratch.ll llvm/test/CodeGen/AMDGPU/local-stack-alloc-block-sp-reference.ll Removed: diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp b/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp index 8911917cffb0..7a45d8c54f9a 100644 --- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp +++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp @@ -417,7 +417,7 @@ bool SIRegisterInfo::needsFrameBaseReg(MachineInstr *MI, int64_t Offset) const { return !SIInstrInfo::isLegalMUBUFImmOffset(FullOffset); const SIInstrInfo *TII = ST.getInstrInfo(); - return TII->isLegalFLATOffset(FullOffset, AMDGPUAS::PRIVATE_ADDRESS, true); + return !TII->isLegalFLATOffset(FullOffset, AMDGPUAS::PRIVATE_ADDRESS, true); } Register SIRegisterInfo::materializeFrameBaseRegister(MachineBasicBlock *MBB, @@ -496,7 +496,6 @@ void SIRegisterInfo::resolveFrameIndex(MachineInstr &MI, Register BaseReg, MachineOperand *OffsetOp = TII->getNamedOperand(MI, AMDGPU::OpName::offset); int64_t NewOffset = OffsetOp->getImm() + Offset; -#ifndef NDEBUG assert(FIOp && FIOp->isFI() && "frame index must be address operand"); assert(TII->isMUBUF(MI) || TII->isFLATScratch(MI)); @@ -508,6 +507,7 @@ void SIRegisterInfo::resolveFrameIndex(MachineInstr &MI, Register BaseReg, return; } +#ifndef NDEBUG MachineOperand *SOffset = TII->getNamedOperand(MI, AMDGPU::OpName::soffset); assert(SOffset->isImm() && SOffset->getImm() == 0); #endif @@ -522,7 +522,7 @@ void SIRegisterInfo::resolveFrameIndex(MachineInstr &MI, Register BaseReg, bool SIRegisterInfo::isFrameOffsetLegal(const MachineInstr *MI, Register BaseReg, int64_t Offset) const { - if (!SIInstrInfo::isMUBUF(*MI) && !!SIInstrInfo::isFLATScratch(*MI)) + if (!SIInstrInfo::isMUBUF(*MI) && !SIInstrInfo::isFLATScratch(*MI)) return false; int64_t NewOffset = Offset + getScratchInstrOffset(MI); diff --git a/llvm/test/CodeGen/AMDGPU/flat-scratch.ll b/llvm/test/CodeGen/AMDGPU/flat-scratch.ll index 916c2d43a4c0..4244d8f4deb5 100644 --- a/llvm/test/CodeGen/AMDGPU/flat-scratch.ll +++ b/llvm/test/CodeGen/AMDGPU/flat-scratch.ll @@ -1185,7 +1185,7 @@ define amdgpu_kernel void @zero_init_large_offset_kernel() { ; GFX9-NEXT:s_add_u32 flat_scratch_lo, s0, s3 ; GFX9-NEXT:s_addc_u32 flat_scratch_hi, s1, 0 ; GFX9-NEXT:s_mov_b32 vcc_hi, 0 -; GFX9-NEXT:scratch_load_dword v0, off, vcc_hi offset:4 glc +; GFX9-NEXT:scratch_load_dword v0, off, vcc_hi offset:16 glc ; GFX9-NEXT:s_waitcnt vmcnt(0) ; GFX9-NEXT:s_mov_b32 s0, 0 ; GFX9-NEXT:s_mov_b32 s1, s0 @@ -1211,7 +1211,7 @@ define amdgpu_kernel void @zero_init_large_offset_kernel() { ; GFX10-NEXT:s_addc_u32 s1, s1, 0 ; GFX10-NEXT:s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s0 ; GFX10-NEXT:s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s1 -; GFX10-NEXT:scratch_load_dword v0, off, off offset:4 glc dlc +; GFX10-NEXT:scratch_load_dword v0, off, off offset:16 glc dlc ; GFX10-NEXT:s_waitcnt vmcnt(0) ; GFX10-NEXT:s_mov_b32 s0, 0 ; GFX10-NEXT:s_movk_i32 vcc_lo, 0x4010 @@ -1242,7 +1242,7 @@ define amdgpu_kernel void @zero_init_large_offset_kernel() { ; GFX9-PAL-NEXT:s_and_b32 s3, s3, 0x ; GFX9-PAL-NEXT:s_add_u32 flat_scratch_lo, s2, s1 ; GFX9-PAL-NEXT:s_addc_u32 flat_scratch_hi, s3, 0 -; GFX9-PAL-NEXT:scratch_load_dword v0, off, vcc_hi offset:4 glc +; GFX9-PAL-NEXT:scratch_load_dword v0, off, vcc_hi offset:16 glc ; GFX9-PAL-NEXT:s_waitcnt vmcnt(0) ; GFX9-PAL-NEXT:s_mov_b32 s1, s0 ; GFX9-PAL-NEXT:s_mov_b32 s2, s0 @@ -1272,7 +1272,7 @@ define amdgpu_kernel void @zero_init_large_offset_kernel() { ; GFX10-PAL-NEXT:s_addc_u32 s3, s3, 0 ; GFX10-PAL-NEXT:s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2 ; GFX10-PAL-NEXT:s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3 -; GFX10-PAL-NEXT:scratch_load_dword v0, off, off offset:4 glc dlc +; GFX10-PAL-NEXT:scratch_load_dword v0, off, off offset:16 glc dlc ; GFX10-PAL-NEXT:s_waitcnt vmcnt(0) ; GFX10-PAL-NEXT:s_mov_b32 s0, 0 ; GFX10-PAL-NEXT:s_movk_i32 vcc_lo, 0x4010 diff --git a/llvm/test/CodeGen/AMDGPU/local-stack-alloc-block-sp-reference.ll b/llvm/test/CodeGen/AMDGPU/local-stack-alloc-block-sp-reference.ll index 5e4b5f70de0b..4385500e71b1 100644 --- a/llvm/test/CodeGen/AMDGPU/local-st
[llvm-branch-commits] [llvm] eb66bf0 - [AMDGPU] Print SCRATCH_EN field after the kernel
Author: Stanislav Mekhanoshin Date: 2020-12-15T22:44:30-08:00 New Revision: eb66bf0802f96458b24a9c6eb9bd6451d8f90110 URL: https://github.com/llvm/llvm-project/commit/eb66bf0802f96458b24a9c6eb9bd6451d8f90110 DIFF: https://github.com/llvm/llvm-project/commit/eb66bf0802f96458b24a9c6eb9bd6451d8f90110.diff LOG: [AMDGPU] Print SCRATCH_EN field after the kernel Differential Revision: https://reviews.llvm.org/D93353 Added: Modified: llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp llvm/test/CodeGen/AMDGPU/GlobalISel/flat-scratch-init.ll Removed: diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp index a14f846b76d1..7ca049280744 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp @@ -538,6 +538,9 @@ bool AMDGPUAsmPrinter::runOnMachineFunction(MachineFunction &MF) { OutStreamer->emitRawComment( " WaveLimiterHint : " + Twine(MFI->needsWaveLimiter()), false); +OutStreamer->emitRawComment( + " COMPUTE_PGM_RSRC2:SCRATCH_EN: " + + Twine(G_00B84C_SCRATCH_EN(CurrentProgramInfo.ComputePGMRSrc2)), false); OutStreamer->emitRawComment( " COMPUTE_PGM_RSRC2:USER_SGPR: " + Twine(G_00B84C_USER_SGPR(CurrentProgramInfo.ComputePGMRSrc2)), false); diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/flat-scratch-init.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/flat-scratch-init.ll index 39029e359889..455c19fcdfc2 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/flat-scratch-init.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/flat-scratch-init.ll @@ -3,7 +3,14 @@ ; Make sure flat_scratch_init is set ; GCN-LABEL: {{^}}stack_object_addrspacecast_in_kernel_no_calls: -; GCN: .amdhsa_user_sgpr_flat_scratch_init 1 +; GCN: s_add_u32 flat_scratch_lo, s4, s7 +; GCN: s_addc_u32 flat_scratch_hi, s5, 0 +; GCN: flat_store_dword +; GCN: .amdhsa_user_sgpr_flat_scratch_init 1 +; GCN: .amdhsa_system_sgpr_private_segment_wavefront_offset +; GCN-NOT: .amdhsa_reserve_flat_scratch +; GCN: COMPUTE_PGM_RSRC2:SCRATCH_EN: 1 +; GCN: COMPUTE_PGM_RSRC2:USER_SGPR: 6 define amdgpu_kernel void @stack_object_addrspacecast_in_kernel_no_calls() { %alloca = alloca i32, addrspace(5) %cast = addrspacecast i32 addrspace(5)* %alloca to i32* @@ -13,7 +20,15 @@ define amdgpu_kernel void @stack_object_addrspacecast_in_kernel_no_calls() { ; TODO: Could optimize out in this case ; GCN-LABEL: {{^}}stack_object_in_kernel_no_calls: -; GCN: .amdhsa_user_sgpr_flat_scratch_init 1 +; GCN: s_add_u32 flat_scratch_lo, s4, s7 +; GCN: s_addc_u32 flat_scratch_hi, s5, 0 +; GCN: buffer_store_dword +; GCN: .amdhsa_user_sgpr_private_segment_buffer 1 +; GCN: .amdhsa_user_sgpr_flat_scratch_init 1 +; GCN: .amdhsa_system_sgpr_private_segment_wavefront_offset 1 +; GCN-NOT: .amdhsa_reserve_flat_scratch +; GCN: COMPUTE_PGM_RSRC2:SCRATCH_EN: 1 +; GCN: COMPUTE_PGM_RSRC2:USER_SGPR: 6 define amdgpu_kernel void @stack_object_in_kernel_no_calls() { %alloca = alloca i32, addrspace(5) store volatile i32 0, i32 addrspace(5)* %alloca @@ -21,7 +36,13 @@ define amdgpu_kernel void @stack_object_in_kernel_no_calls() { } ; GCN-LABEL: {{^}}kernel_no_calls_no_stack: -; GCN: .amdhsa_user_sgpr_flat_scratch_init 0 +; GCN-NOT: flat_scratch +; GCN: .amdhsa_user_sgpr_private_segment_buffer 1 +; GCN: .amdhsa_user_sgpr_flat_scratch_init 0 +; GCN: .amdhsa_system_sgpr_private_segment_wavefront_offset 0 +; GCN: .amdhsa_reserve_flat_scratch 0 +; GCN: COMPUTE_PGM_RSRC2:SCRATCH_EN: 0 +; GCN: COMPUTE_PGM_RSRC2:USER_SGPR: 4 define amdgpu_kernel void @kernel_no_calls_no_stack() { ret void } ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] ae8f4b2 - [AMDGPU] Folding of FI operand with flat scratch
Author: Stanislav Mekhanoshin Date: 2020-12-22T10:48:04-08:00 New Revision: ae8f4b2178c46da1f10eb9279c9b44fab8b85417 URL: https://github.com/llvm/llvm-project/commit/ae8f4b2178c46da1f10eb9279c9b44fab8b85417 DIFF: https://github.com/llvm/llvm-project/commit/ae8f4b2178c46da1f10eb9279c9b44fab8b85417.diff LOG: [AMDGPU] Folding of FI operand with flat scratch Differential Revision: https://reviews.llvm.org/D93501 Added: llvm/test/CodeGen/AMDGPU/flat-scratch-fold-fi.mir Modified: llvm/lib/Target/AMDGPU/SIFoldOperands.cpp llvm/lib/Target/AMDGPU/SIInstrInfo.h llvm/lib/Target/AMDGPU/SIInstrInfo.td llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp llvm/test/CodeGen/AMDGPU/frame-index-elimination.ll Removed: diff --git a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp index bfba432848d4..06cce54e540c 100644 --- a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp +++ b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp @@ -172,9 +172,23 @@ static bool frameIndexMayFold(const SIInstrInfo *TII, const MachineInstr &UseMI, int OpNo, const MachineOperand &OpToFold) { - return OpToFold.isFI() && -TII->isMUBUF(UseMI) && -OpNo == AMDGPU::getNamedOperandIdx(UseMI.getOpcode(), AMDGPU::OpName::vaddr); + if (!OpToFold.isFI()) +return false; + + if (TII->isMUBUF(UseMI)) +return OpNo == AMDGPU::getNamedOperandIdx(UseMI.getOpcode(), + AMDGPU::OpName::vaddr); + if (!TII->isFLATScratch(UseMI)) +return false; + + int SIdx = AMDGPU::getNamedOperandIdx(UseMI.getOpcode(), +AMDGPU::OpName::saddr); + if (OpNo == SIdx) +return true; + + int VIdx = AMDGPU::getNamedOperandIdx(UseMI.getOpcode(), +AMDGPU::OpName::vaddr); + return OpNo == VIdx && SIdx == -1; } FunctionPass *llvm::createSIFoldOperandsPass() { @@ -631,25 +645,36 @@ void SIFoldOperands::foldOperand( // Sanity check that this is a stack access. // FIXME: Should probably use stack pseudos before frame lowering. -if (TII->getNamedOperand(*UseMI, AMDGPU::OpName::srsrc)->getReg() != -MFI->getScratchRSrcReg()) - return; +if (TII->isMUBUF(*UseMI)) { + if (TII->getNamedOperand(*UseMI, AMDGPU::OpName::srsrc)->getReg() != + MFI->getScratchRSrcReg()) +return; -// Ensure this is either relative to the current frame or the current wave. -MachineOperand &SOff = -*TII->getNamedOperand(*UseMI, AMDGPU::OpName::soffset); -if ((!SOff.isReg() || SOff.getReg() != MFI->getStackPtrOffsetReg()) && -(!SOff.isImm() || SOff.getImm() != 0)) - return; + // Ensure this is either relative to the current frame or the current + // wave. + MachineOperand &SOff = + *TII->getNamedOperand(*UseMI, AMDGPU::OpName::soffset); + if ((!SOff.isReg() || SOff.getReg() != MFI->getStackPtrOffsetReg()) && + (!SOff.isImm() || SOff.getImm() != 0)) +return; + + // If this is relative to the current wave, update it to be relative to + // the current frame. + if (SOff.isImm()) +SOff.ChangeToRegister(MFI->getStackPtrOffsetReg(), false); +} // A frame index will resolve to a positive constant, so it should always be // safe to fold the addressing mode, even pre-GFX9. UseMI->getOperand(UseOpIdx).ChangeToFrameIndex(OpToFold.getIndex()); -// If this is relative to the current wave, update it to be relative to the -// current frame. -if (SOff.isImm()) - SOff.ChangeToRegister(MFI->getStackPtrOffsetReg(), false); +if (TII->isFLATScratch(*UseMI) && +AMDGPU::getNamedOperandIdx(UseMI->getOpcode(), + AMDGPU::OpName::vaddr) != -1) { + unsigned NewOpc = AMDGPU::getFlatScratchInstSSfromSV(UseMI->getOpcode()); + UseMI->setDesc(TII->get(NewOpc)); +} + return; } diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.h b/llvm/lib/Target/AMDGPU/SIInstrInfo.h index 4625cefa1e3e..75aedee1ec6b 100644 --- a/llvm/lib/Target/AMDGPU/SIInstrInfo.h +++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.h @@ -1184,6 +1184,9 @@ namespace AMDGPU { LLVM_READONLY int getFlatScratchInstSTfromSS(uint16_t Opcode); + LLVM_READONLY + int getFlatScratchInstSSfromSV(uint16_t Opcode); + const uint64_t RSRC_DATA_FORMAT = 0xf000LL; const uint64_t RSRC_ELEMENT_SIZE_SHIFT = (32 + 19); const uint64_t RSRC_INDEX_STRIDE_SHIFT = (32 + 21); diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.td b/llvm/lib/Target/AMDGPU/SIInstrInfo.td index 746d08b8ce0e..e48138e56d71 100644 --- a/llvm/lib/Target/AMDGPU/SIInstrInfo.td +++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.td @@ -2524,6 +2524,13 @@ def getFlatScratchInstSTfromSS : InstrMapping
[llvm-branch-commits] [llvm] ca4bf58 - [AMDGPU] Support unaligned flat scratch in TLI
Author: Stanislav Mekhanoshin Date: 2020-12-22T16:12:31-08:00 New Revision: ca4bf58e4ee5951473a861716193063c5ef83e9a URL: https://github.com/llvm/llvm-project/commit/ca4bf58e4ee5951473a861716193063c5ef83e9a DIFF: https://github.com/llvm/llvm-project/commit/ca4bf58e4ee5951473a861716193063c5ef83e9a.diff LOG: [AMDGPU] Support unaligned flat scratch in TLI Adjust SITargetLowering::allowsMisalignedMemoryAccessesImpl for unaligned flat scratch support. Mostly needed for global isel. Differential Revision: https://reviews.llvm.org/D93669 Added: Modified: llvm/lib/Target/AMDGPU/SIISelLowering.cpp llvm/test/CodeGen/AMDGPU/chain-hi-to-lo.ll llvm/test/CodeGen/AMDGPU/unaligned-load-store.ll llvm/test/Transforms/LoadStoreVectorizer/AMDGPU/adjust-alloca-alignment.ll Removed: diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp index 5fb1924bdd9f..81fdfa0343b3 100644 --- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp +++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp @@ -1470,12 +1470,21 @@ bool SITargetLowering::allowsMisalignedMemoryAccessesImpl( } } + if (AddrSpace == AMDGPUAS::PRIVATE_ADDRESS) { +bool AlignedBy4 = Alignment >= Align(4); +if (IsFast) + *IsFast = AlignedBy4; + +return AlignedBy4 || + Subtarget->enableFlatScratch() || + Subtarget->hasUnalignedScratchAccess(); + } + // FIXME: We have to be conservative here and assume that flat operations // will access scratch. If we had access to the IR function, then we // could determine if any private memory was used in the function. - if (!Subtarget->hasUnalignedScratchAccess() && - (AddrSpace == AMDGPUAS::PRIVATE_ADDRESS || - AddrSpace == AMDGPUAS::FLAT_ADDRESS)) { + if (AddrSpace == AMDGPUAS::FLAT_ADDRESS && + !Subtarget->hasUnalignedScratchAccess()) { bool AlignedBy4 = Alignment >= Align(4); if (IsFast) *IsFast = AlignedBy4; diff --git a/llvm/test/CodeGen/AMDGPU/chain-hi-to-lo.ll b/llvm/test/CodeGen/AMDGPU/chain-hi-to-lo.ll index 271f6c703980..8e37b413ddf5 100644 --- a/llvm/test/CodeGen/AMDGPU/chain-hi-to-lo.ll +++ b/llvm/test/CodeGen/AMDGPU/chain-hi-to-lo.ll @@ -271,16 +271,9 @@ define amdgpu_kernel void @vload2_private(i16 addrspace(1)* nocapture readonly % ; FLATSCR-NEXT:s_waitcnt vmcnt(0) ; FLATSCR-NEXT:scratch_store_short off, v0, vcc_hi offset:8 ; FLATSCR-NEXT:s_mov_b32 vcc_hi, 0 -; FLATSCR-NEXT:scratch_load_ushort v0, off, vcc_hi offset:4 +; FLATSCR-NEXT:scratch_load_dword v0, off, vcc_hi offset:4 ; FLATSCR-NEXT:s_mov_b32 vcc_hi, 0 -; FLATSCR-NEXT:scratch_load_ushort v3, off, vcc_hi offset:6 -; FLATSCR-NEXT:s_mov_b32 vcc_hi, 0 -; FLATSCR-NEXT:s_waitcnt vmcnt(1) -; FLATSCR-NEXT:v_and_b32_e32 v0, 0x, v0 -; FLATSCR-NEXT:s_waitcnt vmcnt(0) -; FLATSCR-NEXT:v_mov_b32_e32 v1, v3 -; FLATSCR-NEXT:scratch_load_short_d16_hi v1, off, vcc_hi offset:8 -; FLATSCR-NEXT:v_lshl_or_b32 v0, v3, 16, v0 +; FLATSCR-NEXT:scratch_load_dword v1, off, vcc_hi offset:6 ; FLATSCR-NEXT:s_waitcnt vmcnt(0) ; FLATSCR-NEXT:global_store_dwordx2 v2, v[0:1], s[2:3] ; FLATSCR-NEXT:s_endpgm diff --git a/llvm/test/CodeGen/AMDGPU/unaligned-load-store.ll b/llvm/test/CodeGen/AMDGPU/unaligned-load-store.ll index 5d5cfd318edf..645eead8c297 100644 --- a/llvm/test/CodeGen/AMDGPU/unaligned-load-store.ll +++ b/llvm/test/CodeGen/AMDGPU/unaligned-load-store.ll @@ -1,6 +1,7 @@ -; RUN: llc -march=amdgcn -verify-machineinstrs< %s | FileCheck -check-prefix=SI -check-prefix=ALIGNED %s -; RUN: llc -march=amdgcn -mcpu=bonaire -mattr=+unaligned-access-mode -verify-machineinstrs< %s | FileCheck -check-prefix=SI -check-prefix=UNALIGNED %s -; RUN: llc -march=amdgcn -mcpu=tonga -mattr=-flat-for-global -verify-machineinstrs< %s | FileCheck -check-prefix=SI -check-prefix=ALIGNED %s +; RUN: llc -march=amdgcn -verify-machineinstrs< %s | FileCheck -check-prefixes=SI,MUBUF,ALIGNED %s +; RUN: llc -march=amdgcn -mcpu=bonaire -mattr=+unaligned-access-mode -verify-machineinstrs< %s | FileCheck -check-prefixes=SI,MUBUF,UNALIGNED %s +; RUN: llc -march=amdgcn -mcpu=tonga -mattr=-flat-for-global -verify-machineinstrs< %s | FileCheck -check-prefixes=SI,MUBUF,ALIGNED %s +; RUN: llc -march=amdgcn -mcpu=gfx900 -mattr=-flat-for-global -amdgpu-enable-flat-scratch -verify-machineinstrs < %s | FileCheck -check-prefixes=SI,FLATSCR,ALIGNED %s ; SI-LABEL: {{^}}local_unaligned_load_store_i16: ; SI: ds_read_u8 @@ -602,64 +603,70 @@ define amdgpu_kernel void @local_store_align1_v16i8(<16 x i8> addrspace(3)* %out } ; SI-LABEL: {{^}}private_load_align1_f64: -; SI: buffer_load_ubyte -; SI: buffer_load_ubyte -; SI: buffer_load_ubyte -; SI: buffer_load_ubyte -; SI: buffer_load_ubyte -; SI: buffer_load_ubyte -; SI: buffer_load_ubyte -; SI: buffer_load_ubyte +; MUBUF: buffer_load_ubyte +; MUB
[llvm-branch-commits] [llvm] d15119a - [AMDGPU][GlobalISel] GlobalISel for flat scratch
Author: Stanislav Mekhanoshin Date: 2020-12-22T16:33:06-08:00 New Revision: d15119a02d92274cd7f779f4bb8485b1020110e0 URL: https://github.com/llvm/llvm-project/commit/d15119a02d92274cd7f779f4bb8485b1020110e0 DIFF: https://github.com/llvm/llvm-project/commit/d15119a02d92274cd7f779f4bb8485b1020110e0.diff LOG: [AMDGPU][GlobalISel] GlobalISel for flat scratch It does not seem to fold offsets but this is not specific to the flat scratch as getPtrBaseWithConstantOffset() does not return the split for these tests unlike its SDag counterpart. Differential Revision: https://reviews.llvm.org/D93670 Added: llvm/test/CodeGen/AMDGPU/GlobalISel/flat-scratch.ll Modified: llvm/lib/Target/AMDGPU/AMDGPUGISel.td llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.h llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp Removed: diff --git a/llvm/lib/Target/AMDGPU/AMDGPUGISel.td b/llvm/lib/Target/AMDGPU/AMDGPUGISel.td index 661b96a6a98e..bba03736d01a 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUGISel.td +++ b/llvm/lib/Target/AMDGPU/AMDGPUGISel.td @@ -85,6 +85,14 @@ def gi_mubuf_scratch_offen : GIComplexOperandMatcher, GIComplexPatternEquiv; +def gi_flat_scratch_offset : +GIComplexOperandMatcher, +GIComplexPatternEquiv; + +def gi_flat_scratch_saddr : +GIComplexOperandMatcher, +GIComplexPatternEquiv; + def gi_ds_1addr_1offset : GIComplexOperandMatcher, GIComplexPatternEquiv; diff --git a/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp b/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp index b157c03672d1..6c2ff0972ae5 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp @@ -3589,6 +3589,67 @@ AMDGPUInstructionSelector::selectGlobalSAddr(MachineOperand &Root) const { }}}; } +InstructionSelector::ComplexRendererFns +AMDGPUInstructionSelector::selectScratchSAddr(MachineOperand &Root) const { + Register Addr = Root.getReg(); + Register PtrBase; + int64_t ConstOffset; + int64_t ImmOffset = 0; + + // Match the immediate offset first, which canonically is moved as low as + // possible. + std::tie(PtrBase, ConstOffset) = getPtrBaseWithConstantOffset(Addr, *MRI); + + if (ConstOffset != 0 && + TII.isLegalFLATOffset(ConstOffset, AMDGPUAS::PRIVATE_ADDRESS, true)) { +Addr = PtrBase; +ImmOffset = ConstOffset; + } + + auto AddrDef = getDefSrcRegIgnoringCopies(Addr, *MRI); + if (!AddrDef) +return None; + + if (AddrDef->MI->getOpcode() == AMDGPU::G_FRAME_INDEX) { +int FI = AddrDef->MI->getOperand(1).getIndex(); +return {{ +[=](MachineInstrBuilder &MIB) { MIB.addFrameIndex(FI); }, // saddr +[=](MachineInstrBuilder &MIB) { MIB.addImm(ImmOffset); } // offset +}}; + } + + Register SAddr = AddrDef->Reg; + + if (AddrDef->MI->getOpcode() == AMDGPU::G_PTR_ADD) { +Register LHS = AddrDef->MI->getOperand(1).getReg(); +Register RHS = AddrDef->MI->getOperand(2).getReg(); +auto LHSDef = getDefSrcRegIgnoringCopies(LHS, *MRI); +auto RHSDef = getDefSrcRegIgnoringCopies(RHS, *MRI); + +if (LHSDef && RHSDef && +LHSDef->MI->getOpcode() == AMDGPU::G_FRAME_INDEX && +isSGPR(RHSDef->Reg)) { + int FI = LHSDef->MI->getOperand(1).getIndex(); + MachineInstr &I = *Root.getParent(); + MachineBasicBlock *BB = I.getParent(); + const DebugLoc &DL = I.getDebugLoc(); + SAddr = MRI->createVirtualRegister(&AMDGPU::SReg_32RegClass); + + BuildMI(*BB, &I, DL, TII.get(AMDGPU::S_ADD_U32), SAddr) +.addFrameIndex(FI) +.addReg(RHSDef->Reg); +} + } + + if (!isSGPR(SAddr)) +return None; + + return {{ + [=](MachineInstrBuilder &MIB) { MIB.addReg(SAddr); }, // saddr + [=](MachineInstrBuilder &MIB) { MIB.addImm(ImmOffset); } // offset + }}; +} + static bool isStackPtrRelative(const MachinePointerInfo &PtrInfo) { auto PSV = PtrInfo.V.dyn_cast(); return PSV && PSV->isStack(); diff --git a/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.h b/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.h index c575e7e9c8a5..c6b26ea70659 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.h +++ b/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.h @@ -200,6 +200,9 @@ class AMDGPUInstructionSelector final : public InstructionSelector { InstructionSelector::ComplexRendererFns selectGlobalSAddr(MachineOperand &Root) const; + InstructionSelector::ComplexRendererFns + selectScratchSAddr(MachineOperand &Root) const; + InstructionSelector::ComplexRendererFns selectMUBUFScratchOffen(MachineOperand &Root) const; InstructionSelector::ComplexRendererFns diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp index 9b39b86ae28f..28cd867d40be 100644 --- a/llvm/lib/Target/AMDGP
[llvm-branch-commits] [llvm] 747f67e - [AMDGPU] Fix adjustWritemask subreg handling
Author: Stanislav Mekhanoshin Date: 2020-12-23T14:43:31-08:00 New Revision: 747f67e034a924cf308f4c0f1bb6b1fa46bd9fbe URL: https://github.com/llvm/llvm-project/commit/747f67e034a924cf308f4c0f1bb6b1fa46bd9fbe DIFF: https://github.com/llvm/llvm-project/commit/747f67e034a924cf308f4c0f1bb6b1fa46bd9fbe.diff LOG: [AMDGPU] Fix adjustWritemask subreg handling If we happen to extract a non-dword subreg that breaks the logic of the function and it may shrink the dmask because it does not recognize the use of a lane(s). This bug is next to impossible to trigger with the current lowering in the BE, but it breaks in one of my future patches. Differential Revision: https://reviews.llvm.org/D93782 Added: Modified: llvm/lib/Target/AMDGPU/SIISelLowering.cpp Removed: diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp index 81fdfa0343b3..c7abc585d0d1 100644 --- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp +++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp @@ -10862,7 +10862,7 @@ SDValue SITargetLowering::PerformDAGCombine(SDNode *N, /// Helper function for adjustWritemask static unsigned SubIdx2Lane(unsigned Idx) { switch (Idx) { - default: return 0; + default: return ~0u; case AMDGPU::sub0: return 0; case AMDGPU::sub1: return 1; case AMDGPU::sub2: return 2; @@ -10922,6 +10922,8 @@ SDNode *SITargetLowering::adjustWritemask(MachineSDNode *&Node, // in OldDmask, so it can be any of X,Y,Z,W; Lane==1 is the second bit // set, etc. Lane = SubIdx2Lane(I->getConstantOperandVal(1)); +if (Lane == ~0u) + return Node; // Check if the use is for the TFE/LWE generated result at VGPRn+1. if (UsesTFC && Lane == TFCLane) { ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] dd89249 - [AMDGPU] Annotate vgpr<->agpr spills in asm
Author: Stanislav Mekhanoshin Date: 2020-12-07T11:25:25-08:00 New Revision: dd892494983a2e64d1e1eb3d05ce9577357336d2 URL: https://github.com/llvm/llvm-project/commit/dd892494983a2e64d1e1eb3d05ce9577357336d2 DIFF: https://github.com/llvm/llvm-project/commit/dd892494983a2e64d1e1eb3d05ce9577357336d2.diff LOG: [AMDGPU] Annotate vgpr<->agpr spills in asm Differential Revision: https://reviews.llvm.org/D92125 Added: Modified: llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp llvm/test/CodeGen/AMDGPU/spill-agpr.ll llvm/test/CodeGen/AMDGPU/spill-vgpr-to-agpr.ll Removed: diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp b/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp index 9d7a041390ca..18be7c23c94e 100644 --- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp +++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp @@ -697,8 +697,10 @@ static MachineInstrBuilder spillVGPRtoAGPR(const GCNSubtarget &ST, unsigned Opc = (IsStore ^ TRI->isVGPR(MRI, Reg)) ? AMDGPU::V_ACCVGPR_WRITE_B32 : AMDGPU::V_ACCVGPR_READ_B32; - return BuildMI(*MBB, MI, MI->getDebugLoc(), TII->get(Opc), Dst) - .addReg(Src, getKillRegState(IsKill)); + auto MIB = BuildMI(*MBB, MI, MI->getDebugLoc(), TII->get(Opc), Dst) + .addReg(Src, getKillRegState(IsKill)); + MIB->setAsmPrinterFlag(MachineInstr::ReloadReuse); + return MIB; } // This diff ers from buildSpillLoadStore by only scavenging a VGPR. It does not @@ -871,10 +873,12 @@ void SIRegisterInfo::buildSpillLoadStore(MachineBasicBlock::iterator MI, RS->setRegUsed(TmpReg); } if (IsStore) { - auto AccRead = BuildMI(*MBB, MI, DL, TII->get(AMDGPU::V_ACCVGPR_READ_B32), TmpReg) + auto AccRead = BuildMI(*MBB, MI, DL, + TII->get(AMDGPU::V_ACCVGPR_READ_B32), TmpReg) .addReg(SubReg, getKillRegState(IsKill)); if (NeedSuperRegDef) AccRead.addReg(ValueReg, RegState::ImplicitDefine); + AccRead->setAsmPrinterFlag(MachineInstr::ReloadReuse); } SubReg = TmpReg; } @@ -908,10 +912,12 @@ void SIRegisterInfo::buildSpillLoadStore(MachineBasicBlock::iterator MI, if (!IsAGPR && NeedSuperRegDef) MIB.addReg(ValueReg, RegState::ImplicitDefine); - if (!IsStore && TmpReg != AMDGPU::NoRegister) + if (!IsStore && TmpReg != AMDGPU::NoRegister) { MIB = BuildMI(*MBB, MI, DL, TII->get(AMDGPU::V_ACCVGPR_WRITE_B32), FinalReg) .addReg(TmpReg, RegState::Kill); +MIB->setAsmPrinterFlag(MachineInstr::ReloadReuse); + } } else { if (NeedSuperRegDef) MIB.addReg(ValueReg, RegState::ImplicitDefine); diff --git a/llvm/test/CodeGen/AMDGPU/spill-agpr.ll b/llvm/test/CodeGen/AMDGPU/spill-agpr.ll index 3e7b381a45fe..511d02a104b3 100644 --- a/llvm/test/CodeGen/AMDGPU/spill-agpr.ll +++ b/llvm/test/CodeGen/AMDGPU/spill-agpr.ll @@ -5,10 +5,10 @@ ; A2M-DAG:s_mov_b32 s{{[0-9]+}}, SCRATCH_RSRC_DWORD0 ; A2M-DAG:s_mov_b32 s{{[0-9]+}}, SCRATCH_RSRC_DWORD1 ; A2V-NOT:SCRATCH_RSRC -; GFX908-DAG: v_accvgpr_read_b32 v[[VSPILL:[0-9]+]], a0 +; GFX908-DAG: v_accvgpr_read_b32 v[[VSPILL:[0-9]+]], a0 ; Reload Reuse ; A2M:buffer_store_dword v[[VSPILL]], off, s[{{[0-9:]+}}], 0 offset:[[FI:[0-9]+]] ; 4-byte Folded Spill ; A2M:buffer_load_dword v[[VSPILL:[0-9]+]], off, s[{{[0-9:]+}}], 0 offset:[[FI]] ; 4-byte Folded Reload -; GFX908: v_accvgpr_write_b32 a{{[0-9]+}}, v[[VSPILL]] +; GFX908: v_accvgpr_write_b32 a{{[0-9]+}}, v[[VSPILL]] ; Reload Reuse ; A2V:ScratchSize: 0 define amdgpu_kernel void @max_24regs_32a_used(<16 x float> addrspace(1)* %arg, float addrspace(1)* %out) #0 { bb: @@ -34,10 +34,10 @@ bb: ; A2M-DAG:s_mov_b32 s{{[0-9]+}}, SCRATCH_RSRC_DWORD0 ; A2M-DAG:s_mov_b32 s{{[0-9]+}}, SCRATCH_RSRC_DWORD1 ; A2V-NOT:SCRATCH_RSRC -; GFX908-DAG: v_accvgpr_read_b32 v[[VSPILL:[0-9]+]], a{{[0-9]+}} +; GFX908-DAG: v_accvgpr_read_b32 v[[VSPILL:[0-9]+]], a{{[0-9]+}} ; Reload Reuse ; A2M:buffer_store_dword v[[VSPILL]], off, s[{{[0-9:]+}}], 0 offset:[[FI:[0-9]+]] ; 4-byte Folded Spill ; A2M:buffer_load_dword v[[VSPILL:[0-9]+]], off, s[{{[0-9:]+}}], 0 offset:[[FI]] ; 4-byte Folded Reload -; A2V:v_accvgpr_write_b32 a{{[0-9]+}}, v[[VSPILL]] +; A2V:v_accvgpr_write_b32 a{{[0-9]+}}, v[[VSPILL]] ; Reload Reuse ; A2V:ScratchSize: 0 define amdgpu_kernel void @max_12regs_13a_used(i32 %cond, <4 x float> addrspace(1)* %arg, <4 x float> addrspace(1)* %out) #2 { bb: @@ -55,8 +55,7 @@ use: st: %gep1 = getelementptr <4 x float>, <4 x float> addrspace(1)* %out, i64 16 %gep2 = getelementptr <4 x float>, <4 x float> addrspace(1)* %out, i64 32 - store <4 x float> %mai.1, <4 x float> addrspace(1)* %gep1 - store <4 x float> %mai.2, <4 x flo
[llvm-branch-commits] [llvm] 87d7757 - [SLP] Control maximum vectorization factor from TTI
Author: Stanislav Mekhanoshin Date: 2020-12-14T08:49:40-08:00 New Revision: 87d7757bbe14fed420092071ded3430072053316 URL: https://github.com/llvm/llvm-project/commit/87d7757bbe14fed420092071ded3430072053316 DIFF: https://github.com/llvm/llvm-project/commit/87d7757bbe14fed420092071ded3430072053316.diff LOG: [SLP] Control maximum vectorization factor from TTI D82227 has added a proper check to limit PHI vectorization to the maximum vector register size. That unfortunately resulted in at least a couple of regressions on SystemZ and x86. This change reverts PHI handling from D82227 and replaces it with a more general check in SLPVectorizerPass::tryToVectorizeList(). Moved to tryToVectorizeList() it allows to restart vectorization if initial chunk fails. However, this function is more general and handles not only PHI but everything which SLP handles. If vectorization factor would be limited to maximum vector register size it would limit much more vectorization than before leading to further regressions. Therefore a new TTI callback getMaximumVF() is added with the default 0 to preserve current behavior and limit nothing. Then targets can decide what is better for them. The callback gets ElementSize just like a similar getMinimumVF() function and the main opcode of the chain. The latter is to avoid regressions at least on the AMDGPU. We can have loads and stores up to 128 bit wide, and <2 x 16> bit vector math on some subtargets, where the rest shall not be vectorized. I.e. we need to differentiate based on the element size and operation itself. Differential Revision: https://reviews.llvm.org/D92059 Added: Modified: llvm/include/llvm/Analysis/TargetTransformInfo.h llvm/include/llvm/Analysis/TargetTransformInfoImpl.h llvm/lib/Analysis/TargetTransformInfo.cpp llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp llvm/test/Transforms/SLPVectorizer/AMDGPU/add_sub_sat.ll llvm/test/Transforms/SLPVectorizer/AMDGPU/round.ll llvm/test/Transforms/SLPVectorizer/slp-max-phi-size.ll Removed: diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h index 3ba77c9a8dc9..b9b9df35cdb0 100644 --- a/llvm/include/llvm/Analysis/TargetTransformInfo.h +++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h @@ -941,6 +941,11 @@ class TargetTransformInfo { /// applies when shouldMaximizeVectorBandwidth returns true. unsigned getMinimumVF(unsigned ElemWidth) const; + /// \return The maximum vectorization factor for types of given element + /// bit width and opcode, or 0 if there is no maximum VF. + /// Currently only used by the SLP vectorizer. + unsigned getMaximumVF(unsigned ElemWidth, unsigned Opcode) const; + /// \return True if it should be considered for address type promotion. /// \p AllowPromotionWithoutCommonHeader Set true if promoting \p I is /// profitable without finding other extensions fed by the same input. @@ -1498,6 +1503,7 @@ class TargetTransformInfo::Concept { virtual unsigned getMinVectorRegisterBitWidth() = 0; virtual bool shouldMaximizeVectorBandwidth(bool OptSize) const = 0; virtual unsigned getMinimumVF(unsigned ElemWidth) const = 0; + virtual unsigned getMaximumVF(unsigned ElemWidth, unsigned Opcode) const = 0; virtual bool shouldConsiderAddressTypePromotion( const Instruction &I, bool &AllowPromotionWithoutCommonHeader) = 0; virtual unsigned getCacheLineSize() const = 0; @@ -1917,6 +1923,9 @@ class TargetTransformInfo::Model final : public TargetTransformInfo::Concept { unsigned getMinimumVF(unsigned ElemWidth) const override { return Impl.getMinimumVF(ElemWidth); } + unsigned getMaximumVF(unsigned ElemWidth, unsigned Opcode) const override { +return Impl.getMaximumVF(ElemWidth, Opcode); + } bool shouldConsiderAddressTypePromotion( const Instruction &I, bool &AllowPromotionWithoutCommonHeader) override { return Impl.shouldConsiderAddressTypePromotion( diff --git a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h index b4847844cd0e..2c206094ac4a 100644 --- a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h +++ b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h @@ -356,6 +356,8 @@ class TargetTransformInfoImplBase { unsigned getMinimumVF(unsigned ElemWidth) const { return 0; } + unsigned getMaximumVF(unsigned ElemWidth, unsigned Opcode) const { return 0; } + bool shouldConsiderAddressTypePromotion(const Instruction &I, bool &AllowPromotionWithoutCommonHeader) { diff --git a/llvm/lib/Analysis/TargetTransformInfo.cpp b/llvm/lib/Analysis/TargetTransformInfo.cpp index f327d0cad426..086a212ee65b 100644 --- a/llvm/lib/Analysi
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Handle atomic sextload and zextload (PR #111721)
rampitec wrote: > > Missing test for buffer loads? > > Those are the gfx7 global cases. There aren't any atomic buffer load > intrinsics But patch adds several MUBUF_Pseudo_Load_Pats which are not covered by tests? https://github.com/llvm/llvm-project/pull/111721 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Fold more scalar operations on frame index to VALU (PR #115059)
https://github.com/rampitec approved this pull request. https://github.com/llvm/llvm-project/pull/115059 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Default to selecting frame indexes to SGPRs (PR #115060)
https://github.com/rampitec approved this pull request. https://github.com/llvm/llvm-project/pull/115060 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [AMDGPU] Simplify dpp builtin handling (PR #115090)
https://github.com/rampitec edited https://github.com/llvm/llvm-project/pull/115090 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Expand flat atomics that may access private memory (PR #109407)
rampitec wrote: > > Is it legal and defined behavior to target private memory with an atomic? > > In the IR it would have to be, and this is the expected behavior in OpenMP > and C++. It's UB in OpenCL, and UB in CUDA/HIP for old style atomics, but > defined for new std::atomic style cases Is there a plan that OpenCL and HIP FE will produce noalias metadata to avoid the expansion? https://github.com/llvm/llvm-project/pull/109407 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Expand flat atomics that may access private memory (PR #109407)
https://github.com/rampitec approved this pull request. Thanks. Can this be landed after https://github.com/llvm/llvm-project/pull/102462? https://github.com/llvm/llvm-project/pull/109407 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Expand flat atomics that may access private memory (PR #109407)
rampitec wrote: Is it legal and defined behavior to target private memory with an atomic? https://github.com/llvm/llvm-project/pull/109407 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Add baseline tests for cmpxchg custom expansion (PR #109408)
https://github.com/rampitec approved this pull request. https://github.com/llvm/llvm-project/pull/109408 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Add baseline tests for flat-may-alias private atomic expansions (PR #109406)
@@ -0,0 +1,6911 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py +; RUN: llc -mtriple=amdgcn -mcpu=bonaire < %s | FileCheck -check-prefix=GCN1 %s rampitec wrote: Why GCN1 and GCN2? GFX7 and GFX8 are easier to understand. https://github.com/llvm/llvm-project/pull/109406 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [AMDGPU] Simplify dpp builtin handling (PR #115090)
https://github.com/rampitec updated https://github.com/llvm/llvm-project/pull/115090 >From f3d99e4ae92e407ebc2ef3f6b8e4017b397d34eb Mon Sep 17 00:00:00 2001 From: Stanislav Mekhanoshin Date: Mon, 4 Nov 2024 12:28:07 -0800 Subject: [PATCH] [AMDGPU] Simplify dpp builtin handling DPP intrinsics can handle any type now, so no need to cast to integer. The caveat is that intrinsics only handle backend legal types, but it does not work with i8 for example. --- clang/lib/CodeGen/CGBuiltin.cpp | 23 ++- .../CodeGenOpenCL/builtins-amdgcn-gfx10.cl| 30 -- .../test/CodeGenOpenCL/builtins-amdgcn-vi.cl | 60 +++ 3 files changed, 38 insertions(+), 75 deletions(-) diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp index 5c3df5124517d6..8c0e76c9e8c3d7 100644 --- a/clang/lib/CodeGen/CGBuiltin.cpp +++ b/clang/lib/CodeGen/CGBuiltin.cpp @@ -19211,37 +19211,24 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, getContext().GetBuiltinType(BuiltinID, Error, &ICEArguments); assert(Error == ASTContext::GE_None && "Should not codegen an error"); llvm::Type *DataTy = ConvertType(E->getArg(0)->getType()); -unsigned Size = DataTy->getPrimitiveSizeInBits(); -llvm::Type *IntTy = -llvm::IntegerType::get(Builder.getContext(), std::max(Size, 32u)); Function *F = CGM.getIntrinsic(BuiltinID == AMDGPU::BI__builtin_amdgcn_mov_dpp8 ? Intrinsic::amdgcn_mov_dpp8 : Intrinsic::amdgcn_update_dpp, - IntTy); + DataTy); assert(E->getNumArgs() == 5 || E->getNumArgs() == 6 || E->getNumArgs() == 2); bool InsertOld = BuiltinID == AMDGPU::BI__builtin_amdgcn_mov_dpp; if (InsertOld) - Args.push_back(llvm::PoisonValue::get(IntTy)); -for (unsigned I = 0; I != E->getNumArgs(); ++I) { + Args.push_back(llvm::PoisonValue::get(DataTy)); +Args.push_back(EmitScalarOrConstFoldImmArg(ICEArguments, 0, E)); +for (unsigned I = 1; I != E->getNumArgs(); ++I) { llvm::Value *V = EmitScalarOrConstFoldImmArg(ICEArguments, I, E); - if (I < (BuiltinID == AMDGPU::BI__builtin_amdgcn_update_dpp ? 2u : 1u) && - Size < 32) { -if (!DataTy->isIntegerTy()) - V = Builder.CreateBitCast( - V, llvm::IntegerType::get(Builder.getContext(), Size)); -V = Builder.CreateZExtOrBitCast(V, IntTy); - } llvm::Type *ExpTy = F->getFunctionType()->getFunctionParamType(I + InsertOld); Args.push_back(Builder.CreateTruncOrBitCast(V, ExpTy)); } -Value *V = Builder.CreateCall(F, Args); -if (Size < 32 && !DataTy->isIntegerTy()) - V = Builder.CreateTrunc( - V, llvm::IntegerType::get(Builder.getContext(), Size)); -return Builder.CreateTruncOrBitCast(V, DataTy); +return Builder.CreateCall(F, Args); } case AMDGPU::BI__builtin_amdgcn_permlane16: case AMDGPU::BI__builtin_amdgcn_permlanex16: diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx10.cl b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx10.cl index a4054cba236dd2..7e4ee6f4a942db 100644 --- a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx10.cl +++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx10.cl @@ -36,45 +36,37 @@ void test_mov_dpp8_long(global long* out, long a) { } // CHECK-LABEL: @test_mov_dpp8_float( -// CHECK: %0 = bitcast float %a to i32 -// CHECK-NEXT: %1 = tail call{{.*}} i32 @llvm.amdgcn.mov.dpp8.i32(i32 %0, i32 1) -// CHECK-NEXT: store i32 %1, +// CHECK: %0 = tail call{{.*}} float @llvm.amdgcn.mov.dpp8.f32(float %a, i32 1) +// CHECK-NEXT: store float %0, void test_mov_dpp8_float(global float* out, float a) { *out = __builtin_amdgcn_mov_dpp8(a, 1); } // CHECK-LABEL: @test_mov_dpp8_double -// CHECK: %0 = bitcast double %x to i64 -// CHECK-NEXT: %1 = tail call{{.*}} i64 @llvm.amdgcn.mov.dpp8.i64(i64 %0, i32 1) -// CHECK-NEXT: store i64 %1, +// CHECK: %0 = tail call{{.*}} double @llvm.amdgcn.mov.dpp8.f64(double %x, i32 1) +// CHECK-NEXT: store double %0, void test_mov_dpp8_double(double x, global double *p) { *p = __builtin_amdgcn_mov_dpp8(x, 1); } // CHECK-LABEL: @test_mov_dpp8_short -// CHECK: %0 = zext i16 %x to i32 -// CHECK-NEXT: %1 = tail call{{.*}} i32 @llvm.amdgcn.mov.dpp8.i32(i32 %0, i32 1) -// CHECK-NEXT: %2 = trunc i32 %1 to i16 -// CHECK-NEXT: store i16 %2, +// CHECK: %0 = tail call{{.*}} i16 @llvm.amdgcn.mov.dpp8.i16(i16 %x, i32 1) +// CHECK-NEXT: store i16 %0, void test_mov_dpp8_short(short x, global short *p) { *p = __builtin_amdgcn_mov_dpp8(x, 1); } // CHECK-LABEL: @test_mov_dpp8_char -// CHECK: %0 = zext i8 %x to i32 -// CHECK-NEXT: %1 = tail call{{.*}} i32 @llvm.amdgcn.mov.dpp8.i32(i32 %0, i32 1) -// CHECK-NEXT: %2 = trunc i32 %1 to i8 -// CHECK-NEXT: store i8 %2, +// CHECK: %0 = tail call{{.*}} i8 @llvm.amdgcn.mov.dpp8.i8(i8 %x,
[llvm-branch-commits] [clang] [AMDGPU] Simplify dpp builtin handling (PR #115090)
https://github.com/rampitec updated https://github.com/llvm/llvm-project/pull/115090 >From 084e347f5fb6e9068313ad4dbc53b44c2d4cee69 Mon Sep 17 00:00:00 2001 From: Stanislav Mekhanoshin Date: Mon, 4 Nov 2024 12:28:07 -0800 Subject: [PATCH] [AMDGPU] Simplify dpp builtin handling DPP intrinsics can handle any type now, so no need to cast to integer. The caveat is that intrinsics only handle backend legal types, but it does not work with i8 for example. --- clang/lib/CodeGen/CGBuiltin.cpp | 23 ++- .../CodeGenOpenCL/builtins-amdgcn-gfx10.cl| 30 -- .../test/CodeGenOpenCL/builtins-amdgcn-vi.cl | 60 +++ 3 files changed, 38 insertions(+), 75 deletions(-) diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp index 82770a75af23e4..7e3e6463799fb6 100644 --- a/clang/lib/CodeGen/CGBuiltin.cpp +++ b/clang/lib/CodeGen/CGBuiltin.cpp @@ -19193,37 +19193,24 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, getContext().GetBuiltinType(BuiltinID, Error, &ICEArguments); assert(Error == ASTContext::GE_None && "Should not codegen an error"); llvm::Type *DataTy = ConvertType(E->getArg(0)->getType()); -unsigned Size = DataTy->getPrimitiveSizeInBits(); -llvm::Type *IntTy = -llvm::IntegerType::get(Builder.getContext(), std::max(Size, 32u)); Function *F = CGM.getIntrinsic(BuiltinID == AMDGPU::BI__builtin_amdgcn_mov_dpp8 ? Intrinsic::amdgcn_mov_dpp8 : Intrinsic::amdgcn_update_dpp, - IntTy); + DataTy); assert(E->getNumArgs() == 5 || E->getNumArgs() == 6 || E->getNumArgs() == 2); bool InsertOld = BuiltinID == AMDGPU::BI__builtin_amdgcn_mov_dpp; if (InsertOld) - Args.push_back(llvm::PoisonValue::get(IntTy)); -for (unsigned I = 0; I != E->getNumArgs(); ++I) { + Args.push_back(llvm::PoisonValue::get(DataTy)); +Args.push_back(EmitScalarOrConstFoldImmArg(ICEArguments, 0, E)); +for (unsigned I = 1; I != E->getNumArgs(); ++I) { llvm::Value *V = EmitScalarOrConstFoldImmArg(ICEArguments, I, E); - if (I < (BuiltinID == AMDGPU::BI__builtin_amdgcn_update_dpp ? 2u : 1u) && - Size < 32) { -if (!DataTy->isIntegerTy()) - V = Builder.CreateBitCast( - V, llvm::IntegerType::get(Builder.getContext(), Size)); -V = Builder.CreateZExtOrBitCast(V, IntTy); - } llvm::Type *ExpTy = F->getFunctionType()->getFunctionParamType(I + InsertOld); Args.push_back(Builder.CreateTruncOrBitCast(V, ExpTy)); } -Value *V = Builder.CreateCall(F, Args); -if (Size < 32 && !DataTy->isIntegerTy()) - V = Builder.CreateTrunc( - V, llvm::IntegerType::get(Builder.getContext(), Size)); -return Builder.CreateTruncOrBitCast(V, DataTy); +return Builder.CreateCall(F, Args); } case AMDGPU::BI__builtin_amdgcn_permlane16: case AMDGPU::BI__builtin_amdgcn_permlanex16: diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx10.cl b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx10.cl index a4054cba236dd2..7e4ee6f4a942db 100644 --- a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx10.cl +++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx10.cl @@ -36,45 +36,37 @@ void test_mov_dpp8_long(global long* out, long a) { } // CHECK-LABEL: @test_mov_dpp8_float( -// CHECK: %0 = bitcast float %a to i32 -// CHECK-NEXT: %1 = tail call{{.*}} i32 @llvm.amdgcn.mov.dpp8.i32(i32 %0, i32 1) -// CHECK-NEXT: store i32 %1, +// CHECK: %0 = tail call{{.*}} float @llvm.amdgcn.mov.dpp8.f32(float %a, i32 1) +// CHECK-NEXT: store float %0, void test_mov_dpp8_float(global float* out, float a) { *out = __builtin_amdgcn_mov_dpp8(a, 1); } // CHECK-LABEL: @test_mov_dpp8_double -// CHECK: %0 = bitcast double %x to i64 -// CHECK-NEXT: %1 = tail call{{.*}} i64 @llvm.amdgcn.mov.dpp8.i64(i64 %0, i32 1) -// CHECK-NEXT: store i64 %1, +// CHECK: %0 = tail call{{.*}} double @llvm.amdgcn.mov.dpp8.f64(double %x, i32 1) +// CHECK-NEXT: store double %0, void test_mov_dpp8_double(double x, global double *p) { *p = __builtin_amdgcn_mov_dpp8(x, 1); } // CHECK-LABEL: @test_mov_dpp8_short -// CHECK: %0 = zext i16 %x to i32 -// CHECK-NEXT: %1 = tail call{{.*}} i32 @llvm.amdgcn.mov.dpp8.i32(i32 %0, i32 1) -// CHECK-NEXT: %2 = trunc i32 %1 to i16 -// CHECK-NEXT: store i16 %2, +// CHECK: %0 = tail call{{.*}} i16 @llvm.amdgcn.mov.dpp8.i16(i16 %x, i32 1) +// CHECK-NEXT: store i16 %0, void test_mov_dpp8_short(short x, global short *p) { *p = __builtin_amdgcn_mov_dpp8(x, 1); } // CHECK-LABEL: @test_mov_dpp8_char -// CHECK: %0 = zext i8 %x to i32 -// CHECK-NEXT: %1 = tail call{{.*}} i32 @llvm.amdgcn.mov.dpp8.i32(i32 %0, i32 1) -// CHECK-NEXT: %2 = trunc i32 %1 to i8 -// CHECK-NEXT: store i8 %2, +// CHECK: %0 = tail call{{.*}} i8 @llvm.amdgcn.mov.dpp8.i8(i8 %x,
[llvm-branch-commits] [clang] [AMDGPU] Simplify dpp builtin handling (PR #115090)
https://github.com/rampitec updated https://github.com/llvm/llvm-project/pull/115090 >From 7ccac58706b2d7e54c8498818b560af490a70eac Mon Sep 17 00:00:00 2001 From: Stanislav Mekhanoshin Date: Mon, 4 Nov 2024 12:28:07 -0800 Subject: [PATCH] [AMDGPU] Simplify dpp builtin handling DPP intrinsics can handle any type now, so no need to cast to integer. The caveat is that intrinsics only handle backend legal types, but it does not work with i8 for example. --- clang/lib/CodeGen/CGBuiltin.cpp | 23 ++- .../CodeGenOpenCL/builtins-amdgcn-gfx10.cl| 30 -- .../test/CodeGenOpenCL/builtins-amdgcn-vi.cl | 60 +++ 3 files changed, 38 insertions(+), 75 deletions(-) diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp index 5c3df5124517d6..8c0e76c9e8c3d7 100644 --- a/clang/lib/CodeGen/CGBuiltin.cpp +++ b/clang/lib/CodeGen/CGBuiltin.cpp @@ -19211,37 +19211,24 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, getContext().GetBuiltinType(BuiltinID, Error, &ICEArguments); assert(Error == ASTContext::GE_None && "Should not codegen an error"); llvm::Type *DataTy = ConvertType(E->getArg(0)->getType()); -unsigned Size = DataTy->getPrimitiveSizeInBits(); -llvm::Type *IntTy = -llvm::IntegerType::get(Builder.getContext(), std::max(Size, 32u)); Function *F = CGM.getIntrinsic(BuiltinID == AMDGPU::BI__builtin_amdgcn_mov_dpp8 ? Intrinsic::amdgcn_mov_dpp8 : Intrinsic::amdgcn_update_dpp, - IntTy); + DataTy); assert(E->getNumArgs() == 5 || E->getNumArgs() == 6 || E->getNumArgs() == 2); bool InsertOld = BuiltinID == AMDGPU::BI__builtin_amdgcn_mov_dpp; if (InsertOld) - Args.push_back(llvm::PoisonValue::get(IntTy)); -for (unsigned I = 0; I != E->getNumArgs(); ++I) { + Args.push_back(llvm::PoisonValue::get(DataTy)); +Args.push_back(EmitScalarOrConstFoldImmArg(ICEArguments, 0, E)); +for (unsigned I = 1; I != E->getNumArgs(); ++I) { llvm::Value *V = EmitScalarOrConstFoldImmArg(ICEArguments, I, E); - if (I < (BuiltinID == AMDGPU::BI__builtin_amdgcn_update_dpp ? 2u : 1u) && - Size < 32) { -if (!DataTy->isIntegerTy()) - V = Builder.CreateBitCast( - V, llvm::IntegerType::get(Builder.getContext(), Size)); -V = Builder.CreateZExtOrBitCast(V, IntTy); - } llvm::Type *ExpTy = F->getFunctionType()->getFunctionParamType(I + InsertOld); Args.push_back(Builder.CreateTruncOrBitCast(V, ExpTy)); } -Value *V = Builder.CreateCall(F, Args); -if (Size < 32 && !DataTy->isIntegerTy()) - V = Builder.CreateTrunc( - V, llvm::IntegerType::get(Builder.getContext(), Size)); -return Builder.CreateTruncOrBitCast(V, DataTy); +return Builder.CreateCall(F, Args); } case AMDGPU::BI__builtin_amdgcn_permlane16: case AMDGPU::BI__builtin_amdgcn_permlanex16: diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx10.cl b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx10.cl index a4054cba236dd2..7e4ee6f4a942db 100644 --- a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx10.cl +++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx10.cl @@ -36,45 +36,37 @@ void test_mov_dpp8_long(global long* out, long a) { } // CHECK-LABEL: @test_mov_dpp8_float( -// CHECK: %0 = bitcast float %a to i32 -// CHECK-NEXT: %1 = tail call{{.*}} i32 @llvm.amdgcn.mov.dpp8.i32(i32 %0, i32 1) -// CHECK-NEXT: store i32 %1, +// CHECK: %0 = tail call{{.*}} float @llvm.amdgcn.mov.dpp8.f32(float %a, i32 1) +// CHECK-NEXT: store float %0, void test_mov_dpp8_float(global float* out, float a) { *out = __builtin_amdgcn_mov_dpp8(a, 1); } // CHECK-LABEL: @test_mov_dpp8_double -// CHECK: %0 = bitcast double %x to i64 -// CHECK-NEXT: %1 = tail call{{.*}} i64 @llvm.amdgcn.mov.dpp8.i64(i64 %0, i32 1) -// CHECK-NEXT: store i64 %1, +// CHECK: %0 = tail call{{.*}} double @llvm.amdgcn.mov.dpp8.f64(double %x, i32 1) +// CHECK-NEXT: store double %0, void test_mov_dpp8_double(double x, global double *p) { *p = __builtin_amdgcn_mov_dpp8(x, 1); } // CHECK-LABEL: @test_mov_dpp8_short -// CHECK: %0 = zext i16 %x to i32 -// CHECK-NEXT: %1 = tail call{{.*}} i32 @llvm.amdgcn.mov.dpp8.i32(i32 %0, i32 1) -// CHECK-NEXT: %2 = trunc i32 %1 to i16 -// CHECK-NEXT: store i16 %2, +// CHECK: %0 = tail call{{.*}} i16 @llvm.amdgcn.mov.dpp8.i16(i16 %x, i32 1) +// CHECK-NEXT: store i16 %0, void test_mov_dpp8_short(short x, global short *p) { *p = __builtin_amdgcn_mov_dpp8(x, 1); } // CHECK-LABEL: @test_mov_dpp8_char -// CHECK: %0 = zext i8 %x to i32 -// CHECK-NEXT: %1 = tail call{{.*}} i32 @llvm.amdgcn.mov.dpp8.i32(i32 %0, i32 1) -// CHECK-NEXT: %2 = trunc i32 %1 to i8 -// CHECK-NEXT: store i8 %2, +// CHECK: %0 = tail call{{.*}} i8 @llvm.amdgcn.mov.dpp8.i8(i8 %x,
[llvm-branch-commits] [clang] [AMDGPU] Simplify dpp builtin handling (PR #115090)
rampitec wrote: > Should also teach instcombine to fold bitcast + app It still needs downstack change to handle i8: https://github.com/llvm/llvm-project/pull/114887 https://github.com/llvm/llvm-project/pull/115090 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Add baseline test for treating v_pk_mov_b32 like reg_sequence (PR #125656)
https://github.com/rampitec approved this pull request. https://github.com/llvm/llvm-project/pull/125656 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Custom lower 32-bit element shuffles (PR #123711)
rampitec wrote: Is there any way at all to test it? https://github.com/llvm/llvm-project/pull/123711 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Custom lower 32-bit element shuffles (PR #123711)
https://github.com/rampitec approved this pull request. https://github.com/llvm/llvm-project/pull/123711 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Custom lower 32-bit element shuffles (PR #123711)
rampitec wrote: > > Is there any way at all to test it? > > Many shuffle tests were added in > [7786266](https://github.com/llvm/llvm-project/commit/7786266dc7b4e89feadcb01ff21f9e3cf2022a6b), > this shows they are a no-op. The expected test changes from this are in > #123711 OK, I see. LGTM. https://github.com/llvm/llvm-project/pull/123711 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Add test for VALU hoisiting from WWM region. NFC. (PR #123234)
@@ -0,0 +1,43 @@ +# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5 +# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1100 -run-pass=early-machinelicm,si-wqm -o - %s | FileCheck -check-prefix=GCN %s + rampitec wrote: Done https://github.com/llvm/llvm-project/pull/123234 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Add test for VALU hoisiting from WWM region. NFC. (PR #123234)
https://github.com/rampitec updated https://github.com/llvm/llvm-project/pull/123234 >From 7501423b29230f37273094e1b15e8bca0fcc90bd Mon Sep 17 00:00:00 2001 From: Stanislav Mekhanoshin Date: Thu, 16 Jan 2025 10:49:05 -0800 Subject: [PATCH] [AMDGPU] Add test for VALU hoisiting from WWM region. NFC. The test demonstraits a suboptimal VALU hoisting from a WWM region. As a result we have 2 WWM regions instead of one. --- llvm/test/CodeGen/AMDGPU/licm-wwm.mir | 46 +++ 1 file changed, 46 insertions(+) create mode 100644 llvm/test/CodeGen/AMDGPU/licm-wwm.mir diff --git a/llvm/test/CodeGen/AMDGPU/licm-wwm.mir b/llvm/test/CodeGen/AMDGPU/licm-wwm.mir new file mode 100644 index 00..fc20674971a716 --- /dev/null +++ b/llvm/test/CodeGen/AMDGPU/licm-wwm.mir @@ -0,0 +1,46 @@ +# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5 +# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1100 -run-pass=early-machinelicm,si-wqm -o - %s | FileCheck -check-prefix=GCN %s + +# Machine LICM may hoist an intruction from a WWM region, which will force SI-WQM pass +# to create a second WWM region. This is an unwanted hoisting. + +--- +name: licm_move_wwm +tracksRegLiveness: true +body: | + ; GCN-LABEL: name: licm_move_wwm + ; GCN: bb.0: + ; GCN-NEXT: successors: %bb.1(0x8000) + ; GCN-NEXT: {{ $}} + ; GCN-NEXT: [[ENTER_STRICT_WWM:%[0-9]+]]:sreg_32 = ENTER_STRICT_WWM -1, implicit-def $exec, implicit-def $scc, implicit $exec + ; GCN-NEXT: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 1, implicit $exec + ; GCN-NEXT: $exec_lo = EXIT_STRICT_WWM [[ENTER_STRICT_WWM]] + ; GCN-NEXT: S_BRANCH %bb.1 + ; GCN-NEXT: {{ $}} + ; GCN-NEXT: bb.1: + ; GCN-NEXT: successors: %bb.1(0x4000), %bb.2(0x4000) + ; GCN-NEXT: {{ $}} + ; GCN-NEXT: [[ENTER_STRICT_WWM1:%[0-9]+]]:sreg_32 = ENTER_STRICT_WWM -1, implicit-def $exec, implicit-def $scc, implicit $exec + ; GCN-NEXT: [[V_READFIRSTLANE_B32_:%[0-9]+]]:sreg_32 = V_READFIRSTLANE_B32 [[V_MOV_B32_e32_]], implicit $exec + ; GCN-NEXT: $exec_lo = EXIT_STRICT_WWM [[ENTER_STRICT_WWM1]] + ; GCN-NEXT: [[COPY:%[0-9]+]]:sreg_32 = COPY [[V_READFIRSTLANE_B32_]] + ; GCN-NEXT: $exec_lo = S_OR_B32 $exec_lo, [[COPY]], implicit-def $scc + ; GCN-NEXT: S_CBRANCH_EXECNZ %bb.1, implicit $exec + ; GCN-NEXT: S_BRANCH %bb.2 + ; GCN-NEXT: {{ $}} + ; GCN-NEXT: bb.2: + ; GCN-NEXT: S_ENDPGM 0 + bb.0: +S_BRANCH %bb.1 + + bb.1: +%0:vgpr_32 = V_MOV_B32_e32 1, implicit $exec +%1:sreg_32 = V_READFIRSTLANE_B32 killed %0:vgpr_32, implicit $exec +early-clobber %2:sreg_32 = STRICT_WWM killed %1:sreg_32, implicit $exec +$exec_lo = S_OR_B32 $exec_lo, %2, implicit-def $scc +S_CBRANCH_EXECNZ %bb.1, implicit $exec +S_BRANCH %bb.2 + + bb.2: +S_ENDPGM 0 +... ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Disable VALU sinking and hoisting with WWM (PR #123124)
@@ -2773,6 +2773,9 @@ void AMDGPUDAGToDAGISel::SelectINTRINSIC_WO_CHAIN(SDNode *N) { case Intrinsic::amdgcn_wwm: case Intrinsic::amdgcn_strict_wwm: Opcode = AMDGPU::STRICT_WWM; +CurDAG->getMachineFunction() +.getInfo() +->setInitWholeWave(); rampitec wrote: Ack. I can create a separate property HasWWM, but I really want to hear if we even want to go that way. https://github.com/llvm/llvm-project/pull/123124 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Disable VALU sinking and hoisting with WWM (PR #123124)
rampitec wrote: > I guess my concern is performance regressions if any use of WWM (e.g. atomic > optimizer) essentially turns off Machine LICM. I agree. But when moving the code llvm thinks it is something cheap, and its is not, which is also a performance problem. Things would be much easier if we could tell an instruction belongs to a WWM region. https://github.com/llvm/llvm-project/pull/123124 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Disable VALU sinking and hoisting with WWM (PR #123124)
https://github.com/rampitec edited https://github.com/llvm/llvm-project/pull/123124 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Add test for VALU hoisiting from WWM region. NFC. (PR #123234)
rampitec wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/123234?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#123234** https://app.graphite.dev/github/pr/llvm/llvm-project/123234?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/123234?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#123232** https://app.graphite.dev/github/pr/llvm/llvm-project/123232?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn more about https://stacking.dev/?utm_source=stack-comment";>stacking. https://github.com/llvm/llvm-project/pull/123234 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Disable VALU sinking and hoisting with WWM (PR #123124)
https://github.com/rampitec edited https://github.com/llvm/llvm-project/pull/123124 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Add test for VALU hoisiting from WWM region. NFC. (PR #123234)
https://github.com/rampitec created https://github.com/llvm/llvm-project/pull/123234 The test demonstraits a suboptimal VALU hoisting from a WWM region. As a result we have 2 WWM regions instead of one. >From 263a43571303c16c3295cb0a88261504c4aef322 Mon Sep 17 00:00:00 2001 From: Stanislav Mekhanoshin Date: Thu, 16 Jan 2025 10:49:05 -0800 Subject: [PATCH] [AMDGPU] Add test for VALU hoisiting from WWM region. NFC. The test demonstraits a suboptimal VALU hoisting from a WWM region. As a result we have 2 WWM regions instead of one. --- llvm/test/CodeGen/AMDGPU/licm-wwm.mir | 43 +++ 1 file changed, 43 insertions(+) create mode 100644 llvm/test/CodeGen/AMDGPU/licm-wwm.mir diff --git a/llvm/test/CodeGen/AMDGPU/licm-wwm.mir b/llvm/test/CodeGen/AMDGPU/licm-wwm.mir new file mode 100644 index 00..96659fcb716450 --- /dev/null +++ b/llvm/test/CodeGen/AMDGPU/licm-wwm.mir @@ -0,0 +1,43 @@ +# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5 +# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1100 -run-pass=early-machinelicm,si-wqm -o - %s | FileCheck -check-prefix=GCN %s + +--- +name: licm_move_wwm +tracksRegLiveness: true +body: | + ; GCN-LABEL: name: licm_move_wwm + ; GCN: bb.0: + ; GCN-NEXT: successors: %bb.1(0x8000) + ; GCN-NEXT: {{ $}} + ; GCN-NEXT: [[ENTER_STRICT_WWM:%[0-9]+]]:sreg_32 = ENTER_STRICT_WWM -1, implicit-def $exec, implicit-def $scc, implicit $exec + ; GCN-NEXT: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 1, implicit $exec + ; GCN-NEXT: $exec_lo = EXIT_STRICT_WWM [[ENTER_STRICT_WWM]] + ; GCN-NEXT: S_BRANCH %bb.1 + ; GCN-NEXT: {{ $}} + ; GCN-NEXT: bb.1: + ; GCN-NEXT: successors: %bb.1(0x4000), %bb.2(0x4000) + ; GCN-NEXT: {{ $}} + ; GCN-NEXT: [[ENTER_STRICT_WWM1:%[0-9]+]]:sreg_32 = ENTER_STRICT_WWM -1, implicit-def $exec, implicit-def $scc, implicit $exec + ; GCN-NEXT: [[V_READFIRSTLANE_B32_:%[0-9]+]]:sreg_32 = V_READFIRSTLANE_B32 [[V_MOV_B32_e32_]], implicit $exec + ; GCN-NEXT: $exec_lo = EXIT_STRICT_WWM [[ENTER_STRICT_WWM1]] + ; GCN-NEXT: [[COPY:%[0-9]+]]:sreg_32 = COPY [[V_READFIRSTLANE_B32_]] + ; GCN-NEXT: $exec_lo = S_OR_B32 $exec_lo, [[COPY]], implicit-def $scc + ; GCN-NEXT: S_CBRANCH_EXECNZ %bb.1, implicit $exec + ; GCN-NEXT: S_BRANCH %bb.2 + ; GCN-NEXT: {{ $}} + ; GCN-NEXT: bb.2: + ; GCN-NEXT: S_ENDPGM 0 + bb.0: +S_BRANCH %bb.1 + + bb.1: +%0:vgpr_32 = V_MOV_B32_e32 1, implicit $exec +%1:sreg_32 = V_READFIRSTLANE_B32 killed %0:vgpr_32, implicit $exec +early-clobber %2:sreg_32 = STRICT_WWM killed %1:sreg_32, implicit $exec +$exec_lo = S_OR_B32 $exec_lo, %2, implicit-def $scc +S_CBRANCH_EXECNZ %bb.1, implicit $exec +S_BRANCH %bb.2 + + bb.2: +S_ENDPGM 0 +... ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Add test for VALU hoisiting from WWM region. NFC. (PR #123234)
https://github.com/rampitec ready_for_review https://github.com/llvm/llvm-project/pull/123234 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Disable VALU sinking and hoisting with WWM (PR #123124)
rampitec wrote: > Missing new test? Tests added. https://github.com/llvm/llvm-project/pull/123124 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Set inst_pref_size to maximum (PR #126981)
@@ -199,3 +201,28 @@ const MCExpr *SIProgramInfo::getPGMRSrc2(CallingConv::ID CC, return MCConstantExpr::create(0, Ctx); } + +uint64_t SIProgramInfo::getFunctionCodeSize(const MachineFunction &MF) { rampitec wrote: I wanted to look at this separately. Right now the problem is AsmPrinter emits end function label into an incorrect place, actually into a kernel descriptor in .rodata. This is even a wrong section. That will take more and really a separate thing, but when fixed I could replace that with MCExpr. I.e., I can emit a separate end label, but this is also a hack. https://github.com/llvm/llvm-project/pull/126981 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Set inst_pref_size to maximum (PR #126981)
@@ -199,3 +201,28 @@ const MCExpr *SIProgramInfo::getPGMRSrc2(CallingConv::ID CC, return MCConstantExpr::create(0, Ctx); } + +uint64_t SIProgramInfo::getFunctionCodeSize(const MachineFunction &MF) { + if (!CodeSizeInBytes.has_value()) { +const GCNSubtarget &STM = MF.getSubtarget(); +const SIInstrInfo *TII = STM.getInstrInfo(); + +uint64_t CodeSize = 0; + +for (const MachineBasicBlock &MBB : MF) { + for (const MachineInstr &MI : MBB) { +// TODO: CodeSize should account for multiple functions. + +// TODO: Should we count size of debug info? +if (MI.isDebugInstr()) rampitec wrote: That said, the function was simply moved as is, the only added functionality is caching. And yes. it is incorrect and always was, at least because it does not correctly handle inline asm. https://github.com/llvm/llvm-project/pull/126981 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Set inst_pref_size to maximum (PR #126981)
@@ -199,3 +201,28 @@ const MCExpr *SIProgramInfo::getPGMRSrc2(CallingConv::ID CC, return MCConstantExpr::create(0, Ctx); } + +uint64_t SIProgramInfo::getFunctionCodeSize(const MachineFunction &MF) { + if (!CodeSizeInBytes.has_value()) { +const GCNSubtarget &STM = MF.getSubtarget(); +const SIInstrInfo *TII = STM.getInstrInfo(); + +uint64_t CodeSize = 0; + +for (const MachineBasicBlock &MBB : MF) { + for (const MachineInstr &MI : MBB) { +// TODO: CodeSize should account for multiple functions. + +// TODO: Should we count size of debug info? +if (MI.isDebugInstr()) rampitec wrote: Since this is really somewhat unrelated changes, I have split it into a separate https://github.com/llvm/llvm-project/pull/127111, which is just move of the code, and will create yet another PR to address the functional comments. https://github.com/llvm/llvm-project/pull/126981 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Set inst_pref_size to maximum (PR #126981)
https://github.com/rampitec edited https://github.com/llvm/llvm-project/pull/126981 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [AMDGPU][clang] Replace gfx940 and gfx941 with gfx942 in clang (PR #126762)
https://github.com/rampitec approved this pull request. https://github.com/llvm/llvm-project/pull/126762 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] [AMDGPU] Replace gfx940 and gfx941 with gfx942 in llvm (PR #126763)
rampitec wrote: > Should just leave the subtarget feature name alone. It's not worth the > trouble, and this will now start spewing warnings on old IR (due to > unnecessary target-features spam clang should stop emitting). It really > should have been named 94-insts, but I think it's best to leave it alone I agree we can keep feature name and all these 'gfx940' checks, just remove targets. https://github.com/llvm/llvm-project/pull/126763 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] [AMDGPU] Replace gfx940 and gfx941 with gfx942 in llvm (PR #126763)
@@ -1619,28 +1613,6 @@ def FeatureISAVersion9_5_Common : FeatureSet< FeatureAtomicBufferPkAddBF16Inst ])>; -def FeatureISAVersion9_4_0 : FeatureSet< - !listconcat(FeatureISAVersion9_4_Common.Features, -[ - FeatureAddressableLocalMemorySize65536, - FeatureForceStoreSC0SC1, rampitec wrote: FeatureForceStoreSC0SC1 can also be removed along with all the code handling it in a separate change. https://github.com/llvm/llvm-project/pull/126763 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Remove the pass `AMDGPUPromoteKernelArguments` (PR #137655)
@@ -11,11 +10,9 @@ define amdgpu_kernel void @ptr_nest_3(ptr addrspace(1) nocapture readonly %Arg) ; CHECK-NEXT: entry: ; CHECK-NEXT:[[I:%.*]] = tail call i32 @llvm.amdgcn.workitem.id.x() ; CHECK-NEXT:[[P1:%.*]] = getelementptr inbounds ptr, ptr addrspace(1) [[ARG:%.*]], i32 [[I]] -; CHECK-NEXT:[[P2:%.*]] = load ptr, ptr addrspace(1) [[P1]], align 8, !amdgpu.noclobber [[META0:![0-9]+]] -; CHECK-NEXT:[[P2_GLOBAL:%.*]] = addrspacecast ptr [[P2]] to ptr addrspace(1) -; CHECK-NEXT:[[P3:%.*]] = load ptr, ptr addrspace(1) [[P2_GLOBAL]], align 8, !amdgpu.noclobber [[META0]] -; CHECK-NEXT:[[P3_GLOBAL:%.*]] = addrspacecast ptr [[P3]] to ptr addrspace(1) -; CHECK-NEXT:store float 0.00e+00, ptr addrspace(1) [[P3_GLOBAL]], align 4 +; CHECK-NEXT:[[P2:%.*]] = load ptr, ptr addrspace(1) [[P1]], align 8 +; CHECK-NEXT:[[P3:%.*]] = load ptr, ptr [[P2]], align 8 rampitec wrote: I think you can have an invalid pointer anywhere, but that is up to the program not to dereference an invalid pointer. On practice it cannot be anything but global as passed from host. Even if another kernel place there any other pointer it is illegal to use it, and it is up to the developer not to do it. It should not prevent the optimization. https://github.com/llvm/llvm-project/pull/137655 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Remove the pass `AMDGPUPromoteKernelArguments` (PR #137655)
@@ -11,11 +10,9 @@ define amdgpu_kernel void @ptr_nest_3(ptr addrspace(1) nocapture readonly %Arg) ; CHECK-NEXT: entry: ; CHECK-NEXT:[[I:%.*]] = tail call i32 @llvm.amdgcn.workitem.id.x() ; CHECK-NEXT:[[P1:%.*]] = getelementptr inbounds ptr, ptr addrspace(1) [[ARG:%.*]], i32 [[I]] -; CHECK-NEXT:[[P2:%.*]] = load ptr, ptr addrspace(1) [[P1]], align 8, !amdgpu.noclobber [[META0:![0-9]+]] -; CHECK-NEXT:[[P2_GLOBAL:%.*]] = addrspacecast ptr [[P2]] to ptr addrspace(1) -; CHECK-NEXT:[[P3:%.*]] = load ptr, ptr addrspace(1) [[P2_GLOBAL]], align 8, !amdgpu.noclobber [[META0]] -; CHECK-NEXT:[[P3_GLOBAL:%.*]] = addrspacecast ptr [[P3]] to ptr addrspace(1) -; CHECK-NEXT:store float 0.00e+00, ptr addrspace(1) [[P3_GLOBAL]], align 4 +; CHECK-NEXT:[[P2:%.*]] = load ptr, ptr addrspace(1) [[P1]], align 8 +; CHECK-NEXT:[[P3:%.*]] = load ptr, ptr [[P2]], align 8 rampitec wrote: The pass is important for performance, especially for HIP. A pointer passed from host cannot be anything but global and be valid. So, this is a surprising change. https://github.com/llvm/llvm-project/pull/137655 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Respect MBB alignment in the getFunctionCodeSize() (PR #127142)
rampitec wrote: Which one do you prefer, this or https://github.com/llvm/llvm-project/pull/127246? They are mutually exclusive. https://github.com/llvm/llvm-project/pull/127142 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Handle subregister uses in SIFoldOperands constant folding (PR #127485)
https://github.com/rampitec approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/127485 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Handle brev and not cases in getConstValDefinedInReg (PR #127483)
https://github.com/rampitec approved this pull request. https://github.com/llvm/llvm-project/pull/127483 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Respect MBB alignment in the getFunctionCodeSize() (PR #127142)
https://github.com/rampitec updated https://github.com/llvm/llvm-project/pull/127142 >From b574a4b4afbf4cd0a6e128ea5d1e1579698124bc Mon Sep 17 00:00:00 2001 From: Stanislav Mekhanoshin Date: Thu, 13 Feb 2025 14:46:37 -0800 Subject: [PATCH] [AMDGPU] Respect MBB alignment in the getFunctionCodeSize() --- llvm/lib/Target/AMDGPU/SIProgramInfo.cpp | 6 ++ .../CodeGen/AMDGPU/code-size-estimate.mir | 89 +++ 2 files changed, 95 insertions(+) diff --git a/llvm/lib/Target/AMDGPU/SIProgramInfo.cpp b/llvm/lib/Target/AMDGPU/SIProgramInfo.cpp index 1123696509818..b4d740422b94a 100644 --- a/llvm/lib/Target/AMDGPU/SIProgramInfo.cpp +++ b/llvm/lib/Target/AMDGPU/SIProgramInfo.cpp @@ -212,6 +212,12 @@ uint64_t SIProgramInfo::getFunctionCodeSize(const MachineFunction &MF) { uint64_t CodeSize = 0; for (const MachineBasicBlock &MBB : MF) { +// The amount of padding to align code can be both underestimated and +// overestimated. In case of inline asm used getInstSizeInBytes() will +// return a maximum size of a single instruction, where the real size may +// differ. At this point CodeSize may be already off. +CodeSize = alignTo(CodeSize, MBB.getAlignment()); + for (const MachineInstr &MI : MBB) { // TODO: CodeSize should account for multiple functions. diff --git a/llvm/test/CodeGen/AMDGPU/code-size-estimate.mir b/llvm/test/CodeGen/AMDGPU/code-size-estimate.mir index 76eaf350301e4..9ae536af6f0e9 100644 --- a/llvm/test/CodeGen/AMDGPU/code-size-estimate.mir +++ b/llvm/test/CodeGen/AMDGPU/code-size-estimate.mir @@ -31,3 +31,92 @@ body: | WAVE_BARRIER ... + +# CHECK: align4: ; @align4 +# CHECK: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; encoding: [0x00,0x00,0x8c,0xbf] +# CHECK: s_cbranch_scc1 .LBB{{[0-9_]+}} ; encoding: [A,A,0x85,0xbf] +# CHECK: s_barrier ; encoding: [0x00,0x00,0x8a,0xbf] +# CHECK: .p2align2 +# CHECK: s_endpgm; encoding: [0x00,0x00,0x81,0xbf] +# CHECK: ; codeLenInByte = 16 + +--- +name:align4 +tracksRegLiveness: true +body: | + bb.0: +$scc = IMPLICIT_DEF +S_CBRANCH_SCC1 %bb.2, implicit $scc + + bb.1: +S_BARRIER + + bb.2 (align 4): +S_ENDPGM 0 +... + +# CHECK: align8: ; @align8 +# CHECK: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; encoding: [0x00,0x00,0x8c,0xbf] +# CHECK: s_cbranch_scc1 .LBB{{[0-9_]+}} ; encoding: [A,A,0x85,0xbf] +# CHECK: s_barrier ; encoding: [0x00,0x00,0x8a,0xbf] +# CHECK: .p2align3 +# CHECK: s_endpgm; encoding: [0x00,0x00,0x81,0xbf] +# CHECK: ; codeLenInByte = 20 +--- +name:align8 +tracksRegLiveness: true +body: | + bb.0: +$scc = IMPLICIT_DEF +S_CBRANCH_SCC1 %bb.2, implicit $scc + + bb.1: +S_BARRIER + + bb.2 (align 8): +S_ENDPGM 0 +... + +# CHECK: align16:; @align16 +# CHECK: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; encoding: [0x00,0x00,0x8c,0xbf] +# CHECK: s_cbranch_scc1 .LBB{{[0-9_]+}} ; encoding: [A,A,0x85,0xbf] +# CHECK: s_barrier ; encoding: [0x00,0x00,0x8a,0xbf] +# CHECK: .p2align4 +# CHECK: s_endpgm; encoding: [0x00,0x00,0x81,0xbf] +# CHECK: ; codeLenInByte = 20 +--- +name:align16 +tracksRegLiveness: true +body: | + bb.0: +$scc = IMPLICIT_DEF +S_CBRANCH_SCC1 %bb.2, implicit $scc + + bb.1: +S_BARRIER + + bb.2 (align 16): +S_ENDPGM 0 +... + +# CHECK: align32:; @align32 +# CHECK: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; encoding: [0x00,0x00,0x8c,0xbf] +# CHECK: s_cbranch_scc1 .LBB{{[0-9_]+}} ; encoding: [A,A,0x85,0xbf] +# CHECK: s_barrier ; encoding: [0x00,0x00,0x8a,0xbf] +# CHECK: .p2align5 +# CHECK: s_endpgm; encoding: [0x00,0x00,0x81,0xbf] +# CHECK: ; codeLenInByte = 36 +--- +name:align32 +tracksRegLiveness: true +body: | + bb.0: +$scc = IMPLICIT_DEF +S_CBRANCH_SCC1 %bb.2, implicit $scc + + bb.1: +S_BARRIER + + bb.2 (align 32): +S_ENDPGM 0 +... ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Respect MBB alignment in the getFunctionCodeSize() (PR #127142)
https://github.com/rampitec updated https://github.com/llvm/llvm-project/pull/127142 >From b574a4b4afbf4cd0a6e128ea5d1e1579698124bc Mon Sep 17 00:00:00 2001 From: Stanislav Mekhanoshin Date: Thu, 13 Feb 2025 14:46:37 -0800 Subject: [PATCH] [AMDGPU] Respect MBB alignment in the getFunctionCodeSize() --- llvm/lib/Target/AMDGPU/SIProgramInfo.cpp | 6 ++ .../CodeGen/AMDGPU/code-size-estimate.mir | 89 +++ 2 files changed, 95 insertions(+) diff --git a/llvm/lib/Target/AMDGPU/SIProgramInfo.cpp b/llvm/lib/Target/AMDGPU/SIProgramInfo.cpp index 1123696509818..b4d740422b94a 100644 --- a/llvm/lib/Target/AMDGPU/SIProgramInfo.cpp +++ b/llvm/lib/Target/AMDGPU/SIProgramInfo.cpp @@ -212,6 +212,12 @@ uint64_t SIProgramInfo::getFunctionCodeSize(const MachineFunction &MF) { uint64_t CodeSize = 0; for (const MachineBasicBlock &MBB : MF) { +// The amount of padding to align code can be both underestimated and +// overestimated. In case of inline asm used getInstSizeInBytes() will +// return a maximum size of a single instruction, where the real size may +// differ. At this point CodeSize may be already off. +CodeSize = alignTo(CodeSize, MBB.getAlignment()); + for (const MachineInstr &MI : MBB) { // TODO: CodeSize should account for multiple functions. diff --git a/llvm/test/CodeGen/AMDGPU/code-size-estimate.mir b/llvm/test/CodeGen/AMDGPU/code-size-estimate.mir index 76eaf350301e4..9ae536af6f0e9 100644 --- a/llvm/test/CodeGen/AMDGPU/code-size-estimate.mir +++ b/llvm/test/CodeGen/AMDGPU/code-size-estimate.mir @@ -31,3 +31,92 @@ body: | WAVE_BARRIER ... + +# CHECK: align4: ; @align4 +# CHECK: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; encoding: [0x00,0x00,0x8c,0xbf] +# CHECK: s_cbranch_scc1 .LBB{{[0-9_]+}} ; encoding: [A,A,0x85,0xbf] +# CHECK: s_barrier ; encoding: [0x00,0x00,0x8a,0xbf] +# CHECK: .p2align2 +# CHECK: s_endpgm; encoding: [0x00,0x00,0x81,0xbf] +# CHECK: ; codeLenInByte = 16 + +--- +name:align4 +tracksRegLiveness: true +body: | + bb.0: +$scc = IMPLICIT_DEF +S_CBRANCH_SCC1 %bb.2, implicit $scc + + bb.1: +S_BARRIER + + bb.2 (align 4): +S_ENDPGM 0 +... + +# CHECK: align8: ; @align8 +# CHECK: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; encoding: [0x00,0x00,0x8c,0xbf] +# CHECK: s_cbranch_scc1 .LBB{{[0-9_]+}} ; encoding: [A,A,0x85,0xbf] +# CHECK: s_barrier ; encoding: [0x00,0x00,0x8a,0xbf] +# CHECK: .p2align3 +# CHECK: s_endpgm; encoding: [0x00,0x00,0x81,0xbf] +# CHECK: ; codeLenInByte = 20 +--- +name:align8 +tracksRegLiveness: true +body: | + bb.0: +$scc = IMPLICIT_DEF +S_CBRANCH_SCC1 %bb.2, implicit $scc + + bb.1: +S_BARRIER + + bb.2 (align 8): +S_ENDPGM 0 +... + +# CHECK: align16:; @align16 +# CHECK: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; encoding: [0x00,0x00,0x8c,0xbf] +# CHECK: s_cbranch_scc1 .LBB{{[0-9_]+}} ; encoding: [A,A,0x85,0xbf] +# CHECK: s_barrier ; encoding: [0x00,0x00,0x8a,0xbf] +# CHECK: .p2align4 +# CHECK: s_endpgm; encoding: [0x00,0x00,0x81,0xbf] +# CHECK: ; codeLenInByte = 20 +--- +name:align16 +tracksRegLiveness: true +body: | + bb.0: +$scc = IMPLICIT_DEF +S_CBRANCH_SCC1 %bb.2, implicit $scc + + bb.1: +S_BARRIER + + bb.2 (align 16): +S_ENDPGM 0 +... + +# CHECK: align32:; @align32 +# CHECK: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; encoding: [0x00,0x00,0x8c,0xbf] +# CHECK: s_cbranch_scc1 .LBB{{[0-9_]+}} ; encoding: [A,A,0x85,0xbf] +# CHECK: s_barrier ; encoding: [0x00,0x00,0x8a,0xbf] +# CHECK: .p2align5 +# CHECK: s_endpgm; encoding: [0x00,0x00,0x81,0xbf] +# CHECK: ; codeLenInByte = 36 +--- +name:align32 +tracksRegLiveness: true +body: | + bb.0: +$scc = IMPLICIT_DEF +S_CBRANCH_SCC1 %bb.2, implicit $scc + + bb.1: +S_BARRIER + + bb.2 (align 32): +S_ENDPGM 0 +... ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Respect MBB alignment in the getFunctionCodeSize() (PR #127142)
rampitec wrote: > > Which one do you prefer, this or #127246? They are mutually exclusive. > > They're not really. This one is the incremental step which adds the test, > #127246 is the final form The test is meaningless if we overestimate. https://github.com/llvm/llvm-project/pull/127142 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Respect MBB alignment in the getFunctionCodeSize() (PR #127142)
rampitec wrote: And in any case it is a moot until baseline change is accepted. https://github.com/llvm/llvm-project/pull/127142 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Stop introducing v_accvgpr_write_b32 for reg-to-reg copy (PR #129059)
https://github.com/rampitec approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/129059 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [AMDGPU] Simplify dpp builtin handling (PR #115090)
https://github.com/rampitec updated https://github.com/llvm/llvm-project/pull/115090 >From f7e10b1e26159442945c2682ca1ed463bd152605 Mon Sep 17 00:00:00 2001 From: Stanislav Mekhanoshin Date: Mon, 4 Nov 2024 12:28:07 -0800 Subject: [PATCH] [AMDGPU] Simplify dpp builtin handling DPP intrinsics can handle any type now, so no need to cast to integer. The caveat is that intrinsics only handle backend legal types, but it does not work with i8 for example. --- clang/lib/CodeGen/CGBuiltin.cpp | 23 ++- .../CodeGenOpenCL/builtins-amdgcn-gfx10.cl| 30 -- .../test/CodeGenOpenCL/builtins-amdgcn-vi.cl | 60 +++ 3 files changed, 38 insertions(+), 75 deletions(-) diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp index 03b8d16b76e0d..bff48f2e16524 100644 --- a/clang/lib/CodeGen/CGBuiltin.cpp +++ b/clang/lib/CodeGen/CGBuiltin.cpp @@ -20003,37 +20003,24 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, getContext().GetBuiltinType(BuiltinID, Error, &ICEArguments); assert(Error == ASTContext::GE_None && "Should not codegen an error"); llvm::Type *DataTy = ConvertType(E->getArg(0)->getType()); -unsigned Size = DataTy->getPrimitiveSizeInBits(); -llvm::Type *IntTy = -llvm::IntegerType::get(Builder.getContext(), std::max(Size, 32u)); Function *F = CGM.getIntrinsic(BuiltinID == AMDGPU::BI__builtin_amdgcn_mov_dpp8 ? Intrinsic::amdgcn_mov_dpp8 : Intrinsic::amdgcn_update_dpp, - IntTy); + DataTy); assert(E->getNumArgs() == 5 || E->getNumArgs() == 6 || E->getNumArgs() == 2); bool InsertOld = BuiltinID == AMDGPU::BI__builtin_amdgcn_mov_dpp; if (InsertOld) - Args.push_back(llvm::PoisonValue::get(IntTy)); -for (unsigned I = 0; I != E->getNumArgs(); ++I) { + Args.push_back(llvm::PoisonValue::get(DataTy)); +Args.push_back(EmitScalarOrConstFoldImmArg(ICEArguments, 0, E)); +for (unsigned I = 1; I != E->getNumArgs(); ++I) { llvm::Value *V = EmitScalarOrConstFoldImmArg(ICEArguments, I, E); - if (I < (BuiltinID == AMDGPU::BI__builtin_amdgcn_update_dpp ? 2u : 1u) && - Size < 32) { -if (!DataTy->isIntegerTy()) - V = Builder.CreateBitCast( - V, llvm::IntegerType::get(Builder.getContext(), Size)); -V = Builder.CreateZExtOrBitCast(V, IntTy); - } llvm::Type *ExpTy = F->getFunctionType()->getFunctionParamType(I + InsertOld); Args.push_back(Builder.CreateTruncOrBitCast(V, ExpTy)); } -Value *V = Builder.CreateCall(F, Args); -if (Size < 32 && !DataTy->isIntegerTy()) - V = Builder.CreateTrunc( - V, llvm::IntegerType::get(Builder.getContext(), Size)); -return Builder.CreateTruncOrBitCast(V, DataTy); +return Builder.CreateCall(F, Args); } case AMDGPU::BI__builtin_amdgcn_permlane16: case AMDGPU::BI__builtin_amdgcn_permlanex16: diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx10.cl b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx10.cl index a4054cba236dd..7e4ee6f4a942d 100644 --- a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx10.cl +++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx10.cl @@ -36,45 +36,37 @@ void test_mov_dpp8_long(global long* out, long a) { } // CHECK-LABEL: @test_mov_dpp8_float( -// CHECK: %0 = bitcast float %a to i32 -// CHECK-NEXT: %1 = tail call{{.*}} i32 @llvm.amdgcn.mov.dpp8.i32(i32 %0, i32 1) -// CHECK-NEXT: store i32 %1, +// CHECK: %0 = tail call{{.*}} float @llvm.amdgcn.mov.dpp8.f32(float %a, i32 1) +// CHECK-NEXT: store float %0, void test_mov_dpp8_float(global float* out, float a) { *out = __builtin_amdgcn_mov_dpp8(a, 1); } // CHECK-LABEL: @test_mov_dpp8_double -// CHECK: %0 = bitcast double %x to i64 -// CHECK-NEXT: %1 = tail call{{.*}} i64 @llvm.amdgcn.mov.dpp8.i64(i64 %0, i32 1) -// CHECK-NEXT: store i64 %1, +// CHECK: %0 = tail call{{.*}} double @llvm.amdgcn.mov.dpp8.f64(double %x, i32 1) +// CHECK-NEXT: store double %0, void test_mov_dpp8_double(double x, global double *p) { *p = __builtin_amdgcn_mov_dpp8(x, 1); } // CHECK-LABEL: @test_mov_dpp8_short -// CHECK: %0 = zext i16 %x to i32 -// CHECK-NEXT: %1 = tail call{{.*}} i32 @llvm.amdgcn.mov.dpp8.i32(i32 %0, i32 1) -// CHECK-NEXT: %2 = trunc i32 %1 to i16 -// CHECK-NEXT: store i16 %2, +// CHECK: %0 = tail call{{.*}} i16 @llvm.amdgcn.mov.dpp8.i16(i16 %x, i32 1) +// CHECK-NEXT: store i16 %0, void test_mov_dpp8_short(short x, global short *p) { *p = __builtin_amdgcn_mov_dpp8(x, 1); } // CHECK-LABEL: @test_mov_dpp8_char -// CHECK: %0 = zext i8 %x to i32 -// CHECK-NEXT: %1 = tail call{{.*}} i32 @llvm.amdgcn.mov.dpp8.i32(i32 %0, i32 1) -// CHECK-NEXT: %2 = trunc i32 %1 to i8 -// CHECK-NEXT: store i8 %2, +// CHECK: %0 = tail call{{.*}} i8 @llvm.amdgcn.mov.dpp8.i8(i8 %x, i32
[llvm-branch-commits] [clang] [AMDGPU] Simplify dpp builtin handling (PR #115090)
https://github.com/rampitec updated https://github.com/llvm/llvm-project/pull/115090 >From f7e10b1e26159442945c2682ca1ed463bd152605 Mon Sep 17 00:00:00 2001 From: Stanislav Mekhanoshin Date: Mon, 4 Nov 2024 12:28:07 -0800 Subject: [PATCH] [AMDGPU] Simplify dpp builtin handling DPP intrinsics can handle any type now, so no need to cast to integer. The caveat is that intrinsics only handle backend legal types, but it does not work with i8 for example. --- clang/lib/CodeGen/CGBuiltin.cpp | 23 ++- .../CodeGenOpenCL/builtins-amdgcn-gfx10.cl| 30 -- .../test/CodeGenOpenCL/builtins-amdgcn-vi.cl | 60 +++ 3 files changed, 38 insertions(+), 75 deletions(-) diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp index 03b8d16b76e0d..bff48f2e16524 100644 --- a/clang/lib/CodeGen/CGBuiltin.cpp +++ b/clang/lib/CodeGen/CGBuiltin.cpp @@ -20003,37 +20003,24 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, getContext().GetBuiltinType(BuiltinID, Error, &ICEArguments); assert(Error == ASTContext::GE_None && "Should not codegen an error"); llvm::Type *DataTy = ConvertType(E->getArg(0)->getType()); -unsigned Size = DataTy->getPrimitiveSizeInBits(); -llvm::Type *IntTy = -llvm::IntegerType::get(Builder.getContext(), std::max(Size, 32u)); Function *F = CGM.getIntrinsic(BuiltinID == AMDGPU::BI__builtin_amdgcn_mov_dpp8 ? Intrinsic::amdgcn_mov_dpp8 : Intrinsic::amdgcn_update_dpp, - IntTy); + DataTy); assert(E->getNumArgs() == 5 || E->getNumArgs() == 6 || E->getNumArgs() == 2); bool InsertOld = BuiltinID == AMDGPU::BI__builtin_amdgcn_mov_dpp; if (InsertOld) - Args.push_back(llvm::PoisonValue::get(IntTy)); -for (unsigned I = 0; I != E->getNumArgs(); ++I) { + Args.push_back(llvm::PoisonValue::get(DataTy)); +Args.push_back(EmitScalarOrConstFoldImmArg(ICEArguments, 0, E)); +for (unsigned I = 1; I != E->getNumArgs(); ++I) { llvm::Value *V = EmitScalarOrConstFoldImmArg(ICEArguments, I, E); - if (I < (BuiltinID == AMDGPU::BI__builtin_amdgcn_update_dpp ? 2u : 1u) && - Size < 32) { -if (!DataTy->isIntegerTy()) - V = Builder.CreateBitCast( - V, llvm::IntegerType::get(Builder.getContext(), Size)); -V = Builder.CreateZExtOrBitCast(V, IntTy); - } llvm::Type *ExpTy = F->getFunctionType()->getFunctionParamType(I + InsertOld); Args.push_back(Builder.CreateTruncOrBitCast(V, ExpTy)); } -Value *V = Builder.CreateCall(F, Args); -if (Size < 32 && !DataTy->isIntegerTy()) - V = Builder.CreateTrunc( - V, llvm::IntegerType::get(Builder.getContext(), Size)); -return Builder.CreateTruncOrBitCast(V, DataTy); +return Builder.CreateCall(F, Args); } case AMDGPU::BI__builtin_amdgcn_permlane16: case AMDGPU::BI__builtin_amdgcn_permlanex16: diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx10.cl b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx10.cl index a4054cba236dd..7e4ee6f4a942d 100644 --- a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx10.cl +++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx10.cl @@ -36,45 +36,37 @@ void test_mov_dpp8_long(global long* out, long a) { } // CHECK-LABEL: @test_mov_dpp8_float( -// CHECK: %0 = bitcast float %a to i32 -// CHECK-NEXT: %1 = tail call{{.*}} i32 @llvm.amdgcn.mov.dpp8.i32(i32 %0, i32 1) -// CHECK-NEXT: store i32 %1, +// CHECK: %0 = tail call{{.*}} float @llvm.amdgcn.mov.dpp8.f32(float %a, i32 1) +// CHECK-NEXT: store float %0, void test_mov_dpp8_float(global float* out, float a) { *out = __builtin_amdgcn_mov_dpp8(a, 1); } // CHECK-LABEL: @test_mov_dpp8_double -// CHECK: %0 = bitcast double %x to i64 -// CHECK-NEXT: %1 = tail call{{.*}} i64 @llvm.amdgcn.mov.dpp8.i64(i64 %0, i32 1) -// CHECK-NEXT: store i64 %1, +// CHECK: %0 = tail call{{.*}} double @llvm.amdgcn.mov.dpp8.f64(double %x, i32 1) +// CHECK-NEXT: store double %0, void test_mov_dpp8_double(double x, global double *p) { *p = __builtin_amdgcn_mov_dpp8(x, 1); } // CHECK-LABEL: @test_mov_dpp8_short -// CHECK: %0 = zext i16 %x to i32 -// CHECK-NEXT: %1 = tail call{{.*}} i32 @llvm.amdgcn.mov.dpp8.i32(i32 %0, i32 1) -// CHECK-NEXT: %2 = trunc i32 %1 to i16 -// CHECK-NEXT: store i16 %2, +// CHECK: %0 = tail call{{.*}} i16 @llvm.amdgcn.mov.dpp8.i16(i16 %x, i32 1) +// CHECK-NEXT: store i16 %0, void test_mov_dpp8_short(short x, global short *p) { *p = __builtin_amdgcn_mov_dpp8(x, 1); } // CHECK-LABEL: @test_mov_dpp8_char -// CHECK: %0 = zext i8 %x to i32 -// CHECK-NEXT: %1 = tail call{{.*}} i32 @llvm.amdgcn.mov.dpp8.i32(i32 %0, i32 1) -// CHECK-NEXT: %2 = trunc i32 %1 to i8 -// CHECK-NEXT: store i8 %2, +// CHECK: %0 = tail call{{.*}} i8 @llvm.amdgcn.mov.dpp8.i8(i8 %x, i32
[llvm-branch-commits] [llvm] AMDGPU: Replace amdgpu-no-agpr with amdgpu-num-agpr (PR #129893)
https://github.com/rampitec approved this pull request. https://github.com/llvm/llvm-project/pull/129893 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Early bail in getFunctionCodeSize for meta inst. NFC. (PR #127129)
rampitec wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/127129?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#127129** https://app.graphite.dev/github/pr/llvm/llvm-project/127129?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/127129?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#127111** https://app.graphite.dev/github/pr/llvm/llvm-project/127111?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/>: 1 other dependent PR ([#126981](https://github.com/llvm/llvm-project/pull/126981) https://app.graphite.dev/github/pr/llvm/llvm-project/126981?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/>) * `main` This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn more about https://stacking.dev/?utm_source=stack-comment";>stacking. https://github.com/llvm/llvm-project/pull/127129 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Early bail in getFunctionCodeSize for meta inst. NFC. (PR #127129)
https://github.com/rampitec created https://github.com/llvm/llvm-project/pull/127129 It does not change the estimate because getInstSizeInBytes() already returns 0 for meta instructions, but added a test and early bail. >From c0489545755c98dc2f87ffcd83af929816643074 Mon Sep 17 00:00:00 2001 From: Stanislav Mekhanoshin Date: Thu, 13 Feb 2025 13:19:26 -0800 Subject: [PATCH] [AMDGPU] Early bail in getFunctionCodeSize for meta inst. NFC. It does not change the estimate because getInstSizeInBytes() already returns 0 for meta instructions, but added a test and early bail. --- llvm/lib/Target/AMDGPU/SIProgramInfo.cpp| 2 +- llvm/test/CodeGen/AMDGPU/code-size-estimate.mir | 13 + 2 files changed, 14 insertions(+), 1 deletion(-) diff --git a/llvm/lib/Target/AMDGPU/SIProgramInfo.cpp b/llvm/lib/Target/AMDGPU/SIProgramInfo.cpp index 5179288084010..b995687e71780 100644 --- a/llvm/lib/Target/AMDGPU/SIProgramInfo.cpp +++ b/llvm/lib/Target/AMDGPU/SIProgramInfo.cpp @@ -216,7 +216,7 @@ uint64_t SIProgramInfo::getFunctionCodeSize(const MachineFunction &MF) { // TODO: CodeSize should account for multiple functions. // TODO: Should we count size of debug info? - if (MI.isDebugInstr()) + if (MI.isDebugInstr() || MI.isMetaInstruction()) continue; CodeSize += TII->getInstSizeInBytes(MI); diff --git a/llvm/test/CodeGen/AMDGPU/code-size-estimate.mir b/llvm/test/CodeGen/AMDGPU/code-size-estimate.mir index 9e46c58b6b5a9..76eaf350301e4 100644 --- a/llvm/test/CodeGen/AMDGPU/code-size-estimate.mir +++ b/llvm/test/CodeGen/AMDGPU/code-size-estimate.mir @@ -18,3 +18,16 @@ body: | $vgpr16 = V_MOV_B32_indirect_read undef $vgpr1, implicit $exec, implicit $m0, implicit $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15 V_MOV_B32_indirect_write undef $vgpr0, undef $vgpr3, implicit $exec, implicit $m0, implicit-def $vgpr0_vgpr1_vgpr2_vgpr3, implicit killed $vgpr0_vgpr1_vgpr2_vgpr3(tied-def 4) ... + +# CHECK: meta: ; @meta +# CHECK: ; wave barrier +# CHECK: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; encoding: [0x00,0x00,0x8c,0xbf] +# CHECK: ; codeLenInByte = 4 +--- +name:meta +tracksRegLiveness: true +body: | + bb.0: + + WAVE_BARRIER +... ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Respect MBB alignment in the getFunctionCodeSize() (PR #127142)
@@ -212,6 +212,8 @@ uint64_t SIProgramInfo::getFunctionCodeSize(const MachineFunction &MF) { uint64_t CodeSize = 0; for (const MachineBasicBlock &MBB : MF) { +CodeSize = alignTo(CodeSize, MBB.getAlignment()); rampitec wrote: Pessimistic overestimate is actually worse for some applications of this function. For what I am doing now it may result prefetching memory far beyond the program. I believe our estimates shall be correct except for inline asm... https://github.com/llvm/llvm-project/pull/127142 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Respect MBB alignment in the getFunctionCodeSize() (PR #127142)
https://github.com/rampitec created https://github.com/llvm/llvm-project/pull/127142 None >From d01d16815ade61a599b94bb18bc292e326767f15 Mon Sep 17 00:00:00 2001 From: Stanislav Mekhanoshin Date: Thu, 13 Feb 2025 14:46:37 -0800 Subject: [PATCH] [AMDGPU] Respect MBB alignment in the getFunctionCodeSize() --- llvm/lib/Target/AMDGPU/SIProgramInfo.cpp | 2 + .../CodeGen/AMDGPU/code-size-estimate.mir | 89 +++ 2 files changed, 91 insertions(+) diff --git a/llvm/lib/Target/AMDGPU/SIProgramInfo.cpp b/llvm/lib/Target/AMDGPU/SIProgramInfo.cpp index b995687e71780..9d9b4c83ac388 100644 --- a/llvm/lib/Target/AMDGPU/SIProgramInfo.cpp +++ b/llvm/lib/Target/AMDGPU/SIProgramInfo.cpp @@ -212,6 +212,8 @@ uint64_t SIProgramInfo::getFunctionCodeSize(const MachineFunction &MF) { uint64_t CodeSize = 0; for (const MachineBasicBlock &MBB : MF) { +CodeSize = alignTo(CodeSize, MBB.getAlignment()); + for (const MachineInstr &MI : MBB) { // TODO: CodeSize should account for multiple functions. diff --git a/llvm/test/CodeGen/AMDGPU/code-size-estimate.mir b/llvm/test/CodeGen/AMDGPU/code-size-estimate.mir index 76eaf350301e4..9ae536af6f0e9 100644 --- a/llvm/test/CodeGen/AMDGPU/code-size-estimate.mir +++ b/llvm/test/CodeGen/AMDGPU/code-size-estimate.mir @@ -31,3 +31,92 @@ body: | WAVE_BARRIER ... + +# CHECK: align4: ; @align4 +# CHECK: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; encoding: [0x00,0x00,0x8c,0xbf] +# CHECK: s_cbranch_scc1 .LBB{{[0-9_]+}} ; encoding: [A,A,0x85,0xbf] +# CHECK: s_barrier ; encoding: [0x00,0x00,0x8a,0xbf] +# CHECK: .p2align2 +# CHECK: s_endpgm; encoding: [0x00,0x00,0x81,0xbf] +# CHECK: ; codeLenInByte = 16 + +--- +name:align4 +tracksRegLiveness: true +body: | + bb.0: +$scc = IMPLICIT_DEF +S_CBRANCH_SCC1 %bb.2, implicit $scc + + bb.1: +S_BARRIER + + bb.2 (align 4): +S_ENDPGM 0 +... + +# CHECK: align8: ; @align8 +# CHECK: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; encoding: [0x00,0x00,0x8c,0xbf] +# CHECK: s_cbranch_scc1 .LBB{{[0-9_]+}} ; encoding: [A,A,0x85,0xbf] +# CHECK: s_barrier ; encoding: [0x00,0x00,0x8a,0xbf] +# CHECK: .p2align3 +# CHECK: s_endpgm; encoding: [0x00,0x00,0x81,0xbf] +# CHECK: ; codeLenInByte = 20 +--- +name:align8 +tracksRegLiveness: true +body: | + bb.0: +$scc = IMPLICIT_DEF +S_CBRANCH_SCC1 %bb.2, implicit $scc + + bb.1: +S_BARRIER + + bb.2 (align 8): +S_ENDPGM 0 +... + +# CHECK: align16:; @align16 +# CHECK: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; encoding: [0x00,0x00,0x8c,0xbf] +# CHECK: s_cbranch_scc1 .LBB{{[0-9_]+}} ; encoding: [A,A,0x85,0xbf] +# CHECK: s_barrier ; encoding: [0x00,0x00,0x8a,0xbf] +# CHECK: .p2align4 +# CHECK: s_endpgm; encoding: [0x00,0x00,0x81,0xbf] +# CHECK: ; codeLenInByte = 20 +--- +name:align16 +tracksRegLiveness: true +body: | + bb.0: +$scc = IMPLICIT_DEF +S_CBRANCH_SCC1 %bb.2, implicit $scc + + bb.1: +S_BARRIER + + bb.2 (align 16): +S_ENDPGM 0 +... + +# CHECK: align32:; @align32 +# CHECK: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; encoding: [0x00,0x00,0x8c,0xbf] +# CHECK: s_cbranch_scc1 .LBB{{[0-9_]+}} ; encoding: [A,A,0x85,0xbf] +# CHECK: s_barrier ; encoding: [0x00,0x00,0x8a,0xbf] +# CHECK: .p2align5 +# CHECK: s_endpgm; encoding: [0x00,0x00,0x81,0xbf] +# CHECK: ; codeLenInByte = 36 +--- +name:align32 +tracksRegLiveness: true +body: | + bb.0: +$scc = IMPLICIT_DEF +S_CBRANCH_SCC1 %bb.2, implicit $scc + + bb.1: +S_BARRIER + + bb.2 (align 32): +S_ENDPGM 0 +... ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Respect MBB alignment in the getFunctionCodeSize() (PR #127142)
rampitec wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/127142?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#127142** https://app.graphite.dev/github/pr/llvm/llvm-project/127142?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/127142?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#127129** https://app.graphite.dev/github/pr/llvm/llvm-project/127129?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#127111** https://app.graphite.dev/github/pr/llvm/llvm-project/127111?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/>: 1 other dependent PR ([#126981](https://github.com/llvm/llvm-project/pull/126981) https://app.graphite.dev/github/pr/llvm/llvm-project/126981?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/>) * `main` This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn more about https://stacking.dev/?utm_source=stack-comment";>stacking. https://github.com/llvm/llvm-project/pull/127142 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Set inst_pref_size to maximum (PR #126981)
@@ -199,3 +201,28 @@ const MCExpr *SIProgramInfo::getPGMRSrc2(CallingConv::ID CC, return MCConstantExpr::create(0, Ctx); } + +uint64_t SIProgramInfo::getFunctionCodeSize(const MachineFunction &MF) { + if (!CodeSizeInBytes.has_value()) { +const GCNSubtarget &STM = MF.getSubtarget(); +const SIInstrInfo *TII = STM.getInstrInfo(); + +uint64_t CodeSize = 0; + +for (const MachineBasicBlock &MBB : MF) { + for (const MachineInstr &MI : MBB) { rampitec wrote: https://github.com/llvm/llvm-project/pull/127142 https://github.com/llvm/llvm-project/pull/126981 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Respect MBB alignment in the getFunctionCodeSize() (PR #127142)
https://github.com/rampitec ready_for_review https://github.com/llvm/llvm-project/pull/127142 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits