[llvm-branch-commits] [llvm] AMDGPU: Remove global/flat atomic fadd intrinics (PR #97051)
@@ -322,4 +322,36 @@ define <2 x i16> @upgrade_amdgcn_global_atomic_fadd_v2bf16_p1(ptr addrspace(1) % ret <2 x i16> %result } +declare <2 x half> @llvm.amdgcn.flat.atomic.fadd.v2f16.p0.v2f16(ptr nocapture, <2 x half>) #0 Pierre-vh wrote: nit: could we auto-generate this test? Maybe as a future patch or just precommit it directly. https://github.com/llvm/llvm-project/pull/97051 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Remove global/flat atomic fadd intrinics (PR #97051)
@@ -75,6 +75,11 @@ Changes to the AArch64 Backend Changes to the AMDGPU Backend - +* Removed ``llvm.amdgcn.flat.atomic.fadd`` and + ``llvm.amdgcn.global.atomic.fadd`` intrinsics. Users should use the + :ref:`atomicrmw ` instruction with `fadd` and Pierre-vh wrote: Does `i_atomicrmw` work here? Did you try building the docs? https://github.com/llvm/llvm-project/pull/97051 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Remove global/flat atomic fadd intrinics (PR #97051)
@@ -1017,29 +1015,6 @@ main_body: ret void } -define amdgpu_kernel void @global_atomic_fadd_f64_noret(ptr addrspace(1) %ptr, double %data) { Pierre-vh wrote: Why are some tests deleted, and some others changed to use atomicrmw? https://github.com/llvm/llvm-project/pull/97051 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Remove global/flat atomic fadd intrinics (PR #97051)
https://github.com/Pierre-vh approved this pull request. https://github.com/llvm/llvm-project/pull/97051 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] clang/AMDGPU: Emit atomicrmw for __builtin_amdgcn_global_atomic_fadd_{f32|f64} (PR #96872)
@@ -19273,9 +19269,14 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, ProcessOrderScopeAMDGCN(EmitScalarExpr(E->getArg(2)), EmitScalarExpr(E->getArg(3)), AO, SSID); } else { - // The ds_atomic_fadd_* builtins do not have syncscope/order arguments. - SSID = llvm::SyncScope::System; - AO = AtomicOrdering::SequentiallyConsistent; + // Most of the builtins do not have syncscope/order arguments. For DS + // atomics the scope doesn't really matter, as they implicitly operate at + // workgroup scope. + // + // The global/flat cases need to use agent scope to consistently produce + // the native instruction instead of a cmpxchg expansion. + SSID = getLLVMContext().getOrInsertSyncScopeID("agent"); Pierre-vh wrote: What happens with system (the default) ? I'm not sure I like using `agent` just to force the right expansion when there is no memory model motivation behind it. Do we have a precedent for this kind of thing? Could codegen be fixed so you can just use `system`? https://github.com/llvm/llvm-project/pull/96872 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU/NewPM: Port AMDGPULateCodeGenPrepare to new pass manager (PR #102806)
https://github.com/Pierre-vh approved this pull request. Add [NFC] tag? https://github.com/llvm/llvm-project/pull/102806 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU/NewPM: Fill out addPreISelPasses (PR #102814)
@@ -28,8 +36,51 @@ AMDGPUCodeGenPassBuilder::AMDGPUCodeGenPassBuilder( } void AMDGPUCodeGenPassBuilder::addPreISel(AddIRPass &addPass) const { - // TODO: Add passes pre instruction selection. - // Test only, convert to real IR passes in future. + const bool LateCFGStructurize = AMDGPUTargetMachine::EnableLateStructurizeCFG; Pierre-vh wrote: Does this function run yet, or is this just preparatory work/NFC? https://github.com/llvm/llvm-project/pull/102814 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU/NewPM: Fill out addPreISelPasses (PR #102814)
@@ -28,8 +36,51 @@ AMDGPUCodeGenPassBuilder::AMDGPUCodeGenPassBuilder( } void AMDGPUCodeGenPassBuilder::addPreISel(AddIRPass &addPass) const { - // TODO: Add passes pre instruction selection. - // Test only, convert to real IR passes in future. + const bool LateCFGStructurize = AMDGPUTargetMachine::EnableLateStructurizeCFG; + const bool DisableStructurizer = AMDGPUTargetMachine::DisableStructurizer; + const bool EnableStructurizerWorkarounds = + AMDGPUTargetMachine::EnableStructurizerWorkarounds; + + if (TM.getOptLevel() > CodeGenOptLevel::None) Pierre-vh wrote: tiny nit: put the opt level in a variable to avoid repeating the call? https://github.com/llvm/llvm-project/pull/102814 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] PR for llvm/llvm-project#80694 (PR #80695)
https://github.com/Pierre-vh approved this pull request. https://github.com/llvm/llvm-project/pull/80695 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/18.x: [TableGen] Fix ReplaceRegAction RTTI Kind (PR #89790)
Pierre-vh wrote: We don't use RTTI of that class before #89736 so unless that's also being backported for some reason it's not needed. https://github.com/llvm/llvm-project/pull/89790 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] GFX12 VMEM instructions can write VGPR results out of order (PR #105549)
@@ -4371,8 +4375,10 @@ define amdgpu_kernel void @global_sextload_v64i16_to_v64i32(ptr addrspace(1) %ou ; GCN-NOHSA-SI-NEXT:buffer_store_dwordx4 v[8:11], off, s[0:3], 0 offset:48 ; GCN-NOHSA-SI-NEXT:buffer_store_dwordx4 v[4:7], off, s[0:3], 0 ; GCN-NOHSA-SI-NEXT:buffer_load_dword v0, off, s[12:15], 0 ; 4-byte Folded Reload +; GCN-NOHSA-SI-NEXT:s_waitcnt vmcnt(0) Pierre-vh wrote: Why does this non-gfx12 test change? https://github.com/llvm/llvm-project/pull/105549 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] GFX12 VMEM instructions can write VGPR results out of order (PR #105549)
@@ -754,13 +754,21 @@ define amdgpu_kernel void @constant_load_v16i16_align2(ptr addrspace(4) %ptr0) # ; GFX12-NEXT:global_load_u16 v6, v8, s[0:1] offset:8 ; GFX12-NEXT:global_load_u16 v5, v8, s[0:1] offset:4 ; GFX12-NEXT:global_load_u16 v4, v8, s[0:1] +; GFX12-NEXT:s_wait_loadcnt 0x7 Pierre-vh wrote: I'm not sure i understand exactly what's happening here. Why do we need the extra `s_wait_loadcnt`? What happens when two `global_load_d16_hi_b16` execute back-to-back? https://github.com/llvm/llvm-project/pull/105549 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] GFX12 VMEM instructions can write VGPR results out of order (PR #105549)
@@ -953,6 +953,12 @@ def FeatureRequiredExportPriority : SubtargetFeature<"required-export-priority", "Export priority must be explicitly manipulated on GFX11.5" >; +def FeatureVmemWriteVgprInOrder : SubtargetFeature<"vmem-write-vgpr-in-order", Pierre-vh wrote: Wouldn't it be easier to have a "VmemWriteVgprOutOfOrder" feature and just apply it to GFX12? https://github.com/llvm/llvm-project/pull/105549 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] GFX12 VMEM loads can write VGPR results out of order (PR #105549)
@@ -953,6 +953,12 @@ def FeatureRequiredExportPriority : SubtargetFeature<"required-export-priority", "Export priority must be explicitly manipulated on GFX11.5" >; +def FeatureVmemWriteVgprInOrder : SubtargetFeature<"vmem-write-vgpr-in-order", Pierre-vh wrote: Right, I didn't see things that way. I agree conservative is better https://github.com/llvm/llvm-project/pull/105549 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] GFX12 VMEM loads can write VGPR results out of order (PR #105549)
https://github.com/Pierre-vh approved this pull request. https://github.com/llvm/llvm-project/pull/105549 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] edaf6a0 - [AMDGPU][GISel] Combine G_INSERT_VECTOR_ELT to G_SHUFFLE_VECTOR
Author: Pierre van Houtryve Date: 2022-10-19T10:16:08Z New Revision: edaf6a07a4aafd963ea958703890d03ab58ff2dd URL: https://github.com/llvm/llvm-project/commit/edaf6a07a4aafd963ea958703890d03ab58ff2dd DIFF: https://github.com/llvm/llvm-project/commit/edaf6a07a4aafd963ea958703890d03ab58ff2dd.diff LOG: [AMDGPU][GISel] Combine G_INSERT_VECTOR_ELT to G_SHUFFLE_VECTOR Depends on D134967 Differential Revision: https://reviews.llvm.org/D135145 Added: llvm/test/CodeGen/AMDGPU/GlobalISel/prelegalizer-combiner-insertvecelt-to-shufflevector.mir Modified: llvm/lib/Target/AMDGPU/AMDGPUCombine.td llvm/lib/Target/AMDGPU/AMDGPUPreLegalizerCombiner.cpp Removed: diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td index 2415fdfecaae2..8b2ff164d3365 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td +++ b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td @@ -45,6 +45,12 @@ def cvt_f32_ubyteN : GICombineRule< [{ return PostLegalizerHelper.matchCvtF32UByteN(*${cvt_f32_ubyteN}, ${matchinfo}); }]), (apply [{ PostLegalizerHelper.applyCvtF32UByteN(*${cvt_f32_ubyteN}, ${matchinfo}); }])>; +def insert_vec_elt_to_shuffle : GICombineRule< + (defs root:$insertelt, unsigned_matchinfo:$matchinfo), + (match (wip_match_opcode G_INSERT_VECTOR_ELT):$insertelt, + [{ return PreLegalizerHelper.matchInsertVectorEltToShuffle(*${insertelt}, ${matchinfo}); }]), + (apply [{ PreLegalizerHelper.applyInsertVectorEltToShuffle(*${insertelt}, ${matchinfo}); }])>; + def clamp_i64_to_i16_matchdata : GIDefMatchData<"AMDGPUPreLegalizerCombinerHelper::ClampI64ToI16MatchInfo">; def clamp_i64_to_i16 : GICombineRule< @@ -109,7 +115,7 @@ def gfx6gfx7_combines : GICombineGroup<[fcmp_select_to_fmin_fmax_legacy]>; def AMDGPUPreLegalizerCombinerHelper: GICombinerHelper< "AMDGPUGenPreLegalizerCombinerHelper", - [all_combines, clamp_i64_to_i16, foldable_fneg]> { + [all_combines, clamp_i64_to_i16, foldable_fneg, insert_vec_elt_to_shuffle]> { let DisableRuleOption = "amdgpuprelegalizercombiner-disable-rule"; let StateClass = "AMDGPUPreLegalizerCombinerHelperState"; let AdditionalArguments = []; diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPreLegalizerCombiner.cpp b/llvm/lib/Target/AMDGPU/AMDGPUPreLegalizerCombiner.cpp index 6d6c69adaa658..08eefc6da4d31 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUPreLegalizerCombiner.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUPreLegalizerCombiner.cpp @@ -55,6 +55,9 @@ class AMDGPUPreLegalizerCombinerHelper { void applyClampI64ToI16(MachineInstr &MI, const ClampI64ToI16MatchInfo &MatchInfo); + + bool matchInsertVectorEltToShuffle(MachineInstr &MI, unsigned &Idx); + void applyInsertVectorEltToShuffle(MachineInstr &MI, unsigned &Idx); }; bool AMDGPUPreLegalizerCombinerHelper::matchClampI64ToI16( @@ -154,6 +157,73 @@ void AMDGPUPreLegalizerCombinerHelper::applyClampI64ToI16( MI.eraseFromParent(); } +bool AMDGPUPreLegalizerCombinerHelper::matchInsertVectorEltToShuffle( +MachineInstr &MI, unsigned &Idx) { + // Transfroms a G_INSERT_VECTOR_ELT into an equivalent G_SHUFFLE_MASK if: + //- Scalar Pack insts are present (for <32 bits element types) + //- The vector has <= 4 elements. + // as this is a preferred canonical form of the operation. + // + // Note that both restrictions are arbitrary. Currently, it's mostly targeted + // towards 2x16 vectors. Restrictions could be relaxed or entirely removed in + // the future if codegen can handle it without causing regressions. + + LLT VecTy = MRI.getType(MI.getOperand(0).getReg()); + const unsigned EltSize = VecTy.getElementType().getSizeInBits(); + if (EltSize < 32 && + !MI.getMF()->getSubtarget().hasScalarPackInsts()) +return false; + + if (VecTy.isScalable() || VecTy.getNumElements() > 4) +return false; + + Optional MaybeIdxVal = + getIConstantVRegValWithLookThrough(MI.getOperand(3).getReg(), MRI); + if (!MaybeIdxVal) +return false; + + Idx = MaybeIdxVal->Value.getZExtValue(); + return true; +} + +void AMDGPUPreLegalizerCombinerHelper::applyInsertVectorEltToShuffle( +MachineInstr &MI, unsigned &Idx) { + B.setInstrAndDebugLoc(MI); + + Register Ins = MI.getOperand(2).getReg(); + Register Vec = MI.getOperand(1).getReg(); + Register Dst = MI.getOperand(0).getReg(); + + LLT VecTy = MRI.getType(Dst); + LLT EltTy = VecTy.getElementType(); + const unsigned NumElts = VecTy.getNumElements(); + + const auto Undef = MRI.createGenericVirtualRegister(EltTy); + B.buildUndef(Undef); + + const auto OtherVec = MRI.createGenericVirtualRegister(VecTy); + + SmallVector Srcs; + Srcs.push_back(Ins); + for (unsigned K = 1; K < NumElts; ++K) +Srcs.push_back(Undef); + + B.buildBuildVector(OtherVec, Srcs); + + // NumElts == Ins in OtherVec + // 0...(NumElts-1) = Original elements + SmallVector ShuffleMask; + for (unsig
[llvm-branch-commits] [llvm] 007ef6f - [AMDGPU][GISel] Constrain selected operands in selectG_BUILD_VECTOR
Author: Pierre van Houtryve Date: 2022-10-19T10:16:08Z New Revision: 007ef6fa4d89f7e60a82af8c7cc004a6204fd72b URL: https://github.com/llvm/llvm-project/commit/007ef6fa4d89f7e60a82af8c7cc004a6204fd72b DIFF: https://github.com/llvm/llvm-project/commit/007ef6fa4d89f7e60a82af8c7cc004a6204fd72b.diff LOG: [AMDGPU][GISel] Constrain selected operands in selectG_BUILD_VECTOR Small bugfix. Currently harmless but a case in D134354 triggers it. Differential Revision: https://reviews.llvm.org/D136235 Added: Modified: llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp Removed: diff --git a/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp b/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp index 7f41e8593692..0a6896693510 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp @@ -686,13 +686,19 @@ bool AMDGPUInstructionSelector::selectG_BUILD_VECTOR(MachineInstr &MI) const { // TODO: Can be improved? if (IsVector) { Register TmpReg = MRI->createVirtualRegister(&AMDGPU::VGPR_32RegClass); -BuildMI(*BB, MI, DL, TII.get(AMDGPU::V_AND_B32_e32), TmpReg) -.addImm(0x) -.addReg(Src0); -BuildMI(*BB, MI, DL, TII.get(AMDGPU::V_LSHL_OR_B32_e64), Dst) -.addReg(Src1) -.addImm(16) -.addReg(TmpReg); +auto MIB = BuildMI(*BB, MI, DL, TII.get(AMDGPU::V_AND_B32_e32), TmpReg) + .addImm(0x) + .addReg(Src0); +if (!constrainSelectedInstRegOperands(*MIB, TII, TRI, RBI)) + return false; + +MIB = BuildMI(*BB, MI, DL, TII.get(AMDGPU::V_LSHL_OR_B32_e64), Dst) + .addReg(Src1) + .addImm(16) + .addReg(TmpReg); +if (!constrainSelectedInstRegOperands(*MIB, TII, TRI, RBI)) + return false; + MI.eraseFromParent(); return true; } ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] e07c05b - [AMDGPU] Clear bodies of function with incompatible features
Author: Pierre van Houtryve Date: 2022-11-30T06:14:35-05:00 New Revision: e07c05bc91ae1dfb625b7b0d93a83e5c6039fcb2 URL: https://github.com/llvm/llvm-project/commit/e07c05bc91ae1dfb625b7b0d93a83e5c6039fcb2 DIFF: https://github.com/llvm/llvm-project/commit/e07c05bc91ae1dfb625b7b0d93a83e5c6039fcb2.diff LOG: [AMDGPU] Clear bodies of function with incompatible features Adds a new passs that replaces the body of a function with trap+unreachable if it uses features that are not supported on the current GPU. This change is aimed at preventing crashes when building code at O0 that uses idioms such as `if (ISA_VERSION >= N) intrinsic_a(); else intrinsic_b();` where ISA_VERSION is not constexpr, and intrinsic_a is not selectable on older targets. This is a pattern that's used all over the ROCm device libs. The main motive behind this change is to allow code using ROCm device libs to be built at O0. Note: the feature checking logic is done ad-hoc in the pass. There is no other pass that needs (or will need in the foreseeable future) to do similar feature-checking logic so I did not see a need to generalize the feature checking logic yet. It can (and should probably) be generalized later and moved to a TargetInfo-like class or helper file. Added: llvm/lib/Target/AMDGPU/AMDGPUClearIncompatibleFunctions.cpp llvm/test/CodeGen/AMDGPU/clear-incompatible-functions.ll Modified: llvm/lib/Target/AMDGPU/AMDGPU.h llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp llvm/lib/Target/AMDGPU/CMakeLists.txt llvm/test/CodeGen/AMDGPU/GlobalISel/dummy-target.ll llvm/test/CodeGen/AMDGPU/llc-pipeline.ll Removed: diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.h b/llvm/lib/Target/AMDGPU/AMDGPU.h index 355aa0ba465b4..6a9ac1d165724 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPU.h +++ b/llvm/lib/Target/AMDGPU/AMDGPU.h @@ -47,6 +47,7 @@ FunctionPass *createSIFormMemoryClausesPass(); FunctionPass *createSIPostRABundlerPass(); FunctionPass *createAMDGPUSimplifyLibCallsPass(const TargetMachine *); FunctionPass *createAMDGPUUseNativeCallsPass(); +FunctionPass *createAMDGPUClearIncompatibleFunctionsPass(const TargetMachine *); FunctionPass *createAMDGPUCodeGenPreparePass(); FunctionPass *createAMDGPULateCodeGenPreparePass(); FunctionPass *createAMDGPUMachineCFGStructurizerPass(); @@ -287,6 +288,9 @@ extern char &AMDGPUAnnotateUniformValuesPassID; void initializeAMDGPUCodeGenPreparePass(PassRegistry&); extern char &AMDGPUCodeGenPrepareID; +void initializeAMDGPUClearIncompatibleFunctionsPass(PassRegistry &); +extern char &AMDGPUClearIncompatibleFunctionsID; + void initializeAMDGPULateCodeGenPreparePass(PassRegistry &); extern char &AMDGPULateCodeGenPrepareID; diff --git a/llvm/lib/Target/AMDGPU/AMDGPUClearIncompatibleFunctions.cpp b/llvm/lib/Target/AMDGPU/AMDGPUClearIncompatibleFunctions.cpp new file mode 100644 index 0..e0ea3aac5b7f5 --- /dev/null +++ b/llvm/lib/Target/AMDGPU/AMDGPUClearIncompatibleFunctions.cpp @@ -0,0 +1,120 @@ +//===-- AMDGPUClearIncompatibleFunctions.cpp --===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// +/// \file +/// This pass replaces the bodies of functions that have attributes incompatible +/// with the current target with trap/unreachable. +// +//===--===// + +#include "AMDGPU.h" +#include "GCNSubtarget.h" +#include "llvm/IR/Function.h" +#include "llvm/IR/Module.h" +#include "llvm/IR/IRBuilder.h" +#include "llvm/Target/TargetMachine.h" +#include "llvm/IR/DiagnosticInfo.h" +#include "llvm/Pass.h" + +#define DEBUG_TYPE "amdgpu-clear-incompatible-functions" + +using namespace llvm; + +namespace llvm { +extern const SubtargetFeatureKV AMDGPUFeatureKV[AMDGPU::NumSubtargetFeatures-1]; +} + +namespace { + +using Generation = AMDGPUSubtarget::Generation; + +class AMDGPUClearIncompatibleFunctions : public FunctionPass { +public: + static char ID; + + AMDGPUClearIncompatibleFunctions(const TargetMachine *TM = nullptr) : FunctionPass(ID), TM(TM) { +assert(TM && "No TargetMachine!"); + } + + StringRef getPassName() const override { +return "AMDGPU Clear Incompatible Functions Bodies"; + } + + void getAnalysisUsage(AnalysisUsage &AU) const override { +// If changes are made, no analyses are preserved. + } + + bool runOnFunction(Function &F) override; + +private: + const TargetMachine *TM = nullptr; +}; + +// List of features alongside the minimum GPU generation needed to support them. +constexpr std::array, 6> FeatureAndMinGen = {{ + { AMDGPU::FeatureGFX11Insts, Generation::GFX11 }, + { AMDGPU::FeatureGFX10Insts, Genera
[llvm-branch-commits] [llvm] AMDGPU: Custom expand flat cmpxchg which may access private (PR #109410)
@@ -43,7 +43,7 @@ define i64 @test_flat_atomicrmw_sub_0_i64_agent(ptr %ptr) { ; ALL: [[ATOMICRMW_PRIVATE]]: ; ALL-NEXT:[[TMP1:%.*]] = addrspacecast ptr [[PTR]] to ptr addrspace(5) ; ALL-NEXT:[[LOADED_PRIVATE:%.*]] = load i64, ptr addrspace(5) [[TMP1]], align 8 -; ALL-NEXT:[[NEW:%.*]] = sub i64 [[LOADED_PRIVATE]], 0 +; ALL-NEXT:[[NEW:%.*]] = add i64 [[LOADED_PRIVATE]], 0 Pierre-vh wrote: Why does this transform happen more often now? https://github.com/llvm/llvm-project/pull/109410 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Custom expand flat cmpxchg which may access private (PR #109410)
https://github.com/Pierre-vh approved this pull request. https://github.com/llvm/llvm-project/pull/109410 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Precommit si-fold-bitmask.mir (PR #131310)
Pierre-vh wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/131310?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#131312** https://app.graphite.dev/github/pr/llvm/llvm-project/131312?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#131311** https://app.graphite.dev/github/pr/llvm/llvm-project/131311?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#131310** https://app.graphite.dev/github/pr/llvm/llvm-project/131310?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/131310?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#131309** https://app.graphite.dev/github/pr/llvm/llvm-project/131309?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#131308** https://app.graphite.dev/github/pr/llvm/llvm-project/131308?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#131307** https://app.graphite.dev/github/pr/llvm/llvm-project/131307?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#131306** https://app.graphite.dev/github/pr/llvm/llvm-project/131306?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#131305** https://app.graphite.dev/github/pr/llvm/llvm-project/131305?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn more about https://stacking.dev/?utm_source=stack-comment";>stacking. https://github.com/llvm/llvm-project/pull/131310 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Precommit si-fold-bitmask.mir (PR #131310)
Pierre-vh wrote: > We can fold the clamp of the shift amount into the shift instruction during > selection as we know the instruction ignores the high bits. We do that in the > DAG path already. I think it special cases the and & (bitwidth - 1) pattern, > which should form canonically. In principle it could do a general simplify > demand bits Where and how should that be implemented ? I struggled with that. I tried adding a new special case in TableGen but I just couldn't find the right way to do it. Do I just add it in C++ InstructionSelector before it checks the patterns? Or should it be some kind of post-processing step after the shift has been selected, but before the G_ZEXT is selected? https://github.com/llvm/llvm-project/pull/131310 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU][GlobalISel] Allow forming s16 U/SBFX pre-regbankselect (PR #131309)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/131309 >From 16cbcc2c44bfe74ba54f00c5be634c54ff43a5cf Mon Sep 17 00:00:00 2001 From: pvanhout Date: Wed, 12 Mar 2025 09:43:15 +0100 Subject: [PATCH] [AMDGPU][GlobalISel] Allow forming s16 U/SBFX pre-regbankselect Make s16 G_U/SBFX legal and widen them in RegBankSelect. This allows the set of BFX formation combines to work on s16 types. --- .../lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | 9 +- .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp | 33 +- llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll | 645 -- llvm/test/CodeGen/AMDGPU/GlobalISel/fshr.ll | 380 --- .../AMDGPU/GlobalISel/legalize-sbfx.mir | 26 +- .../AMDGPU/GlobalISel/legalize-ubfx.mir | 27 +- llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll | 27 +- 7 files changed, 503 insertions(+), 644 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp index d6675f225cdfc..cc014fbd32466 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp @@ -2068,10 +2068,13 @@ AMDGPULegalizerInfo::AMDGPULegalizerInfo(const GCNSubtarget &ST_, .minScalar(0, S32) .lower(); + // Only {S32, S32} or {S32, S64} should ever reach codegen. + // We allow S/UBFX for S16 so the combiner can form them before + // RegBankSelect, and RegBankSelect will then legalize them correctly. getActionDefinitionsBuilder({G_SBFX, G_UBFX}) - .legalFor({{S32, S32}, {S64, S32}}) - .clampScalar(1, S32, S32) - .clampScalar(0, S32, S64) + .legalFor({{S16, S16}, {S32, S32}, {S64, S32}}) + .clampScalar(1, S16, S32) + .clampScalar(0, S16, S64) .widenScalarToNextPow2(0) .scalarize(0); diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp index b46fc7d9c752a..1c9d67826186f 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp @@ -1485,7 +1485,9 @@ bool AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B, Register DstReg = MI.getOperand(0).getReg(); LLT Ty = MRI.getType(DstReg); + const LLT S64 = LLT::scalar(64); const LLT S32 = LLT::scalar(32); + const LLT S16 = LLT::scalar(16); unsigned FirstOpnd = isa(MI) ? 2 : 1; Register SrcReg = MI.getOperand(FirstOpnd).getReg(); @@ -1495,6 +1497,18 @@ bool AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B, const RegisterBank *DstBank = OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank; if (DstBank == &AMDGPU::VGPRRegBank) { +if (Ty == S16) { + ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::VGPRRegBank); + B.setInsertPt(B.getMBB(), MI); + LegalizerHelper Helper(B.getMF(), ApplyBank, B); + + Helper.widenScalarDst(MI, S32); + Helper.widenScalarSrc(MI, S32, 1, AMDGPU::G_ANYEXT); + Helper.widenScalarSrc(MI, S32, 2, AMDGPU::G_ZEXT); + Helper.widenScalarSrc(MI, S32, 3, AMDGPU::G_ZEXT); + return true; +} + if (Ty == S32) return true; @@ -1554,6 +1568,11 @@ bool AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B, ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::SGPRRegBank); + if (Ty == S16) { +OffsetReg = B.buildAnyExtOrTrunc(S32, OffsetReg).getReg(0); +WidthReg = B.buildAnyExtOrTrunc(S32, WidthReg).getReg(0); + } + // Ensure the high bits are clear to insert the offset. auto OffsetMask = B.buildConstant(S32, maskTrailingOnes(6)); auto ClampOffset = B.buildAnd(S32, OffsetReg, OffsetMask); @@ -1568,13 +1587,21 @@ bool AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B, // TODO: It might be worth using a pseudo here to avoid scc clobber and // register class constraints. - unsigned Opc = Ty == S32 ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32) : - (Signed ? AMDGPU::S_BFE_I64 : AMDGPU::S_BFE_U64); + unsigned Opc = (Ty != S64) ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32) + : (Signed ? AMDGPU::S_BFE_I64 : AMDGPU::S_BFE_U64); - auto MIB = B.buildInstr(Opc, {DstReg}, {SrcReg, MergedInputs}); + Register BFEDst = DstReg; + if (Ty == S16) { +BFEDst = MRI.createGenericVirtualRegister(S32); +MRI.setRegBank(BFEDst, AMDGPU::SGPRRegBank); + } + auto MIB = B.buildInstr(Opc, {BFEDst}, {SrcReg, MergedInputs}); if (!constrainSelectedInstRegOperands(*MIB, *TII, *TRI, *this)) llvm_unreachable("failed to constrain BFE"); + if (BFEDst != DstReg) +B.buildZExtOrTrunc(DstReg, BFEDst); + MI.eraseFromParent(); return true; } diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll index 07fcb02d98649..d2b600b04f9fc 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/fsh
[llvm-branch-commits] [llvm] [AMDGPU][GlobalISel] Allow forming s16 U/SBFX pre-regbankselect (PR #131309)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/131309 >From 16cbcc2c44bfe74ba54f00c5be634c54ff43a5cf Mon Sep 17 00:00:00 2001 From: pvanhout Date: Wed, 12 Mar 2025 09:43:15 +0100 Subject: [PATCH] [AMDGPU][GlobalISel] Allow forming s16 U/SBFX pre-regbankselect Make s16 G_U/SBFX legal and widen them in RegBankSelect. This allows the set of BFX formation combines to work on s16 types. --- .../lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | 9 +- .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp | 33 +- llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll | 645 -- llvm/test/CodeGen/AMDGPU/GlobalISel/fshr.ll | 380 --- .../AMDGPU/GlobalISel/legalize-sbfx.mir | 26 +- .../AMDGPU/GlobalISel/legalize-ubfx.mir | 27 +- llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll | 27 +- 7 files changed, 503 insertions(+), 644 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp index d6675f225cdfc..cc014fbd32466 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp @@ -2068,10 +2068,13 @@ AMDGPULegalizerInfo::AMDGPULegalizerInfo(const GCNSubtarget &ST_, .minScalar(0, S32) .lower(); + // Only {S32, S32} or {S32, S64} should ever reach codegen. + // We allow S/UBFX for S16 so the combiner can form them before + // RegBankSelect, and RegBankSelect will then legalize them correctly. getActionDefinitionsBuilder({G_SBFX, G_UBFX}) - .legalFor({{S32, S32}, {S64, S32}}) - .clampScalar(1, S32, S32) - .clampScalar(0, S32, S64) + .legalFor({{S16, S16}, {S32, S32}, {S64, S32}}) + .clampScalar(1, S16, S32) + .clampScalar(0, S16, S64) .widenScalarToNextPow2(0) .scalarize(0); diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp index b46fc7d9c752a..1c9d67826186f 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp @@ -1485,7 +1485,9 @@ bool AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B, Register DstReg = MI.getOperand(0).getReg(); LLT Ty = MRI.getType(DstReg); + const LLT S64 = LLT::scalar(64); const LLT S32 = LLT::scalar(32); + const LLT S16 = LLT::scalar(16); unsigned FirstOpnd = isa(MI) ? 2 : 1; Register SrcReg = MI.getOperand(FirstOpnd).getReg(); @@ -1495,6 +1497,18 @@ bool AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B, const RegisterBank *DstBank = OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank; if (DstBank == &AMDGPU::VGPRRegBank) { +if (Ty == S16) { + ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::VGPRRegBank); + B.setInsertPt(B.getMBB(), MI); + LegalizerHelper Helper(B.getMF(), ApplyBank, B); + + Helper.widenScalarDst(MI, S32); + Helper.widenScalarSrc(MI, S32, 1, AMDGPU::G_ANYEXT); + Helper.widenScalarSrc(MI, S32, 2, AMDGPU::G_ZEXT); + Helper.widenScalarSrc(MI, S32, 3, AMDGPU::G_ZEXT); + return true; +} + if (Ty == S32) return true; @@ -1554,6 +1568,11 @@ bool AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B, ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::SGPRRegBank); + if (Ty == S16) { +OffsetReg = B.buildAnyExtOrTrunc(S32, OffsetReg).getReg(0); +WidthReg = B.buildAnyExtOrTrunc(S32, WidthReg).getReg(0); + } + // Ensure the high bits are clear to insert the offset. auto OffsetMask = B.buildConstant(S32, maskTrailingOnes(6)); auto ClampOffset = B.buildAnd(S32, OffsetReg, OffsetMask); @@ -1568,13 +1587,21 @@ bool AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B, // TODO: It might be worth using a pseudo here to avoid scc clobber and // register class constraints. - unsigned Opc = Ty == S32 ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32) : - (Signed ? AMDGPU::S_BFE_I64 : AMDGPU::S_BFE_U64); + unsigned Opc = (Ty != S64) ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32) + : (Signed ? AMDGPU::S_BFE_I64 : AMDGPU::S_BFE_U64); - auto MIB = B.buildInstr(Opc, {DstReg}, {SrcReg, MergedInputs}); + Register BFEDst = DstReg; + if (Ty == S16) { +BFEDst = MRI.createGenericVirtualRegister(S32); +MRI.setRegBank(BFEDst, AMDGPU::SGPRRegBank); + } + auto MIB = B.buildInstr(Opc, {BFEDst}, {SrcReg, MergedInputs}); if (!constrainSelectedInstRegOperands(*MIB, *TII, *TRI, *this)) llvm_unreachable("failed to constrain BFE"); + if (BFEDst != DstReg) +B.buildZExtOrTrunc(DstReg, BFEDst); + MI.eraseFromParent(); return true; } diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll index 07fcb02d98649..d2b600b04f9fc 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/fsh
[llvm-branch-commits] [llvm] [AMDGPU] Precommit si-fold-bitmask.mir (PR #131310)
Pierre-vh wrote: > > Where and how should that be implemented ? I struggled with that. I tried > > adding a new special case in TableGen but I just couldn't find the right > > way to do it. Do I just add it in C++ InstructionSelector before it checks > > the patterns? Or should it be some kind of post-processing step after the > > shift has been selected, but before the G_ZEXT is selected? > > It already exists as a complex pattern, isUnneededShiftMask. The combiners > should be trying to get the clamping code into this form which expects the and I tried it but the DAG immediately transforms `(and x, 0xFF)` into a zext and it seems pretty stubborn about it as it's a basic transform. I don't mind trying to make it work a bit longer, but I could also just bring this back. What do you think? https://github.com/llvm/llvm-project/pull/131310 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Add sext_trunc in RegBankCombiner (PR #131623)
Pierre-vh wrote: Ah, this doesn't do anything at this stage. It's only helpful once we disable widening of i16 ops to i32 in CGP. Then this pattern can appear and it'll fold it. This combine is tested in AArch64. Should I copy over a few simple test cases in the AMDGPU folder just to show the combine works in RegBankCombiner? https://github.com/llvm/llvm-project/pull/131623 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU][GlobalISel] Allow forming s16 U/SBFX pre-regbankselect (PR #131309)
https://github.com/Pierre-vh created https://github.com/llvm/llvm-project/pull/131309 Make s16 G_U/SBFX legal and widen them in RegBankSelect. This allows the set of BFX formation combines to work on s16 types. >From ee917df6c6e996135d1b08f924b6645649eafa0d Mon Sep 17 00:00:00 2001 From: pvanhout Date: Wed, 12 Mar 2025 09:43:15 +0100 Subject: [PATCH] [AMDGPU][GlobalISel] Allow forming s16 U/SBFX pre-regbankselect Make s16 G_U/SBFX legal and widen them in RegBankSelect. This allows the set of BFX formation combines to work on s16 types. --- .../lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | 9 +- .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp | 33 +- llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll | 645 -- llvm/test/CodeGen/AMDGPU/GlobalISel/fshr.ll | 380 --- .../AMDGPU/GlobalISel/legalize-sbfx.mir | 26 +- .../AMDGPU/GlobalISel/legalize-ubfx.mir | 27 +- llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll | 27 +- 7 files changed, 503 insertions(+), 644 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp index 6e611ebb4b625..23dd20b51e8e7 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp @@ -2068,10 +2068,13 @@ AMDGPULegalizerInfo::AMDGPULegalizerInfo(const GCNSubtarget &ST_, .minScalar(0, S32) .lower(); + // Only {S32, S32} or {S32, S64} should ever reach codegen. + // We allow S/UBFX for S16 so the combiner can form them before + // RegBankSelect, and RegBankSelect will then legalize them correctly. getActionDefinitionsBuilder({G_SBFX, G_UBFX}) - .legalFor({{S32, S32}, {S64, S32}}) - .clampScalar(1, S32, S32) - .clampScalar(0, S32, S64) + .legalFor({{S16, S16}, {S32, S32}, {S64, S32}}) + .clampScalar(1, S16, S32) + .clampScalar(0, S16, S64) .widenScalarToNextPow2(0) .scalarize(0); diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp index 27b86723ce474..ed0d52f6b2441 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp @@ -1485,7 +1485,9 @@ bool AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B, Register DstReg = MI.getOperand(0).getReg(); LLT Ty = MRI.getType(DstReg); + const LLT S64 = LLT::scalar(64); const LLT S32 = LLT::scalar(32); + const LLT S16 = LLT::scalar(16); unsigned FirstOpnd = isa(MI) ? 2 : 1; Register SrcReg = MI.getOperand(FirstOpnd).getReg(); @@ -1495,6 +1497,18 @@ bool AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B, const RegisterBank *DstBank = OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank; if (DstBank == &AMDGPU::VGPRRegBank) { +if (Ty == S16) { + ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::VGPRRegBank); + B.setInsertPt(B.getMBB(), MI); + LegalizerHelper Helper(B.getMF(), ApplyBank, B); + + Helper.widenScalarDst(MI, S32); + Helper.widenScalarSrc(MI, S32, 1, AMDGPU::G_ANYEXT); + Helper.widenScalarSrc(MI, S32, 2, AMDGPU::G_ZEXT); + Helper.widenScalarSrc(MI, S32, 3, AMDGPU::G_ZEXT); + return true; +} + if (Ty == S32) return true; @@ -1554,6 +1568,11 @@ bool AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B, ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::SGPRRegBank); + if (Ty == S16) { +OffsetReg = B.buildAnyExtOrTrunc(S32, OffsetReg).getReg(0); +WidthReg = B.buildAnyExtOrTrunc(S32, WidthReg).getReg(0); + } + // Ensure the high bits are clear to insert the offset. auto OffsetMask = B.buildConstant(S32, maskTrailingOnes(6)); auto ClampOffset = B.buildAnd(S32, OffsetReg, OffsetMask); @@ -1568,13 +1587,21 @@ bool AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B, // TODO: It might be worth using a pseudo here to avoid scc clobber and // register class constraints. - unsigned Opc = Ty == S32 ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32) : - (Signed ? AMDGPU::S_BFE_I64 : AMDGPU::S_BFE_U64); + unsigned Opc = (Ty != S64) ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32) + : (Signed ? AMDGPU::S_BFE_I64 : AMDGPU::S_BFE_U64); - auto MIB = B.buildInstr(Opc, {DstReg}, {SrcReg, MergedInputs}); + Register BFEDst = DstReg; + if (Ty == S16) { +BFEDst = MRI.createGenericVirtualRegister(S32); +MRI.setRegBank(BFEDst, AMDGPU::SGPRRegBank); + } + auto MIB = B.buildInstr(Opc, {BFEDst}, {SrcReg, MergedInputs}); if (!constrainSelectedInstRegOperands(*MIB, *TII, *TRI, *this)) llvm_unreachable("failed to constrain BFE"); + if (BFEDst != DstReg) +B.buildZExtOrTrunc(DstReg, BFEDst); + MI.eraseFromParent(); return true; } diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll index 07fcb
[llvm-branch-commits] [llvm] [AMDGPU][Legalizer] Widen i16 G_SEXT_INREG (PR #131308)
https://github.com/Pierre-vh ready_for_review https://github.com/llvm/llvm-project/pull/131308 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Add sext_trunc in RegBankCombiner (PR #131623)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/131623 >From 4feac2fc42257cac9a1ca0070ec199f93a901b0d Mon Sep 17 00:00:00 2001 From: pvanhout Date: Mon, 17 Mar 2025 13:22:25 +0100 Subject: [PATCH] [AMDGPU] Add sext_trunc in RegBankCombiner --- llvm/lib/Target/AMDGPU/AMDGPUCombine.td | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td index a21505356274b..083ce48911689 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td +++ b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td @@ -181,5 +181,5 @@ def AMDGPURegBankCombiner : GICombiner< zext_trunc_fold, int_minmax_to_med3, ptr_add_immed_chain, fp_minmax_to_clamp, fp_minmax_to_med3, fmed3_intrinsic_to_clamp, identity_combines, redundant_and, constant_fold_cast_op, - cast_of_cast_combines]> { + cast_of_cast_combines, sext_trunc]> { } ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Add sext_trunc in RegBankCombiner (PR #131623)
Pierre-vh wrote: Test changes were in the previous diff in the stack, it should be fixed now. https://github.com/llvm/llvm-project/pull/131623 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU][GlobalISel] Allow forming s16 U/SBFX pre-regbankselect (PR #131309)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/131309 >From 8aa7f8b8f1c73d8fec55a229ea8dff020fc4c906 Mon Sep 17 00:00:00 2001 From: pvanhout Date: Wed, 12 Mar 2025 09:43:15 +0100 Subject: [PATCH] [AMDGPU][GlobalISel] Allow forming s16 U/SBFX pre-regbankselect Make s16 G_U/SBFX legal and widen them in RegBankSelect. This allows the set of BFX formation combines to work on s16 types. --- .../lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | 9 +- .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp | 33 +- llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll | 645 -- llvm/test/CodeGen/AMDGPU/GlobalISel/fshr.ll | 380 --- .../AMDGPU/GlobalISel/legalize-sbfx.mir | 26 +- .../AMDGPU/GlobalISel/legalize-ubfx.mir | 27 +- llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll | 27 +- 7 files changed, 503 insertions(+), 644 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp index d6675f225cdfc..cc014fbd32466 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp @@ -2068,10 +2068,13 @@ AMDGPULegalizerInfo::AMDGPULegalizerInfo(const GCNSubtarget &ST_, .minScalar(0, S32) .lower(); + // Only {S32, S32} or {S32, S64} should ever reach codegen. + // We allow S/UBFX for S16 so the combiner can form them before + // RegBankSelect, and RegBankSelect will then legalize them correctly. getActionDefinitionsBuilder({G_SBFX, G_UBFX}) - .legalFor({{S32, S32}, {S64, S32}}) - .clampScalar(1, S32, S32) - .clampScalar(0, S32, S64) + .legalFor({{S16, S16}, {S32, S32}, {S64, S32}}) + .clampScalar(1, S16, S32) + .clampScalar(0, S16, S64) .widenScalarToNextPow2(0) .scalarize(0); diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp index b46fc7d9c752a..1c9d67826186f 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp @@ -1485,7 +1485,9 @@ bool AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B, Register DstReg = MI.getOperand(0).getReg(); LLT Ty = MRI.getType(DstReg); + const LLT S64 = LLT::scalar(64); const LLT S32 = LLT::scalar(32); + const LLT S16 = LLT::scalar(16); unsigned FirstOpnd = isa(MI) ? 2 : 1; Register SrcReg = MI.getOperand(FirstOpnd).getReg(); @@ -1495,6 +1497,18 @@ bool AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B, const RegisterBank *DstBank = OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank; if (DstBank == &AMDGPU::VGPRRegBank) { +if (Ty == S16) { + ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::VGPRRegBank); + B.setInsertPt(B.getMBB(), MI); + LegalizerHelper Helper(B.getMF(), ApplyBank, B); + + Helper.widenScalarDst(MI, S32); + Helper.widenScalarSrc(MI, S32, 1, AMDGPU::G_ANYEXT); + Helper.widenScalarSrc(MI, S32, 2, AMDGPU::G_ZEXT); + Helper.widenScalarSrc(MI, S32, 3, AMDGPU::G_ZEXT); + return true; +} + if (Ty == S32) return true; @@ -1554,6 +1568,11 @@ bool AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B, ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::SGPRRegBank); + if (Ty == S16) { +OffsetReg = B.buildAnyExtOrTrunc(S32, OffsetReg).getReg(0); +WidthReg = B.buildAnyExtOrTrunc(S32, WidthReg).getReg(0); + } + // Ensure the high bits are clear to insert the offset. auto OffsetMask = B.buildConstant(S32, maskTrailingOnes(6)); auto ClampOffset = B.buildAnd(S32, OffsetReg, OffsetMask); @@ -1568,13 +1587,21 @@ bool AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B, // TODO: It might be worth using a pseudo here to avoid scc clobber and // register class constraints. - unsigned Opc = Ty == S32 ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32) : - (Signed ? AMDGPU::S_BFE_I64 : AMDGPU::S_BFE_U64); + unsigned Opc = (Ty != S64) ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32) + : (Signed ? AMDGPU::S_BFE_I64 : AMDGPU::S_BFE_U64); - auto MIB = B.buildInstr(Opc, {DstReg}, {SrcReg, MergedInputs}); + Register BFEDst = DstReg; + if (Ty == S16) { +BFEDst = MRI.createGenericVirtualRegister(S32); +MRI.setRegBank(BFEDst, AMDGPU::SGPRRegBank); + } + auto MIB = B.buildInstr(Opc, {BFEDst}, {SrcReg, MergedInputs}); if (!constrainSelectedInstRegOperands(*MIB, *TII, *TRI, *this)) llvm_unreachable("failed to constrain BFE"); + if (BFEDst != DstReg) +B.buildZExtOrTrunc(DstReg, BFEDst); + MI.eraseFromParent(); return true; } diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll index 07fcb02d98649..d2b600b04f9fc 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/fsh
[llvm-branch-commits] [llvm] [AMDGPU][GlobalISel] Allow forming s16 U/SBFX pre-regbankselect (PR #131309)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/131309 >From 8aa7f8b8f1c73d8fec55a229ea8dff020fc4c906 Mon Sep 17 00:00:00 2001 From: pvanhout Date: Wed, 12 Mar 2025 09:43:15 +0100 Subject: [PATCH] [AMDGPU][GlobalISel] Allow forming s16 U/SBFX pre-regbankselect Make s16 G_U/SBFX legal and widen them in RegBankSelect. This allows the set of BFX formation combines to work on s16 types. --- .../lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | 9 +- .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp | 33 +- llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll | 645 -- llvm/test/CodeGen/AMDGPU/GlobalISel/fshr.ll | 380 --- .../AMDGPU/GlobalISel/legalize-sbfx.mir | 26 +- .../AMDGPU/GlobalISel/legalize-ubfx.mir | 27 +- llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll | 27 +- 7 files changed, 503 insertions(+), 644 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp index d6675f225cdfc..cc014fbd32466 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp @@ -2068,10 +2068,13 @@ AMDGPULegalizerInfo::AMDGPULegalizerInfo(const GCNSubtarget &ST_, .minScalar(0, S32) .lower(); + // Only {S32, S32} or {S32, S64} should ever reach codegen. + // We allow S/UBFX for S16 so the combiner can form them before + // RegBankSelect, and RegBankSelect will then legalize them correctly. getActionDefinitionsBuilder({G_SBFX, G_UBFX}) - .legalFor({{S32, S32}, {S64, S32}}) - .clampScalar(1, S32, S32) - .clampScalar(0, S32, S64) + .legalFor({{S16, S16}, {S32, S32}, {S64, S32}}) + .clampScalar(1, S16, S32) + .clampScalar(0, S16, S64) .widenScalarToNextPow2(0) .scalarize(0); diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp index b46fc7d9c752a..1c9d67826186f 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp @@ -1485,7 +1485,9 @@ bool AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B, Register DstReg = MI.getOperand(0).getReg(); LLT Ty = MRI.getType(DstReg); + const LLT S64 = LLT::scalar(64); const LLT S32 = LLT::scalar(32); + const LLT S16 = LLT::scalar(16); unsigned FirstOpnd = isa(MI) ? 2 : 1; Register SrcReg = MI.getOperand(FirstOpnd).getReg(); @@ -1495,6 +1497,18 @@ bool AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B, const RegisterBank *DstBank = OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank; if (DstBank == &AMDGPU::VGPRRegBank) { +if (Ty == S16) { + ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::VGPRRegBank); + B.setInsertPt(B.getMBB(), MI); + LegalizerHelper Helper(B.getMF(), ApplyBank, B); + + Helper.widenScalarDst(MI, S32); + Helper.widenScalarSrc(MI, S32, 1, AMDGPU::G_ANYEXT); + Helper.widenScalarSrc(MI, S32, 2, AMDGPU::G_ZEXT); + Helper.widenScalarSrc(MI, S32, 3, AMDGPU::G_ZEXT); + return true; +} + if (Ty == S32) return true; @@ -1554,6 +1568,11 @@ bool AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B, ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::SGPRRegBank); + if (Ty == S16) { +OffsetReg = B.buildAnyExtOrTrunc(S32, OffsetReg).getReg(0); +WidthReg = B.buildAnyExtOrTrunc(S32, WidthReg).getReg(0); + } + // Ensure the high bits are clear to insert the offset. auto OffsetMask = B.buildConstant(S32, maskTrailingOnes(6)); auto ClampOffset = B.buildAnd(S32, OffsetReg, OffsetMask); @@ -1568,13 +1587,21 @@ bool AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B, // TODO: It might be worth using a pseudo here to avoid scc clobber and // register class constraints. - unsigned Opc = Ty == S32 ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32) : - (Signed ? AMDGPU::S_BFE_I64 : AMDGPU::S_BFE_U64); + unsigned Opc = (Ty != S64) ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32) + : (Signed ? AMDGPU::S_BFE_I64 : AMDGPU::S_BFE_U64); - auto MIB = B.buildInstr(Opc, {DstReg}, {SrcReg, MergedInputs}); + Register BFEDst = DstReg; + if (Ty == S16) { +BFEDst = MRI.createGenericVirtualRegister(S32); +MRI.setRegBank(BFEDst, AMDGPU::SGPRRegBank); + } + auto MIB = B.buildInstr(Opc, {BFEDst}, {SrcReg, MergedInputs}); if (!constrainSelectedInstRegOperands(*MIB, *TII, *TRI, *this)) llvm_unreachable("failed to constrain BFE"); + if (BFEDst != DstReg) +B.buildZExtOrTrunc(DstReg, BFEDst); + MI.eraseFromParent(); return true; } diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll index 07fcb02d98649..d2b600b04f9fc 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/fsh
[llvm-branch-commits] [llvm] [AMDGPU] Add sext_trunc in RegBankCombiner (PR #131623)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/131623 >From 4feac2fc42257cac9a1ca0070ec199f93a901b0d Mon Sep 17 00:00:00 2001 From: pvanhout Date: Mon, 17 Mar 2025 13:22:25 +0100 Subject: [PATCH] [AMDGPU] Add sext_trunc in RegBankCombiner --- llvm/lib/Target/AMDGPU/AMDGPUCombine.td | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td index a21505356274b..083ce48911689 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td +++ b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td @@ -181,5 +181,5 @@ def AMDGPURegBankCombiner : GICombiner< zext_trunc_fold, int_minmax_to_med3, ptr_add_immed_chain, fp_minmax_to_clamp, fp_minmax_to_med3, fmed3_intrinsic_to_clamp, identity_combines, redundant_and, constant_fold_cast_op, - cast_of_cast_combines]> { + cast_of_cast_combines, sext_trunc]> { } ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU][Legalizer] Widen i16 G_SEXT_INREG (PR #131308)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/131308 >From cdfba0ea7ab0fcb60d632a25433b18b421022c25 Mon Sep 17 00:00:00 2001 From: pvanhout Date: Wed, 5 Mar 2025 13:41:04 +0100 Subject: [PATCH 1/2] [AMDGPU][Legalizer] Widen i16 G_SEXT_INREG It's better to widen them to avoid it being lowered into a G_ASHR + G_SHL. With this change we just extend to i32 then trunc the result. --- .../lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | 3 +- llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll | 7 +- .../AMDGPU/GlobalISel/legalize-abs.mir| 8 +- .../AMDGPU/GlobalISel/legalize-ashr.mir | 20 +-- .../AMDGPU/GlobalISel/legalize-sext-inreg.mir | 155 +++--- .../AMDGPU/GlobalISel/legalize-sext.mir | 101 ++-- .../AMDGPU/GlobalISel/legalize-smax.mir | 33 +++- .../AMDGPU/GlobalISel/legalize-smin.mir | 33 +++- .../AMDGPU/GlobalISel/legalize-smulh.mir | 132 +++ .../CodeGen/AMDGPU/GlobalISel/llvm.abs.ll | 45 ++--- .../CodeGen/AMDGPU/GlobalISel/sext_inreg.ll | 130 ++- 11 files changed, 299 insertions(+), 368 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp index b3a8183beeacf..6e611ebb4b625 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp @@ -2009,7 +2009,8 @@ AMDGPULegalizerInfo::AMDGPULegalizerInfo(const GCNSubtarget &ST_, // S64 is only legal on SALU, and needs to be broken into 32-bit elements in // RegBankSelect. auto &SextInReg = getActionDefinitionsBuilder(G_SEXT_INREG) -.legalFor({{S32}, {S64}}); +.legalFor({{S32}, {S64}}) +.widenScalarIf(typeIs(0, S16), widenScalarOrEltToNextPow2(0, 32)); if (ST.hasVOP3PInsts()) { SextInReg.lowerFor({{V2S16}}) diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll index 493e8cef63890..f81d7f1c300b8 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll @@ -17,8 +17,7 @@ define i8 @v_ashr_i8(i8 %value, i8 %amount) { ; GFX8-LABEL: v_ashr_i8: ; GFX8: ; %bb.0: ; GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX8-NEXT:v_lshlrev_b16_e32 v0, 8, v0 -; GFX8-NEXT:v_ashrrev_i16_sdwa v0, v1, sext(v0) dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:BYTE_1 +; GFX8-NEXT:v_ashrrev_i16_sdwa v0, v1, sext(v0) dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:BYTE_0 ; GFX8-NEXT:s_setpc_b64 s[30:31] ; ; GFX9-LABEL: v_ashr_i8: @@ -49,8 +48,8 @@ define i8 @v_ashr_i8_7(i8 %value) { ; GFX8-LABEL: v_ashr_i8_7: ; GFX8: ; %bb.0: ; GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX8-NEXT:v_lshlrev_b16_e32 v0, 8, v0 -; GFX8-NEXT:v_ashrrev_i16_e32 v0, 15, v0 +; GFX8-NEXT:v_mov_b32_e32 v1, 7 +; GFX8-NEXT:v_ashrrev_i16_sdwa v0, v1, sext(v0) dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:BYTE_0 ; GFX8-NEXT:s_setpc_b64 s[30:31] ; ; GFX9-LABEL: v_ashr_i8_7: diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-abs.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-abs.mir index a9fe80eb47e76..2b911b2dce697 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-abs.mir +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-abs.mir @@ -144,11 +144,9 @@ body: | ; VI: liveins: $vgpr0 ; VI-NEXT: {{ $}} ; VI-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0 -; VI-NEXT: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY]](s32) -; VI-NEXT: [[C:%[0-9]+]]:_(s16) = G_CONSTANT i16 8 -; VI-NEXT: [[SHL:%[0-9]+]]:_(s16) = G_SHL [[TRUNC]], [[C]](s16) -; VI-NEXT: [[ASHR:%[0-9]+]]:_(s16) = G_ASHR [[SHL]], [[C]](s16) -; VI-NEXT: [[ABS:%[0-9]+]]:_(s16) = G_ABS [[ASHR]] +; VI-NEXT: [[SEXT_INREG:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY]], 8 +; VI-NEXT: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[SEXT_INREG]](s32) +; VI-NEXT: [[ABS:%[0-9]+]]:_(s16) = G_ABS [[TRUNC]] ; VI-NEXT: [[ANYEXT:%[0-9]+]]:_(s32) = G_ANYEXT [[ABS]](s16) ; VI-NEXT: $vgpr0 = COPY [[ANYEXT]](s32) ; diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ashr.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ashr.mir index f4aaab745e03b..53905a2f49dd0 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ashr.mir +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ashr.mir @@ -319,12 +319,10 @@ body: | ; VI-NEXT: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY1]](s32) ; VI-NEXT: [[C:%[0-9]+]]:_(s16) = G_CONSTANT i16 255 ; VI-NEXT: [[AND:%[0-9]+]]:_(s16) = G_AND [[TRUNC]], [[C]] -; VI-NEXT: [[TRUNC1:%[0-9]+]]:_(s16) = G_TRUNC [[COPY]](s32) -; VI-NEXT: [[C1:%[0-9]+]]:_(s16) = G_CONSTANT i16 8 -; VI-NEXT: [[SHL:%[0-9]+]]:_(s16) = G_SHL [[TRUNC1]], [[C1]](s16) -; VI-NEXT: [[ASHR:%[0-9]+]]:_(s16) = G_ASHR [[SHL]], [[C1]](s16) -; VI-NEXT: [[ASHR1:%[0-9]+]]:_(s16) = G_ASHR [[ASHR]], [
[llvm-branch-commits] [llvm] [AMDGPU][GlobalISel] Allow forming s16 U/SBFX pre-regbankselect (PR #131309)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/131309 >From 2dc7126ab1abb6aa49aaf263a0591759130ddca5 Mon Sep 17 00:00:00 2001 From: pvanhout Date: Wed, 12 Mar 2025 09:43:15 +0100 Subject: [PATCH] [AMDGPU][GlobalISel] Allow forming s16 U/SBFX pre-regbankselect Make s16 G_U/SBFX legal and widen them in RegBankSelect. This allows the set of BFX formation combines to work on s16 types. --- .../lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | 9 +- .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp | 33 +- llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll | 645 -- llvm/test/CodeGen/AMDGPU/GlobalISel/fshr.ll | 380 --- .../AMDGPU/GlobalISel/legalize-sbfx.mir | 26 +- .../AMDGPU/GlobalISel/legalize-ubfx.mir | 27 +- llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll | 27 +- 7 files changed, 503 insertions(+), 644 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp index cfb5c3b3006f0..ab900157d2095 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp @@ -2069,10 +2069,13 @@ AMDGPULegalizerInfo::AMDGPULegalizerInfo(const GCNSubtarget &ST_, .minScalar(0, S32) .lower(); + // Only {S32, S32} or {S32, S64} should ever reach codegen. + // We allow S/UBFX for S16 so the combiner can form them before + // RegBankSelect, and RegBankSelect will then legalize them correctly. getActionDefinitionsBuilder({G_SBFX, G_UBFX}) - .legalFor({{S32, S32}, {S64, S32}}) - .clampScalar(1, S32, S32) - .clampScalar(0, S32, S64) + .legalFor({{S16, S16}, {S32, S32}, {S64, S32}}) + .clampScalar(1, S16, S32) + .clampScalar(0, S16, S64) .widenScalarToNextPow2(0) .scalarize(0); diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp index b46fc7d9c752a..1c9d67826186f 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp @@ -1485,7 +1485,9 @@ bool AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B, Register DstReg = MI.getOperand(0).getReg(); LLT Ty = MRI.getType(DstReg); + const LLT S64 = LLT::scalar(64); const LLT S32 = LLT::scalar(32); + const LLT S16 = LLT::scalar(16); unsigned FirstOpnd = isa(MI) ? 2 : 1; Register SrcReg = MI.getOperand(FirstOpnd).getReg(); @@ -1495,6 +1497,18 @@ bool AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B, const RegisterBank *DstBank = OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank; if (DstBank == &AMDGPU::VGPRRegBank) { +if (Ty == S16) { + ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::VGPRRegBank); + B.setInsertPt(B.getMBB(), MI); + LegalizerHelper Helper(B.getMF(), ApplyBank, B); + + Helper.widenScalarDst(MI, S32); + Helper.widenScalarSrc(MI, S32, 1, AMDGPU::G_ANYEXT); + Helper.widenScalarSrc(MI, S32, 2, AMDGPU::G_ZEXT); + Helper.widenScalarSrc(MI, S32, 3, AMDGPU::G_ZEXT); + return true; +} + if (Ty == S32) return true; @@ -1554,6 +1568,11 @@ bool AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B, ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::SGPRRegBank); + if (Ty == S16) { +OffsetReg = B.buildAnyExtOrTrunc(S32, OffsetReg).getReg(0); +WidthReg = B.buildAnyExtOrTrunc(S32, WidthReg).getReg(0); + } + // Ensure the high bits are clear to insert the offset. auto OffsetMask = B.buildConstant(S32, maskTrailingOnes(6)); auto ClampOffset = B.buildAnd(S32, OffsetReg, OffsetMask); @@ -1568,13 +1587,21 @@ bool AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B, // TODO: It might be worth using a pseudo here to avoid scc clobber and // register class constraints. - unsigned Opc = Ty == S32 ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32) : - (Signed ? AMDGPU::S_BFE_I64 : AMDGPU::S_BFE_U64); + unsigned Opc = (Ty != S64) ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32) + : (Signed ? AMDGPU::S_BFE_I64 : AMDGPU::S_BFE_U64); - auto MIB = B.buildInstr(Opc, {DstReg}, {SrcReg, MergedInputs}); + Register BFEDst = DstReg; + if (Ty == S16) { +BFEDst = MRI.createGenericVirtualRegister(S32); +MRI.setRegBank(BFEDst, AMDGPU::SGPRRegBank); + } + auto MIB = B.buildInstr(Opc, {BFEDst}, {SrcReg, MergedInputs}); if (!constrainSelectedInstRegOperands(*MIB, *TII, *TRI, *this)) llvm_unreachable("failed to constrain BFE"); + if (BFEDst != DstReg) +B.buildZExtOrTrunc(DstReg, BFEDst); + MI.eraseFromParent(); return true; } diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll index 07fcb02d98649..d2b600b04f9fc 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/fsh
[llvm-branch-commits] [llvm] [AMDGPU] Precommit si-fold-bitmask.mir (PR #131310)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/131310 >From d4b257d1b34b51018f51546974bffdc2ea56433d Mon Sep 17 00:00:00 2001 From: pvanhout Date: Fri, 14 Mar 2025 10:00:21 +0100 Subject: [PATCH] [AMDGPU] Precommit si-fold-bitmask.mir --- llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir | 429 ++ 1 file changed, 429 insertions(+) create mode 100644 llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir diff --git a/llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir b/llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir new file mode 100644 index 0..1edf970591179 --- /dev/null +++ b/llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir @@ -0,0 +1,429 @@ +# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py +# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -run-pass=si-fold-operands -verify-machineinstrs -o - %s | FileCheck --check-prefix=GCN %s + +# Test supported instructions + +--- +name: v_ashr_i32_e64__v_and_b32_e32 +tracksRegLiveness: true +body: | + bb.0: +liveins: $vgpr0, $vgpr1 + +; GCN-LABEL: name: v_ashr_i32_e64__v_and_b32_e32 +; GCN: liveins: $vgpr0, $vgpr1 +; GCN-NEXT: {{ $}} +; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0 +; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1 +; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit $exec +; GCN-NEXT: %ret:vgpr_32 = V_ASHR_I32_e64 %src, %shiftmask, implicit $exec +; GCN-NEXT: $vgpr0 = COPY %ret +%src:vgpr_32 = COPY $vgpr0 +%shift:vgpr_32 = COPY $vgpr1 +%shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit $exec +%ret:vgpr_32 = V_ASHR_I32_e64 %src, %shiftmask, implicit $exec +$vgpr0 = COPY %ret +... + +--- +name: v_lshr_b32_e64__v_and_b32_e32 +tracksRegLiveness: true +body: | + bb.0: +liveins: $vgpr0, $vgpr1 + +; GCN-LABEL: name: v_lshr_b32_e64__v_and_b32_e32 +; GCN: liveins: $vgpr0, $vgpr1 +; GCN-NEXT: {{ $}} +; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0 +; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1 +; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit $exec +; GCN-NEXT: %ret:vgpr_32 = V_LSHR_B32_e64 %src, %shiftmask, implicit $exec +; GCN-NEXT: $vgpr0 = COPY %ret +%src:vgpr_32 = COPY $vgpr0 +%shift:vgpr_32 = COPY $vgpr1 +%shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit $exec +%ret:vgpr_32 = V_LSHR_B32_e64 %src, %shiftmask, implicit $exec +$vgpr0 = COPY %ret +... + +--- +name: v_lshr_b32_e32__v_and_b32_e32 +tracksRegLiveness: true +body: | + bb.0: +liveins: $vgpr0, $vgpr1 + +; GCN-LABEL: name: v_lshr_b32_e32__v_and_b32_e32 +; GCN: liveins: $vgpr0, $vgpr1 +; GCN-NEXT: {{ $}} +; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0 +; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1 +; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit $exec +; GCN-NEXT: %ret:vgpr_32 = V_LSHR_B32_e32 %src, %shiftmask, implicit $exec +; GCN-NEXT: $vgpr0 = COPY %ret +%src:vgpr_32 = COPY $vgpr0 +%shift:vgpr_32 = COPY $vgpr1 +%shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit $exec +%ret:vgpr_32 = V_LSHR_B32_e32 %src, %shiftmask, implicit $exec +$vgpr0 = COPY %ret +... + +--- +name: v_lshl_b32_e64__v_and_b32_e32 +tracksRegLiveness: true +body: | + bb.0: +liveins: $vgpr0, $vgpr1 + +; GCN-LABEL: name: v_lshl_b32_e64__v_and_b32_e32 +; GCN: liveins: $vgpr0, $vgpr1 +; GCN-NEXT: {{ $}} +; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0 +; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1 +; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit $exec +; GCN-NEXT: %ret:vgpr_32 = V_LSHL_B32_e64 %src, %shiftmask, implicit $exec +; GCN-NEXT: $vgpr0 = COPY %ret +%src:vgpr_32 = COPY $vgpr0 +%shift:vgpr_32 = COPY $vgpr1 +%shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit $exec +%ret:vgpr_32 = V_LSHL_B32_e64 %src, %shiftmask, implicit $exec +$vgpr0 = COPY %ret +... + +--- +name: s_lshl_b32__s_and_b32 +tracksRegLiveness: true +body: | + bb.0: +liveins: $sgpr0, $sgpr1 + +; GCN-LABEL: name: s_lshl_b32__s_and_b32 +; GCN: liveins: $sgpr0, $sgpr1 +; GCN-NEXT: {{ $}} +; GCN-NEXT: %src:sgpr_32 = COPY $sgpr0 +; GCN-NEXT: %shift:sgpr_32 = COPY $sgpr1 +; GCN-NEXT: %shiftmask:sgpr_32 = S_AND_B32 65535, %shift, implicit-def $scc +; GCN-NEXT: %ret:sgpr_32 = S_LSHL_B32 %src, %shiftmask, implicit-def $scc +; GCN-NEXT: $sgpr0 = COPY %ret +%src:sgpr_32 = COPY $sgpr0 +%shift:sgpr_32 = COPY $sgpr1 +%shiftmask:sgpr_32 = S_AND_B32 65535, %shift, implicit-def $scc +%ret:sgpr_32 = S_LSHL_B32 %src, %shiftmask, implicit-def $scc +$sgpr0 = COPY %ret +... + +--- +name: s_lshr_b32__s_and_b32 +tracksRegLiveness: true +body: | + bb.0: +liveins: $sgpr0, $sgpr1 + +; GCN-LABEL: name: s_lshr_b32__s_and_b32 +; GCN: liveins: $sgpr0, $sgpr1 +; GCN-NEXT: {{ $}} +; GCN-NEXT: %src:sgpr_32 = COPY $sgpr0 +; GCN-NEXT: %shift:sgpr_
[llvm-branch-commits] [llvm] [AMDGPU][SIFoldOperands] Fold some redundant bitmasks (PR #131311)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/131311 >From 17e13825f173be8fd67494f13f002f35d93e357f Mon Sep 17 00:00:00 2001 From: pvanhout Date: Fri, 14 Mar 2025 10:05:19 +0100 Subject: [PATCH 1/2] [AMDGPU][SIFoldOperands] Fold some redundant bitmasks Instructions like shifts only read some of the bits of the shift amount operand, between 4 and 6 bits. If the source operand is being masked, we can just ignore the mask. Effects are minimal right now but this will kick in more once we disable uniform i16 operation widening in CGP. With that disabled, we get more i16 shift amounts that are zext'd and without this we'd end up with more `s_and_b32 s1, s1, 0x` in the output. Ideally ISel should handle this but it's proving difficult to get the patterns right, and after a few hours of trying I just decided to go with this as it's simple enough and it "just works" for this purpose. --- llvm/lib/Target/AMDGPU/SIFoldOperands.cpp | 97 +++- llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll | 8 +- llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll | 201 - llvm/test/CodeGen/AMDGPU/GlobalISel/fshr.ll | 207 -- llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll | 8 +- llvm/test/CodeGen/AMDGPU/GlobalISel/shl.ll| 6 +- llvm/test/CodeGen/AMDGPU/constrained-shift.ll | 1 - llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir | 26 +-- 8 files changed, 303 insertions(+), 251 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp index cc15dd7cb495c..5f666e10b5cb7 100644 --- a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp +++ b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp @@ -131,6 +131,7 @@ class SIFoldOperandsImpl { std::optional getImmOrMaterializedImm(MachineOperand &Op) const; bool tryConstantFoldOp(MachineInstr *MI) const; bool tryFoldCndMask(MachineInstr &MI) const; + bool tryFoldBitMask(MachineInstr &MI) const; bool tryFoldZeroHighBits(MachineInstr &MI) const; bool foldInstOperand(MachineInstr &MI, MachineOperand &OpToFold) const; @@ -1447,6 +1448,99 @@ bool SIFoldOperandsImpl::tryFoldCndMask(MachineInstr &MI) const { return true; } +static bool getBitsReadByInst(unsigned Opc, unsigned &NumBitsRead, + unsigned &OpIdx) { + switch (Opc) { + case AMDGPU::V_ASHR_I32_e64: + case AMDGPU::V_ASHR_I32_e32: + case AMDGPU::V_LSHR_B32_e64: + case AMDGPU::V_LSHR_B32_e32: + case AMDGPU::V_LSHL_B32_e64: + case AMDGPU::V_LSHL_B32_e32: + case AMDGPU::S_LSHL_B32: + case AMDGPU::S_LSHR_B32: + case AMDGPU::S_ASHR_I32: +NumBitsRead = 5; +OpIdx = 2; +return true; + case AMDGPU::S_LSHL_B64: + case AMDGPU::S_LSHR_B64: + case AMDGPU::S_ASHR_I64: +NumBitsRead = 6; +OpIdx = 2; +return true; + case AMDGPU::V_LSHLREV_B32_e64: + case AMDGPU::V_LSHLREV_B32_e32: + case AMDGPU::V_LSHRREV_B32_e64: + case AMDGPU::V_LSHRREV_B32_e32: + case AMDGPU::V_ASHRREV_I32_e64: + case AMDGPU::V_ASHRREV_I32_e32: +NumBitsRead = 5; +OpIdx = 1; +return true; + default: +return false; + } +} + +static bool isAndBitMaskRedundant(MachineInstr &MI, unsigned BitsNeeded, +unsigned &SrcOp) { + MachineOperand *RegOp = &MI.getOperand(1); + MachineOperand *ImmOp = &MI.getOperand(2); + + if (!RegOp->isReg() || !ImmOp->isImm()) { +if (ImmOp->isReg() && RegOp->isImm()) + std::swap(RegOp, ImmOp); +else + return false; + } + + SrcOp = RegOp->getOperandNo(); + + const unsigned BitMask = maskTrailingOnes(BitsNeeded); + return (ImmOp->getImm() & BitMask) == BitMask; +} + +bool SIFoldOperandsImpl::tryFoldBitMask(MachineInstr &MI) const { + unsigned NumBitsRead = 0; + unsigned OpIdx = 0; + if (!getBitsReadByInst(MI.getOpcode(), NumBitsRead, OpIdx)) +return false; + + MachineOperand &Op = MI.getOperand(OpIdx); + if (!Op.isReg()) +return false; + + Register OpReg = Op.getReg(); + if (OpReg.isPhysical()) +return false; + + MachineInstr *OpDef = MRI->getVRegDef(OpReg); + if (!OpDef) +return false ; + + LLVM_DEBUG(dbgs() << "tryFoldBitMask: " << MI << "\tOpIdx:" << OpIdx << ", NumBitsRead:" << NumBitsRead << "\n"); + + unsigned ReplaceWith; + switch (OpDef->getOpcode()) { + // TODO: add more opcodes? + case AMDGPU::S_AND_B32: + case AMDGPU::V_AND_B32_e32: + case AMDGPU::V_AND_B32_e64: +if (!isAndBitMaskRedundant(*OpDef, NumBitsRead, ReplaceWith)) + return false; +break; + default: +return false; + } + + MachineOperand &ReplaceWithOp = OpDef->getOperand(ReplaceWith); + LLVM_DEBUG(dbgs() << "\treplacing operand with:" << ReplaceWithOp << "\n"); + + MI.getOperand(OpIdx).setReg(ReplaceWithOp.getReg()); + return true; +} + bool SIFoldOperandsImpl::tryFoldZeroHighBits(MachineInstr &MI) const { if (MI.getOpcode() != AMDGPU::V_AND_B32_e64 && MI.getOpcode() != AMDGPU::V_AND_B32_e32) @@ -1458,7 +1552,7 @@ bool SIFoldOperands
[llvm-branch-commits] [llvm] [AMDGPU][GlobalISel] Combine (sext (trunc (sext_in_reg x))) (PR #131312)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/131312 >From 9fabf931105e1cf86cf69f90bd5c62068846c3e1 Mon Sep 17 00:00:00 2001 From: pvanhout Date: Fri, 14 Mar 2025 10:34:51 +0100 Subject: [PATCH] [AMDGPU][GlobalISel] Combine (sext (trunc (sext_in_reg x))) This is a bit of an akward pattern that can come up as a result of legalization and then widening of i16 operations to i32 in RegBankSelect on AMDGPU. This quick combine avoids redundant patterns like ``` s_sext_i32_i8 s0, s0 s_sext_i32_i16 s0, s0 s_ashr_i32 s0, s0, s1 ``` With this the second sext is removed as it's redundant. --- .../include/llvm/Target/GlobalISel/Combine.td | 12 ++- .../combine-sext-trunc-sextinreg.mir | 86 +++ .../CodeGen/AMDGPU/GlobalISel/llvm.abs.ll | 78 - 3 files changed, 113 insertions(+), 63 deletions(-) create mode 100644 llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir diff --git a/llvm/include/llvm/Target/GlobalISel/Combine.td b/llvm/include/llvm/Target/GlobalISel/Combine.td index 3590ab221ad44..9727b86b4be8b 100644 --- a/llvm/include/llvm/Target/GlobalISel/Combine.td +++ b/llvm/include/llvm/Target/GlobalISel/Combine.td @@ -258,6 +258,14 @@ def sext_trunc_sextload : GICombineRule< [{ return Helper.matchSextTruncSextLoad(*${d}); }]), (apply [{ Helper.applySextTruncSextLoad(*${d}); }])>; +def sext_trunc_sextinreg : GICombineRule< + (defs root:$dst), + (match (G_SEXT_INREG $sir, $src, $width), + (G_TRUNC $trunc, $sir), + (G_SEXT $dst, $trunc), + [{ return (MRI.getType(${trunc}.getReg()).getScalarSizeInBits() >= ${width}.getImm()); }]), + (apply (GIReplaceReg $dst, $sir))>; + def sext_inreg_of_load_matchdata : GIDefMatchData<"std::tuple">; def sext_inreg_of_load : GICombineRule< (defs root:$root, sext_inreg_of_load_matchdata:$matchinfo), @@ -1896,7 +1904,9 @@ def cast_of_cast_combines: GICombineGroup<[ sext_of_anyext, anyext_of_anyext, anyext_of_zext, - anyext_of_sext + anyext_of_sext, + + sext_trunc_sextinreg ]>; def cast_combines: GICombineGroup<[ diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir new file mode 100644 index 0..d41e5b172efc2 --- /dev/null +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir @@ -0,0 +1,86 @@ +# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py +# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1030 -run-pass=amdgpu-postlegalizer-combiner -verify-machineinstrs %s -o - | FileCheck %s +# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1030 -run-pass=amdgpu-regbank-combiner -verify-machineinstrs %s -o - | FileCheck %s + +--- +name: trunc_s16_inreg_8 +tracksRegLiveness: true +body: | + bb.0: +liveins: $vgpr0 +; CHECK-LABEL: name: trunc_s16_inreg_8 +; CHECK: liveins: $vgpr0 +; CHECK-NEXT: {{ $}} +; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0 +; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 8 +; CHECK-NEXT: $vgpr0 = COPY %inreg(s32) +%copy:_(s32) = COPY $vgpr0 +%inreg:_(s32) = G_SEXT_INREG %copy, 8 +%trunc:_(s16) = G_TRUNC %inreg +%sext:_(s32) = G_SEXT %trunc +$vgpr0 = COPY %sext +... + +--- +name: trunc_s16_inreg_16 +tracksRegLiveness: true +body: | + bb.0: +liveins: $vgpr0 +; CHECK-LABEL: name: trunc_s16_inreg_16 +; CHECK: liveins: $vgpr0 +; CHECK-NEXT: {{ $}} +; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0 +; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 16 +; CHECK-NEXT: $vgpr0 = COPY %inreg(s32) +%copy:_(s32) = COPY $vgpr0 +%inreg:_(s32) = G_SEXT_INREG %copy, 16 +%trunc:_(s16) = G_TRUNC %inreg +%sext:_(s32) = G_SEXT %trunc +$vgpr0 = COPY %sext +... + +--- +name: trunc_s8_inreg_16 +tracksRegLiveness: true +body: | + bb.0: +liveins: $vgpr0 +; CHECK-LABEL: name: trunc_s8_inreg_16 +; CHECK: liveins: $vgpr0 +; CHECK-NEXT: {{ $}} +; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0 +; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 16 +; CHECK-NEXT: %trunc:_(s8) = G_TRUNC %inreg(s32) +; CHECK-NEXT: %sext:_(s32) = G_SEXT %trunc(s8) +; CHECK-NEXT: $vgpr0 = COPY %sext(s32) +%copy:_(s32) = COPY $vgpr0 +%inreg:_(s32) = G_SEXT_INREG %copy, 16 +%trunc:_(s8) = G_TRUNC %inreg +%sext:_(s32) = G_SEXT %trunc +$vgpr0 = COPY %sext +... + +# TODO?: We could handle this by inserting a trunc, but I'm not sure how useful that'd be. +--- +name: mismatching_types +tracksRegLiveness: true +body: | + bb.0: +liveins: $vgpr0 +; CHECK-LABEL: name: mismatching_types +; CHECK: liveins: $vgpr0 +; CHECK-NEXT: {{ $}} +; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0 +; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 8 +; CHECK-NEXT: %trunc:_(s8) = G_TRUNC %inreg(s32) +; CHECK-NEXT: %sext:_(s16
[llvm-branch-commits] [llvm] [AMDGPU][SIFoldOperands] Fold some redundant bitmasks (PR #131311)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/131311 >From 520757cf40d285b58eb0539840be2bf282c0a0af Mon Sep 17 00:00:00 2001 From: pvanhout Date: Fri, 14 Mar 2025 10:05:19 +0100 Subject: [PATCH 1/2] [AMDGPU][SIFoldOperands] Fold some redundant bitmasks Instructions like shifts only read some of the bits of the shift amount operand, between 4 and 6 bits. If the source operand is being masked, we can just ignore the mask. Effects are minimal right now but this will kick in more once we disable uniform i16 operation widening in CGP. With that disabled, we get more i16 shift amounts that are zext'd and without this we'd end up with more `s_and_b32 s1, s1, 0x` in the output. Ideally ISel should handle this but it's proving difficult to get the patterns right, and after a few hours of trying I just decided to go with this as it's simple enough and it "just works" for this purpose. --- llvm/lib/Target/AMDGPU/SIFoldOperands.cpp | 97 +++- llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll | 8 +- llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll | 201 - llvm/test/CodeGen/AMDGPU/GlobalISel/fshr.ll | 207 -- llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll | 8 +- llvm/test/CodeGen/AMDGPU/GlobalISel/shl.ll| 6 +- llvm/test/CodeGen/AMDGPU/constrained-shift.ll | 1 - llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir | 26 +-- 8 files changed, 303 insertions(+), 251 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp index 91df516b80857..a279a0a973e75 100644 --- a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp +++ b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp @@ -131,6 +131,7 @@ class SIFoldOperandsImpl { std::optional getImmOrMaterializedImm(MachineOperand &Op) const; bool tryConstantFoldOp(MachineInstr *MI) const; bool tryFoldCndMask(MachineInstr &MI) const; + bool tryFoldBitMask(MachineInstr &MI) const; bool tryFoldZeroHighBits(MachineInstr &MI) const; bool foldInstOperand(MachineInstr &MI, MachineOperand &OpToFold) const; @@ -1447,6 +1448,99 @@ bool SIFoldOperandsImpl::tryFoldCndMask(MachineInstr &MI) const { return true; } +static bool getBitsReadByInst(unsigned Opc, unsigned &NumBitsRead, + unsigned &OpIdx) { + switch (Opc) { + case AMDGPU::V_ASHR_I32_e64: + case AMDGPU::V_ASHR_I32_e32: + case AMDGPU::V_LSHR_B32_e64: + case AMDGPU::V_LSHR_B32_e32: + case AMDGPU::V_LSHL_B32_e64: + case AMDGPU::V_LSHL_B32_e32: + case AMDGPU::S_LSHL_B32: + case AMDGPU::S_LSHR_B32: + case AMDGPU::S_ASHR_I32: +NumBitsRead = 5; +OpIdx = 2; +return true; + case AMDGPU::S_LSHL_B64: + case AMDGPU::S_LSHR_B64: + case AMDGPU::S_ASHR_I64: +NumBitsRead = 6; +OpIdx = 2; +return true; + case AMDGPU::V_LSHLREV_B32_e64: + case AMDGPU::V_LSHLREV_B32_e32: + case AMDGPU::V_LSHRREV_B32_e64: + case AMDGPU::V_LSHRREV_B32_e32: + case AMDGPU::V_ASHRREV_I32_e64: + case AMDGPU::V_ASHRREV_I32_e32: +NumBitsRead = 5; +OpIdx = 1; +return true; + default: +return false; + } +} + +static bool isAndBitMaskRedundant(MachineInstr &MI, unsigned BitsNeeded, +unsigned &SrcOp) { + MachineOperand *RegOp = &MI.getOperand(1); + MachineOperand *ImmOp = &MI.getOperand(2); + + if (!RegOp->isReg() || !ImmOp->isImm()) { +if (ImmOp->isReg() && RegOp->isImm()) + std::swap(RegOp, ImmOp); +else + return false; + } + + SrcOp = RegOp->getOperandNo(); + + const unsigned BitMask = maskTrailingOnes(BitsNeeded); + return (ImmOp->getImm() & BitMask) == BitMask; +} + +bool SIFoldOperandsImpl::tryFoldBitMask(MachineInstr &MI) const { + unsigned NumBitsRead = 0; + unsigned OpIdx = 0; + if (!getBitsReadByInst(MI.getOpcode(), NumBitsRead, OpIdx)) +return false; + + MachineOperand &Op = MI.getOperand(OpIdx); + if (!Op.isReg()) +return false; + + Register OpReg = Op.getReg(); + if (OpReg.isPhysical()) +return false; + + MachineInstr *OpDef = MRI->getVRegDef(OpReg); + if (!OpDef) +return false ; + + LLVM_DEBUG(dbgs() << "tryFoldBitMask: " << MI << "\tOpIdx:" << OpIdx << ", NumBitsRead:" << NumBitsRead << "\n"); + + unsigned ReplaceWith; + switch (OpDef->getOpcode()) { + // TODO: add more opcodes? + case AMDGPU::S_AND_B32: + case AMDGPU::V_AND_B32_e32: + case AMDGPU::V_AND_B32_e64: +if (!isAndBitMaskRedundant(*OpDef, NumBitsRead, ReplaceWith)) + return false; +break; + default: +return false; + } + + MachineOperand &ReplaceWithOp = OpDef->getOperand(ReplaceWith); + LLVM_DEBUG(dbgs() << "\treplacing operand with:" << ReplaceWithOp << "\n"); + + MI.getOperand(OpIdx).setReg(ReplaceWithOp.getReg()); + return true; +} + bool SIFoldOperandsImpl::tryFoldZeroHighBits(MachineInstr &MI) const { if (MI.getOpcode() != AMDGPU::V_AND_B32_e64 && MI.getOpcode() != AMDGPU::V_AND_B32_e32) @@ -1458,7 +1552,7 @@ bool SIFoldOperands
[llvm-branch-commits] [llvm] [AMDGPU][GlobalISel] Combine (sext (trunc (sext_in_reg x))) (PR #131312)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/131312 >From 4751d38d86886106c00e9140bf0bb3a3459950cb Mon Sep 17 00:00:00 2001 From: pvanhout Date: Fri, 14 Mar 2025 10:34:51 +0100 Subject: [PATCH] [AMDGPU][GlobalISel] Combine (sext (trunc (sext_in_reg x))) This is a bit of an akward pattern that can come up as a result of legalization and then widening of i16 operations to i32 in RegBankSelect on AMDGPU. This quick combine avoids redundant patterns like ``` s_sext_i32_i8 s0, s0 s_sext_i32_i16 s0, s0 s_ashr_i32 s0, s0, s1 ``` With this the second sext is removed as it's redundant. --- .../include/llvm/Target/GlobalISel/Combine.td | 12 ++- .../combine-sext-trunc-sextinreg.mir | 86 +++ .../CodeGen/AMDGPU/GlobalISel/llvm.abs.ll | 78 - 3 files changed, 113 insertions(+), 63 deletions(-) create mode 100644 llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir diff --git a/llvm/include/llvm/Target/GlobalISel/Combine.td b/llvm/include/llvm/Target/GlobalISel/Combine.td index 3590ab221ad44..9727b86b4be8b 100644 --- a/llvm/include/llvm/Target/GlobalISel/Combine.td +++ b/llvm/include/llvm/Target/GlobalISel/Combine.td @@ -258,6 +258,14 @@ def sext_trunc_sextload : GICombineRule< [{ return Helper.matchSextTruncSextLoad(*${d}); }]), (apply [{ Helper.applySextTruncSextLoad(*${d}); }])>; +def sext_trunc_sextinreg : GICombineRule< + (defs root:$dst), + (match (G_SEXT_INREG $sir, $src, $width), + (G_TRUNC $trunc, $sir), + (G_SEXT $dst, $trunc), + [{ return (MRI.getType(${trunc}.getReg()).getScalarSizeInBits() >= ${width}.getImm()); }]), + (apply (GIReplaceReg $dst, $sir))>; + def sext_inreg_of_load_matchdata : GIDefMatchData<"std::tuple">; def sext_inreg_of_load : GICombineRule< (defs root:$root, sext_inreg_of_load_matchdata:$matchinfo), @@ -1896,7 +1904,9 @@ def cast_of_cast_combines: GICombineGroup<[ sext_of_anyext, anyext_of_anyext, anyext_of_zext, - anyext_of_sext + anyext_of_sext, + + sext_trunc_sextinreg ]>; def cast_combines: GICombineGroup<[ diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir new file mode 100644 index 0..d41e5b172efc2 --- /dev/null +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir @@ -0,0 +1,86 @@ +# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py +# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1030 -run-pass=amdgpu-postlegalizer-combiner -verify-machineinstrs %s -o - | FileCheck %s +# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1030 -run-pass=amdgpu-regbank-combiner -verify-machineinstrs %s -o - | FileCheck %s + +--- +name: trunc_s16_inreg_8 +tracksRegLiveness: true +body: | + bb.0: +liveins: $vgpr0 +; CHECK-LABEL: name: trunc_s16_inreg_8 +; CHECK: liveins: $vgpr0 +; CHECK-NEXT: {{ $}} +; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0 +; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 8 +; CHECK-NEXT: $vgpr0 = COPY %inreg(s32) +%copy:_(s32) = COPY $vgpr0 +%inreg:_(s32) = G_SEXT_INREG %copy, 8 +%trunc:_(s16) = G_TRUNC %inreg +%sext:_(s32) = G_SEXT %trunc +$vgpr0 = COPY %sext +... + +--- +name: trunc_s16_inreg_16 +tracksRegLiveness: true +body: | + bb.0: +liveins: $vgpr0 +; CHECK-LABEL: name: trunc_s16_inreg_16 +; CHECK: liveins: $vgpr0 +; CHECK-NEXT: {{ $}} +; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0 +; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 16 +; CHECK-NEXT: $vgpr0 = COPY %inreg(s32) +%copy:_(s32) = COPY $vgpr0 +%inreg:_(s32) = G_SEXT_INREG %copy, 16 +%trunc:_(s16) = G_TRUNC %inreg +%sext:_(s32) = G_SEXT %trunc +$vgpr0 = COPY %sext +... + +--- +name: trunc_s8_inreg_16 +tracksRegLiveness: true +body: | + bb.0: +liveins: $vgpr0 +; CHECK-LABEL: name: trunc_s8_inreg_16 +; CHECK: liveins: $vgpr0 +; CHECK-NEXT: {{ $}} +; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0 +; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 16 +; CHECK-NEXT: %trunc:_(s8) = G_TRUNC %inreg(s32) +; CHECK-NEXT: %sext:_(s32) = G_SEXT %trunc(s8) +; CHECK-NEXT: $vgpr0 = COPY %sext(s32) +%copy:_(s32) = COPY $vgpr0 +%inreg:_(s32) = G_SEXT_INREG %copy, 16 +%trunc:_(s8) = G_TRUNC %inreg +%sext:_(s32) = G_SEXT %trunc +$vgpr0 = COPY %sext +... + +# TODO?: We could handle this by inserting a trunc, but I'm not sure how useful that'd be. +--- +name: mismatching_types +tracksRegLiveness: true +body: | + bb.0: +liveins: $vgpr0 +; CHECK-LABEL: name: mismatching_types +; CHECK: liveins: $vgpr0 +; CHECK-NEXT: {{ $}} +; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0 +; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 8 +; CHECK-NEXT: %trunc:_(s8) = G_TRUNC %inreg(s32) +; CHECK-NEXT: %sext:_(s16
[llvm-branch-commits] [llvm] [AMDGPU][RegBankInfo] Promote scalar i16 and/or/xor to i32 (PR #131306)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/131306 >From 1af83464f02df212384bd97848b0073d41053234 Mon Sep 17 00:00:00 2001 From: pvanhout Date: Wed, 5 Mar 2025 10:46:01 +0100 Subject: [PATCH 1/2] [AMDGPU][RegBankInfo] Promote scalar i16 and/or/xor to i32 See #64591 --- .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp | 28 +- llvm/test/CodeGen/AMDGPU/GlobalISel/andn2.ll | 10 +- llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll | 519 -- llvm/test/CodeGen/AMDGPU/GlobalISel/fshr.ll | 286 +- llvm/test/CodeGen/AMDGPU/GlobalISel/orn2.ll | 10 +- 5 files changed, 403 insertions(+), 450 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp index c19ee14ab1574..27b86723ce474 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp @@ -2416,9 +2416,10 @@ void AMDGPURegisterBankInfo::applyMappingImpl( Register DstReg = MI.getOperand(0).getReg(); LLT DstTy = MRI.getType(DstReg); -if (DstTy.getSizeInBits() == 1) { - const RegisterBank *DstBank = +const RegisterBank *DstBank = OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank; + +if (DstTy.getSizeInBits() == 1) { if (DstBank == &AMDGPU::VCCRegBank) break; @@ -2432,6 +2433,29 @@ void AMDGPURegisterBankInfo::applyMappingImpl( return; } +// 16-bit operations are VALU only, but can be promoted to 32-bit SALU. +// Packed 16-bit operations need to be scalarized and promoted. +if (DstTy.getSizeInBits() == 16 && DstBank == &AMDGPU::SGPRRegBank) { + const LLT S32 = LLT::scalar(32); + MachineBasicBlock *MBB = MI.getParent(); + MachineFunction *MF = MBB->getParent(); + ApplyRegBankMapping ApplySALU(B, *this, MRI, &AMDGPU::SGPRRegBank); + LegalizerHelper Helper(*MF, ApplySALU, B); + // Widen to S32, but handle `G_XOR x, -1` differently. Legalizer widening + // will use a G_ANYEXT to extend the -1 which prevents matching G_XOR -1 + // as "not". + if (MI.getOpcode() == AMDGPU::G_XOR && + mi_match(MI.getOperand(2).getReg(), MRI, m_SpecificICstOrSplat(-1))) { +Helper.widenScalarSrc(MI, S32, 1, AMDGPU::G_ANYEXT); +Helper.widenScalarSrc(MI, S32, 2, AMDGPU::G_SEXT); +Helper.widenScalarDst(MI, S32); + } else { +if (Helper.widenScalar(MI, 0, S32) != LegalizerHelper::Legalized) + llvm_unreachable("widen scalar should have succeeded"); + } + return; +} + if (DstTy.getSizeInBits() != 64) break; diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/andn2.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/andn2.ll index 1a94429b1b5a1..36359579ea442 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/andn2.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/andn2.ll @@ -391,20 +391,20 @@ define amdgpu_ps i16 @s_andn2_i16_commute(i16 inreg %src0, i16 inreg %src1) { define amdgpu_ps { i16, i16 } @s_andn2_i16_multi_use(i16 inreg %src0, i16 inreg %src1) { ; GCN-LABEL: s_andn2_i16_multi_use: ; GCN: ; %bb.0: -; GCN-NEXT:s_xor_b32 s1, s3, -1 +; GCN-NEXT:s_not_b32 s1, s3 ; GCN-NEXT:s_andn2_b32 s0, s2, s3 ; GCN-NEXT:; return to shader part epilog ; ; GFX10-LABEL: s_andn2_i16_multi_use: ; GFX10: ; %bb.0: ; GFX10-NEXT:s_andn2_b32 s0, s2, s3 -; GFX10-NEXT:s_xor_b32 s1, s3, -1 +; GFX10-NEXT:s_not_b32 s1, s3 ; GFX10-NEXT:; return to shader part epilog ; ; GFX11-LABEL: s_andn2_i16_multi_use: ; GFX11: ; %bb.0: ; GFX11-NEXT:s_and_not1_b32 s0, s2, s3 -; GFX11-NEXT:s_xor_b32 s1, s3, -1 +; GFX11-NEXT:s_not_b32 s1, s3 ; GFX11-NEXT:; return to shader part epilog %not.src1 = xor i16 %src1, -1 %and = and i16 %src0, %not.src1 @@ -482,14 +482,14 @@ define amdgpu_ps float @v_andn2_i16_sv(i16 inreg %src0, i16 %src1) { define amdgpu_ps float @v_andn2_i16_vs(i16 %src0, i16 inreg %src1) { ; GCN-LABEL: v_andn2_i16_vs: ; GCN: ; %bb.0: -; GCN-NEXT:s_xor_b32 s0, s2, -1 +; GCN-NEXT:s_not_b32 s0, s2 ; GCN-NEXT:v_and_b32_e32 v0, s0, v0 ; GCN-NEXT:v_and_b32_e32 v0, 0x, v0 ; GCN-NEXT:; return to shader part epilog ; ; GFX10PLUS-LABEL: v_andn2_i16_vs: ; GFX10PLUS: ; %bb.0: -; GFX10PLUS-NEXT:s_xor_b32 s0, s2, -1 +; GFX10PLUS-NEXT:s_not_b32 s0, s2 ; GFX10PLUS-NEXT:v_and_b32_e32 v0, s0, v0 ; GFX10PLUS-NEXT:v_and_b32_e32 v0, 0x, v0 ; GFX10PLUS-NEXT:; return to shader part epilog diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll index e60739fd84059..3a52497bd6e91 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll @@ -1052,17 +1052,14 @@ define amdgpu_ps i32 @s_fshl_v4i8(i32 inreg %lhs.arg, i32 inreg %rhs.arg, i32 in ; GFX8-NEXT:s_lshr_b32 s2, s2, s3 ; GFX8-NEXT
[llvm-branch-commits] [llvm] [AMDGPU] Precommit si-fold-bitmask.mir (PR #131310)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/131310 >From 6db5fe8cc5ff82cc7dc8751ac584870ddbf1b537 Mon Sep 17 00:00:00 2001 From: pvanhout Date: Fri, 14 Mar 2025 10:00:21 +0100 Subject: [PATCH] [AMDGPU] Precommit si-fold-bitmask.mir --- llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir | 429 ++ 1 file changed, 429 insertions(+) create mode 100644 llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir diff --git a/llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir b/llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir new file mode 100644 index 0..1edf970591179 --- /dev/null +++ b/llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir @@ -0,0 +1,429 @@ +# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py +# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -run-pass=si-fold-operands -verify-machineinstrs -o - %s | FileCheck --check-prefix=GCN %s + +# Test supported instructions + +--- +name: v_ashr_i32_e64__v_and_b32_e32 +tracksRegLiveness: true +body: | + bb.0: +liveins: $vgpr0, $vgpr1 + +; GCN-LABEL: name: v_ashr_i32_e64__v_and_b32_e32 +; GCN: liveins: $vgpr0, $vgpr1 +; GCN-NEXT: {{ $}} +; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0 +; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1 +; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit $exec +; GCN-NEXT: %ret:vgpr_32 = V_ASHR_I32_e64 %src, %shiftmask, implicit $exec +; GCN-NEXT: $vgpr0 = COPY %ret +%src:vgpr_32 = COPY $vgpr0 +%shift:vgpr_32 = COPY $vgpr1 +%shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit $exec +%ret:vgpr_32 = V_ASHR_I32_e64 %src, %shiftmask, implicit $exec +$vgpr0 = COPY %ret +... + +--- +name: v_lshr_b32_e64__v_and_b32_e32 +tracksRegLiveness: true +body: | + bb.0: +liveins: $vgpr0, $vgpr1 + +; GCN-LABEL: name: v_lshr_b32_e64__v_and_b32_e32 +; GCN: liveins: $vgpr0, $vgpr1 +; GCN-NEXT: {{ $}} +; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0 +; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1 +; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit $exec +; GCN-NEXT: %ret:vgpr_32 = V_LSHR_B32_e64 %src, %shiftmask, implicit $exec +; GCN-NEXT: $vgpr0 = COPY %ret +%src:vgpr_32 = COPY $vgpr0 +%shift:vgpr_32 = COPY $vgpr1 +%shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit $exec +%ret:vgpr_32 = V_LSHR_B32_e64 %src, %shiftmask, implicit $exec +$vgpr0 = COPY %ret +... + +--- +name: v_lshr_b32_e32__v_and_b32_e32 +tracksRegLiveness: true +body: | + bb.0: +liveins: $vgpr0, $vgpr1 + +; GCN-LABEL: name: v_lshr_b32_e32__v_and_b32_e32 +; GCN: liveins: $vgpr0, $vgpr1 +; GCN-NEXT: {{ $}} +; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0 +; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1 +; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit $exec +; GCN-NEXT: %ret:vgpr_32 = V_LSHR_B32_e32 %src, %shiftmask, implicit $exec +; GCN-NEXT: $vgpr0 = COPY %ret +%src:vgpr_32 = COPY $vgpr0 +%shift:vgpr_32 = COPY $vgpr1 +%shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit $exec +%ret:vgpr_32 = V_LSHR_B32_e32 %src, %shiftmask, implicit $exec +$vgpr0 = COPY %ret +... + +--- +name: v_lshl_b32_e64__v_and_b32_e32 +tracksRegLiveness: true +body: | + bb.0: +liveins: $vgpr0, $vgpr1 + +; GCN-LABEL: name: v_lshl_b32_e64__v_and_b32_e32 +; GCN: liveins: $vgpr0, $vgpr1 +; GCN-NEXT: {{ $}} +; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0 +; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1 +; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit $exec +; GCN-NEXT: %ret:vgpr_32 = V_LSHL_B32_e64 %src, %shiftmask, implicit $exec +; GCN-NEXT: $vgpr0 = COPY %ret +%src:vgpr_32 = COPY $vgpr0 +%shift:vgpr_32 = COPY $vgpr1 +%shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit $exec +%ret:vgpr_32 = V_LSHL_B32_e64 %src, %shiftmask, implicit $exec +$vgpr0 = COPY %ret +... + +--- +name: s_lshl_b32__s_and_b32 +tracksRegLiveness: true +body: | + bb.0: +liveins: $sgpr0, $sgpr1 + +; GCN-LABEL: name: s_lshl_b32__s_and_b32 +; GCN: liveins: $sgpr0, $sgpr1 +; GCN-NEXT: {{ $}} +; GCN-NEXT: %src:sgpr_32 = COPY $sgpr0 +; GCN-NEXT: %shift:sgpr_32 = COPY $sgpr1 +; GCN-NEXT: %shiftmask:sgpr_32 = S_AND_B32 65535, %shift, implicit-def $scc +; GCN-NEXT: %ret:sgpr_32 = S_LSHL_B32 %src, %shiftmask, implicit-def $scc +; GCN-NEXT: $sgpr0 = COPY %ret +%src:sgpr_32 = COPY $sgpr0 +%shift:sgpr_32 = COPY $sgpr1 +%shiftmask:sgpr_32 = S_AND_B32 65535, %shift, implicit-def $scc +%ret:sgpr_32 = S_LSHL_B32 %src, %shiftmask, implicit-def $scc +$sgpr0 = COPY %ret +... + +--- +name: s_lshr_b32__s_and_b32 +tracksRegLiveness: true +body: | + bb.0: +liveins: $sgpr0, $sgpr1 + +; GCN-LABEL: name: s_lshr_b32__s_and_b32 +; GCN: liveins: $sgpr0, $sgpr1 +; GCN-NEXT: {{ $}} +; GCN-NEXT: %src:sgpr_32 = COPY $sgpr0 +; GCN-NEXT: %shift:sgpr_
[llvm-branch-commits] [llvm] [AMDGPU][GlobalISel] Allow forming s16 U/SBFX pre-regbankselect (PR #131309)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/131309 >From 090fa3eb8b5ebb595a6ec4b78ec337af71466a73 Mon Sep 17 00:00:00 2001 From: pvanhout Date: Wed, 12 Mar 2025 09:43:15 +0100 Subject: [PATCH] [AMDGPU][GlobalISel] Allow forming s16 U/SBFX pre-regbankselect Make s16 G_U/SBFX legal and widen them in RegBankSelect. This allows the set of BFX formation combines to work on s16 types. --- .../lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | 9 +- .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp | 33 +- llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll | 645 -- llvm/test/CodeGen/AMDGPU/GlobalISel/fshr.ll | 380 --- .../AMDGPU/GlobalISel/legalize-sbfx.mir | 26 +- .../AMDGPU/GlobalISel/legalize-ubfx.mir | 27 +- llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll | 27 +- 7 files changed, 503 insertions(+), 644 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp index cfb5c3b3006f0..ab900157d2095 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp @@ -2069,10 +2069,13 @@ AMDGPULegalizerInfo::AMDGPULegalizerInfo(const GCNSubtarget &ST_, .minScalar(0, S32) .lower(); + // Only {S32, S32} or {S32, S64} should ever reach codegen. + // We allow S/UBFX for S16 so the combiner can form them before + // RegBankSelect, and RegBankSelect will then legalize them correctly. getActionDefinitionsBuilder({G_SBFX, G_UBFX}) - .legalFor({{S32, S32}, {S64, S32}}) - .clampScalar(1, S32, S32) - .clampScalar(0, S32, S64) + .legalFor({{S16, S16}, {S32, S32}, {S64, S32}}) + .clampScalar(1, S16, S32) + .clampScalar(0, S16, S64) .widenScalarToNextPow2(0) .scalarize(0); diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp index a7df9a0edd21a..844251be24c42 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp @@ -1485,7 +1485,9 @@ bool AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B, Register DstReg = MI.getOperand(0).getReg(); LLT Ty = MRI.getType(DstReg); + const LLT S64 = LLT::scalar(64); const LLT S32 = LLT::scalar(32); + const LLT S16 = LLT::scalar(16); unsigned FirstOpnd = isa(MI) ? 2 : 1; Register SrcReg = MI.getOperand(FirstOpnd).getReg(); @@ -1495,6 +1497,18 @@ bool AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B, const RegisterBank *DstBank = OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank; if (DstBank == &AMDGPU::VGPRRegBank) { +if (Ty == S16) { + ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::VGPRRegBank); + B.setInsertPt(B.getMBB(), MI); + LegalizerHelper Helper(B.getMF(), ApplyBank, B); + + Helper.widenScalarDst(MI, S32); + Helper.widenScalarSrc(MI, S32, 1, AMDGPU::G_ANYEXT); + Helper.widenScalarSrc(MI, S32, 2, AMDGPU::G_ZEXT); + Helper.widenScalarSrc(MI, S32, 3, AMDGPU::G_ZEXT); + return true; +} + if (Ty == S32) return true; @@ -1554,6 +1568,11 @@ bool AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B, ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::SGPRRegBank); + if (Ty == S16) { +OffsetReg = B.buildAnyExtOrTrunc(S32, OffsetReg).getReg(0); +WidthReg = B.buildAnyExtOrTrunc(S32, WidthReg).getReg(0); + } + // Ensure the high bits are clear to insert the offset. auto OffsetMask = B.buildConstant(S32, maskTrailingOnes(6)); auto ClampOffset = B.buildAnd(S32, OffsetReg, OffsetMask); @@ -1568,13 +1587,21 @@ bool AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B, // TODO: It might be worth using a pseudo here to avoid scc clobber and // register class constraints. - unsigned Opc = Ty == S32 ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32) : - (Signed ? AMDGPU::S_BFE_I64 : AMDGPU::S_BFE_U64); + unsigned Opc = (Ty != S64) ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32) + : (Signed ? AMDGPU::S_BFE_I64 : AMDGPU::S_BFE_U64); - auto MIB = B.buildInstr(Opc, {DstReg}, {SrcReg, MergedInputs}); + Register BFEDst = DstReg; + if (Ty == S16) { +BFEDst = MRI.createGenericVirtualRegister(S32); +MRI.setRegBank(BFEDst, AMDGPU::SGPRRegBank); + } + auto MIB = B.buildInstr(Opc, {BFEDst}, {SrcReg, MergedInputs}); if (!constrainSelectedInstRegOperands(*MIB, *TII, *TRI, *this)) llvm_unreachable("failed to constrain BFE"); + if (BFEDst != DstReg) +B.buildZExtOrTrunc(DstReg, BFEDst); + MI.eraseFromParent(); return true; } diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll index 07fcb02d98649..d2b600b04f9fc 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/fsh
[llvm-branch-commits] [llvm] [AMDGPU] Precommit si-fold-bitmask.mir (PR #131310)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/131310 >From 6db5fe8cc5ff82cc7dc8751ac584870ddbf1b537 Mon Sep 17 00:00:00 2001 From: pvanhout Date: Fri, 14 Mar 2025 10:00:21 +0100 Subject: [PATCH] [AMDGPU] Precommit si-fold-bitmask.mir --- llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir | 429 ++ 1 file changed, 429 insertions(+) create mode 100644 llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir diff --git a/llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir b/llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir new file mode 100644 index 0..1edf970591179 --- /dev/null +++ b/llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir @@ -0,0 +1,429 @@ +# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py +# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -run-pass=si-fold-operands -verify-machineinstrs -o - %s | FileCheck --check-prefix=GCN %s + +# Test supported instructions + +--- +name: v_ashr_i32_e64__v_and_b32_e32 +tracksRegLiveness: true +body: | + bb.0: +liveins: $vgpr0, $vgpr1 + +; GCN-LABEL: name: v_ashr_i32_e64__v_and_b32_e32 +; GCN: liveins: $vgpr0, $vgpr1 +; GCN-NEXT: {{ $}} +; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0 +; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1 +; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit $exec +; GCN-NEXT: %ret:vgpr_32 = V_ASHR_I32_e64 %src, %shiftmask, implicit $exec +; GCN-NEXT: $vgpr0 = COPY %ret +%src:vgpr_32 = COPY $vgpr0 +%shift:vgpr_32 = COPY $vgpr1 +%shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit $exec +%ret:vgpr_32 = V_ASHR_I32_e64 %src, %shiftmask, implicit $exec +$vgpr0 = COPY %ret +... + +--- +name: v_lshr_b32_e64__v_and_b32_e32 +tracksRegLiveness: true +body: | + bb.0: +liveins: $vgpr0, $vgpr1 + +; GCN-LABEL: name: v_lshr_b32_e64__v_and_b32_e32 +; GCN: liveins: $vgpr0, $vgpr1 +; GCN-NEXT: {{ $}} +; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0 +; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1 +; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit $exec +; GCN-NEXT: %ret:vgpr_32 = V_LSHR_B32_e64 %src, %shiftmask, implicit $exec +; GCN-NEXT: $vgpr0 = COPY %ret +%src:vgpr_32 = COPY $vgpr0 +%shift:vgpr_32 = COPY $vgpr1 +%shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit $exec +%ret:vgpr_32 = V_LSHR_B32_e64 %src, %shiftmask, implicit $exec +$vgpr0 = COPY %ret +... + +--- +name: v_lshr_b32_e32__v_and_b32_e32 +tracksRegLiveness: true +body: | + bb.0: +liveins: $vgpr0, $vgpr1 + +; GCN-LABEL: name: v_lshr_b32_e32__v_and_b32_e32 +; GCN: liveins: $vgpr0, $vgpr1 +; GCN-NEXT: {{ $}} +; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0 +; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1 +; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit $exec +; GCN-NEXT: %ret:vgpr_32 = V_LSHR_B32_e32 %src, %shiftmask, implicit $exec +; GCN-NEXT: $vgpr0 = COPY %ret +%src:vgpr_32 = COPY $vgpr0 +%shift:vgpr_32 = COPY $vgpr1 +%shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit $exec +%ret:vgpr_32 = V_LSHR_B32_e32 %src, %shiftmask, implicit $exec +$vgpr0 = COPY %ret +... + +--- +name: v_lshl_b32_e64__v_and_b32_e32 +tracksRegLiveness: true +body: | + bb.0: +liveins: $vgpr0, $vgpr1 + +; GCN-LABEL: name: v_lshl_b32_e64__v_and_b32_e32 +; GCN: liveins: $vgpr0, $vgpr1 +; GCN-NEXT: {{ $}} +; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0 +; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1 +; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit $exec +; GCN-NEXT: %ret:vgpr_32 = V_LSHL_B32_e64 %src, %shiftmask, implicit $exec +; GCN-NEXT: $vgpr0 = COPY %ret +%src:vgpr_32 = COPY $vgpr0 +%shift:vgpr_32 = COPY $vgpr1 +%shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit $exec +%ret:vgpr_32 = V_LSHL_B32_e64 %src, %shiftmask, implicit $exec +$vgpr0 = COPY %ret +... + +--- +name: s_lshl_b32__s_and_b32 +tracksRegLiveness: true +body: | + bb.0: +liveins: $sgpr0, $sgpr1 + +; GCN-LABEL: name: s_lshl_b32__s_and_b32 +; GCN: liveins: $sgpr0, $sgpr1 +; GCN-NEXT: {{ $}} +; GCN-NEXT: %src:sgpr_32 = COPY $sgpr0 +; GCN-NEXT: %shift:sgpr_32 = COPY $sgpr1 +; GCN-NEXT: %shiftmask:sgpr_32 = S_AND_B32 65535, %shift, implicit-def $scc +; GCN-NEXT: %ret:sgpr_32 = S_LSHL_B32 %src, %shiftmask, implicit-def $scc +; GCN-NEXT: $sgpr0 = COPY %ret +%src:sgpr_32 = COPY $sgpr0 +%shift:sgpr_32 = COPY $sgpr1 +%shiftmask:sgpr_32 = S_AND_B32 65535, %shift, implicit-def $scc +%ret:sgpr_32 = S_LSHL_B32 %src, %shiftmask, implicit-def $scc +$sgpr0 = COPY %ret +... + +--- +name: s_lshr_b32__s_and_b32 +tracksRegLiveness: true +body: | + bb.0: +liveins: $sgpr0, $sgpr1 + +; GCN-LABEL: name: s_lshr_b32__s_and_b32 +; GCN: liveins: $sgpr0, $sgpr1 +; GCN-NEXT: {{ $}} +; GCN-NEXT: %src:sgpr_32 = COPY $sgpr0 +; GCN-NEXT: %shift:sgpr_
[llvm-branch-commits] [llvm] [AMDGPU][GlobalISel] Allow forming s16 U/SBFX pre-regbankselect (PR #131309)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/131309 >From 090fa3eb8b5ebb595a6ec4b78ec337af71466a73 Mon Sep 17 00:00:00 2001 From: pvanhout Date: Wed, 12 Mar 2025 09:43:15 +0100 Subject: [PATCH] [AMDGPU][GlobalISel] Allow forming s16 U/SBFX pre-regbankselect Make s16 G_U/SBFX legal and widen them in RegBankSelect. This allows the set of BFX formation combines to work on s16 types. --- .../lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | 9 +- .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp | 33 +- llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll | 645 -- llvm/test/CodeGen/AMDGPU/GlobalISel/fshr.ll | 380 --- .../AMDGPU/GlobalISel/legalize-sbfx.mir | 26 +- .../AMDGPU/GlobalISel/legalize-ubfx.mir | 27 +- llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll | 27 +- 7 files changed, 503 insertions(+), 644 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp index cfb5c3b3006f0..ab900157d2095 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp @@ -2069,10 +2069,13 @@ AMDGPULegalizerInfo::AMDGPULegalizerInfo(const GCNSubtarget &ST_, .minScalar(0, S32) .lower(); + // Only {S32, S32} or {S32, S64} should ever reach codegen. + // We allow S/UBFX for S16 so the combiner can form them before + // RegBankSelect, and RegBankSelect will then legalize them correctly. getActionDefinitionsBuilder({G_SBFX, G_UBFX}) - .legalFor({{S32, S32}, {S64, S32}}) - .clampScalar(1, S32, S32) - .clampScalar(0, S32, S64) + .legalFor({{S16, S16}, {S32, S32}, {S64, S32}}) + .clampScalar(1, S16, S32) + .clampScalar(0, S16, S64) .widenScalarToNextPow2(0) .scalarize(0); diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp index a7df9a0edd21a..844251be24c42 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp @@ -1485,7 +1485,9 @@ bool AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B, Register DstReg = MI.getOperand(0).getReg(); LLT Ty = MRI.getType(DstReg); + const LLT S64 = LLT::scalar(64); const LLT S32 = LLT::scalar(32); + const LLT S16 = LLT::scalar(16); unsigned FirstOpnd = isa(MI) ? 2 : 1; Register SrcReg = MI.getOperand(FirstOpnd).getReg(); @@ -1495,6 +1497,18 @@ bool AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B, const RegisterBank *DstBank = OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank; if (DstBank == &AMDGPU::VGPRRegBank) { +if (Ty == S16) { + ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::VGPRRegBank); + B.setInsertPt(B.getMBB(), MI); + LegalizerHelper Helper(B.getMF(), ApplyBank, B); + + Helper.widenScalarDst(MI, S32); + Helper.widenScalarSrc(MI, S32, 1, AMDGPU::G_ANYEXT); + Helper.widenScalarSrc(MI, S32, 2, AMDGPU::G_ZEXT); + Helper.widenScalarSrc(MI, S32, 3, AMDGPU::G_ZEXT); + return true; +} + if (Ty == S32) return true; @@ -1554,6 +1568,11 @@ bool AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B, ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::SGPRRegBank); + if (Ty == S16) { +OffsetReg = B.buildAnyExtOrTrunc(S32, OffsetReg).getReg(0); +WidthReg = B.buildAnyExtOrTrunc(S32, WidthReg).getReg(0); + } + // Ensure the high bits are clear to insert the offset. auto OffsetMask = B.buildConstant(S32, maskTrailingOnes(6)); auto ClampOffset = B.buildAnd(S32, OffsetReg, OffsetMask); @@ -1568,13 +1587,21 @@ bool AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B, // TODO: It might be worth using a pseudo here to avoid scc clobber and // register class constraints. - unsigned Opc = Ty == S32 ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32) : - (Signed ? AMDGPU::S_BFE_I64 : AMDGPU::S_BFE_U64); + unsigned Opc = (Ty != S64) ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32) + : (Signed ? AMDGPU::S_BFE_I64 : AMDGPU::S_BFE_U64); - auto MIB = B.buildInstr(Opc, {DstReg}, {SrcReg, MergedInputs}); + Register BFEDst = DstReg; + if (Ty == S16) { +BFEDst = MRI.createGenericVirtualRegister(S32); +MRI.setRegBank(BFEDst, AMDGPU::SGPRRegBank); + } + auto MIB = B.buildInstr(Opc, {BFEDst}, {SrcReg, MergedInputs}); if (!constrainSelectedInstRegOperands(*MIB, *TII, *TRI, *this)) llvm_unreachable("failed to constrain BFE"); + if (BFEDst != DstReg) +B.buildZExtOrTrunc(DstReg, BFEDst); + MI.eraseFromParent(); return true; } diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll index 07fcb02d98649..d2b600b04f9fc 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/fsh
[llvm-branch-commits] [llvm] [AMDGPU][GlobalISel] Combine (sext (trunc (sext_in_reg x))) (PR #131312)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/131312 >From 4751d38d86886106c00e9140bf0bb3a3459950cb Mon Sep 17 00:00:00 2001 From: pvanhout Date: Fri, 14 Mar 2025 10:34:51 +0100 Subject: [PATCH] [AMDGPU][GlobalISel] Combine (sext (trunc (sext_in_reg x))) This is a bit of an akward pattern that can come up as a result of legalization and then widening of i16 operations to i32 in RegBankSelect on AMDGPU. This quick combine avoids redundant patterns like ``` s_sext_i32_i8 s0, s0 s_sext_i32_i16 s0, s0 s_ashr_i32 s0, s0, s1 ``` With this the second sext is removed as it's redundant. --- .../include/llvm/Target/GlobalISel/Combine.td | 12 ++- .../combine-sext-trunc-sextinreg.mir | 86 +++ .../CodeGen/AMDGPU/GlobalISel/llvm.abs.ll | 78 - 3 files changed, 113 insertions(+), 63 deletions(-) create mode 100644 llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir diff --git a/llvm/include/llvm/Target/GlobalISel/Combine.td b/llvm/include/llvm/Target/GlobalISel/Combine.td index 3590ab221ad44..9727b86b4be8b 100644 --- a/llvm/include/llvm/Target/GlobalISel/Combine.td +++ b/llvm/include/llvm/Target/GlobalISel/Combine.td @@ -258,6 +258,14 @@ def sext_trunc_sextload : GICombineRule< [{ return Helper.matchSextTruncSextLoad(*${d}); }]), (apply [{ Helper.applySextTruncSextLoad(*${d}); }])>; +def sext_trunc_sextinreg : GICombineRule< + (defs root:$dst), + (match (G_SEXT_INREG $sir, $src, $width), + (G_TRUNC $trunc, $sir), + (G_SEXT $dst, $trunc), + [{ return (MRI.getType(${trunc}.getReg()).getScalarSizeInBits() >= ${width}.getImm()); }]), + (apply (GIReplaceReg $dst, $sir))>; + def sext_inreg_of_load_matchdata : GIDefMatchData<"std::tuple">; def sext_inreg_of_load : GICombineRule< (defs root:$root, sext_inreg_of_load_matchdata:$matchinfo), @@ -1896,7 +1904,9 @@ def cast_of_cast_combines: GICombineGroup<[ sext_of_anyext, anyext_of_anyext, anyext_of_zext, - anyext_of_sext + anyext_of_sext, + + sext_trunc_sextinreg ]>; def cast_combines: GICombineGroup<[ diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir new file mode 100644 index 0..d41e5b172efc2 --- /dev/null +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir @@ -0,0 +1,86 @@ +# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py +# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1030 -run-pass=amdgpu-postlegalizer-combiner -verify-machineinstrs %s -o - | FileCheck %s +# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1030 -run-pass=amdgpu-regbank-combiner -verify-machineinstrs %s -o - | FileCheck %s + +--- +name: trunc_s16_inreg_8 +tracksRegLiveness: true +body: | + bb.0: +liveins: $vgpr0 +; CHECK-LABEL: name: trunc_s16_inreg_8 +; CHECK: liveins: $vgpr0 +; CHECK-NEXT: {{ $}} +; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0 +; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 8 +; CHECK-NEXT: $vgpr0 = COPY %inreg(s32) +%copy:_(s32) = COPY $vgpr0 +%inreg:_(s32) = G_SEXT_INREG %copy, 8 +%trunc:_(s16) = G_TRUNC %inreg +%sext:_(s32) = G_SEXT %trunc +$vgpr0 = COPY %sext +... + +--- +name: trunc_s16_inreg_16 +tracksRegLiveness: true +body: | + bb.0: +liveins: $vgpr0 +; CHECK-LABEL: name: trunc_s16_inreg_16 +; CHECK: liveins: $vgpr0 +; CHECK-NEXT: {{ $}} +; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0 +; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 16 +; CHECK-NEXT: $vgpr0 = COPY %inreg(s32) +%copy:_(s32) = COPY $vgpr0 +%inreg:_(s32) = G_SEXT_INREG %copy, 16 +%trunc:_(s16) = G_TRUNC %inreg +%sext:_(s32) = G_SEXT %trunc +$vgpr0 = COPY %sext +... + +--- +name: trunc_s8_inreg_16 +tracksRegLiveness: true +body: | + bb.0: +liveins: $vgpr0 +; CHECK-LABEL: name: trunc_s8_inreg_16 +; CHECK: liveins: $vgpr0 +; CHECK-NEXT: {{ $}} +; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0 +; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 16 +; CHECK-NEXT: %trunc:_(s8) = G_TRUNC %inreg(s32) +; CHECK-NEXT: %sext:_(s32) = G_SEXT %trunc(s8) +; CHECK-NEXT: $vgpr0 = COPY %sext(s32) +%copy:_(s32) = COPY $vgpr0 +%inreg:_(s32) = G_SEXT_INREG %copy, 16 +%trunc:_(s8) = G_TRUNC %inreg +%sext:_(s32) = G_SEXT %trunc +$vgpr0 = COPY %sext +... + +# TODO?: We could handle this by inserting a trunc, but I'm not sure how useful that'd be. +--- +name: mismatching_types +tracksRegLiveness: true +body: | + bb.0: +liveins: $vgpr0 +; CHECK-LABEL: name: mismatching_types +; CHECK: liveins: $vgpr0 +; CHECK-NEXT: {{ $}} +; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0 +; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 8 +; CHECK-NEXT: %trunc:_(s8) = G_TRUNC %inreg(s32) +; CHECK-NEXT: %sext:_(s16
[llvm-branch-commits] [llvm] [AMDGPU][Legalizer] Widen i16 G_SEXT_INREG (PR #131308)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/131308 >From be5c76eeb981e94017cc2a504f35079d47d7ce5c Mon Sep 17 00:00:00 2001 From: pvanhout Date: Wed, 5 Mar 2025 13:41:04 +0100 Subject: [PATCH 1/2] [AMDGPU][Legalizer] Widen i16 G_SEXT_INREG It's better to widen them to avoid it being lowered into a G_ASHR + G_SHL. With this change we just extend to i32 then trunc the result. --- .../lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | 3 +- llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll | 7 +- .../AMDGPU/GlobalISel/legalize-abs.mir| 8 +- .../AMDGPU/GlobalISel/legalize-ashr.mir | 20 +-- .../AMDGPU/GlobalISel/legalize-sext-inreg.mir | 155 +++--- .../AMDGPU/GlobalISel/legalize-sext.mir | 101 ++-- .../AMDGPU/GlobalISel/legalize-smax.mir | 33 +++- .../AMDGPU/GlobalISel/legalize-smin.mir | 33 +++- .../AMDGPU/GlobalISel/legalize-smulh.mir | 132 +++ .../CodeGen/AMDGPU/GlobalISel/llvm.abs.ll | 45 ++--- .../CodeGen/AMDGPU/GlobalISel/sext_inreg.ll | 130 ++- 11 files changed, 299 insertions(+), 368 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp index b3a8183beeacf..6e611ebb4b625 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp @@ -2009,7 +2009,8 @@ AMDGPULegalizerInfo::AMDGPULegalizerInfo(const GCNSubtarget &ST_, // S64 is only legal on SALU, and needs to be broken into 32-bit elements in // RegBankSelect. auto &SextInReg = getActionDefinitionsBuilder(G_SEXT_INREG) -.legalFor({{S32}, {S64}}); +.legalFor({{S32}, {S64}}) +.widenScalarIf(typeIs(0, S16), widenScalarOrEltToNextPow2(0, 32)); if (ST.hasVOP3PInsts()) { SextInReg.lowerFor({{V2S16}}) diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll index 493e8cef63890..f81d7f1c300b8 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll @@ -17,8 +17,7 @@ define i8 @v_ashr_i8(i8 %value, i8 %amount) { ; GFX8-LABEL: v_ashr_i8: ; GFX8: ; %bb.0: ; GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX8-NEXT:v_lshlrev_b16_e32 v0, 8, v0 -; GFX8-NEXT:v_ashrrev_i16_sdwa v0, v1, sext(v0) dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:BYTE_1 +; GFX8-NEXT:v_ashrrev_i16_sdwa v0, v1, sext(v0) dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:BYTE_0 ; GFX8-NEXT:s_setpc_b64 s[30:31] ; ; GFX9-LABEL: v_ashr_i8: @@ -49,8 +48,8 @@ define i8 @v_ashr_i8_7(i8 %value) { ; GFX8-LABEL: v_ashr_i8_7: ; GFX8: ; %bb.0: ; GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX8-NEXT:v_lshlrev_b16_e32 v0, 8, v0 -; GFX8-NEXT:v_ashrrev_i16_e32 v0, 15, v0 +; GFX8-NEXT:v_mov_b32_e32 v1, 7 +; GFX8-NEXT:v_ashrrev_i16_sdwa v0, v1, sext(v0) dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:BYTE_0 ; GFX8-NEXT:s_setpc_b64 s[30:31] ; ; GFX9-LABEL: v_ashr_i8_7: diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-abs.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-abs.mir index a9fe80eb47e76..2b911b2dce697 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-abs.mir +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-abs.mir @@ -144,11 +144,9 @@ body: | ; VI: liveins: $vgpr0 ; VI-NEXT: {{ $}} ; VI-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0 -; VI-NEXT: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY]](s32) -; VI-NEXT: [[C:%[0-9]+]]:_(s16) = G_CONSTANT i16 8 -; VI-NEXT: [[SHL:%[0-9]+]]:_(s16) = G_SHL [[TRUNC]], [[C]](s16) -; VI-NEXT: [[ASHR:%[0-9]+]]:_(s16) = G_ASHR [[SHL]], [[C]](s16) -; VI-NEXT: [[ABS:%[0-9]+]]:_(s16) = G_ABS [[ASHR]] +; VI-NEXT: [[SEXT_INREG:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY]], 8 +; VI-NEXT: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[SEXT_INREG]](s32) +; VI-NEXT: [[ABS:%[0-9]+]]:_(s16) = G_ABS [[TRUNC]] ; VI-NEXT: [[ANYEXT:%[0-9]+]]:_(s32) = G_ANYEXT [[ABS]](s16) ; VI-NEXT: $vgpr0 = COPY [[ANYEXT]](s32) ; diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ashr.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ashr.mir index f4aaab745e03b..53905a2f49dd0 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ashr.mir +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ashr.mir @@ -319,12 +319,10 @@ body: | ; VI-NEXT: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY1]](s32) ; VI-NEXT: [[C:%[0-9]+]]:_(s16) = G_CONSTANT i16 255 ; VI-NEXT: [[AND:%[0-9]+]]:_(s16) = G_AND [[TRUNC]], [[C]] -; VI-NEXT: [[TRUNC1:%[0-9]+]]:_(s16) = G_TRUNC [[COPY]](s32) -; VI-NEXT: [[C1:%[0-9]+]]:_(s16) = G_CONSTANT i16 8 -; VI-NEXT: [[SHL:%[0-9]+]]:_(s16) = G_SHL [[TRUNC1]], [[C1]](s16) -; VI-NEXT: [[ASHR:%[0-9]+]]:_(s16) = G_ASHR [[SHL]], [[C1]](s16) -; VI-NEXT: [[ASHR1:%[0-9]+]]:_(s16) = G_ASHR [[ASHR]], [
[llvm-branch-commits] [llvm] [AMDGPU][SIFoldOperands] Fold some redundant bitmasks (PR #131311)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/131311 >From 520757cf40d285b58eb0539840be2bf282c0a0af Mon Sep 17 00:00:00 2001 From: pvanhout Date: Fri, 14 Mar 2025 10:05:19 +0100 Subject: [PATCH 1/2] [AMDGPU][SIFoldOperands] Fold some redundant bitmasks Instructions like shifts only read some of the bits of the shift amount operand, between 4 and 6 bits. If the source operand is being masked, we can just ignore the mask. Effects are minimal right now but this will kick in more once we disable uniform i16 operation widening in CGP. With that disabled, we get more i16 shift amounts that are zext'd and without this we'd end up with more `s_and_b32 s1, s1, 0x` in the output. Ideally ISel should handle this but it's proving difficult to get the patterns right, and after a few hours of trying I just decided to go with this as it's simple enough and it "just works" for this purpose. --- llvm/lib/Target/AMDGPU/SIFoldOperands.cpp | 97 +++- llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll | 8 +- llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll | 201 - llvm/test/CodeGen/AMDGPU/GlobalISel/fshr.ll | 207 -- llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll | 8 +- llvm/test/CodeGen/AMDGPU/GlobalISel/shl.ll| 6 +- llvm/test/CodeGen/AMDGPU/constrained-shift.ll | 1 - llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir | 26 +-- 8 files changed, 303 insertions(+), 251 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp index 91df516b80857..a279a0a973e75 100644 --- a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp +++ b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp @@ -131,6 +131,7 @@ class SIFoldOperandsImpl { std::optional getImmOrMaterializedImm(MachineOperand &Op) const; bool tryConstantFoldOp(MachineInstr *MI) const; bool tryFoldCndMask(MachineInstr &MI) const; + bool tryFoldBitMask(MachineInstr &MI) const; bool tryFoldZeroHighBits(MachineInstr &MI) const; bool foldInstOperand(MachineInstr &MI, MachineOperand &OpToFold) const; @@ -1447,6 +1448,99 @@ bool SIFoldOperandsImpl::tryFoldCndMask(MachineInstr &MI) const { return true; } +static bool getBitsReadByInst(unsigned Opc, unsigned &NumBitsRead, + unsigned &OpIdx) { + switch (Opc) { + case AMDGPU::V_ASHR_I32_e64: + case AMDGPU::V_ASHR_I32_e32: + case AMDGPU::V_LSHR_B32_e64: + case AMDGPU::V_LSHR_B32_e32: + case AMDGPU::V_LSHL_B32_e64: + case AMDGPU::V_LSHL_B32_e32: + case AMDGPU::S_LSHL_B32: + case AMDGPU::S_LSHR_B32: + case AMDGPU::S_ASHR_I32: +NumBitsRead = 5; +OpIdx = 2; +return true; + case AMDGPU::S_LSHL_B64: + case AMDGPU::S_LSHR_B64: + case AMDGPU::S_ASHR_I64: +NumBitsRead = 6; +OpIdx = 2; +return true; + case AMDGPU::V_LSHLREV_B32_e64: + case AMDGPU::V_LSHLREV_B32_e32: + case AMDGPU::V_LSHRREV_B32_e64: + case AMDGPU::V_LSHRREV_B32_e32: + case AMDGPU::V_ASHRREV_I32_e64: + case AMDGPU::V_ASHRREV_I32_e32: +NumBitsRead = 5; +OpIdx = 1; +return true; + default: +return false; + } +} + +static bool isAndBitMaskRedundant(MachineInstr &MI, unsigned BitsNeeded, +unsigned &SrcOp) { + MachineOperand *RegOp = &MI.getOperand(1); + MachineOperand *ImmOp = &MI.getOperand(2); + + if (!RegOp->isReg() || !ImmOp->isImm()) { +if (ImmOp->isReg() && RegOp->isImm()) + std::swap(RegOp, ImmOp); +else + return false; + } + + SrcOp = RegOp->getOperandNo(); + + const unsigned BitMask = maskTrailingOnes(BitsNeeded); + return (ImmOp->getImm() & BitMask) == BitMask; +} + +bool SIFoldOperandsImpl::tryFoldBitMask(MachineInstr &MI) const { + unsigned NumBitsRead = 0; + unsigned OpIdx = 0; + if (!getBitsReadByInst(MI.getOpcode(), NumBitsRead, OpIdx)) +return false; + + MachineOperand &Op = MI.getOperand(OpIdx); + if (!Op.isReg()) +return false; + + Register OpReg = Op.getReg(); + if (OpReg.isPhysical()) +return false; + + MachineInstr *OpDef = MRI->getVRegDef(OpReg); + if (!OpDef) +return false ; + + LLVM_DEBUG(dbgs() << "tryFoldBitMask: " << MI << "\tOpIdx:" << OpIdx << ", NumBitsRead:" << NumBitsRead << "\n"); + + unsigned ReplaceWith; + switch (OpDef->getOpcode()) { + // TODO: add more opcodes? + case AMDGPU::S_AND_B32: + case AMDGPU::V_AND_B32_e32: + case AMDGPU::V_AND_B32_e64: +if (!isAndBitMaskRedundant(*OpDef, NumBitsRead, ReplaceWith)) + return false; +break; + default: +return false; + } + + MachineOperand &ReplaceWithOp = OpDef->getOperand(ReplaceWith); + LLVM_DEBUG(dbgs() << "\treplacing operand with:" << ReplaceWithOp << "\n"); + + MI.getOperand(OpIdx).setReg(ReplaceWithOp.getReg()); + return true; +} + bool SIFoldOperandsImpl::tryFoldZeroHighBits(MachineInstr &MI) const { if (MI.getOpcode() != AMDGPU::V_AND_B32_e64 && MI.getOpcode() != AMDGPU::V_AND_B32_e32) @@ -1458,7 +1552,7 @@ bool SIFoldOperands
[llvm-branch-commits] [llvm] [AMDGPU][RegBankCombiner] Add cast_of_cast and constant_fold_cast combines (PR #131307)
Pierre-vh wrote: ### Merge activity * **Mar 17, 4:51 AM EDT**: A user started a stack merge that includes this pull request via [Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/131307). https://github.com/llvm/llvm-project/pull/131307 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU][RegBankInfo] Promote scalar i16 and/or/xor to i32 (PR #131306)
Pierre-vh wrote: ### Merge activity * **Mar 17, 4:51 AM EDT**: A user started a stack merge that includes this pull request via [Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/131306). https://github.com/llvm/llvm-project/pull/131306 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU][Legalizer] Widen i16 G_SEXT_INREG (PR #131308)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/131308 >From be5c76eeb981e94017cc2a504f35079d47d7ce5c Mon Sep 17 00:00:00 2001 From: pvanhout Date: Wed, 5 Mar 2025 13:41:04 +0100 Subject: [PATCH 1/2] [AMDGPU][Legalizer] Widen i16 G_SEXT_INREG It's better to widen them to avoid it being lowered into a G_ASHR + G_SHL. With this change we just extend to i32 then trunc the result. --- .../lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | 3 +- llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll | 7 +- .../AMDGPU/GlobalISel/legalize-abs.mir| 8 +- .../AMDGPU/GlobalISel/legalize-ashr.mir | 20 +-- .../AMDGPU/GlobalISel/legalize-sext-inreg.mir | 155 +++--- .../AMDGPU/GlobalISel/legalize-sext.mir | 101 ++-- .../AMDGPU/GlobalISel/legalize-smax.mir | 33 +++- .../AMDGPU/GlobalISel/legalize-smin.mir | 33 +++- .../AMDGPU/GlobalISel/legalize-smulh.mir | 132 +++ .../CodeGen/AMDGPU/GlobalISel/llvm.abs.ll | 45 ++--- .../CodeGen/AMDGPU/GlobalISel/sext_inreg.ll | 130 ++- 11 files changed, 299 insertions(+), 368 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp index b3a8183beeacf..6e611ebb4b625 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp @@ -2009,7 +2009,8 @@ AMDGPULegalizerInfo::AMDGPULegalizerInfo(const GCNSubtarget &ST_, // S64 is only legal on SALU, and needs to be broken into 32-bit elements in // RegBankSelect. auto &SextInReg = getActionDefinitionsBuilder(G_SEXT_INREG) -.legalFor({{S32}, {S64}}); +.legalFor({{S32}, {S64}}) +.widenScalarIf(typeIs(0, S16), widenScalarOrEltToNextPow2(0, 32)); if (ST.hasVOP3PInsts()) { SextInReg.lowerFor({{V2S16}}) diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll index 493e8cef63890..f81d7f1c300b8 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll @@ -17,8 +17,7 @@ define i8 @v_ashr_i8(i8 %value, i8 %amount) { ; GFX8-LABEL: v_ashr_i8: ; GFX8: ; %bb.0: ; GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX8-NEXT:v_lshlrev_b16_e32 v0, 8, v0 -; GFX8-NEXT:v_ashrrev_i16_sdwa v0, v1, sext(v0) dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:BYTE_1 +; GFX8-NEXT:v_ashrrev_i16_sdwa v0, v1, sext(v0) dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:BYTE_0 ; GFX8-NEXT:s_setpc_b64 s[30:31] ; ; GFX9-LABEL: v_ashr_i8: @@ -49,8 +48,8 @@ define i8 @v_ashr_i8_7(i8 %value) { ; GFX8-LABEL: v_ashr_i8_7: ; GFX8: ; %bb.0: ; GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX8-NEXT:v_lshlrev_b16_e32 v0, 8, v0 -; GFX8-NEXT:v_ashrrev_i16_e32 v0, 15, v0 +; GFX8-NEXT:v_mov_b32_e32 v1, 7 +; GFX8-NEXT:v_ashrrev_i16_sdwa v0, v1, sext(v0) dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:BYTE_0 ; GFX8-NEXT:s_setpc_b64 s[30:31] ; ; GFX9-LABEL: v_ashr_i8_7: diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-abs.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-abs.mir index a9fe80eb47e76..2b911b2dce697 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-abs.mir +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-abs.mir @@ -144,11 +144,9 @@ body: | ; VI: liveins: $vgpr0 ; VI-NEXT: {{ $}} ; VI-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0 -; VI-NEXT: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY]](s32) -; VI-NEXT: [[C:%[0-9]+]]:_(s16) = G_CONSTANT i16 8 -; VI-NEXT: [[SHL:%[0-9]+]]:_(s16) = G_SHL [[TRUNC]], [[C]](s16) -; VI-NEXT: [[ASHR:%[0-9]+]]:_(s16) = G_ASHR [[SHL]], [[C]](s16) -; VI-NEXT: [[ABS:%[0-9]+]]:_(s16) = G_ABS [[ASHR]] +; VI-NEXT: [[SEXT_INREG:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY]], 8 +; VI-NEXT: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[SEXT_INREG]](s32) +; VI-NEXT: [[ABS:%[0-9]+]]:_(s16) = G_ABS [[TRUNC]] ; VI-NEXT: [[ANYEXT:%[0-9]+]]:_(s32) = G_ANYEXT [[ABS]](s16) ; VI-NEXT: $vgpr0 = COPY [[ANYEXT]](s32) ; diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ashr.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ashr.mir index f4aaab745e03b..53905a2f49dd0 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ashr.mir +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ashr.mir @@ -319,12 +319,10 @@ body: | ; VI-NEXT: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY1]](s32) ; VI-NEXT: [[C:%[0-9]+]]:_(s16) = G_CONSTANT i16 255 ; VI-NEXT: [[AND:%[0-9]+]]:_(s16) = G_AND [[TRUNC]], [[C]] -; VI-NEXT: [[TRUNC1:%[0-9]+]]:_(s16) = G_TRUNC [[COPY]](s32) -; VI-NEXT: [[C1:%[0-9]+]]:_(s16) = G_CONSTANT i16 8 -; VI-NEXT: [[SHL:%[0-9]+]]:_(s16) = G_SHL [[TRUNC1]], [[C1]](s16) -; VI-NEXT: [[ASHR:%[0-9]+]]:_(s16) = G_ASHR [[SHL]], [[C1]](s16) -; VI-NEXT: [[ASHR1:%[0-9]+]]:_(s16) = G_ASHR [[ASHR]], [
[llvm-branch-commits] [llvm] [AMDGPU][RegBankInfo] Promote scalar i16 and/or/xor to i32 (PR #131306)
@@ -2432,6 +2433,29 @@ void AMDGPURegisterBankInfo::applyMappingImpl( return; } +// 16-bit operations are VALU only, but can be promoted to 32-bit SALU. +// Packed 16-bit operations need to be scalarized and promoted. Pierre-vh wrote: It was copy pasted from below and I forgot to remove it, it's irrelevant here https://github.com/llvm/llvm-project/pull/131306 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU][GlobalISel] Combine (sext (trunc (sext_in_reg x))) (PR #131312)
https://github.com/Pierre-vh closed https://github.com/llvm/llvm-project/pull/131312 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU][SIFoldOperands] Fold some redundant bitmasks (PR #131311)
https://github.com/Pierre-vh closed https://github.com/llvm/llvm-project/pull/131311 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU][GlobalISel] Allow forming s16 U/SBFX pre-regbankselect (PR #131309)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/131309 >From d65db023bfae0c9a5eaeb5bebac39d75723c27d6 Mon Sep 17 00:00:00 2001 From: pvanhout Date: Wed, 12 Mar 2025 09:43:15 +0100 Subject: [PATCH] [AMDGPU][GlobalISel] Allow forming s16 U/SBFX pre-regbankselect Make s16 G_U/SBFX legal and widen them in RegBankSelect. This allows the set of BFX formation combines to work on s16 types. --- .../lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | 9 +- .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp | 33 +- llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll | 645 -- llvm/test/CodeGen/AMDGPU/GlobalISel/fshr.ll | 380 --- .../AMDGPU/GlobalISel/legalize-sbfx.mir | 26 +- .../AMDGPU/GlobalISel/legalize-ubfx.mir | 27 +- llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll | 27 +- 7 files changed, 503 insertions(+), 644 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp index cfb5c3b3006f0..ab900157d2095 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp @@ -2069,10 +2069,13 @@ AMDGPULegalizerInfo::AMDGPULegalizerInfo(const GCNSubtarget &ST_, .minScalar(0, S32) .lower(); + // Only {S32, S32} or {S32, S64} should ever reach codegen. + // We allow S/UBFX for S16 so the combiner can form them before + // RegBankSelect, and RegBankSelect will then legalize them correctly. getActionDefinitionsBuilder({G_SBFX, G_UBFX}) - .legalFor({{S32, S32}, {S64, S32}}) - .clampScalar(1, S32, S32) - .clampScalar(0, S32, S64) + .legalFor({{S16, S16}, {S32, S32}, {S64, S32}}) + .clampScalar(1, S16, S32) + .clampScalar(0, S16, S64) .widenScalarToNextPow2(0) .scalarize(0); diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp index b46fc7d9c752a..1c9d67826186f 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp @@ -1485,7 +1485,9 @@ bool AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B, Register DstReg = MI.getOperand(0).getReg(); LLT Ty = MRI.getType(DstReg); + const LLT S64 = LLT::scalar(64); const LLT S32 = LLT::scalar(32); + const LLT S16 = LLT::scalar(16); unsigned FirstOpnd = isa(MI) ? 2 : 1; Register SrcReg = MI.getOperand(FirstOpnd).getReg(); @@ -1495,6 +1497,18 @@ bool AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B, const RegisterBank *DstBank = OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank; if (DstBank == &AMDGPU::VGPRRegBank) { +if (Ty == S16) { + ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::VGPRRegBank); + B.setInsertPt(B.getMBB(), MI); + LegalizerHelper Helper(B.getMF(), ApplyBank, B); + + Helper.widenScalarDst(MI, S32); + Helper.widenScalarSrc(MI, S32, 1, AMDGPU::G_ANYEXT); + Helper.widenScalarSrc(MI, S32, 2, AMDGPU::G_ZEXT); + Helper.widenScalarSrc(MI, S32, 3, AMDGPU::G_ZEXT); + return true; +} + if (Ty == S32) return true; @@ -1554,6 +1568,11 @@ bool AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B, ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::SGPRRegBank); + if (Ty == S16) { +OffsetReg = B.buildAnyExtOrTrunc(S32, OffsetReg).getReg(0); +WidthReg = B.buildAnyExtOrTrunc(S32, WidthReg).getReg(0); + } + // Ensure the high bits are clear to insert the offset. auto OffsetMask = B.buildConstant(S32, maskTrailingOnes(6)); auto ClampOffset = B.buildAnd(S32, OffsetReg, OffsetMask); @@ -1568,13 +1587,21 @@ bool AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B, // TODO: It might be worth using a pseudo here to avoid scc clobber and // register class constraints. - unsigned Opc = Ty == S32 ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32) : - (Signed ? AMDGPU::S_BFE_I64 : AMDGPU::S_BFE_U64); + unsigned Opc = (Ty != S64) ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32) + : (Signed ? AMDGPU::S_BFE_I64 : AMDGPU::S_BFE_U64); - auto MIB = B.buildInstr(Opc, {DstReg}, {SrcReg, MergedInputs}); + Register BFEDst = DstReg; + if (Ty == S16) { +BFEDst = MRI.createGenericVirtualRegister(S32); +MRI.setRegBank(BFEDst, AMDGPU::SGPRRegBank); + } + auto MIB = B.buildInstr(Opc, {BFEDst}, {SrcReg, MergedInputs}); if (!constrainSelectedInstRegOperands(*MIB, *TII, *TRI, *this)) llvm_unreachable("failed to constrain BFE"); + if (BFEDst != DstReg) +B.buildZExtOrTrunc(DstReg, BFEDst); + MI.eraseFromParent(); return true; } diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll index 07fcb02d98649..d2b600b04f9fc 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/fsh
[llvm-branch-commits] [llvm] [AMDGPU][SIFoldOperands] Fold some redundant bitmasks (PR #131311)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/131311 >From f3fddad8dca1e8ed327d7cc7cfee7a465032dcc4 Mon Sep 17 00:00:00 2001 From: pvanhout Date: Fri, 14 Mar 2025 10:05:19 +0100 Subject: [PATCH 1/2] [AMDGPU][SIFoldOperands] Fold some redundant bitmasks Instructions like shifts only read some of the bits of the shift amount operand, between 4 and 6 bits. If the source operand is being masked, we can just ignore the mask. Effects are minimal right now but this will kick in more once we disable uniform i16 operation widening in CGP. With that disabled, we get more i16 shift amounts that are zext'd and without this we'd end up with more `s_and_b32 s1, s1, 0x` in the output. Ideally ISel should handle this but it's proving difficult to get the patterns right, and after a few hours of trying I just decided to go with this as it's simple enough and it "just works" for this purpose. --- llvm/lib/Target/AMDGPU/SIFoldOperands.cpp | 97 +++- llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll | 8 +- llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll | 201 - llvm/test/CodeGen/AMDGPU/GlobalISel/fshr.ll | 207 -- llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll | 8 +- llvm/test/CodeGen/AMDGPU/GlobalISel/shl.ll| 6 +- llvm/test/CodeGen/AMDGPU/constrained-shift.ll | 1 - llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir | 26 +-- 8 files changed, 303 insertions(+), 251 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp index cc15dd7cb495c..5f666e10b5cb7 100644 --- a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp +++ b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp @@ -131,6 +131,7 @@ class SIFoldOperandsImpl { std::optional getImmOrMaterializedImm(MachineOperand &Op) const; bool tryConstantFoldOp(MachineInstr *MI) const; bool tryFoldCndMask(MachineInstr &MI) const; + bool tryFoldBitMask(MachineInstr &MI) const; bool tryFoldZeroHighBits(MachineInstr &MI) const; bool foldInstOperand(MachineInstr &MI, MachineOperand &OpToFold) const; @@ -1447,6 +1448,99 @@ bool SIFoldOperandsImpl::tryFoldCndMask(MachineInstr &MI) const { return true; } +static bool getBitsReadByInst(unsigned Opc, unsigned &NumBitsRead, + unsigned &OpIdx) { + switch (Opc) { + case AMDGPU::V_ASHR_I32_e64: + case AMDGPU::V_ASHR_I32_e32: + case AMDGPU::V_LSHR_B32_e64: + case AMDGPU::V_LSHR_B32_e32: + case AMDGPU::V_LSHL_B32_e64: + case AMDGPU::V_LSHL_B32_e32: + case AMDGPU::S_LSHL_B32: + case AMDGPU::S_LSHR_B32: + case AMDGPU::S_ASHR_I32: +NumBitsRead = 5; +OpIdx = 2; +return true; + case AMDGPU::S_LSHL_B64: + case AMDGPU::S_LSHR_B64: + case AMDGPU::S_ASHR_I64: +NumBitsRead = 6; +OpIdx = 2; +return true; + case AMDGPU::V_LSHLREV_B32_e64: + case AMDGPU::V_LSHLREV_B32_e32: + case AMDGPU::V_LSHRREV_B32_e64: + case AMDGPU::V_LSHRREV_B32_e32: + case AMDGPU::V_ASHRREV_I32_e64: + case AMDGPU::V_ASHRREV_I32_e32: +NumBitsRead = 5; +OpIdx = 1; +return true; + default: +return false; + } +} + +static bool isAndBitMaskRedundant(MachineInstr &MI, unsigned BitsNeeded, +unsigned &SrcOp) { + MachineOperand *RegOp = &MI.getOperand(1); + MachineOperand *ImmOp = &MI.getOperand(2); + + if (!RegOp->isReg() || !ImmOp->isImm()) { +if (ImmOp->isReg() && RegOp->isImm()) + std::swap(RegOp, ImmOp); +else + return false; + } + + SrcOp = RegOp->getOperandNo(); + + const unsigned BitMask = maskTrailingOnes(BitsNeeded); + return (ImmOp->getImm() & BitMask) == BitMask; +} + +bool SIFoldOperandsImpl::tryFoldBitMask(MachineInstr &MI) const { + unsigned NumBitsRead = 0; + unsigned OpIdx = 0; + if (!getBitsReadByInst(MI.getOpcode(), NumBitsRead, OpIdx)) +return false; + + MachineOperand &Op = MI.getOperand(OpIdx); + if (!Op.isReg()) +return false; + + Register OpReg = Op.getReg(); + if (OpReg.isPhysical()) +return false; + + MachineInstr *OpDef = MRI->getVRegDef(OpReg); + if (!OpDef) +return false ; + + LLVM_DEBUG(dbgs() << "tryFoldBitMask: " << MI << "\tOpIdx:" << OpIdx << ", NumBitsRead:" << NumBitsRead << "\n"); + + unsigned ReplaceWith; + switch (OpDef->getOpcode()) { + // TODO: add more opcodes? + case AMDGPU::S_AND_B32: + case AMDGPU::V_AND_B32_e32: + case AMDGPU::V_AND_B32_e64: +if (!isAndBitMaskRedundant(*OpDef, NumBitsRead, ReplaceWith)) + return false; +break; + default: +return false; + } + + MachineOperand &ReplaceWithOp = OpDef->getOperand(ReplaceWith); + LLVM_DEBUG(dbgs() << "\treplacing operand with:" << ReplaceWithOp << "\n"); + + MI.getOperand(OpIdx).setReg(ReplaceWithOp.getReg()); + return true; +} + bool SIFoldOperandsImpl::tryFoldZeroHighBits(MachineInstr &MI) const { if (MI.getOpcode() != AMDGPU::V_AND_B32_e64 && MI.getOpcode() != AMDGPU::V_AND_B32_e32) @@ -1458,7 +1552,7 @@ bool SIFoldOperands
[llvm-branch-commits] [llvm] [AMDGPU] Precommit si-fold-bitmask.mir (PR #131310)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/131310 >From 65d5012c30366cc713b793a30ab5119ddf8a77af Mon Sep 17 00:00:00 2001 From: pvanhout Date: Fri, 14 Mar 2025 10:00:21 +0100 Subject: [PATCH] [AMDGPU] Precommit si-fold-bitmask.mir --- llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir | 429 ++ 1 file changed, 429 insertions(+) create mode 100644 llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir diff --git a/llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir b/llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir new file mode 100644 index 0..1edf970591179 --- /dev/null +++ b/llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir @@ -0,0 +1,429 @@ +# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py +# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -run-pass=si-fold-operands -verify-machineinstrs -o - %s | FileCheck --check-prefix=GCN %s + +# Test supported instructions + +--- +name: v_ashr_i32_e64__v_and_b32_e32 +tracksRegLiveness: true +body: | + bb.0: +liveins: $vgpr0, $vgpr1 + +; GCN-LABEL: name: v_ashr_i32_e64__v_and_b32_e32 +; GCN: liveins: $vgpr0, $vgpr1 +; GCN-NEXT: {{ $}} +; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0 +; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1 +; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit $exec +; GCN-NEXT: %ret:vgpr_32 = V_ASHR_I32_e64 %src, %shiftmask, implicit $exec +; GCN-NEXT: $vgpr0 = COPY %ret +%src:vgpr_32 = COPY $vgpr0 +%shift:vgpr_32 = COPY $vgpr1 +%shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit $exec +%ret:vgpr_32 = V_ASHR_I32_e64 %src, %shiftmask, implicit $exec +$vgpr0 = COPY %ret +... + +--- +name: v_lshr_b32_e64__v_and_b32_e32 +tracksRegLiveness: true +body: | + bb.0: +liveins: $vgpr0, $vgpr1 + +; GCN-LABEL: name: v_lshr_b32_e64__v_and_b32_e32 +; GCN: liveins: $vgpr0, $vgpr1 +; GCN-NEXT: {{ $}} +; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0 +; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1 +; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit $exec +; GCN-NEXT: %ret:vgpr_32 = V_LSHR_B32_e64 %src, %shiftmask, implicit $exec +; GCN-NEXT: $vgpr0 = COPY %ret +%src:vgpr_32 = COPY $vgpr0 +%shift:vgpr_32 = COPY $vgpr1 +%shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit $exec +%ret:vgpr_32 = V_LSHR_B32_e64 %src, %shiftmask, implicit $exec +$vgpr0 = COPY %ret +... + +--- +name: v_lshr_b32_e32__v_and_b32_e32 +tracksRegLiveness: true +body: | + bb.0: +liveins: $vgpr0, $vgpr1 + +; GCN-LABEL: name: v_lshr_b32_e32__v_and_b32_e32 +; GCN: liveins: $vgpr0, $vgpr1 +; GCN-NEXT: {{ $}} +; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0 +; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1 +; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit $exec +; GCN-NEXT: %ret:vgpr_32 = V_LSHR_B32_e32 %src, %shiftmask, implicit $exec +; GCN-NEXT: $vgpr0 = COPY %ret +%src:vgpr_32 = COPY $vgpr0 +%shift:vgpr_32 = COPY $vgpr1 +%shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit $exec +%ret:vgpr_32 = V_LSHR_B32_e32 %src, %shiftmask, implicit $exec +$vgpr0 = COPY %ret +... + +--- +name: v_lshl_b32_e64__v_and_b32_e32 +tracksRegLiveness: true +body: | + bb.0: +liveins: $vgpr0, $vgpr1 + +; GCN-LABEL: name: v_lshl_b32_e64__v_and_b32_e32 +; GCN: liveins: $vgpr0, $vgpr1 +; GCN-NEXT: {{ $}} +; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0 +; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1 +; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit $exec +; GCN-NEXT: %ret:vgpr_32 = V_LSHL_B32_e64 %src, %shiftmask, implicit $exec +; GCN-NEXT: $vgpr0 = COPY %ret +%src:vgpr_32 = COPY $vgpr0 +%shift:vgpr_32 = COPY $vgpr1 +%shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit $exec +%ret:vgpr_32 = V_LSHL_B32_e64 %src, %shiftmask, implicit $exec +$vgpr0 = COPY %ret +... + +--- +name: s_lshl_b32__s_and_b32 +tracksRegLiveness: true +body: | + bb.0: +liveins: $sgpr0, $sgpr1 + +; GCN-LABEL: name: s_lshl_b32__s_and_b32 +; GCN: liveins: $sgpr0, $sgpr1 +; GCN-NEXT: {{ $}} +; GCN-NEXT: %src:sgpr_32 = COPY $sgpr0 +; GCN-NEXT: %shift:sgpr_32 = COPY $sgpr1 +; GCN-NEXT: %shiftmask:sgpr_32 = S_AND_B32 65535, %shift, implicit-def $scc +; GCN-NEXT: %ret:sgpr_32 = S_LSHL_B32 %src, %shiftmask, implicit-def $scc +; GCN-NEXT: $sgpr0 = COPY %ret +%src:sgpr_32 = COPY $sgpr0 +%shift:sgpr_32 = COPY $sgpr1 +%shiftmask:sgpr_32 = S_AND_B32 65535, %shift, implicit-def $scc +%ret:sgpr_32 = S_LSHL_B32 %src, %shiftmask, implicit-def $scc +$sgpr0 = COPY %ret +... + +--- +name: s_lshr_b32__s_and_b32 +tracksRegLiveness: true +body: | + bb.0: +liveins: $sgpr0, $sgpr1 + +; GCN-LABEL: name: s_lshr_b32__s_and_b32 +; GCN: liveins: $sgpr0, $sgpr1 +; GCN-NEXT: {{ $}} +; GCN-NEXT: %src:sgpr_32 = COPY $sgpr0 +; GCN-NEXT: %shift:sgpr_
[llvm-branch-commits] [llvm] [AMDGPU][GlobalISel] Combine (sext (trunc (sext_in_reg x))) (PR #131312)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/131312 >From 782153a9a47d4a0fdb897e811033179fa67c5060 Mon Sep 17 00:00:00 2001 From: pvanhout Date: Fri, 14 Mar 2025 10:34:51 +0100 Subject: [PATCH] [AMDGPU][GlobalISel] Combine (sext (trunc (sext_in_reg x))) This is a bit of an akward pattern that can come up as a result of legalization and then widening of i16 operations to i32 in RegBankSelect on AMDGPU. This quick combine avoids redundant patterns like ``` s_sext_i32_i8 s0, s0 s_sext_i32_i16 s0, s0 s_ashr_i32 s0, s0, s1 ``` With this the second sext is removed as it's redundant. --- .../include/llvm/Target/GlobalISel/Combine.td | 12 ++- .../combine-sext-trunc-sextinreg.mir | 86 +++ .../CodeGen/AMDGPU/GlobalISel/llvm.abs.ll | 78 - 3 files changed, 113 insertions(+), 63 deletions(-) create mode 100644 llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir diff --git a/llvm/include/llvm/Target/GlobalISel/Combine.td b/llvm/include/llvm/Target/GlobalISel/Combine.td index 3590ab221ad44..9727b86b4be8b 100644 --- a/llvm/include/llvm/Target/GlobalISel/Combine.td +++ b/llvm/include/llvm/Target/GlobalISel/Combine.td @@ -258,6 +258,14 @@ def sext_trunc_sextload : GICombineRule< [{ return Helper.matchSextTruncSextLoad(*${d}); }]), (apply [{ Helper.applySextTruncSextLoad(*${d}); }])>; +def sext_trunc_sextinreg : GICombineRule< + (defs root:$dst), + (match (G_SEXT_INREG $sir, $src, $width), + (G_TRUNC $trunc, $sir), + (G_SEXT $dst, $trunc), + [{ return (MRI.getType(${trunc}.getReg()).getScalarSizeInBits() >= ${width}.getImm()); }]), + (apply (GIReplaceReg $dst, $sir))>; + def sext_inreg_of_load_matchdata : GIDefMatchData<"std::tuple">; def sext_inreg_of_load : GICombineRule< (defs root:$root, sext_inreg_of_load_matchdata:$matchinfo), @@ -1896,7 +1904,9 @@ def cast_of_cast_combines: GICombineGroup<[ sext_of_anyext, anyext_of_anyext, anyext_of_zext, - anyext_of_sext + anyext_of_sext, + + sext_trunc_sextinreg ]>; def cast_combines: GICombineGroup<[ diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir new file mode 100644 index 0..d41e5b172efc2 --- /dev/null +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir @@ -0,0 +1,86 @@ +# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py +# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1030 -run-pass=amdgpu-postlegalizer-combiner -verify-machineinstrs %s -o - | FileCheck %s +# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1030 -run-pass=amdgpu-regbank-combiner -verify-machineinstrs %s -o - | FileCheck %s + +--- +name: trunc_s16_inreg_8 +tracksRegLiveness: true +body: | + bb.0: +liveins: $vgpr0 +; CHECK-LABEL: name: trunc_s16_inreg_8 +; CHECK: liveins: $vgpr0 +; CHECK-NEXT: {{ $}} +; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0 +; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 8 +; CHECK-NEXT: $vgpr0 = COPY %inreg(s32) +%copy:_(s32) = COPY $vgpr0 +%inreg:_(s32) = G_SEXT_INREG %copy, 8 +%trunc:_(s16) = G_TRUNC %inreg +%sext:_(s32) = G_SEXT %trunc +$vgpr0 = COPY %sext +... + +--- +name: trunc_s16_inreg_16 +tracksRegLiveness: true +body: | + bb.0: +liveins: $vgpr0 +; CHECK-LABEL: name: trunc_s16_inreg_16 +; CHECK: liveins: $vgpr0 +; CHECK-NEXT: {{ $}} +; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0 +; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 16 +; CHECK-NEXT: $vgpr0 = COPY %inreg(s32) +%copy:_(s32) = COPY $vgpr0 +%inreg:_(s32) = G_SEXT_INREG %copy, 16 +%trunc:_(s16) = G_TRUNC %inreg +%sext:_(s32) = G_SEXT %trunc +$vgpr0 = COPY %sext +... + +--- +name: trunc_s8_inreg_16 +tracksRegLiveness: true +body: | + bb.0: +liveins: $vgpr0 +; CHECK-LABEL: name: trunc_s8_inreg_16 +; CHECK: liveins: $vgpr0 +; CHECK-NEXT: {{ $}} +; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0 +; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 16 +; CHECK-NEXT: %trunc:_(s8) = G_TRUNC %inreg(s32) +; CHECK-NEXT: %sext:_(s32) = G_SEXT %trunc(s8) +; CHECK-NEXT: $vgpr0 = COPY %sext(s32) +%copy:_(s32) = COPY $vgpr0 +%inreg:_(s32) = G_SEXT_INREG %copy, 16 +%trunc:_(s8) = G_TRUNC %inreg +%sext:_(s32) = G_SEXT %trunc +$vgpr0 = COPY %sext +... + +# TODO?: We could handle this by inserting a trunc, but I'm not sure how useful that'd be. +--- +name: mismatching_types +tracksRegLiveness: true +body: | + bb.0: +liveins: $vgpr0 +; CHECK-LABEL: name: mismatching_types +; CHECK: liveins: $vgpr0 +; CHECK-NEXT: {{ $}} +; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0 +; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 8 +; CHECK-NEXT: %trunc:_(s8) = G_TRUNC %inreg(s32) +; CHECK-NEXT: %sext:_(s16
[llvm-branch-commits] [llvm] [GlobalISel] Combine redundant sext_inreg (PR #131624)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/131624 >From 3f3c67934d0c9ea34c11cbd24becc24541baf567 Mon Sep 17 00:00:00 2001 From: pvanhout Date: Mon, 17 Mar 2025 13:54:59 +0100 Subject: [PATCH 1/3] [GlobalISel] Combine redundant sext_inreg --- .../llvm/CodeGen/GlobalISel/CombinerHelper.h | 3 + .../include/llvm/Target/GlobalISel/Combine.td | 9 +- .../GlobalISel/CombinerHelperCasts.cpp| 27 +++ .../combine-redundant-sext-inreg.mir | 164 ++ .../combine-sext-trunc-sextinreg.mir | 87 ++ .../CodeGen/AMDGPU/GlobalISel/llvm.abs.ll | 5 - 6 files changed, 289 insertions(+), 6 deletions(-) create mode 100644 llvm/test/CodeGen/AMDGPU/GlobalISel/combine-redundant-sext-inreg.mir create mode 100644 llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir diff --git a/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h b/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h index 9b78342c8fc39..5778377d125a8 100644 --- a/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h +++ b/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h @@ -994,6 +994,9 @@ class CombinerHelper { // overflow sub bool matchSuboCarryOut(const MachineInstr &MI, BuildFnTy &MatchInfo) const; + // (sext_inreg (sext_inreg x, K0), K1) + void applyRedundantSextInReg(MachineInstr &Root, MachineInstr &Other) const; + private: /// Checks for legality of an indexed variant of \p LdSt. bool isIndexedLoadStoreLegal(GLoadStore &LdSt) const; diff --git a/llvm/include/llvm/Target/GlobalISel/Combine.td b/llvm/include/llvm/Target/GlobalISel/Combine.td index 660b03080f92e..6a0ff683a4647 100644 --- a/llvm/include/llvm/Target/GlobalISel/Combine.td +++ b/llvm/include/llvm/Target/GlobalISel/Combine.td @@ -1849,6 +1849,12 @@ def anyext_of_anyext : ext_of_ext_opcodes; def anyext_of_zext : ext_of_ext_opcodes; def anyext_of_sext : ext_of_ext_opcodes; +def sext_inreg_of_sext_inreg : GICombineRule< + (defs root:$dst), + (match (G_SEXT_INREG $x, $src, $a):$other, + (G_SEXT_INREG $dst, $x, $b):$root), + (apply [{ Helper.applyRedundantSextInReg(*${root}, *${other}); }])>; + // Push cast through build vector. class buildvector_of_opcode : GICombineRule < (defs root:$root, build_fn_matchinfo:$matchinfo), @@ -1896,7 +1902,8 @@ def cast_of_cast_combines: GICombineGroup<[ sext_of_anyext, anyext_of_anyext, anyext_of_zext, - anyext_of_sext + anyext_of_sext, + sext_inreg_of_sext_inreg, ]>; def cast_combines: GICombineGroup<[ diff --git a/llvm/lib/CodeGen/GlobalISel/CombinerHelperCasts.cpp b/llvm/lib/CodeGen/GlobalISel/CombinerHelperCasts.cpp index 576fd5fd81703..883a62c308232 100644 --- a/llvm/lib/CodeGen/GlobalISel/CombinerHelperCasts.cpp +++ b/llvm/lib/CodeGen/GlobalISel/CombinerHelperCasts.cpp @@ -378,3 +378,30 @@ bool CombinerHelper::matchCastOfInteger(const MachineInstr &CastMI, return false; } } + +void CombinerHelper::applyRedundantSextInReg(MachineInstr &Root, + MachineInstr &Other) const { + assert(Root.getOpcode() == TargetOpcode::G_SEXT_INREG && + Other.getOpcode() == TargetOpcode::G_SEXT_INREG); + + unsigned RootWidth = Root.getOperand(2).getImm(); + unsigned OtherWidth = Other.getOperand(2).getImm(); + + Register Dst = Root.getOperand(0).getReg(); + Register OtherDst = Other.getOperand(0).getReg(); + Register Src = Other.getOperand(1).getReg(); + + if (RootWidth >= OtherWidth) { +// The root sext_inreg is entirely redundant because the other one +// is narrower. +Observer.changingAllUsesOfReg(MRI, Dst); +MRI.replaceRegWith(Dst, OtherDst); +Observer.finishedChangingAllUsesOfReg(); + } else { +// RootWidth < OtherWidth, rewrite this G_SEXT_INREG with the source of the +// other G_SEXT_INREG. +Builder.buildSExtInReg(Dst, Src, RootWidth); + } + + Root.eraseFromParent(); +} diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-redundant-sext-inreg.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-redundant-sext-inreg.mir new file mode 100644 index 0..566ee8e6c338d --- /dev/null +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-redundant-sext-inreg.mir @@ -0,0 +1,164 @@ +# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py +# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1030 -run-pass=amdgpu-regbank-combiner -verify-machineinstrs %s -o - | FileCheck %s + +--- +name: inreg8_inreg16 +tracksRegLiveness: true +body: | + bb.0: +liveins: $vgpr0 +; CHECK-LABEL: name: inreg8_inreg16 +; CHECK: liveins: $vgpr0 +; CHECK-NEXT: {{ $}} +; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0 +; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 8 +; CHECK-NEXT: $vgpr0 = COPY %inreg(s32) +%copy:_(s32) = COPY $vgpr0 +%inreg:_(s32) = G_SEXT_INREG %copy, 8 +%inreg1:_(s32) = G_SEXT_INREG %inreg, 16 +$vgpr0 = COPY %inreg1 +... + +
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: add RegBankLegalize rules for extends and trunc (PR #132383)
@@ -489,22 +489,61 @@ RegBankLegalizeRules::RegBankLegalizeRules(const GCNSubtarget &_ST, .Uni(B32, {{SgprB32}, {Sgpr32AExtBoolInReg, SgprB32, SgprB32}}); addRulesForGOpcs({G_ANYEXT}) + .Any({{UniS16, S1}, {{None}, {None}}}) // should be combined away .Any({{UniS32, S1}, {{None}, {None}}}) // should be combined away - .Any({{UniS32, S16}, {{Sgpr32}, {Sgpr16}}}); + .Any({{UniS64, S1}, {{None}, {None}}}) // should be combined away + .Any({{{DivS16, S1}}, {{Vgpr16}, {Vcc}, VccExtToSel}}) + .Any({{{DivS32, S1}}, {{Vgpr32}, {Vcc}, VccExtToSel}}) + .Any({{{DivS64, S1}}, {{Vgpr64}, {Vcc}, VccExtToSel}}) + .Any({{UniS64, S32}, {{Sgpr64}, {Sgpr32}, Ext32To64}}) Pierre-vh wrote: unrelated to the patch: These should be better documented, otherwise it's very hard to read what's actually happening here. I had to go find 2 different struct signatures before getting an idea of what these lines do. A small comment on top `RegBankLegalizeRules` that explains how many braces are needed and how the arguments are laid out could go a long way. I also feel like we could eliminate one or even two sets of braces by just making them arguments, further helping readability. It could just be an overload that's preferred when manually writing the rules, and keep the current signature if we're pushing rules using a loop or something? https://github.com/llvm/llvm-project/pull/132383 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [GlobalISel] Combine redundant sext_inreg (PR #131624)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/131624 >From 3f3c67934d0c9ea34c11cbd24becc24541baf567 Mon Sep 17 00:00:00 2001 From: pvanhout Date: Mon, 17 Mar 2025 13:54:59 +0100 Subject: [PATCH 1/2] [GlobalISel] Combine redundant sext_inreg --- .../llvm/CodeGen/GlobalISel/CombinerHelper.h | 3 + .../include/llvm/Target/GlobalISel/Combine.td | 9 +- .../GlobalISel/CombinerHelperCasts.cpp| 27 +++ .../combine-redundant-sext-inreg.mir | 164 ++ .../combine-sext-trunc-sextinreg.mir | 87 ++ .../CodeGen/AMDGPU/GlobalISel/llvm.abs.ll | 5 - 6 files changed, 289 insertions(+), 6 deletions(-) create mode 100644 llvm/test/CodeGen/AMDGPU/GlobalISel/combine-redundant-sext-inreg.mir create mode 100644 llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir diff --git a/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h b/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h index 9b78342c8fc39..5778377d125a8 100644 --- a/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h +++ b/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h @@ -994,6 +994,9 @@ class CombinerHelper { // overflow sub bool matchSuboCarryOut(const MachineInstr &MI, BuildFnTy &MatchInfo) const; + // (sext_inreg (sext_inreg x, K0), K1) + void applyRedundantSextInReg(MachineInstr &Root, MachineInstr &Other) const; + private: /// Checks for legality of an indexed variant of \p LdSt. bool isIndexedLoadStoreLegal(GLoadStore &LdSt) const; diff --git a/llvm/include/llvm/Target/GlobalISel/Combine.td b/llvm/include/llvm/Target/GlobalISel/Combine.td index 660b03080f92e..6a0ff683a4647 100644 --- a/llvm/include/llvm/Target/GlobalISel/Combine.td +++ b/llvm/include/llvm/Target/GlobalISel/Combine.td @@ -1849,6 +1849,12 @@ def anyext_of_anyext : ext_of_ext_opcodes; def anyext_of_zext : ext_of_ext_opcodes; def anyext_of_sext : ext_of_ext_opcodes; +def sext_inreg_of_sext_inreg : GICombineRule< + (defs root:$dst), + (match (G_SEXT_INREG $x, $src, $a):$other, + (G_SEXT_INREG $dst, $x, $b):$root), + (apply [{ Helper.applyRedundantSextInReg(*${root}, *${other}); }])>; + // Push cast through build vector. class buildvector_of_opcode : GICombineRule < (defs root:$root, build_fn_matchinfo:$matchinfo), @@ -1896,7 +1902,8 @@ def cast_of_cast_combines: GICombineGroup<[ sext_of_anyext, anyext_of_anyext, anyext_of_zext, - anyext_of_sext + anyext_of_sext, + sext_inreg_of_sext_inreg, ]>; def cast_combines: GICombineGroup<[ diff --git a/llvm/lib/CodeGen/GlobalISel/CombinerHelperCasts.cpp b/llvm/lib/CodeGen/GlobalISel/CombinerHelperCasts.cpp index 576fd5fd81703..883a62c308232 100644 --- a/llvm/lib/CodeGen/GlobalISel/CombinerHelperCasts.cpp +++ b/llvm/lib/CodeGen/GlobalISel/CombinerHelperCasts.cpp @@ -378,3 +378,30 @@ bool CombinerHelper::matchCastOfInteger(const MachineInstr &CastMI, return false; } } + +void CombinerHelper::applyRedundantSextInReg(MachineInstr &Root, + MachineInstr &Other) const { + assert(Root.getOpcode() == TargetOpcode::G_SEXT_INREG && + Other.getOpcode() == TargetOpcode::G_SEXT_INREG); + + unsigned RootWidth = Root.getOperand(2).getImm(); + unsigned OtherWidth = Other.getOperand(2).getImm(); + + Register Dst = Root.getOperand(0).getReg(); + Register OtherDst = Other.getOperand(0).getReg(); + Register Src = Other.getOperand(1).getReg(); + + if (RootWidth >= OtherWidth) { +// The root sext_inreg is entirely redundant because the other one +// is narrower. +Observer.changingAllUsesOfReg(MRI, Dst); +MRI.replaceRegWith(Dst, OtherDst); +Observer.finishedChangingAllUsesOfReg(); + } else { +// RootWidth < OtherWidth, rewrite this G_SEXT_INREG with the source of the +// other G_SEXT_INREG. +Builder.buildSExtInReg(Dst, Src, RootWidth); + } + + Root.eraseFromParent(); +} diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-redundant-sext-inreg.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-redundant-sext-inreg.mir new file mode 100644 index 0..566ee8e6c338d --- /dev/null +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-redundant-sext-inreg.mir @@ -0,0 +1,164 @@ +# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py +# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1030 -run-pass=amdgpu-regbank-combiner -verify-machineinstrs %s -o - | FileCheck %s + +--- +name: inreg8_inreg16 +tracksRegLiveness: true +body: | + bb.0: +liveins: $vgpr0 +; CHECK-LABEL: name: inreg8_inreg16 +; CHECK: liveins: $vgpr0 +; CHECK-NEXT: {{ $}} +; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0 +; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 8 +; CHECK-NEXT: $vgpr0 = COPY %inreg(s32) +%copy:_(s32) = COPY $vgpr0 +%inreg:_(s32) = G_SEXT_INREG %copy, 8 +%inreg1:_(s32) = G_SEXT_INREG %inreg, 16 +$vgpr0 = COPY %inreg1 +... + +
[llvm-branch-commits] [llvm] [GlobalISel] Combine redundant sext_inreg (PR #131624)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/131624 >From 3f3c67934d0c9ea34c11cbd24becc24541baf567 Mon Sep 17 00:00:00 2001 From: pvanhout Date: Mon, 17 Mar 2025 13:54:59 +0100 Subject: [PATCH 1/2] [GlobalISel] Combine redundant sext_inreg --- .../llvm/CodeGen/GlobalISel/CombinerHelper.h | 3 + .../include/llvm/Target/GlobalISel/Combine.td | 9 +- .../GlobalISel/CombinerHelperCasts.cpp| 27 +++ .../combine-redundant-sext-inreg.mir | 164 ++ .../combine-sext-trunc-sextinreg.mir | 87 ++ .../CodeGen/AMDGPU/GlobalISel/llvm.abs.ll | 5 - 6 files changed, 289 insertions(+), 6 deletions(-) create mode 100644 llvm/test/CodeGen/AMDGPU/GlobalISel/combine-redundant-sext-inreg.mir create mode 100644 llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir diff --git a/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h b/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h index 9b78342c8fc39..5778377d125a8 100644 --- a/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h +++ b/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h @@ -994,6 +994,9 @@ class CombinerHelper { // overflow sub bool matchSuboCarryOut(const MachineInstr &MI, BuildFnTy &MatchInfo) const; + // (sext_inreg (sext_inreg x, K0), K1) + void applyRedundantSextInReg(MachineInstr &Root, MachineInstr &Other) const; + private: /// Checks for legality of an indexed variant of \p LdSt. bool isIndexedLoadStoreLegal(GLoadStore &LdSt) const; diff --git a/llvm/include/llvm/Target/GlobalISel/Combine.td b/llvm/include/llvm/Target/GlobalISel/Combine.td index 660b03080f92e..6a0ff683a4647 100644 --- a/llvm/include/llvm/Target/GlobalISel/Combine.td +++ b/llvm/include/llvm/Target/GlobalISel/Combine.td @@ -1849,6 +1849,12 @@ def anyext_of_anyext : ext_of_ext_opcodes; def anyext_of_zext : ext_of_ext_opcodes; def anyext_of_sext : ext_of_ext_opcodes; +def sext_inreg_of_sext_inreg : GICombineRule< + (defs root:$dst), + (match (G_SEXT_INREG $x, $src, $a):$other, + (G_SEXT_INREG $dst, $x, $b):$root), + (apply [{ Helper.applyRedundantSextInReg(*${root}, *${other}); }])>; + // Push cast through build vector. class buildvector_of_opcode : GICombineRule < (defs root:$root, build_fn_matchinfo:$matchinfo), @@ -1896,7 +1902,8 @@ def cast_of_cast_combines: GICombineGroup<[ sext_of_anyext, anyext_of_anyext, anyext_of_zext, - anyext_of_sext + anyext_of_sext, + sext_inreg_of_sext_inreg, ]>; def cast_combines: GICombineGroup<[ diff --git a/llvm/lib/CodeGen/GlobalISel/CombinerHelperCasts.cpp b/llvm/lib/CodeGen/GlobalISel/CombinerHelperCasts.cpp index 576fd5fd81703..883a62c308232 100644 --- a/llvm/lib/CodeGen/GlobalISel/CombinerHelperCasts.cpp +++ b/llvm/lib/CodeGen/GlobalISel/CombinerHelperCasts.cpp @@ -378,3 +378,30 @@ bool CombinerHelper::matchCastOfInteger(const MachineInstr &CastMI, return false; } } + +void CombinerHelper::applyRedundantSextInReg(MachineInstr &Root, + MachineInstr &Other) const { + assert(Root.getOpcode() == TargetOpcode::G_SEXT_INREG && + Other.getOpcode() == TargetOpcode::G_SEXT_INREG); + + unsigned RootWidth = Root.getOperand(2).getImm(); + unsigned OtherWidth = Other.getOperand(2).getImm(); + + Register Dst = Root.getOperand(0).getReg(); + Register OtherDst = Other.getOperand(0).getReg(); + Register Src = Other.getOperand(1).getReg(); + + if (RootWidth >= OtherWidth) { +// The root sext_inreg is entirely redundant because the other one +// is narrower. +Observer.changingAllUsesOfReg(MRI, Dst); +MRI.replaceRegWith(Dst, OtherDst); +Observer.finishedChangingAllUsesOfReg(); + } else { +// RootWidth < OtherWidth, rewrite this G_SEXT_INREG with the source of the +// other G_SEXT_INREG. +Builder.buildSExtInReg(Dst, Src, RootWidth); + } + + Root.eraseFromParent(); +} diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-redundant-sext-inreg.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-redundant-sext-inreg.mir new file mode 100644 index 0..566ee8e6c338d --- /dev/null +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-redundant-sext-inreg.mir @@ -0,0 +1,164 @@ +# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py +# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1030 -run-pass=amdgpu-regbank-combiner -verify-machineinstrs %s -o - | FileCheck %s + +--- +name: inreg8_inreg16 +tracksRegLiveness: true +body: | + bb.0: +liveins: $vgpr0 +; CHECK-LABEL: name: inreg8_inreg16 +; CHECK: liveins: $vgpr0 +; CHECK-NEXT: {{ $}} +; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0 +; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 8 +; CHECK-NEXT: $vgpr0 = COPY %inreg(s32) +%copy:_(s32) = COPY $vgpr0 +%inreg:_(s32) = G_SEXT_INREG %copy, 8 +%inreg1:_(s32) = G_SEXT_INREG %inreg, 16 +$vgpr0 = COPY %inreg1 +... + +
[llvm-branch-commits] [llvm] [AMDGPU] Precommit si-fold-bitmask.mir (PR #131310)
https://github.com/Pierre-vh closed https://github.com/llvm/llvm-project/pull/131310 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [GlobalISel] Combine redundant sext_inreg (PR #131624)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/131624 >From f4c801437460aef9b9c2e5f49d1e98ec90fadb16 Mon Sep 17 00:00:00 2001 From: pvanhout Date: Mon, 17 Mar 2025 13:54:59 +0100 Subject: [PATCH 1/4] [GlobalISel] Combine redundant sext_inreg --- .../llvm/CodeGen/GlobalISel/CombinerHelper.h | 3 + .../include/llvm/Target/GlobalISel/Combine.td | 9 +- .../GlobalISel/CombinerHelperCasts.cpp| 27 +++ .../combine-redundant-sext-inreg.mir | 164 ++ .../combine-sext-trunc-sextinreg.mir | 87 ++ .../CodeGen/AMDGPU/GlobalISel/llvm.abs.ll | 5 - 6 files changed, 289 insertions(+), 6 deletions(-) create mode 100644 llvm/test/CodeGen/AMDGPU/GlobalISel/combine-redundant-sext-inreg.mir create mode 100644 llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir diff --git a/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h b/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h index 9b78342c8fc39..5778377d125a8 100644 --- a/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h +++ b/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h @@ -994,6 +994,9 @@ class CombinerHelper { // overflow sub bool matchSuboCarryOut(const MachineInstr &MI, BuildFnTy &MatchInfo) const; + // (sext_inreg (sext_inreg x, K0), K1) + void applyRedundantSextInReg(MachineInstr &Root, MachineInstr &Other) const; + private: /// Checks for legality of an indexed variant of \p LdSt. bool isIndexedLoadStoreLegal(GLoadStore &LdSt) const; diff --git a/llvm/include/llvm/Target/GlobalISel/Combine.td b/llvm/include/llvm/Target/GlobalISel/Combine.td index 660b03080f92e..6a0ff683a4647 100644 --- a/llvm/include/llvm/Target/GlobalISel/Combine.td +++ b/llvm/include/llvm/Target/GlobalISel/Combine.td @@ -1849,6 +1849,12 @@ def anyext_of_anyext : ext_of_ext_opcodes; def anyext_of_zext : ext_of_ext_opcodes; def anyext_of_sext : ext_of_ext_opcodes; +def sext_inreg_of_sext_inreg : GICombineRule< + (defs root:$dst), + (match (G_SEXT_INREG $x, $src, $a):$other, + (G_SEXT_INREG $dst, $x, $b):$root), + (apply [{ Helper.applyRedundantSextInReg(*${root}, *${other}); }])>; + // Push cast through build vector. class buildvector_of_opcode : GICombineRule < (defs root:$root, build_fn_matchinfo:$matchinfo), @@ -1896,7 +1902,8 @@ def cast_of_cast_combines: GICombineGroup<[ sext_of_anyext, anyext_of_anyext, anyext_of_zext, - anyext_of_sext + anyext_of_sext, + sext_inreg_of_sext_inreg, ]>; def cast_combines: GICombineGroup<[ diff --git a/llvm/lib/CodeGen/GlobalISel/CombinerHelperCasts.cpp b/llvm/lib/CodeGen/GlobalISel/CombinerHelperCasts.cpp index 576fd5fd81703..883a62c308232 100644 --- a/llvm/lib/CodeGen/GlobalISel/CombinerHelperCasts.cpp +++ b/llvm/lib/CodeGen/GlobalISel/CombinerHelperCasts.cpp @@ -378,3 +378,30 @@ bool CombinerHelper::matchCastOfInteger(const MachineInstr &CastMI, return false; } } + +void CombinerHelper::applyRedundantSextInReg(MachineInstr &Root, + MachineInstr &Other) const { + assert(Root.getOpcode() == TargetOpcode::G_SEXT_INREG && + Other.getOpcode() == TargetOpcode::G_SEXT_INREG); + + unsigned RootWidth = Root.getOperand(2).getImm(); + unsigned OtherWidth = Other.getOperand(2).getImm(); + + Register Dst = Root.getOperand(0).getReg(); + Register OtherDst = Other.getOperand(0).getReg(); + Register Src = Other.getOperand(1).getReg(); + + if (RootWidth >= OtherWidth) { +// The root sext_inreg is entirely redundant because the other one +// is narrower. +Observer.changingAllUsesOfReg(MRI, Dst); +MRI.replaceRegWith(Dst, OtherDst); +Observer.finishedChangingAllUsesOfReg(); + } else { +// RootWidth < OtherWidth, rewrite this G_SEXT_INREG with the source of the +// other G_SEXT_INREG. +Builder.buildSExtInReg(Dst, Src, RootWidth); + } + + Root.eraseFromParent(); +} diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-redundant-sext-inreg.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-redundant-sext-inreg.mir new file mode 100644 index 0..566ee8e6c338d --- /dev/null +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-redundant-sext-inreg.mir @@ -0,0 +1,164 @@ +# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py +# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1030 -run-pass=amdgpu-regbank-combiner -verify-machineinstrs %s -o - | FileCheck %s + +--- +name: inreg8_inreg16 +tracksRegLiveness: true +body: | + bb.0: +liveins: $vgpr0 +; CHECK-LABEL: name: inreg8_inreg16 +; CHECK: liveins: $vgpr0 +; CHECK-NEXT: {{ $}} +; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0 +; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 8 +; CHECK-NEXT: $vgpr0 = COPY %inreg(s32) +%copy:_(s32) = COPY $vgpr0 +%inreg:_(s32) = G_SEXT_INREG %copy, 8 +%inreg1:_(s32) = G_SEXT_INREG %inreg, 16 +$vgpr0 = COPY %inreg1 +... + +
[llvm-branch-commits] [llvm] [AMDGPU][GlobalISel] Combine (sext (trunc (sext_in_reg x))) (PR #131312)
https://github.com/Pierre-vh created https://github.com/llvm/llvm-project/pull/131312 This is a bit of an akward pattern that can come up as a result of legalization and then widening of i16 operations to i32 in RegBankSelect on AMDGPU. This quick combine avoids redundant patterns like ``` s_sext_i32_i8 s0, s0 s_sext_i32_i16 s0, s0 s_ashr_i32 s0, s0, s1 ``` With this the second sext is removed as it's redundant. >From 3289b2373ce2ec850a9bebb597168243d36608a6 Mon Sep 17 00:00:00 2001 From: pvanhout Date: Fri, 14 Mar 2025 10:34:51 +0100 Subject: [PATCH] [AMDGPU][GlobalISel] Combine (sext (trunc (sext_in_reg x))) This is a bit of an akward pattern that can come up as a result of legalization and then widening of i16 operations to i32 in RegBankSelect on AMDGPU. This quick combine avoids redundant patterns like ``` s_sext_i32_i8 s0, s0 s_sext_i32_i16 s0, s0 s_ashr_i32 s0, s0, s1 ``` With this the second sext is removed as it's redundant. --- .../include/llvm/Target/GlobalISel/Combine.td | 12 ++- .../combine-sext-trunc-sextinreg.mir | 86 +++ .../CodeGen/AMDGPU/GlobalISel/llvm.abs.ll | 78 - 3 files changed, 113 insertions(+), 63 deletions(-) create mode 100644 llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir diff --git a/llvm/include/llvm/Target/GlobalISel/Combine.td b/llvm/include/llvm/Target/GlobalISel/Combine.td index 3590ab221ad44..9727b86b4be8b 100644 --- a/llvm/include/llvm/Target/GlobalISel/Combine.td +++ b/llvm/include/llvm/Target/GlobalISel/Combine.td @@ -258,6 +258,14 @@ def sext_trunc_sextload : GICombineRule< [{ return Helper.matchSextTruncSextLoad(*${d}); }]), (apply [{ Helper.applySextTruncSextLoad(*${d}); }])>; +def sext_trunc_sextinreg : GICombineRule< + (defs root:$dst), + (match (G_SEXT_INREG $sir, $src, $width), + (G_TRUNC $trunc, $sir), + (G_SEXT $dst, $trunc), + [{ return (MRI.getType(${trunc}.getReg()).getScalarSizeInBits() >= ${width}.getImm()); }]), + (apply (GIReplaceReg $dst, $sir))>; + def sext_inreg_of_load_matchdata : GIDefMatchData<"std::tuple">; def sext_inreg_of_load : GICombineRule< (defs root:$root, sext_inreg_of_load_matchdata:$matchinfo), @@ -1896,7 +1904,9 @@ def cast_of_cast_combines: GICombineGroup<[ sext_of_anyext, anyext_of_anyext, anyext_of_zext, - anyext_of_sext + anyext_of_sext, + + sext_trunc_sextinreg ]>; def cast_combines: GICombineGroup<[ diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir new file mode 100644 index 0..d41e5b172efc2 --- /dev/null +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir @@ -0,0 +1,86 @@ +# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py +# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1030 -run-pass=amdgpu-postlegalizer-combiner -verify-machineinstrs %s -o - | FileCheck %s +# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1030 -run-pass=amdgpu-regbank-combiner -verify-machineinstrs %s -o - | FileCheck %s + +--- +name: trunc_s16_inreg_8 +tracksRegLiveness: true +body: | + bb.0: +liveins: $vgpr0 +; CHECK-LABEL: name: trunc_s16_inreg_8 +; CHECK: liveins: $vgpr0 +; CHECK-NEXT: {{ $}} +; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0 +; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 8 +; CHECK-NEXT: $vgpr0 = COPY %inreg(s32) +%copy:_(s32) = COPY $vgpr0 +%inreg:_(s32) = G_SEXT_INREG %copy, 8 +%trunc:_(s16) = G_TRUNC %inreg +%sext:_(s32) = G_SEXT %trunc +$vgpr0 = COPY %sext +... + +--- +name: trunc_s16_inreg_16 +tracksRegLiveness: true +body: | + bb.0: +liveins: $vgpr0 +; CHECK-LABEL: name: trunc_s16_inreg_16 +; CHECK: liveins: $vgpr0 +; CHECK-NEXT: {{ $}} +; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0 +; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 16 +; CHECK-NEXT: $vgpr0 = COPY %inreg(s32) +%copy:_(s32) = COPY $vgpr0 +%inreg:_(s32) = G_SEXT_INREG %copy, 16 +%trunc:_(s16) = G_TRUNC %inreg +%sext:_(s32) = G_SEXT %trunc +$vgpr0 = COPY %sext +... + +--- +name: trunc_s8_inreg_16 +tracksRegLiveness: true +body: | + bb.0: +liveins: $vgpr0 +; CHECK-LABEL: name: trunc_s8_inreg_16 +; CHECK: liveins: $vgpr0 +; CHECK-NEXT: {{ $}} +; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0 +; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 16 +; CHECK-NEXT: %trunc:_(s8) = G_TRUNC %inreg(s32) +; CHECK-NEXT: %sext:_(s32) = G_SEXT %trunc(s8) +; CHECK-NEXT: $vgpr0 = COPY %sext(s32) +%copy:_(s32) = COPY $vgpr0 +%inreg:_(s32) = G_SEXT_INREG %copy, 16 +%trunc:_(s8) = G_TRUNC %inreg +%sext:_(s32) = G_SEXT %trunc +$vgpr0 = COPY %sext +... + +# TODO?: We could handle this by inserting a trunc, but I'm not sure how useful that'd be. +--- +name: mismatching_types +tracksRegLiveness: true +body:
[llvm-branch-commits] [llvm] [AMDGPU][Legalizer] Widen i16 G_SEXT_INREG (PR #131308)
https://github.com/Pierre-vh created https://github.com/llvm/llvm-project/pull/131308 It's better to widen them to avoid it being lowered into a G_ASHR + G_SHL. With this change we just extend to i32 then trunc the result. >From 815595b1ca20b613b5b4b08cafedda93e397cf92 Mon Sep 17 00:00:00 2001 From: pvanhout Date: Wed, 5 Mar 2025 13:41:04 +0100 Subject: [PATCH] [AMDGPU][Legalizer] Widen i16 G_SEXT_INREG It's better to widen them to avoid it being lowered into a G_ASHR + G_SHL. With this change we just extend to i32 then trunc the result. --- .../lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | 3 +- llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll | 7 +- .../AMDGPU/GlobalISel/legalize-abs.mir| 8 +- .../AMDGPU/GlobalISel/legalize-ashr.mir | 20 +-- .../AMDGPU/GlobalISel/legalize-sext-inreg.mir | 155 +++--- .../AMDGPU/GlobalISel/legalize-sext.mir | 101 ++-- .../AMDGPU/GlobalISel/legalize-smax.mir | 33 +++- .../AMDGPU/GlobalISel/legalize-smin.mir | 33 +++- .../AMDGPU/GlobalISel/legalize-smulh.mir | 132 +++ .../CodeGen/AMDGPU/GlobalISel/llvm.abs.ll | 45 ++--- .../CodeGen/AMDGPU/GlobalISel/sext_inreg.ll | 130 ++- 11 files changed, 299 insertions(+), 368 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp index b3a8183beeacf..6e611ebb4b625 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp @@ -2009,7 +2009,8 @@ AMDGPULegalizerInfo::AMDGPULegalizerInfo(const GCNSubtarget &ST_, // S64 is only legal on SALU, and needs to be broken into 32-bit elements in // RegBankSelect. auto &SextInReg = getActionDefinitionsBuilder(G_SEXT_INREG) -.legalFor({{S32}, {S64}}); +.legalFor({{S32}, {S64}}) +.widenScalarIf(typeIs(0, S16), widenScalarOrEltToNextPow2(0, 32)); if (ST.hasVOP3PInsts()) { SextInReg.lowerFor({{V2S16}}) diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll index 493e8cef63890..f81d7f1c300b8 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll @@ -17,8 +17,7 @@ define i8 @v_ashr_i8(i8 %value, i8 %amount) { ; GFX8-LABEL: v_ashr_i8: ; GFX8: ; %bb.0: ; GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX8-NEXT:v_lshlrev_b16_e32 v0, 8, v0 -; GFX8-NEXT:v_ashrrev_i16_sdwa v0, v1, sext(v0) dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:BYTE_1 +; GFX8-NEXT:v_ashrrev_i16_sdwa v0, v1, sext(v0) dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:BYTE_0 ; GFX8-NEXT:s_setpc_b64 s[30:31] ; ; GFX9-LABEL: v_ashr_i8: @@ -49,8 +48,8 @@ define i8 @v_ashr_i8_7(i8 %value) { ; GFX8-LABEL: v_ashr_i8_7: ; GFX8: ; %bb.0: ; GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX8-NEXT:v_lshlrev_b16_e32 v0, 8, v0 -; GFX8-NEXT:v_ashrrev_i16_e32 v0, 15, v0 +; GFX8-NEXT:v_mov_b32_e32 v1, 7 +; GFX8-NEXT:v_ashrrev_i16_sdwa v0, v1, sext(v0) dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:BYTE_0 ; GFX8-NEXT:s_setpc_b64 s[30:31] ; ; GFX9-LABEL: v_ashr_i8_7: diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-abs.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-abs.mir index a9fe80eb47e76..2b911b2dce697 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-abs.mir +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-abs.mir @@ -144,11 +144,9 @@ body: | ; VI: liveins: $vgpr0 ; VI-NEXT: {{ $}} ; VI-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0 -; VI-NEXT: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY]](s32) -; VI-NEXT: [[C:%[0-9]+]]:_(s16) = G_CONSTANT i16 8 -; VI-NEXT: [[SHL:%[0-9]+]]:_(s16) = G_SHL [[TRUNC]], [[C]](s16) -; VI-NEXT: [[ASHR:%[0-9]+]]:_(s16) = G_ASHR [[SHL]], [[C]](s16) -; VI-NEXT: [[ABS:%[0-9]+]]:_(s16) = G_ABS [[ASHR]] +; VI-NEXT: [[SEXT_INREG:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY]], 8 +; VI-NEXT: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[SEXT_INREG]](s32) +; VI-NEXT: [[ABS:%[0-9]+]]:_(s16) = G_ABS [[TRUNC]] ; VI-NEXT: [[ANYEXT:%[0-9]+]]:_(s32) = G_ANYEXT [[ABS]](s16) ; VI-NEXT: $vgpr0 = COPY [[ANYEXT]](s32) ; diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ashr.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ashr.mir index f4aaab745e03b..53905a2f49dd0 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ashr.mir +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ashr.mir @@ -319,12 +319,10 @@ body: | ; VI-NEXT: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY1]](s32) ; VI-NEXT: [[C:%[0-9]+]]:_(s16) = G_CONSTANT i16 255 ; VI-NEXT: [[AND:%[0-9]+]]:_(s16) = G_AND [[TRUNC]], [[C]] -; VI-NEXT: [[TRUNC1:%[0-9]+]]:_(s16) = G_TRUNC [[COPY]](s32) -; VI-NEXT: [[C1:%[0-9]+]]:_(s16) = G_CONSTANT i16 8 -; VI-NEXT: [[SHL:%[0-9]+]]:_(s16) = G_SHL [[TRUNC1]], [[C1]](s
[llvm-branch-commits] [llvm] [AMDGPU] Precommit si-fold-bitmask.mir (PR #131310)
https://github.com/Pierre-vh created https://github.com/llvm/llvm-project/pull/131310 None >From b87a9db3b8ab29db3f1bb668a4d3bf312add817b Mon Sep 17 00:00:00 2001 From: pvanhout Date: Fri, 14 Mar 2025 10:00:21 +0100 Subject: [PATCH] [AMDGPU] Precommit si-fold-bitmask.mir --- llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir | 429 ++ 1 file changed, 429 insertions(+) create mode 100644 llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir diff --git a/llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir b/llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir new file mode 100644 index 0..1edf970591179 --- /dev/null +++ b/llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir @@ -0,0 +1,429 @@ +# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py +# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -run-pass=si-fold-operands -verify-machineinstrs -o - %s | FileCheck --check-prefix=GCN %s + +# Test supported instructions + +--- +name: v_ashr_i32_e64__v_and_b32_e32 +tracksRegLiveness: true +body: | + bb.0: +liveins: $vgpr0, $vgpr1 + +; GCN-LABEL: name: v_ashr_i32_e64__v_and_b32_e32 +; GCN: liveins: $vgpr0, $vgpr1 +; GCN-NEXT: {{ $}} +; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0 +; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1 +; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit $exec +; GCN-NEXT: %ret:vgpr_32 = V_ASHR_I32_e64 %src, %shiftmask, implicit $exec +; GCN-NEXT: $vgpr0 = COPY %ret +%src:vgpr_32 = COPY $vgpr0 +%shift:vgpr_32 = COPY $vgpr1 +%shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit $exec +%ret:vgpr_32 = V_ASHR_I32_e64 %src, %shiftmask, implicit $exec +$vgpr0 = COPY %ret +... + +--- +name: v_lshr_b32_e64__v_and_b32_e32 +tracksRegLiveness: true +body: | + bb.0: +liveins: $vgpr0, $vgpr1 + +; GCN-LABEL: name: v_lshr_b32_e64__v_and_b32_e32 +; GCN: liveins: $vgpr0, $vgpr1 +; GCN-NEXT: {{ $}} +; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0 +; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1 +; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit $exec +; GCN-NEXT: %ret:vgpr_32 = V_LSHR_B32_e64 %src, %shiftmask, implicit $exec +; GCN-NEXT: $vgpr0 = COPY %ret +%src:vgpr_32 = COPY $vgpr0 +%shift:vgpr_32 = COPY $vgpr1 +%shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit $exec +%ret:vgpr_32 = V_LSHR_B32_e64 %src, %shiftmask, implicit $exec +$vgpr0 = COPY %ret +... + +--- +name: v_lshr_b32_e32__v_and_b32_e32 +tracksRegLiveness: true +body: | + bb.0: +liveins: $vgpr0, $vgpr1 + +; GCN-LABEL: name: v_lshr_b32_e32__v_and_b32_e32 +; GCN: liveins: $vgpr0, $vgpr1 +; GCN-NEXT: {{ $}} +; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0 +; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1 +; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit $exec +; GCN-NEXT: %ret:vgpr_32 = V_LSHR_B32_e32 %src, %shiftmask, implicit $exec +; GCN-NEXT: $vgpr0 = COPY %ret +%src:vgpr_32 = COPY $vgpr0 +%shift:vgpr_32 = COPY $vgpr1 +%shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit $exec +%ret:vgpr_32 = V_LSHR_B32_e32 %src, %shiftmask, implicit $exec +$vgpr0 = COPY %ret +... + +--- +name: v_lshl_b32_e64__v_and_b32_e32 +tracksRegLiveness: true +body: | + bb.0: +liveins: $vgpr0, $vgpr1 + +; GCN-LABEL: name: v_lshl_b32_e64__v_and_b32_e32 +; GCN: liveins: $vgpr0, $vgpr1 +; GCN-NEXT: {{ $}} +; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0 +; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1 +; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit $exec +; GCN-NEXT: %ret:vgpr_32 = V_LSHL_B32_e64 %src, %shiftmask, implicit $exec +; GCN-NEXT: $vgpr0 = COPY %ret +%src:vgpr_32 = COPY $vgpr0 +%shift:vgpr_32 = COPY $vgpr1 +%shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit $exec +%ret:vgpr_32 = V_LSHL_B32_e64 %src, %shiftmask, implicit $exec +$vgpr0 = COPY %ret +... + +--- +name: s_lshl_b32__s_and_b32 +tracksRegLiveness: true +body: | + bb.0: +liveins: $sgpr0, $sgpr1 + +; GCN-LABEL: name: s_lshl_b32__s_and_b32 +; GCN: liveins: $sgpr0, $sgpr1 +; GCN-NEXT: {{ $}} +; GCN-NEXT: %src:sgpr_32 = COPY $sgpr0 +; GCN-NEXT: %shift:sgpr_32 = COPY $sgpr1 +; GCN-NEXT: %shiftmask:sgpr_32 = S_AND_B32 65535, %shift, implicit-def $scc +; GCN-NEXT: %ret:sgpr_32 = S_LSHL_B32 %src, %shiftmask, implicit-def $scc +; GCN-NEXT: $sgpr0 = COPY %ret +%src:sgpr_32 = COPY $sgpr0 +%shift:sgpr_32 = COPY $sgpr1 +%shiftmask:sgpr_32 = S_AND_B32 65535, %shift, implicit-def $scc +%ret:sgpr_32 = S_LSHL_B32 %src, %shiftmask, implicit-def $scc +$sgpr0 = COPY %ret +... + +--- +name: s_lshr_b32__s_and_b32 +tracksRegLiveness: true +body: | + bb.0: +liveins: $sgpr0, $sgpr1 + +; GCN-LABEL: name: s_lshr_b32__s_and_b32 +; GCN: liveins: $sgpr0, $sgpr1 +; GCN-NEXT: {{ $}} +; GCN-NEXT: %src:sgpr_32 = COPY $sgpr0 +; GCN-NEXT: %shift
[llvm-branch-commits] [llvm] [AMDGPU][RegBankCombiner] Add cast_of_cast and constant_fold_cast combines (PR #131307)
Pierre-vh wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/131307?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#131312** https://app.graphite.dev/github/pr/llvm/llvm-project/131312?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#131311** https://app.graphite.dev/github/pr/llvm/llvm-project/131311?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#131310** https://app.graphite.dev/github/pr/llvm/llvm-project/131310?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#131309** https://app.graphite.dev/github/pr/llvm/llvm-project/131309?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#131308** https://app.graphite.dev/github/pr/llvm/llvm-project/131308?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#131307** https://app.graphite.dev/github/pr/llvm/llvm-project/131307?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/131307?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#131306** https://app.graphite.dev/github/pr/llvm/llvm-project/131306?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#131305** https://app.graphite.dev/github/pr/llvm/llvm-project/131305?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn more about https://stacking.dev/?utm_source=stack-comment";>stacking. https://github.com/llvm/llvm-project/pull/131307 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU][SIFoldOperands] Fold some redundant bitmasks (PR #131311)
https://github.com/Pierre-vh created https://github.com/llvm/llvm-project/pull/131311 Instructions like shifts only read some of the bits of the shift amount operand, between 4 and 6 bits. If the source operand is being masked, we can just ignore the mask. Effects are minimal right now but this will kick in more once we disable uniform i16 operation widening in CGP. With that disabled, we get more i16 shift amounts that are zext'd and without this we'd end up with more `s_and_b32 s1, s1, 0x` in the output. Ideally ISel should handle this but it's proving difficult to get the patterns right, and after a few hours of trying I just decided to go with this as it's simple enough and it "just works" for this purpose. >From f46e24f0f5f98e5deb7bd13d737ed8c674da75e1 Mon Sep 17 00:00:00 2001 From: pvanhout Date: Fri, 14 Mar 2025 10:05:19 +0100 Subject: [PATCH] [AMDGPU][SIFoldOperands] Fold some redundant bitmasks Instructions like shifts only read some of the bits of the shift amount operand, between 4 and 6 bits. If the source operand is being masked, we can just ignore the mask. Effects are minimal right now but this will kick in more once we disable uniform i16 operation widening in CGP. With that disabled, we get more i16 shift amounts that are zext'd and without this we'd end up with more `s_and_b32 s1, s1, 0x` in the output. Ideally ISel should handle this but it's proving difficult to get the patterns right, and after a few hours of trying I just decided to go with this as it's simple enough and it "just works" for this purpose. --- llvm/lib/Target/AMDGPU/SIFoldOperands.cpp | 97 +++- llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll | 8 +- llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll | 201 - llvm/test/CodeGen/AMDGPU/GlobalISel/fshr.ll | 207 -- llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll | 8 +- llvm/test/CodeGen/AMDGPU/GlobalISel/shl.ll| 6 +- llvm/test/CodeGen/AMDGPU/constrained-shift.ll | 1 - llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir | 26 +-- 8 files changed, 303 insertions(+), 251 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp index 91df516b80857..a279a0a973e75 100644 --- a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp +++ b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp @@ -131,6 +131,7 @@ class SIFoldOperandsImpl { std::optional getImmOrMaterializedImm(MachineOperand &Op) const; bool tryConstantFoldOp(MachineInstr *MI) const; bool tryFoldCndMask(MachineInstr &MI) const; + bool tryFoldBitMask(MachineInstr &MI) const; bool tryFoldZeroHighBits(MachineInstr &MI) const; bool foldInstOperand(MachineInstr &MI, MachineOperand &OpToFold) const; @@ -1447,6 +1448,99 @@ bool SIFoldOperandsImpl::tryFoldCndMask(MachineInstr &MI) const { return true; } +static bool getBitsReadByInst(unsigned Opc, unsigned &NumBitsRead, + unsigned &OpIdx) { + switch (Opc) { + case AMDGPU::V_ASHR_I32_e64: + case AMDGPU::V_ASHR_I32_e32: + case AMDGPU::V_LSHR_B32_e64: + case AMDGPU::V_LSHR_B32_e32: + case AMDGPU::V_LSHL_B32_e64: + case AMDGPU::V_LSHL_B32_e32: + case AMDGPU::S_LSHL_B32: + case AMDGPU::S_LSHR_B32: + case AMDGPU::S_ASHR_I32: +NumBitsRead = 5; +OpIdx = 2; +return true; + case AMDGPU::S_LSHL_B64: + case AMDGPU::S_LSHR_B64: + case AMDGPU::S_ASHR_I64: +NumBitsRead = 6; +OpIdx = 2; +return true; + case AMDGPU::V_LSHLREV_B32_e64: + case AMDGPU::V_LSHLREV_B32_e32: + case AMDGPU::V_LSHRREV_B32_e64: + case AMDGPU::V_LSHRREV_B32_e32: + case AMDGPU::V_ASHRREV_I32_e64: + case AMDGPU::V_ASHRREV_I32_e32: +NumBitsRead = 5; +OpIdx = 1; +return true; + default: +return false; + } +} + +static bool isAndBitMaskRedundant(MachineInstr &MI, unsigned BitsNeeded, +unsigned &SrcOp) { + MachineOperand *RegOp = &MI.getOperand(1); + MachineOperand *ImmOp = &MI.getOperand(2); + + if (!RegOp->isReg() || !ImmOp->isImm()) { +if (ImmOp->isReg() && RegOp->isImm()) + std::swap(RegOp, ImmOp); +else + return false; + } + + SrcOp = RegOp->getOperandNo(); + + const unsigned BitMask = maskTrailingOnes(BitsNeeded); + return (ImmOp->getImm() & BitMask) == BitMask; +} + +bool SIFoldOperandsImpl::tryFoldBitMask(MachineInstr &MI) const { + unsigned NumBitsRead = 0; + unsigned OpIdx = 0; + if (!getBitsReadByInst(MI.getOpcode(), NumBitsRead, OpIdx)) +return false; + + MachineOperand &Op = MI.getOperand(OpIdx); + if (!Op.isReg()) +return false; + + Register OpReg = Op.getReg(); + if (OpReg.isPhysical()) +return false; + + MachineInstr *OpDef = MRI->getVRegDef(OpReg); + if (!OpDef) +return false ; + + LLVM_DEBUG(dbgs() << "tryFoldBitMask: " << MI << "\tOpIdx:" << OpIdx << ", NumBitsRead:" << NumBitsRead << "\n"); + + unsigned ReplaceWith; + switch (OpDef->getOpcode()) { + // TODO: add more opcodes? + case AMDGPU::S_AND
[llvm-branch-commits] [llvm] [AMDGPU][RegBankInfo] Promote scalar i16 and/or/xor to i32 (PR #131306)
Pierre-vh wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/131306?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#131312** https://app.graphite.dev/github/pr/llvm/llvm-project/131312?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#131311** https://app.graphite.dev/github/pr/llvm/llvm-project/131311?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#131310** https://app.graphite.dev/github/pr/llvm/llvm-project/131310?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#131309** https://app.graphite.dev/github/pr/llvm/llvm-project/131309?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#131308** https://app.graphite.dev/github/pr/llvm/llvm-project/131308?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#131307** https://app.graphite.dev/github/pr/llvm/llvm-project/131307?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#131306** https://app.graphite.dev/github/pr/llvm/llvm-project/131306?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/131306?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#131305** https://app.graphite.dev/github/pr/llvm/llvm-project/131305?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn more about https://stacking.dev/?utm_source=stack-comment";>stacking. https://github.com/llvm/llvm-project/pull/131306 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Precommit si-fold-bitmask.mir (PR #131310)
https://github.com/Pierre-vh ready_for_review https://github.com/llvm/llvm-project/pull/131310 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU][RegBankInfo] Promote scalar i16 and/or/xor to i32 (PR #131306)
https://github.com/Pierre-vh ready_for_review https://github.com/llvm/llvm-project/pull/131306 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU][RegBankCombiner] Add cast_of_cast and constant_fold_cast combines (PR #131307)
https://github.com/Pierre-vh ready_for_review https://github.com/llvm/llvm-project/pull/131307 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU][GlobalISel] Allow forming s16 U/SBFX pre-regbankselect (PR #131309)
https://github.com/Pierre-vh ready_for_review https://github.com/llvm/llvm-project/pull/131309 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU][GlobalISel] Combine (sext (trunc (sext_in_reg x))) (PR #131312)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/131312 >From b9bf3f2f53fcf7cbd133e57d4c7f64a8f06763b2 Mon Sep 17 00:00:00 2001 From: pvanhout Date: Fri, 14 Mar 2025 10:34:51 +0100 Subject: [PATCH] [AMDGPU][GlobalISel] Combine (sext (trunc (sext_in_reg x))) This is a bit of an akward pattern that can come up as a result of legalization and then widening of i16 operations to i32 in RegBankSelect on AMDGPU. This quick combine avoids redundant patterns like ``` s_sext_i32_i8 s0, s0 s_sext_i32_i16 s0, s0 s_ashr_i32 s0, s0, s1 ``` With this the second sext is removed as it's redundant. --- .../include/llvm/Target/GlobalISel/Combine.td | 12 ++- .../combine-sext-trunc-sextinreg.mir | 86 +++ .../CodeGen/AMDGPU/GlobalISel/llvm.abs.ll | 78 - 3 files changed, 113 insertions(+), 63 deletions(-) create mode 100644 llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir diff --git a/llvm/include/llvm/Target/GlobalISel/Combine.td b/llvm/include/llvm/Target/GlobalISel/Combine.td index 3590ab221ad44..9727b86b4be8b 100644 --- a/llvm/include/llvm/Target/GlobalISel/Combine.td +++ b/llvm/include/llvm/Target/GlobalISel/Combine.td @@ -258,6 +258,14 @@ def sext_trunc_sextload : GICombineRule< [{ return Helper.matchSextTruncSextLoad(*${d}); }]), (apply [{ Helper.applySextTruncSextLoad(*${d}); }])>; +def sext_trunc_sextinreg : GICombineRule< + (defs root:$dst), + (match (G_SEXT_INREG $sir, $src, $width), + (G_TRUNC $trunc, $sir), + (G_SEXT $dst, $trunc), + [{ return (MRI.getType(${trunc}.getReg()).getScalarSizeInBits() >= ${width}.getImm()); }]), + (apply (GIReplaceReg $dst, $sir))>; + def sext_inreg_of_load_matchdata : GIDefMatchData<"std::tuple">; def sext_inreg_of_load : GICombineRule< (defs root:$root, sext_inreg_of_load_matchdata:$matchinfo), @@ -1896,7 +1904,9 @@ def cast_of_cast_combines: GICombineGroup<[ sext_of_anyext, anyext_of_anyext, anyext_of_zext, - anyext_of_sext + anyext_of_sext, + + sext_trunc_sextinreg ]>; def cast_combines: GICombineGroup<[ diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir new file mode 100644 index 0..d41e5b172efc2 --- /dev/null +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir @@ -0,0 +1,86 @@ +# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py +# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1030 -run-pass=amdgpu-postlegalizer-combiner -verify-machineinstrs %s -o - | FileCheck %s +# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1030 -run-pass=amdgpu-regbank-combiner -verify-machineinstrs %s -o - | FileCheck %s + +--- +name: trunc_s16_inreg_8 +tracksRegLiveness: true +body: | + bb.0: +liveins: $vgpr0 +; CHECK-LABEL: name: trunc_s16_inreg_8 +; CHECK: liveins: $vgpr0 +; CHECK-NEXT: {{ $}} +; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0 +; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 8 +; CHECK-NEXT: $vgpr0 = COPY %inreg(s32) +%copy:_(s32) = COPY $vgpr0 +%inreg:_(s32) = G_SEXT_INREG %copy, 8 +%trunc:_(s16) = G_TRUNC %inreg +%sext:_(s32) = G_SEXT %trunc +$vgpr0 = COPY %sext +... + +--- +name: trunc_s16_inreg_16 +tracksRegLiveness: true +body: | + bb.0: +liveins: $vgpr0 +; CHECK-LABEL: name: trunc_s16_inreg_16 +; CHECK: liveins: $vgpr0 +; CHECK-NEXT: {{ $}} +; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0 +; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 16 +; CHECK-NEXT: $vgpr0 = COPY %inreg(s32) +%copy:_(s32) = COPY $vgpr0 +%inreg:_(s32) = G_SEXT_INREG %copy, 16 +%trunc:_(s16) = G_TRUNC %inreg +%sext:_(s32) = G_SEXT %trunc +$vgpr0 = COPY %sext +... + +--- +name: trunc_s8_inreg_16 +tracksRegLiveness: true +body: | + bb.0: +liveins: $vgpr0 +; CHECK-LABEL: name: trunc_s8_inreg_16 +; CHECK: liveins: $vgpr0 +; CHECK-NEXT: {{ $}} +; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0 +; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 16 +; CHECK-NEXT: %trunc:_(s8) = G_TRUNC %inreg(s32) +; CHECK-NEXT: %sext:_(s32) = G_SEXT %trunc(s8) +; CHECK-NEXT: $vgpr0 = COPY %sext(s32) +%copy:_(s32) = COPY $vgpr0 +%inreg:_(s32) = G_SEXT_INREG %copy, 16 +%trunc:_(s8) = G_TRUNC %inreg +%sext:_(s32) = G_SEXT %trunc +$vgpr0 = COPY %sext +... + +# TODO?: We could handle this by inserting a trunc, but I'm not sure how useful that'd be. +--- +name: mismatching_types +tracksRegLiveness: true +body: | + bb.0: +liveins: $vgpr0 +; CHECK-LABEL: name: mismatching_types +; CHECK: liveins: $vgpr0 +; CHECK-NEXT: {{ $}} +; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0 +; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 8 +; CHECK-NEXT: %trunc:_(s8) = G_TRUNC %inreg(s32) +; CHECK-NEXT: %sext:_(s16
[llvm-branch-commits] [llvm] [AMDGPU][RegBankInfo] Promote scalar i16 and/or/xor to i32 (PR #131306)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/131306 >From 1af83464f02df212384bd97848b0073d41053234 Mon Sep 17 00:00:00 2001 From: pvanhout Date: Wed, 5 Mar 2025 10:46:01 +0100 Subject: [PATCH] [AMDGPU][RegBankInfo] Promote scalar i16 and/or/xor to i32 See #64591 --- .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp | 28 +- llvm/test/CodeGen/AMDGPU/GlobalISel/andn2.ll | 10 +- llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll | 519 -- llvm/test/CodeGen/AMDGPU/GlobalISel/fshr.ll | 286 +- llvm/test/CodeGen/AMDGPU/GlobalISel/orn2.ll | 10 +- 5 files changed, 403 insertions(+), 450 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp index c19ee14ab1574..27b86723ce474 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp @@ -2416,9 +2416,10 @@ void AMDGPURegisterBankInfo::applyMappingImpl( Register DstReg = MI.getOperand(0).getReg(); LLT DstTy = MRI.getType(DstReg); -if (DstTy.getSizeInBits() == 1) { - const RegisterBank *DstBank = +const RegisterBank *DstBank = OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank; + +if (DstTy.getSizeInBits() == 1) { if (DstBank == &AMDGPU::VCCRegBank) break; @@ -2432,6 +2433,29 @@ void AMDGPURegisterBankInfo::applyMappingImpl( return; } +// 16-bit operations are VALU only, but can be promoted to 32-bit SALU. +// Packed 16-bit operations need to be scalarized and promoted. +if (DstTy.getSizeInBits() == 16 && DstBank == &AMDGPU::SGPRRegBank) { + const LLT S32 = LLT::scalar(32); + MachineBasicBlock *MBB = MI.getParent(); + MachineFunction *MF = MBB->getParent(); + ApplyRegBankMapping ApplySALU(B, *this, MRI, &AMDGPU::SGPRRegBank); + LegalizerHelper Helper(*MF, ApplySALU, B); + // Widen to S32, but handle `G_XOR x, -1` differently. Legalizer widening + // will use a G_ANYEXT to extend the -1 which prevents matching G_XOR -1 + // as "not". + if (MI.getOpcode() == AMDGPU::G_XOR && + mi_match(MI.getOperand(2).getReg(), MRI, m_SpecificICstOrSplat(-1))) { +Helper.widenScalarSrc(MI, S32, 1, AMDGPU::G_ANYEXT); +Helper.widenScalarSrc(MI, S32, 2, AMDGPU::G_SEXT); +Helper.widenScalarDst(MI, S32); + } else { +if (Helper.widenScalar(MI, 0, S32) != LegalizerHelper::Legalized) + llvm_unreachable("widen scalar should have succeeded"); + } + return; +} + if (DstTy.getSizeInBits() != 64) break; diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/andn2.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/andn2.ll index 1a94429b1b5a1..36359579ea442 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/andn2.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/andn2.ll @@ -391,20 +391,20 @@ define amdgpu_ps i16 @s_andn2_i16_commute(i16 inreg %src0, i16 inreg %src1) { define amdgpu_ps { i16, i16 } @s_andn2_i16_multi_use(i16 inreg %src0, i16 inreg %src1) { ; GCN-LABEL: s_andn2_i16_multi_use: ; GCN: ; %bb.0: -; GCN-NEXT:s_xor_b32 s1, s3, -1 +; GCN-NEXT:s_not_b32 s1, s3 ; GCN-NEXT:s_andn2_b32 s0, s2, s3 ; GCN-NEXT:; return to shader part epilog ; ; GFX10-LABEL: s_andn2_i16_multi_use: ; GFX10: ; %bb.0: ; GFX10-NEXT:s_andn2_b32 s0, s2, s3 -; GFX10-NEXT:s_xor_b32 s1, s3, -1 +; GFX10-NEXT:s_not_b32 s1, s3 ; GFX10-NEXT:; return to shader part epilog ; ; GFX11-LABEL: s_andn2_i16_multi_use: ; GFX11: ; %bb.0: ; GFX11-NEXT:s_and_not1_b32 s0, s2, s3 -; GFX11-NEXT:s_xor_b32 s1, s3, -1 +; GFX11-NEXT:s_not_b32 s1, s3 ; GFX11-NEXT:; return to shader part epilog %not.src1 = xor i16 %src1, -1 %and = and i16 %src0, %not.src1 @@ -482,14 +482,14 @@ define amdgpu_ps float @v_andn2_i16_sv(i16 inreg %src0, i16 %src1) { define amdgpu_ps float @v_andn2_i16_vs(i16 %src0, i16 inreg %src1) { ; GCN-LABEL: v_andn2_i16_vs: ; GCN: ; %bb.0: -; GCN-NEXT:s_xor_b32 s0, s2, -1 +; GCN-NEXT:s_not_b32 s0, s2 ; GCN-NEXT:v_and_b32_e32 v0, s0, v0 ; GCN-NEXT:v_and_b32_e32 v0, 0x, v0 ; GCN-NEXT:; return to shader part epilog ; ; GFX10PLUS-LABEL: v_andn2_i16_vs: ; GFX10PLUS: ; %bb.0: -; GFX10PLUS-NEXT:s_xor_b32 s0, s2, -1 +; GFX10PLUS-NEXT:s_not_b32 s0, s2 ; GFX10PLUS-NEXT:v_and_b32_e32 v0, s0, v0 ; GFX10PLUS-NEXT:v_and_b32_e32 v0, 0x, v0 ; GFX10PLUS-NEXT:; return to shader part epilog diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll index e60739fd84059..3a52497bd6e91 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll @@ -1052,17 +1052,14 @@ define amdgpu_ps i32 @s_fshl_v4i8(i32 inreg %lhs.arg, i32 inreg %rhs.arg, i32 in ; GFX8-NEXT:s_lshr_b32 s2, s2, s3 ; GFX8-NEXT:
[llvm-branch-commits] [llvm] [AMDGPU][SIFoldOperands] Fold some redundant bitmasks (PR #131311)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/131311 >From d6e5dc03ae8bb46972b7bcffd35e60babbfbc678 Mon Sep 17 00:00:00 2001 From: pvanhout Date: Fri, 14 Mar 2025 10:05:19 +0100 Subject: [PATCH 1/2] [AMDGPU][SIFoldOperands] Fold some redundant bitmasks Instructions like shifts only read some of the bits of the shift amount operand, between 4 and 6 bits. If the source operand is being masked, we can just ignore the mask. Effects are minimal right now but this will kick in more once we disable uniform i16 operation widening in CGP. With that disabled, we get more i16 shift amounts that are zext'd and without this we'd end up with more `s_and_b32 s1, s1, 0x` in the output. Ideally ISel should handle this but it's proving difficult to get the patterns right, and after a few hours of trying I just decided to go with this as it's simple enough and it "just works" for this purpose. --- llvm/lib/Target/AMDGPU/SIFoldOperands.cpp | 97 +++- llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll | 8 +- llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll | 201 - llvm/test/CodeGen/AMDGPU/GlobalISel/fshr.ll | 207 -- llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll | 8 +- llvm/test/CodeGen/AMDGPU/GlobalISel/shl.ll| 6 +- llvm/test/CodeGen/AMDGPU/constrained-shift.ll | 1 - llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir | 26 +-- 8 files changed, 303 insertions(+), 251 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp index 91df516b80857..a279a0a973e75 100644 --- a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp +++ b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp @@ -131,6 +131,7 @@ class SIFoldOperandsImpl { std::optional getImmOrMaterializedImm(MachineOperand &Op) const; bool tryConstantFoldOp(MachineInstr *MI) const; bool tryFoldCndMask(MachineInstr &MI) const; + bool tryFoldBitMask(MachineInstr &MI) const; bool tryFoldZeroHighBits(MachineInstr &MI) const; bool foldInstOperand(MachineInstr &MI, MachineOperand &OpToFold) const; @@ -1447,6 +1448,99 @@ bool SIFoldOperandsImpl::tryFoldCndMask(MachineInstr &MI) const { return true; } +static bool getBitsReadByInst(unsigned Opc, unsigned &NumBitsRead, + unsigned &OpIdx) { + switch (Opc) { + case AMDGPU::V_ASHR_I32_e64: + case AMDGPU::V_ASHR_I32_e32: + case AMDGPU::V_LSHR_B32_e64: + case AMDGPU::V_LSHR_B32_e32: + case AMDGPU::V_LSHL_B32_e64: + case AMDGPU::V_LSHL_B32_e32: + case AMDGPU::S_LSHL_B32: + case AMDGPU::S_LSHR_B32: + case AMDGPU::S_ASHR_I32: +NumBitsRead = 5; +OpIdx = 2; +return true; + case AMDGPU::S_LSHL_B64: + case AMDGPU::S_LSHR_B64: + case AMDGPU::S_ASHR_I64: +NumBitsRead = 6; +OpIdx = 2; +return true; + case AMDGPU::V_LSHLREV_B32_e64: + case AMDGPU::V_LSHLREV_B32_e32: + case AMDGPU::V_LSHRREV_B32_e64: + case AMDGPU::V_LSHRREV_B32_e32: + case AMDGPU::V_ASHRREV_I32_e64: + case AMDGPU::V_ASHRREV_I32_e32: +NumBitsRead = 5; +OpIdx = 1; +return true; + default: +return false; + } +} + +static bool isAndBitMaskRedundant(MachineInstr &MI, unsigned BitsNeeded, +unsigned &SrcOp) { + MachineOperand *RegOp = &MI.getOperand(1); + MachineOperand *ImmOp = &MI.getOperand(2); + + if (!RegOp->isReg() || !ImmOp->isImm()) { +if (ImmOp->isReg() && RegOp->isImm()) + std::swap(RegOp, ImmOp); +else + return false; + } + + SrcOp = RegOp->getOperandNo(); + + const unsigned BitMask = maskTrailingOnes(BitsNeeded); + return (ImmOp->getImm() & BitMask) == BitMask; +} + +bool SIFoldOperandsImpl::tryFoldBitMask(MachineInstr &MI) const { + unsigned NumBitsRead = 0; + unsigned OpIdx = 0; + if (!getBitsReadByInst(MI.getOpcode(), NumBitsRead, OpIdx)) +return false; + + MachineOperand &Op = MI.getOperand(OpIdx); + if (!Op.isReg()) +return false; + + Register OpReg = Op.getReg(); + if (OpReg.isPhysical()) +return false; + + MachineInstr *OpDef = MRI->getVRegDef(OpReg); + if (!OpDef) +return false ; + + LLVM_DEBUG(dbgs() << "tryFoldBitMask: " << MI << "\tOpIdx:" << OpIdx << ", NumBitsRead:" << NumBitsRead << "\n"); + + unsigned ReplaceWith; + switch (OpDef->getOpcode()) { + // TODO: add more opcodes? + case AMDGPU::S_AND_B32: + case AMDGPU::V_AND_B32_e32: + case AMDGPU::V_AND_B32_e64: +if (!isAndBitMaskRedundant(*OpDef, NumBitsRead, ReplaceWith)) + return false; +break; + default: +return false; + } + + MachineOperand &ReplaceWithOp = OpDef->getOperand(ReplaceWith); + LLVM_DEBUG(dbgs() << "\treplacing operand with:" << ReplaceWithOp << "\n"); + + MI.getOperand(OpIdx).setReg(ReplaceWithOp.getReg()); + return true; +} + bool SIFoldOperandsImpl::tryFoldZeroHighBits(MachineInstr &MI) const { if (MI.getOpcode() != AMDGPU::V_AND_B32_e64 && MI.getOpcode() != AMDGPU::V_AND_B32_e32) @@ -1458,7 +1552,7 @@ bool SIFoldOperands
[llvm-branch-commits] [llvm] [AMDGPU][Legalizer] Widen i16 G_SEXT_INREG (PR #131308)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/131308 >From e6862b4528d1ed48bbca9e742dd9a96d8777545b Mon Sep 17 00:00:00 2001 From: pvanhout Date: Wed, 5 Mar 2025 13:41:04 +0100 Subject: [PATCH 1/2] [AMDGPU][Legalizer] Widen i16 G_SEXT_INREG It's better to widen them to avoid it being lowered into a G_ASHR + G_SHL. With this change we just extend to i32 then trunc the result. --- .../lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | 3 +- llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll | 7 +- .../AMDGPU/GlobalISel/legalize-abs.mir| 8 +- .../AMDGPU/GlobalISel/legalize-ashr.mir | 20 +-- .../AMDGPU/GlobalISel/legalize-sext-inreg.mir | 155 +++--- .../AMDGPU/GlobalISel/legalize-sext.mir | 101 ++-- .../AMDGPU/GlobalISel/legalize-smax.mir | 33 +++- .../AMDGPU/GlobalISel/legalize-smin.mir | 33 +++- .../AMDGPU/GlobalISel/legalize-smulh.mir | 132 +++ .../CodeGen/AMDGPU/GlobalISel/llvm.abs.ll | 45 ++--- .../CodeGen/AMDGPU/GlobalISel/sext_inreg.ll | 130 ++- 11 files changed, 299 insertions(+), 368 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp index b3a8183beeacf..6e611ebb4b625 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp @@ -2009,7 +2009,8 @@ AMDGPULegalizerInfo::AMDGPULegalizerInfo(const GCNSubtarget &ST_, // S64 is only legal on SALU, and needs to be broken into 32-bit elements in // RegBankSelect. auto &SextInReg = getActionDefinitionsBuilder(G_SEXT_INREG) -.legalFor({{S32}, {S64}}); +.legalFor({{S32}, {S64}}) +.widenScalarIf(typeIs(0, S16), widenScalarOrEltToNextPow2(0, 32)); if (ST.hasVOP3PInsts()) { SextInReg.lowerFor({{V2S16}}) diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll index 493e8cef63890..f81d7f1c300b8 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll @@ -17,8 +17,7 @@ define i8 @v_ashr_i8(i8 %value, i8 %amount) { ; GFX8-LABEL: v_ashr_i8: ; GFX8: ; %bb.0: ; GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX8-NEXT:v_lshlrev_b16_e32 v0, 8, v0 -; GFX8-NEXT:v_ashrrev_i16_sdwa v0, v1, sext(v0) dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:BYTE_1 +; GFX8-NEXT:v_ashrrev_i16_sdwa v0, v1, sext(v0) dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:BYTE_0 ; GFX8-NEXT:s_setpc_b64 s[30:31] ; ; GFX9-LABEL: v_ashr_i8: @@ -49,8 +48,8 @@ define i8 @v_ashr_i8_7(i8 %value) { ; GFX8-LABEL: v_ashr_i8_7: ; GFX8: ; %bb.0: ; GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX8-NEXT:v_lshlrev_b16_e32 v0, 8, v0 -; GFX8-NEXT:v_ashrrev_i16_e32 v0, 15, v0 +; GFX8-NEXT:v_mov_b32_e32 v1, 7 +; GFX8-NEXT:v_ashrrev_i16_sdwa v0, v1, sext(v0) dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:BYTE_0 ; GFX8-NEXT:s_setpc_b64 s[30:31] ; ; GFX9-LABEL: v_ashr_i8_7: diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-abs.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-abs.mir index a9fe80eb47e76..2b911b2dce697 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-abs.mir +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-abs.mir @@ -144,11 +144,9 @@ body: | ; VI: liveins: $vgpr0 ; VI-NEXT: {{ $}} ; VI-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0 -; VI-NEXT: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY]](s32) -; VI-NEXT: [[C:%[0-9]+]]:_(s16) = G_CONSTANT i16 8 -; VI-NEXT: [[SHL:%[0-9]+]]:_(s16) = G_SHL [[TRUNC]], [[C]](s16) -; VI-NEXT: [[ASHR:%[0-9]+]]:_(s16) = G_ASHR [[SHL]], [[C]](s16) -; VI-NEXT: [[ABS:%[0-9]+]]:_(s16) = G_ABS [[ASHR]] +; VI-NEXT: [[SEXT_INREG:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY]], 8 +; VI-NEXT: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[SEXT_INREG]](s32) +; VI-NEXT: [[ABS:%[0-9]+]]:_(s16) = G_ABS [[TRUNC]] ; VI-NEXT: [[ANYEXT:%[0-9]+]]:_(s32) = G_ANYEXT [[ABS]](s16) ; VI-NEXT: $vgpr0 = COPY [[ANYEXT]](s32) ; diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ashr.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ashr.mir index f4aaab745e03b..53905a2f49dd0 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ashr.mir +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ashr.mir @@ -319,12 +319,10 @@ body: | ; VI-NEXT: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY1]](s32) ; VI-NEXT: [[C:%[0-9]+]]:_(s16) = G_CONSTANT i16 255 ; VI-NEXT: [[AND:%[0-9]+]]:_(s16) = G_AND [[TRUNC]], [[C]] -; VI-NEXT: [[TRUNC1:%[0-9]+]]:_(s16) = G_TRUNC [[COPY]](s32) -; VI-NEXT: [[C1:%[0-9]+]]:_(s16) = G_CONSTANT i16 8 -; VI-NEXT: [[SHL:%[0-9]+]]:_(s16) = G_SHL [[TRUNC1]], [[C1]](s16) -; VI-NEXT: [[ASHR:%[0-9]+]]:_(s16) = G_ASHR [[SHL]], [[C1]](s16) -; VI-NEXT: [[ASHR1:%[0-9]+]]:_(s16) = G_ASHR [[ASHR]], [
[llvm-branch-commits] [llvm] [AMDGPU] Precommit si-fold-bitmask.mir (PR #131310)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/131310 >From fcd5623ccd18100197817f7f4d5a500ca433f8dc Mon Sep 17 00:00:00 2001 From: pvanhout Date: Fri, 14 Mar 2025 10:00:21 +0100 Subject: [PATCH] [AMDGPU] Precommit si-fold-bitmask.mir --- llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir | 429 ++ 1 file changed, 429 insertions(+) create mode 100644 llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir diff --git a/llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir b/llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir new file mode 100644 index 0..1edf970591179 --- /dev/null +++ b/llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir @@ -0,0 +1,429 @@ +# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py +# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -run-pass=si-fold-operands -verify-machineinstrs -o - %s | FileCheck --check-prefix=GCN %s + +# Test supported instructions + +--- +name: v_ashr_i32_e64__v_and_b32_e32 +tracksRegLiveness: true +body: | + bb.0: +liveins: $vgpr0, $vgpr1 + +; GCN-LABEL: name: v_ashr_i32_e64__v_and_b32_e32 +; GCN: liveins: $vgpr0, $vgpr1 +; GCN-NEXT: {{ $}} +; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0 +; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1 +; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit $exec +; GCN-NEXT: %ret:vgpr_32 = V_ASHR_I32_e64 %src, %shiftmask, implicit $exec +; GCN-NEXT: $vgpr0 = COPY %ret +%src:vgpr_32 = COPY $vgpr0 +%shift:vgpr_32 = COPY $vgpr1 +%shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit $exec +%ret:vgpr_32 = V_ASHR_I32_e64 %src, %shiftmask, implicit $exec +$vgpr0 = COPY %ret +... + +--- +name: v_lshr_b32_e64__v_and_b32_e32 +tracksRegLiveness: true +body: | + bb.0: +liveins: $vgpr0, $vgpr1 + +; GCN-LABEL: name: v_lshr_b32_e64__v_and_b32_e32 +; GCN: liveins: $vgpr0, $vgpr1 +; GCN-NEXT: {{ $}} +; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0 +; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1 +; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit $exec +; GCN-NEXT: %ret:vgpr_32 = V_LSHR_B32_e64 %src, %shiftmask, implicit $exec +; GCN-NEXT: $vgpr0 = COPY %ret +%src:vgpr_32 = COPY $vgpr0 +%shift:vgpr_32 = COPY $vgpr1 +%shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit $exec +%ret:vgpr_32 = V_LSHR_B32_e64 %src, %shiftmask, implicit $exec +$vgpr0 = COPY %ret +... + +--- +name: v_lshr_b32_e32__v_and_b32_e32 +tracksRegLiveness: true +body: | + bb.0: +liveins: $vgpr0, $vgpr1 + +; GCN-LABEL: name: v_lshr_b32_e32__v_and_b32_e32 +; GCN: liveins: $vgpr0, $vgpr1 +; GCN-NEXT: {{ $}} +; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0 +; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1 +; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit $exec +; GCN-NEXT: %ret:vgpr_32 = V_LSHR_B32_e32 %src, %shiftmask, implicit $exec +; GCN-NEXT: $vgpr0 = COPY %ret +%src:vgpr_32 = COPY $vgpr0 +%shift:vgpr_32 = COPY $vgpr1 +%shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit $exec +%ret:vgpr_32 = V_LSHR_B32_e32 %src, %shiftmask, implicit $exec +$vgpr0 = COPY %ret +... + +--- +name: v_lshl_b32_e64__v_and_b32_e32 +tracksRegLiveness: true +body: | + bb.0: +liveins: $vgpr0, $vgpr1 + +; GCN-LABEL: name: v_lshl_b32_e64__v_and_b32_e32 +; GCN: liveins: $vgpr0, $vgpr1 +; GCN-NEXT: {{ $}} +; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0 +; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1 +; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit $exec +; GCN-NEXT: %ret:vgpr_32 = V_LSHL_B32_e64 %src, %shiftmask, implicit $exec +; GCN-NEXT: $vgpr0 = COPY %ret +%src:vgpr_32 = COPY $vgpr0 +%shift:vgpr_32 = COPY $vgpr1 +%shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit $exec +%ret:vgpr_32 = V_LSHL_B32_e64 %src, %shiftmask, implicit $exec +$vgpr0 = COPY %ret +... + +--- +name: s_lshl_b32__s_and_b32 +tracksRegLiveness: true +body: | + bb.0: +liveins: $sgpr0, $sgpr1 + +; GCN-LABEL: name: s_lshl_b32__s_and_b32 +; GCN: liveins: $sgpr0, $sgpr1 +; GCN-NEXT: {{ $}} +; GCN-NEXT: %src:sgpr_32 = COPY $sgpr0 +; GCN-NEXT: %shift:sgpr_32 = COPY $sgpr1 +; GCN-NEXT: %shiftmask:sgpr_32 = S_AND_B32 65535, %shift, implicit-def $scc +; GCN-NEXT: %ret:sgpr_32 = S_LSHL_B32 %src, %shiftmask, implicit-def $scc +; GCN-NEXT: $sgpr0 = COPY %ret +%src:sgpr_32 = COPY $sgpr0 +%shift:sgpr_32 = COPY $sgpr1 +%shiftmask:sgpr_32 = S_AND_B32 65535, %shift, implicit-def $scc +%ret:sgpr_32 = S_LSHL_B32 %src, %shiftmask, implicit-def $scc +$sgpr0 = COPY %ret +... + +--- +name: s_lshr_b32__s_and_b32 +tracksRegLiveness: true +body: | + bb.0: +liveins: $sgpr0, $sgpr1 + +; GCN-LABEL: name: s_lshr_b32__s_and_b32 +; GCN: liveins: $sgpr0, $sgpr1 +; GCN-NEXT: {{ $}} +; GCN-NEXT: %src:sgpr_32 = COPY $sgpr0 +; GCN-NEXT: %shift:sgpr_
[llvm-branch-commits] [llvm] [AMDGPU][GlobalISel] Allow forming s16 U/SBFX pre-regbankselect (PR #131309)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/131309 >From c30cc50e3650137bdb8acc9674c312f6c088983f Mon Sep 17 00:00:00 2001 From: pvanhout Date: Wed, 12 Mar 2025 09:43:15 +0100 Subject: [PATCH] [AMDGPU][GlobalISel] Allow forming s16 U/SBFX pre-regbankselect Make s16 G_U/SBFX legal and widen them in RegBankSelect. This allows the set of BFX formation combines to work on s16 types. --- .../lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | 9 +- .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp | 33 +- llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll | 645 -- llvm/test/CodeGen/AMDGPU/GlobalISel/fshr.ll | 380 --- .../AMDGPU/GlobalISel/legalize-sbfx.mir | 26 +- .../AMDGPU/GlobalISel/legalize-ubfx.mir | 27 +- llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll | 27 +- 7 files changed, 503 insertions(+), 644 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp index cfb5c3b3006f0..ab900157d2095 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp @@ -2069,10 +2069,13 @@ AMDGPULegalizerInfo::AMDGPULegalizerInfo(const GCNSubtarget &ST_, .minScalar(0, S32) .lower(); + // Only {S32, S32} or {S32, S64} should ever reach codegen. + // We allow S/UBFX for S16 so the combiner can form them before + // RegBankSelect, and RegBankSelect will then legalize them correctly. getActionDefinitionsBuilder({G_SBFX, G_UBFX}) - .legalFor({{S32, S32}, {S64, S32}}) - .clampScalar(1, S32, S32) - .clampScalar(0, S32, S64) + .legalFor({{S16, S16}, {S32, S32}, {S64, S32}}) + .clampScalar(1, S16, S32) + .clampScalar(0, S16, S64) .widenScalarToNextPow2(0) .scalarize(0); diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp index 27b86723ce474..ed0d52f6b2441 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp @@ -1485,7 +1485,9 @@ bool AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B, Register DstReg = MI.getOperand(0).getReg(); LLT Ty = MRI.getType(DstReg); + const LLT S64 = LLT::scalar(64); const LLT S32 = LLT::scalar(32); + const LLT S16 = LLT::scalar(16); unsigned FirstOpnd = isa(MI) ? 2 : 1; Register SrcReg = MI.getOperand(FirstOpnd).getReg(); @@ -1495,6 +1497,18 @@ bool AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B, const RegisterBank *DstBank = OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank; if (DstBank == &AMDGPU::VGPRRegBank) { +if (Ty == S16) { + ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::VGPRRegBank); + B.setInsertPt(B.getMBB(), MI); + LegalizerHelper Helper(B.getMF(), ApplyBank, B); + + Helper.widenScalarDst(MI, S32); + Helper.widenScalarSrc(MI, S32, 1, AMDGPU::G_ANYEXT); + Helper.widenScalarSrc(MI, S32, 2, AMDGPU::G_ZEXT); + Helper.widenScalarSrc(MI, S32, 3, AMDGPU::G_ZEXT); + return true; +} + if (Ty == S32) return true; @@ -1554,6 +1568,11 @@ bool AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B, ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::SGPRRegBank); + if (Ty == S16) { +OffsetReg = B.buildAnyExtOrTrunc(S32, OffsetReg).getReg(0); +WidthReg = B.buildAnyExtOrTrunc(S32, WidthReg).getReg(0); + } + // Ensure the high bits are clear to insert the offset. auto OffsetMask = B.buildConstant(S32, maskTrailingOnes(6)); auto ClampOffset = B.buildAnd(S32, OffsetReg, OffsetMask); @@ -1568,13 +1587,21 @@ bool AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B, // TODO: It might be worth using a pseudo here to avoid scc clobber and // register class constraints. - unsigned Opc = Ty == S32 ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32) : - (Signed ? AMDGPU::S_BFE_I64 : AMDGPU::S_BFE_U64); + unsigned Opc = (Ty != S64) ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32) + : (Signed ? AMDGPU::S_BFE_I64 : AMDGPU::S_BFE_U64); - auto MIB = B.buildInstr(Opc, {DstReg}, {SrcReg, MergedInputs}); + Register BFEDst = DstReg; + if (Ty == S16) { +BFEDst = MRI.createGenericVirtualRegister(S32); +MRI.setRegBank(BFEDst, AMDGPU::SGPRRegBank); + } + auto MIB = B.buildInstr(Opc, {BFEDst}, {SrcReg, MergedInputs}); if (!constrainSelectedInstRegOperands(*MIB, *TII, *TRI, *this)) llvm_unreachable("failed to constrain BFE"); + if (BFEDst != DstReg) +B.buildZExtOrTrunc(DstReg, BFEDst); + MI.eraseFromParent(); return true; } diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll index 07fcb02d98649..d2b600b04f9fc 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/fsh
[llvm-branch-commits] [llvm] [AMDGPU][GlobalISel] Allow forming s16 U/SBFX pre-regbankselect (PR #131309)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/131309 >From c30cc50e3650137bdb8acc9674c312f6c088983f Mon Sep 17 00:00:00 2001 From: pvanhout Date: Wed, 12 Mar 2025 09:43:15 +0100 Subject: [PATCH] [AMDGPU][GlobalISel] Allow forming s16 U/SBFX pre-regbankselect Make s16 G_U/SBFX legal and widen them in RegBankSelect. This allows the set of BFX formation combines to work on s16 types. --- .../lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | 9 +- .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp | 33 +- llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll | 645 -- llvm/test/CodeGen/AMDGPU/GlobalISel/fshr.ll | 380 --- .../AMDGPU/GlobalISel/legalize-sbfx.mir | 26 +- .../AMDGPU/GlobalISel/legalize-ubfx.mir | 27 +- llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll | 27 +- 7 files changed, 503 insertions(+), 644 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp index cfb5c3b3006f0..ab900157d2095 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp @@ -2069,10 +2069,13 @@ AMDGPULegalizerInfo::AMDGPULegalizerInfo(const GCNSubtarget &ST_, .minScalar(0, S32) .lower(); + // Only {S32, S32} or {S32, S64} should ever reach codegen. + // We allow S/UBFX for S16 so the combiner can form them before + // RegBankSelect, and RegBankSelect will then legalize them correctly. getActionDefinitionsBuilder({G_SBFX, G_UBFX}) - .legalFor({{S32, S32}, {S64, S32}}) - .clampScalar(1, S32, S32) - .clampScalar(0, S32, S64) + .legalFor({{S16, S16}, {S32, S32}, {S64, S32}}) + .clampScalar(1, S16, S32) + .clampScalar(0, S16, S64) .widenScalarToNextPow2(0) .scalarize(0); diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp index 27b86723ce474..ed0d52f6b2441 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp @@ -1485,7 +1485,9 @@ bool AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B, Register DstReg = MI.getOperand(0).getReg(); LLT Ty = MRI.getType(DstReg); + const LLT S64 = LLT::scalar(64); const LLT S32 = LLT::scalar(32); + const LLT S16 = LLT::scalar(16); unsigned FirstOpnd = isa(MI) ? 2 : 1; Register SrcReg = MI.getOperand(FirstOpnd).getReg(); @@ -1495,6 +1497,18 @@ bool AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B, const RegisterBank *DstBank = OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank; if (DstBank == &AMDGPU::VGPRRegBank) { +if (Ty == S16) { + ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::VGPRRegBank); + B.setInsertPt(B.getMBB(), MI); + LegalizerHelper Helper(B.getMF(), ApplyBank, B); + + Helper.widenScalarDst(MI, S32); + Helper.widenScalarSrc(MI, S32, 1, AMDGPU::G_ANYEXT); + Helper.widenScalarSrc(MI, S32, 2, AMDGPU::G_ZEXT); + Helper.widenScalarSrc(MI, S32, 3, AMDGPU::G_ZEXT); + return true; +} + if (Ty == S32) return true; @@ -1554,6 +1568,11 @@ bool AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B, ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::SGPRRegBank); + if (Ty == S16) { +OffsetReg = B.buildAnyExtOrTrunc(S32, OffsetReg).getReg(0); +WidthReg = B.buildAnyExtOrTrunc(S32, WidthReg).getReg(0); + } + // Ensure the high bits are clear to insert the offset. auto OffsetMask = B.buildConstant(S32, maskTrailingOnes(6)); auto ClampOffset = B.buildAnd(S32, OffsetReg, OffsetMask); @@ -1568,13 +1587,21 @@ bool AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B, // TODO: It might be worth using a pseudo here to avoid scc clobber and // register class constraints. - unsigned Opc = Ty == S32 ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32) : - (Signed ? AMDGPU::S_BFE_I64 : AMDGPU::S_BFE_U64); + unsigned Opc = (Ty != S64) ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32) + : (Signed ? AMDGPU::S_BFE_I64 : AMDGPU::S_BFE_U64); - auto MIB = B.buildInstr(Opc, {DstReg}, {SrcReg, MergedInputs}); + Register BFEDst = DstReg; + if (Ty == S16) { +BFEDst = MRI.createGenericVirtualRegister(S32); +MRI.setRegBank(BFEDst, AMDGPU::SGPRRegBank); + } + auto MIB = B.buildInstr(Opc, {BFEDst}, {SrcReg, MergedInputs}); if (!constrainSelectedInstRegOperands(*MIB, *TII, *TRI, *this)) llvm_unreachable("failed to constrain BFE"); + if (BFEDst != DstReg) +B.buildZExtOrTrunc(DstReg, BFEDst); + MI.eraseFromParent(); return true; } diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll index 07fcb02d98649..d2b600b04f9fc 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/fsh
[llvm-branch-commits] [llvm] [AMDGPU] Precommit si-fold-bitmask.mir (PR #131310)
Pierre-vh wrote: > > GlobalISel unfortunately needs it. We can end up with things like a > > `G_LSHR` with the shift amount being zext'd, and they're both lowered > > independently so we have a `s_and_b32` of the shift amount. > > It should always be post legalize / post regbankselect combinable. Things are > strictly more difficult after selection The main issue I was having was with code that had <32 bit arguments in registers. We'd have ``` %0(s32) = COPY $sgpr0 %1(s16) = G_TRUNC %0 %2(s32) = G_ZEXT %1 ``` Then %2 being used as the shift amount. We can't eliminate the zext/trunc because the generic opcode has no mention of reading only the lower bits, AFAIK. I tried experimenting with multiple approaches but I didn't find anything better than doing it in SIFoldOperand https://github.com/llvm/llvm-project/pull/131310 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU][GlobalISel] Combine (sext (trunc (sext_in_reg x))) (PR #131312)
@@ -258,6 +258,14 @@ def sext_trunc_sextload : GICombineRule< [{ return Helper.matchSextTruncSextLoad(*${d}); }]), (apply [{ Helper.applySextTruncSextLoad(*${d}); }])>; +def sext_trunc_sextinreg : GICombineRule< + (defs root:$dst), + (match (G_SEXT_INREG $sir, $src, $width), + (G_TRUNC $trunc, $sir), + (G_SEXT $dst, $trunc), + [{ return (MRI.getType(${trunc}.getReg()).getScalarSizeInBits() >= ${width}.getImm()); }]), Pierre-vh wrote: Apply isn't allowed to fail. It's just that the presence of `GIReplaceReg` triggers emission of a `canReplaceReg` call during the matching portion of the match table rule. > On a related note, couldn't you split this whole combine into two > independently useful parts: Good idea, I can try that https://github.com/llvm/llvm-project/pull/131312 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU][GlobalISel] Combine (sext (trunc (sext_in_reg x))) (PR #131312)
https://github.com/Pierre-vh ready_for_review https://github.com/llvm/llvm-project/pull/131312 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Precommit si-fold-bitmask.mir (PR #131310)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/131310 >From fcd5623ccd18100197817f7f4d5a500ca433f8dc Mon Sep 17 00:00:00 2001 From: pvanhout Date: Fri, 14 Mar 2025 10:00:21 +0100 Subject: [PATCH] [AMDGPU] Precommit si-fold-bitmask.mir --- llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir | 429 ++ 1 file changed, 429 insertions(+) create mode 100644 llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir diff --git a/llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir b/llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir new file mode 100644 index 0..1edf970591179 --- /dev/null +++ b/llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir @@ -0,0 +1,429 @@ +# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py +# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -run-pass=si-fold-operands -verify-machineinstrs -o - %s | FileCheck --check-prefix=GCN %s + +# Test supported instructions + +--- +name: v_ashr_i32_e64__v_and_b32_e32 +tracksRegLiveness: true +body: | + bb.0: +liveins: $vgpr0, $vgpr1 + +; GCN-LABEL: name: v_ashr_i32_e64__v_and_b32_e32 +; GCN: liveins: $vgpr0, $vgpr1 +; GCN-NEXT: {{ $}} +; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0 +; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1 +; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit $exec +; GCN-NEXT: %ret:vgpr_32 = V_ASHR_I32_e64 %src, %shiftmask, implicit $exec +; GCN-NEXT: $vgpr0 = COPY %ret +%src:vgpr_32 = COPY $vgpr0 +%shift:vgpr_32 = COPY $vgpr1 +%shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit $exec +%ret:vgpr_32 = V_ASHR_I32_e64 %src, %shiftmask, implicit $exec +$vgpr0 = COPY %ret +... + +--- +name: v_lshr_b32_e64__v_and_b32_e32 +tracksRegLiveness: true +body: | + bb.0: +liveins: $vgpr0, $vgpr1 + +; GCN-LABEL: name: v_lshr_b32_e64__v_and_b32_e32 +; GCN: liveins: $vgpr0, $vgpr1 +; GCN-NEXT: {{ $}} +; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0 +; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1 +; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit $exec +; GCN-NEXT: %ret:vgpr_32 = V_LSHR_B32_e64 %src, %shiftmask, implicit $exec +; GCN-NEXT: $vgpr0 = COPY %ret +%src:vgpr_32 = COPY $vgpr0 +%shift:vgpr_32 = COPY $vgpr1 +%shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit $exec +%ret:vgpr_32 = V_LSHR_B32_e64 %src, %shiftmask, implicit $exec +$vgpr0 = COPY %ret +... + +--- +name: v_lshr_b32_e32__v_and_b32_e32 +tracksRegLiveness: true +body: | + bb.0: +liveins: $vgpr0, $vgpr1 + +; GCN-LABEL: name: v_lshr_b32_e32__v_and_b32_e32 +; GCN: liveins: $vgpr0, $vgpr1 +; GCN-NEXT: {{ $}} +; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0 +; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1 +; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit $exec +; GCN-NEXT: %ret:vgpr_32 = V_LSHR_B32_e32 %src, %shiftmask, implicit $exec +; GCN-NEXT: $vgpr0 = COPY %ret +%src:vgpr_32 = COPY $vgpr0 +%shift:vgpr_32 = COPY $vgpr1 +%shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit $exec +%ret:vgpr_32 = V_LSHR_B32_e32 %src, %shiftmask, implicit $exec +$vgpr0 = COPY %ret +... + +--- +name: v_lshl_b32_e64__v_and_b32_e32 +tracksRegLiveness: true +body: | + bb.0: +liveins: $vgpr0, $vgpr1 + +; GCN-LABEL: name: v_lshl_b32_e64__v_and_b32_e32 +; GCN: liveins: $vgpr0, $vgpr1 +; GCN-NEXT: {{ $}} +; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0 +; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1 +; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit $exec +; GCN-NEXT: %ret:vgpr_32 = V_LSHL_B32_e64 %src, %shiftmask, implicit $exec +; GCN-NEXT: $vgpr0 = COPY %ret +%src:vgpr_32 = COPY $vgpr0 +%shift:vgpr_32 = COPY $vgpr1 +%shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit $exec +%ret:vgpr_32 = V_LSHL_B32_e64 %src, %shiftmask, implicit $exec +$vgpr0 = COPY %ret +... + +--- +name: s_lshl_b32__s_and_b32 +tracksRegLiveness: true +body: | + bb.0: +liveins: $sgpr0, $sgpr1 + +; GCN-LABEL: name: s_lshl_b32__s_and_b32 +; GCN: liveins: $sgpr0, $sgpr1 +; GCN-NEXT: {{ $}} +; GCN-NEXT: %src:sgpr_32 = COPY $sgpr0 +; GCN-NEXT: %shift:sgpr_32 = COPY $sgpr1 +; GCN-NEXT: %shiftmask:sgpr_32 = S_AND_B32 65535, %shift, implicit-def $scc +; GCN-NEXT: %ret:sgpr_32 = S_LSHL_B32 %src, %shiftmask, implicit-def $scc +; GCN-NEXT: $sgpr0 = COPY %ret +%src:sgpr_32 = COPY $sgpr0 +%shift:sgpr_32 = COPY $sgpr1 +%shiftmask:sgpr_32 = S_AND_B32 65535, %shift, implicit-def $scc +%ret:sgpr_32 = S_LSHL_B32 %src, %shiftmask, implicit-def $scc +$sgpr0 = COPY %ret +... + +--- +name: s_lshr_b32__s_and_b32 +tracksRegLiveness: true +body: | + bb.0: +liveins: $sgpr0, $sgpr1 + +; GCN-LABEL: name: s_lshr_b32__s_and_b32 +; GCN: liveins: $sgpr0, $sgpr1 +; GCN-NEXT: {{ $}} +; GCN-NEXT: %src:sgpr_32 = COPY $sgpr0 +; GCN-NEXT: %shift:sgpr_
[llvm-branch-commits] [llvm] [AMDGPU][GlobalISel] Allow forming s16 U/SBFX pre-regbankselect (PR #131309)
Pierre-vh wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/131309?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#131312** https://app.graphite.dev/github/pr/llvm/llvm-project/131312?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#131311** https://app.graphite.dev/github/pr/llvm/llvm-project/131311?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#131310** https://app.graphite.dev/github/pr/llvm/llvm-project/131310?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#131309** https://app.graphite.dev/github/pr/llvm/llvm-project/131309?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/131309?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#131308** https://app.graphite.dev/github/pr/llvm/llvm-project/131308?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#131307** https://app.graphite.dev/github/pr/llvm/llvm-project/131307?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#131306** https://app.graphite.dev/github/pr/llvm/llvm-project/131306?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#131305** https://app.graphite.dev/github/pr/llvm/llvm-project/131305?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn more about https://stacking.dev/?utm_source=stack-comment";>stacking. https://github.com/llvm/llvm-project/pull/131309 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU][SIFoldOperands] Fold some redundant bitmasks (PR #131311)
https://github.com/Pierre-vh ready_for_review https://github.com/llvm/llvm-project/pull/131311 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU][Legalizer] Widen i16 G_SEXT_INREG (PR #131308)
Pierre-vh wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/131308?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#131312** https://app.graphite.dev/github/pr/llvm/llvm-project/131312?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#131311** https://app.graphite.dev/github/pr/llvm/llvm-project/131311?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#131310** https://app.graphite.dev/github/pr/llvm/llvm-project/131310?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#131309** https://app.graphite.dev/github/pr/llvm/llvm-project/131309?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#131308** https://app.graphite.dev/github/pr/llvm/llvm-project/131308?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/131308?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#131307** https://app.graphite.dev/github/pr/llvm/llvm-project/131307?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#131306** https://app.graphite.dev/github/pr/llvm/llvm-project/131306?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#131305** https://app.graphite.dev/github/pr/llvm/llvm-project/131305?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn more about https://stacking.dev/?utm_source=stack-comment";>stacking. https://github.com/llvm/llvm-project/pull/131308 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU][GlobalISel] Combine (sext (trunc (sext_in_reg x))) (PR #131312)
Pierre-vh wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/131312?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#131312** https://app.graphite.dev/github/pr/llvm/llvm-project/131312?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/131312?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#131311** https://app.graphite.dev/github/pr/llvm/llvm-project/131311?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#131310** https://app.graphite.dev/github/pr/llvm/llvm-project/131310?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#131309** https://app.graphite.dev/github/pr/llvm/llvm-project/131309?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#131308** https://app.graphite.dev/github/pr/llvm/llvm-project/131308?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#131307** https://app.graphite.dev/github/pr/llvm/llvm-project/131307?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#131306** https://app.graphite.dev/github/pr/llvm/llvm-project/131306?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#131305** https://app.graphite.dev/github/pr/llvm/llvm-project/131305?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn more about https://stacking.dev/?utm_source=stack-comment";>stacking. https://github.com/llvm/llvm-project/pull/131312 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU][RegBankInfo] Promote scalar i16 and/or/xor to i32 (PR #131306)
https://github.com/Pierre-vh created https://github.com/llvm/llvm-project/pull/131306 See #64591 >From a9f0563665a6d2b69fdee0d826cb52d6651c3dc4 Mon Sep 17 00:00:00 2001 From: pvanhout Date: Wed, 5 Mar 2025 10:46:01 +0100 Subject: [PATCH] [AMDGPU][RegBankInfo] Promote scalar i16 and/or/xor to i32 See #64591 --- .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp | 28 +- llvm/test/CodeGen/AMDGPU/GlobalISel/andn2.ll | 10 +- llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll | 519 -- llvm/test/CodeGen/AMDGPU/GlobalISel/fshr.ll | 286 +- llvm/test/CodeGen/AMDGPU/GlobalISel/orn2.ll | 10 +- 5 files changed, 403 insertions(+), 450 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp index c19ee14ab1574..27b86723ce474 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp @@ -2416,9 +2416,10 @@ void AMDGPURegisterBankInfo::applyMappingImpl( Register DstReg = MI.getOperand(0).getReg(); LLT DstTy = MRI.getType(DstReg); -if (DstTy.getSizeInBits() == 1) { - const RegisterBank *DstBank = +const RegisterBank *DstBank = OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank; + +if (DstTy.getSizeInBits() == 1) { if (DstBank == &AMDGPU::VCCRegBank) break; @@ -2432,6 +2433,29 @@ void AMDGPURegisterBankInfo::applyMappingImpl( return; } +// 16-bit operations are VALU only, but can be promoted to 32-bit SALU. +// Packed 16-bit operations need to be scalarized and promoted. +if (DstTy.getSizeInBits() == 16 && DstBank == &AMDGPU::SGPRRegBank) { + const LLT S32 = LLT::scalar(32); + MachineBasicBlock *MBB = MI.getParent(); + MachineFunction *MF = MBB->getParent(); + ApplyRegBankMapping ApplySALU(B, *this, MRI, &AMDGPU::SGPRRegBank); + LegalizerHelper Helper(*MF, ApplySALU, B); + // Widen to S32, but handle `G_XOR x, -1` differently. Legalizer widening + // will use a G_ANYEXT to extend the -1 which prevents matching G_XOR -1 + // as "not". + if (MI.getOpcode() == AMDGPU::G_XOR && + mi_match(MI.getOperand(2).getReg(), MRI, m_SpecificICstOrSplat(-1))) { +Helper.widenScalarSrc(MI, S32, 1, AMDGPU::G_ANYEXT); +Helper.widenScalarSrc(MI, S32, 2, AMDGPU::G_SEXT); +Helper.widenScalarDst(MI, S32); + } else { +if (Helper.widenScalar(MI, 0, S32) != LegalizerHelper::Legalized) + llvm_unreachable("widen scalar should have succeeded"); + } + return; +} + if (DstTy.getSizeInBits() != 64) break; diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/andn2.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/andn2.ll index 1a94429b1b5a1..36359579ea442 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/andn2.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/andn2.ll @@ -391,20 +391,20 @@ define amdgpu_ps i16 @s_andn2_i16_commute(i16 inreg %src0, i16 inreg %src1) { define amdgpu_ps { i16, i16 } @s_andn2_i16_multi_use(i16 inreg %src0, i16 inreg %src1) { ; GCN-LABEL: s_andn2_i16_multi_use: ; GCN: ; %bb.0: -; GCN-NEXT:s_xor_b32 s1, s3, -1 +; GCN-NEXT:s_not_b32 s1, s3 ; GCN-NEXT:s_andn2_b32 s0, s2, s3 ; GCN-NEXT:; return to shader part epilog ; ; GFX10-LABEL: s_andn2_i16_multi_use: ; GFX10: ; %bb.0: ; GFX10-NEXT:s_andn2_b32 s0, s2, s3 -; GFX10-NEXT:s_xor_b32 s1, s3, -1 +; GFX10-NEXT:s_not_b32 s1, s3 ; GFX10-NEXT:; return to shader part epilog ; ; GFX11-LABEL: s_andn2_i16_multi_use: ; GFX11: ; %bb.0: ; GFX11-NEXT:s_and_not1_b32 s0, s2, s3 -; GFX11-NEXT:s_xor_b32 s1, s3, -1 +; GFX11-NEXT:s_not_b32 s1, s3 ; GFX11-NEXT:; return to shader part epilog %not.src1 = xor i16 %src1, -1 %and = and i16 %src0, %not.src1 @@ -482,14 +482,14 @@ define amdgpu_ps float @v_andn2_i16_sv(i16 inreg %src0, i16 %src1) { define amdgpu_ps float @v_andn2_i16_vs(i16 %src0, i16 inreg %src1) { ; GCN-LABEL: v_andn2_i16_vs: ; GCN: ; %bb.0: -; GCN-NEXT:s_xor_b32 s0, s2, -1 +; GCN-NEXT:s_not_b32 s0, s2 ; GCN-NEXT:v_and_b32_e32 v0, s0, v0 ; GCN-NEXT:v_and_b32_e32 v0, 0x, v0 ; GCN-NEXT:; return to shader part epilog ; ; GFX10PLUS-LABEL: v_andn2_i16_vs: ; GFX10PLUS: ; %bb.0: -; GFX10PLUS-NEXT:s_xor_b32 s0, s2, -1 +; GFX10PLUS-NEXT:s_not_b32 s0, s2 ; GFX10PLUS-NEXT:v_and_b32_e32 v0, s0, v0 ; GFX10PLUS-NEXT:v_and_b32_e32 v0, 0x, v0 ; GFX10PLUS-NEXT:; return to shader part epilog diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll index e60739fd84059..3a52497bd6e91 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll @@ -1052,17 +1052,14 @@ define amdgpu_ps i32 @s_fshl_v4i8(i32 inreg %lhs.arg, i32 inreg %rhs.arg, i32 in ; GFX8-NEXT:s_lshr_b32 s2, s2, s3 ; GF
[llvm-branch-commits] [llvm] [AMDGPU][SIFoldOperands] Fold some redundant bitmasks (PR #131311)
Pierre-vh wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/131311?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#131312** https://app.graphite.dev/github/pr/llvm/llvm-project/131312?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#131311** https://app.graphite.dev/github/pr/llvm/llvm-project/131311?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/131311?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#131310** https://app.graphite.dev/github/pr/llvm/llvm-project/131310?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#131309** https://app.graphite.dev/github/pr/llvm/llvm-project/131309?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#131308** https://app.graphite.dev/github/pr/llvm/llvm-project/131308?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#131307** https://app.graphite.dev/github/pr/llvm/llvm-project/131307?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#131306** https://app.graphite.dev/github/pr/llvm/llvm-project/131306?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#131305** https://app.graphite.dev/github/pr/llvm/llvm-project/131305?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn more about https://stacking.dev/?utm_source=stack-comment";>stacking. https://github.com/llvm/llvm-project/pull/131311 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU][Legalizer] Widen i16 G_SEXT_INREG (PR #131308)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/131308 >From e6862b4528d1ed48bbca9e742dd9a96d8777545b Mon Sep 17 00:00:00 2001 From: pvanhout Date: Wed, 5 Mar 2025 13:41:04 +0100 Subject: [PATCH 1/2] [AMDGPU][Legalizer] Widen i16 G_SEXT_INREG It's better to widen them to avoid it being lowered into a G_ASHR + G_SHL. With this change we just extend to i32 then trunc the result. --- .../lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | 3 +- llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll | 7 +- .../AMDGPU/GlobalISel/legalize-abs.mir| 8 +- .../AMDGPU/GlobalISel/legalize-ashr.mir | 20 +-- .../AMDGPU/GlobalISel/legalize-sext-inreg.mir | 155 +++--- .../AMDGPU/GlobalISel/legalize-sext.mir | 101 ++-- .../AMDGPU/GlobalISel/legalize-smax.mir | 33 +++- .../AMDGPU/GlobalISel/legalize-smin.mir | 33 +++- .../AMDGPU/GlobalISel/legalize-smulh.mir | 132 +++ .../CodeGen/AMDGPU/GlobalISel/llvm.abs.ll | 45 ++--- .../CodeGen/AMDGPU/GlobalISel/sext_inreg.ll | 130 ++- 11 files changed, 299 insertions(+), 368 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp index b3a8183beeacf..6e611ebb4b625 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp @@ -2009,7 +2009,8 @@ AMDGPULegalizerInfo::AMDGPULegalizerInfo(const GCNSubtarget &ST_, // S64 is only legal on SALU, and needs to be broken into 32-bit elements in // RegBankSelect. auto &SextInReg = getActionDefinitionsBuilder(G_SEXT_INREG) -.legalFor({{S32}, {S64}}); +.legalFor({{S32}, {S64}}) +.widenScalarIf(typeIs(0, S16), widenScalarOrEltToNextPow2(0, 32)); if (ST.hasVOP3PInsts()) { SextInReg.lowerFor({{V2S16}}) diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll index 493e8cef63890..f81d7f1c300b8 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll @@ -17,8 +17,7 @@ define i8 @v_ashr_i8(i8 %value, i8 %amount) { ; GFX8-LABEL: v_ashr_i8: ; GFX8: ; %bb.0: ; GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX8-NEXT:v_lshlrev_b16_e32 v0, 8, v0 -; GFX8-NEXT:v_ashrrev_i16_sdwa v0, v1, sext(v0) dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:BYTE_1 +; GFX8-NEXT:v_ashrrev_i16_sdwa v0, v1, sext(v0) dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:BYTE_0 ; GFX8-NEXT:s_setpc_b64 s[30:31] ; ; GFX9-LABEL: v_ashr_i8: @@ -49,8 +48,8 @@ define i8 @v_ashr_i8_7(i8 %value) { ; GFX8-LABEL: v_ashr_i8_7: ; GFX8: ; %bb.0: ; GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX8-NEXT:v_lshlrev_b16_e32 v0, 8, v0 -; GFX8-NEXT:v_ashrrev_i16_e32 v0, 15, v0 +; GFX8-NEXT:v_mov_b32_e32 v1, 7 +; GFX8-NEXT:v_ashrrev_i16_sdwa v0, v1, sext(v0) dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:BYTE_0 ; GFX8-NEXT:s_setpc_b64 s[30:31] ; ; GFX9-LABEL: v_ashr_i8_7: diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-abs.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-abs.mir index a9fe80eb47e76..2b911b2dce697 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-abs.mir +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-abs.mir @@ -144,11 +144,9 @@ body: | ; VI: liveins: $vgpr0 ; VI-NEXT: {{ $}} ; VI-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0 -; VI-NEXT: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY]](s32) -; VI-NEXT: [[C:%[0-9]+]]:_(s16) = G_CONSTANT i16 8 -; VI-NEXT: [[SHL:%[0-9]+]]:_(s16) = G_SHL [[TRUNC]], [[C]](s16) -; VI-NEXT: [[ASHR:%[0-9]+]]:_(s16) = G_ASHR [[SHL]], [[C]](s16) -; VI-NEXT: [[ABS:%[0-9]+]]:_(s16) = G_ABS [[ASHR]] +; VI-NEXT: [[SEXT_INREG:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY]], 8 +; VI-NEXT: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[SEXT_INREG]](s32) +; VI-NEXT: [[ABS:%[0-9]+]]:_(s16) = G_ABS [[TRUNC]] ; VI-NEXT: [[ANYEXT:%[0-9]+]]:_(s32) = G_ANYEXT [[ABS]](s16) ; VI-NEXT: $vgpr0 = COPY [[ANYEXT]](s32) ; diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ashr.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ashr.mir index f4aaab745e03b..53905a2f49dd0 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ashr.mir +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ashr.mir @@ -319,12 +319,10 @@ body: | ; VI-NEXT: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY1]](s32) ; VI-NEXT: [[C:%[0-9]+]]:_(s16) = G_CONSTANT i16 255 ; VI-NEXT: [[AND:%[0-9]+]]:_(s16) = G_AND [[TRUNC]], [[C]] -; VI-NEXT: [[TRUNC1:%[0-9]+]]:_(s16) = G_TRUNC [[COPY]](s32) -; VI-NEXT: [[C1:%[0-9]+]]:_(s16) = G_CONSTANT i16 8 -; VI-NEXT: [[SHL:%[0-9]+]]:_(s16) = G_SHL [[TRUNC1]], [[C1]](s16) -; VI-NEXT: [[ASHR:%[0-9]+]]:_(s16) = G_ASHR [[SHL]], [[C1]](s16) -; VI-NEXT: [[ASHR1:%[0-9]+]]:_(s16) = G_ASHR [[ASHR]], [
[llvm-branch-commits] [llvm] [GlobalISel] Combine redundant sext_inreg (PR #131624)
https://github.com/Pierre-vh created https://github.com/llvm/llvm-project/pull/131624 None >From e36f66595a582b6ba926186674b6da6b41236ff5 Mon Sep 17 00:00:00 2001 From: pvanhout Date: Mon, 17 Mar 2025 13:54:59 +0100 Subject: [PATCH] [GlobalISel] Combine redundant sext_inreg --- .../llvm/CodeGen/GlobalISel/CombinerHelper.h | 3 + .../include/llvm/Target/GlobalISel/Combine.td | 9 +- .../GlobalISel/CombinerHelperCasts.cpp| 27 +++ .../combine-redundant-sext-inreg.mir | 164 ++ .../combine-sext-trunc-sextinreg.mir | 87 ++ .../CodeGen/AMDGPU/GlobalISel/llvm.abs.ll | 5 - 6 files changed, 289 insertions(+), 6 deletions(-) create mode 100644 llvm/test/CodeGen/AMDGPU/GlobalISel/combine-redundant-sext-inreg.mir create mode 100644 llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir diff --git a/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h b/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h index 9b78342c8fc39..5778377d125a8 100644 --- a/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h +++ b/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h @@ -994,6 +994,9 @@ class CombinerHelper { // overflow sub bool matchSuboCarryOut(const MachineInstr &MI, BuildFnTy &MatchInfo) const; + // (sext_inreg (sext_inreg x, K0), K1) + void applyRedundantSextInReg(MachineInstr &Root, MachineInstr &Other) const; + private: /// Checks for legality of an indexed variant of \p LdSt. bool isIndexedLoadStoreLegal(GLoadStore &LdSt) const; diff --git a/llvm/include/llvm/Target/GlobalISel/Combine.td b/llvm/include/llvm/Target/GlobalISel/Combine.td index 660b03080f92e..6a0ff683a4647 100644 --- a/llvm/include/llvm/Target/GlobalISel/Combine.td +++ b/llvm/include/llvm/Target/GlobalISel/Combine.td @@ -1849,6 +1849,12 @@ def anyext_of_anyext : ext_of_ext_opcodes; def anyext_of_zext : ext_of_ext_opcodes; def anyext_of_sext : ext_of_ext_opcodes; +def sext_inreg_of_sext_inreg : GICombineRule< + (defs root:$dst), + (match (G_SEXT_INREG $x, $src, $a):$other, + (G_SEXT_INREG $dst, $x, $b):$root), + (apply [{ Helper.applyRedundantSextInReg(*${root}, *${other}); }])>; + // Push cast through build vector. class buildvector_of_opcode : GICombineRule < (defs root:$root, build_fn_matchinfo:$matchinfo), @@ -1896,7 +1902,8 @@ def cast_of_cast_combines: GICombineGroup<[ sext_of_anyext, anyext_of_anyext, anyext_of_zext, - anyext_of_sext + anyext_of_sext, + sext_inreg_of_sext_inreg, ]>; def cast_combines: GICombineGroup<[ diff --git a/llvm/lib/CodeGen/GlobalISel/CombinerHelperCasts.cpp b/llvm/lib/CodeGen/GlobalISel/CombinerHelperCasts.cpp index 182484754d091..ffc2384fc14fd 100644 --- a/llvm/lib/CodeGen/GlobalISel/CombinerHelperCasts.cpp +++ b/llvm/lib/CodeGen/GlobalISel/CombinerHelperCasts.cpp @@ -372,3 +372,30 @@ bool CombinerHelper::matchCastOfInteger(const MachineInstr &CastMI, return false; } } + +void CombinerHelper::applyRedundantSextInReg(MachineInstr &Root, + MachineInstr &Other) const { + assert(Root.getOpcode() == TargetOpcode::G_SEXT_INREG && + Other.getOpcode() == TargetOpcode::G_SEXT_INREG); + + unsigned RootWidth = Root.getOperand(2).getImm(); + unsigned OtherWidth = Other.getOperand(2).getImm(); + + Register Dst = Root.getOperand(0).getReg(); + Register OtherDst = Other.getOperand(0).getReg(); + Register Src = Other.getOperand(1).getReg(); + + if (RootWidth >= OtherWidth) { +// The root sext_inreg is entirely redundant because the other one +// is narrower. +Observer.changingAllUsesOfReg(MRI, Dst); +MRI.replaceRegWith(Dst, OtherDst); +Observer.finishedChangingAllUsesOfReg(); + } else { +// RootWidth < OtherWidth, rewrite this G_SEXT_INREG with the source of the +// other G_SEXT_INREG. +Builder.buildSExtInReg(Dst, Src, RootWidth); + } + + Root.eraseFromParent(); +} diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-redundant-sext-inreg.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-redundant-sext-inreg.mir new file mode 100644 index 0..566ee8e6c338d --- /dev/null +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-redundant-sext-inreg.mir @@ -0,0 +1,164 @@ +# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py +# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1030 -run-pass=amdgpu-regbank-combiner -verify-machineinstrs %s -o - | FileCheck %s + +--- +name: inreg8_inreg16 +tracksRegLiveness: true +body: | + bb.0: +liveins: $vgpr0 +; CHECK-LABEL: name: inreg8_inreg16 +; CHECK: liveins: $vgpr0 +; CHECK-NEXT: {{ $}} +; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0 +; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 8 +; CHECK-NEXT: $vgpr0 = COPY %inreg(s32) +%copy:_(s32) = COPY $vgpr0 +%inreg:_(s32) = G_SEXT_INREG %copy, 8 +%inreg1:_(s32) = G_SEXT_INREG %inreg, 16 +$vgpr0 = COPY %inreg1 +... +
[llvm-branch-commits] [llvm] [AMDGPU] Add sext_trunc in RegBankCombiner (PR #131623)
https://github.com/Pierre-vh created https://github.com/llvm/llvm-project/pull/131623 None >From 3f2cbbd6addf4844c7c861a6de55be59a8c96c35 Mon Sep 17 00:00:00 2001 From: pvanhout Date: Mon, 17 Mar 2025 13:22:25 +0100 Subject: [PATCH] [AMDGPU] Add sext_trunc in RegBankCombiner --- llvm/lib/Target/AMDGPU/AMDGPUCombine.td | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td index a21505356274b..083ce48911689 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td +++ b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td @@ -181,5 +181,5 @@ def AMDGPURegBankCombiner : GICombiner< zext_trunc_fold, int_minmax_to_med3, ptr_add_immed_chain, fp_minmax_to_clamp, fp_minmax_to_med3, fmed3_intrinsic_to_clamp, identity_combines, redundant_and, constant_fold_cast_op, - cast_of_cast_combines]> { + cast_of_cast_combines, sext_trunc]> { } ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Add sext_trunc in RegBankCombiner (PR #131623)
https://github.com/Pierre-vh ready_for_review https://github.com/llvm/llvm-project/pull/131623 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [GlobalISel] Combine redundant sext_inreg (PR #131624)
https://github.com/Pierre-vh ready_for_review https://github.com/llvm/llvm-project/pull/131624 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Add sext_trunc in RegBankCombiner (PR #131623)
Pierre-vh wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/131623?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#131624** https://app.graphite.dev/github/pr/llvm/llvm-project/131624?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#131623** https://app.graphite.dev/github/pr/llvm/llvm-project/131623?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/131623?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#131622** https://app.graphite.dev/github/pr/llvm/llvm-project/131622?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn more about https://stacking.dev/?utm_source=stack-comment";>stacking. https://github.com/llvm/llvm-project/pull/131623 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [GlobalISel] Combine redundant sext_inreg (PR #131624)
Pierre-vh wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/131624?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#131624** https://app.graphite.dev/github/pr/llvm/llvm-project/131624?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/131624?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#131623** https://app.graphite.dev/github/pr/llvm/llvm-project/131623?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#131622** https://app.graphite.dev/github/pr/llvm/llvm-project/131622?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn more about https://stacking.dev/?utm_source=stack-comment";>stacking. https://github.com/llvm/llvm-project/pull/131624 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU][InsertWaitCnts] Track global_wb/inv/wbinv (PR #135340)
Pierre-vh wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/135340?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#135340** https://app.graphite.dev/github/pr/llvm/llvm-project/135340?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/135340?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#135339** https://app.graphite.dev/github/pr/llvm/llvm-project/135339?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn more about https://stacking.dev/?utm_source=stack-comment";>stacking. https://github.com/llvm/llvm-project/pull/135340 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU][InsertWaitCnts] Track global_wb/inv/wbinv (PR #135340)
https://github.com/Pierre-vh ready_for_review https://github.com/llvm/llvm-project/pull/135340 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits