[llvm-branch-commits] [llvm] AMDGPU: Remove global/flat atomic fadd intrinics (PR #97051)

2024-08-01 Thread Pierre van Houtryve via llvm-branch-commits


@@ -322,4 +322,36 @@ define <2 x i16> 
@upgrade_amdgcn_global_atomic_fadd_v2bf16_p1(ptr addrspace(1) %
   ret <2 x i16> %result
 }
 
+declare <2 x half> @llvm.amdgcn.flat.atomic.fadd.v2f16.p0.v2f16(ptr nocapture, 
<2 x half>) #0

Pierre-vh wrote:

nit: could we auto-generate this test? Maybe as a future patch or just 
precommit it directly.

https://github.com/llvm/llvm-project/pull/97051
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Remove global/flat atomic fadd intrinics (PR #97051)

2024-08-01 Thread Pierre van Houtryve via llvm-branch-commits


@@ -75,6 +75,11 @@ Changes to the AArch64 Backend
 Changes to the AMDGPU Backend
 -
 
+* Removed ``llvm.amdgcn.flat.atomic.fadd`` and
+  ``llvm.amdgcn.global.atomic.fadd`` intrinsics. Users should use the
+  :ref:`atomicrmw ` instruction with `fadd` and

Pierre-vh wrote:

Does `i_atomicrmw` work here? Did you try building the docs?

https://github.com/llvm/llvm-project/pull/97051
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Remove global/flat atomic fadd intrinics (PR #97051)

2024-08-01 Thread Pierre van Houtryve via llvm-branch-commits


@@ -1017,29 +1015,6 @@ main_body:
   ret void
 }
 
-define amdgpu_kernel void @global_atomic_fadd_f64_noret(ptr addrspace(1) %ptr, 
double %data) {

Pierre-vh wrote:

Why are some tests deleted, and some others changed to use atomicrmw?

https://github.com/llvm/llvm-project/pull/97051
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Remove global/flat atomic fadd intrinics (PR #97051)

2024-08-01 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh approved this pull request.


https://github.com/llvm/llvm-project/pull/97051
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] clang/AMDGPU: Emit atomicrmw for __builtin_amdgcn_global_atomic_fadd_{f32|f64} (PR #96872)

2024-08-01 Thread Pierre van Houtryve via llvm-branch-commits


@@ -19273,9 +19269,14 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
   ProcessOrderScopeAMDGCN(EmitScalarExpr(E->getArg(2)),
   EmitScalarExpr(E->getArg(3)), AO, SSID);
 } else {
-  // The ds_atomic_fadd_* builtins do not have syncscope/order arguments.
-  SSID = llvm::SyncScope::System;
-  AO = AtomicOrdering::SequentiallyConsistent;
+  // Most of the builtins do not have syncscope/order arguments. For DS
+  // atomics the scope doesn't really matter, as they implicitly operate at
+  // workgroup scope.
+  //
+  // The global/flat cases need to use agent scope to consistently produce
+  // the native instruction instead of a cmpxchg expansion.
+  SSID = getLLVMContext().getOrInsertSyncScopeID("agent");

Pierre-vh wrote:

What happens with system (the default) ? I'm not sure I like using `agent` just 
to force the right expansion when there is no memory model motivation behind 
it. Do we have a precedent for this kind of thing? 

Could codegen be fixed so you can just use `system`?

https://github.com/llvm/llvm-project/pull/96872
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU/NewPM: Port AMDGPULateCodeGenPrepare to new pass manager (PR #102806)

2024-08-12 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh approved this pull request.

Add [NFC] tag?

https://github.com/llvm/llvm-project/pull/102806
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU/NewPM: Fill out addPreISelPasses (PR #102814)

2024-08-12 Thread Pierre van Houtryve via llvm-branch-commits


@@ -28,8 +36,51 @@ AMDGPUCodeGenPassBuilder::AMDGPUCodeGenPassBuilder(
 }
 
 void AMDGPUCodeGenPassBuilder::addPreISel(AddIRPass &addPass) const {
-  // TODO: Add passes pre instruction selection.
-  // Test only, convert to real IR passes in future.
+  const bool LateCFGStructurize = 
AMDGPUTargetMachine::EnableLateStructurizeCFG;

Pierre-vh wrote:

Does this function run yet, or is this just preparatory work/NFC?

https://github.com/llvm/llvm-project/pull/102814
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU/NewPM: Fill out addPreISelPasses (PR #102814)

2024-08-12 Thread Pierre van Houtryve via llvm-branch-commits


@@ -28,8 +36,51 @@ AMDGPUCodeGenPassBuilder::AMDGPUCodeGenPassBuilder(
 }
 
 void AMDGPUCodeGenPassBuilder::addPreISel(AddIRPass &addPass) const {
-  // TODO: Add passes pre instruction selection.
-  // Test only, convert to real IR passes in future.
+  const bool LateCFGStructurize = 
AMDGPUTargetMachine::EnableLateStructurizeCFG;
+  const bool DisableStructurizer = AMDGPUTargetMachine::DisableStructurizer;
+  const bool EnableStructurizerWorkarounds =
+  AMDGPUTargetMachine::EnableStructurizerWorkarounds;
+
+  if (TM.getOptLevel() > CodeGenOptLevel::None)

Pierre-vh wrote:

tiny nit: put the opt level in a variable to avoid repeating the call?

https://github.com/llvm/llvm-project/pull/102814
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] PR for llvm/llvm-project#80694 (PR #80695)

2024-02-05 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh approved this pull request.


https://github.com/llvm/llvm-project/pull/80695
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/18.x: [TableGen] Fix ReplaceRegAction RTTI Kind (PR #89790)

2024-04-24 Thread Pierre van Houtryve via llvm-branch-commits

Pierre-vh wrote:

We don't use RTTI of that class before #89736 so unless that's also being 
backported for some reason it's not  needed.


https://github.com/llvm/llvm-project/pull/89790
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] GFX12 VMEM instructions can write VGPR results out of order (PR #105549)

2024-08-22 Thread Pierre van Houtryve via llvm-branch-commits


@@ -4371,8 +4375,10 @@ define amdgpu_kernel void 
@global_sextload_v64i16_to_v64i32(ptr addrspace(1) %ou
 ; GCN-NOHSA-SI-NEXT:buffer_store_dwordx4 v[8:11], off, s[0:3], 0 offset:48
 ; GCN-NOHSA-SI-NEXT:buffer_store_dwordx4 v[4:7], off, s[0:3], 0
 ; GCN-NOHSA-SI-NEXT:buffer_load_dword v0, off, s[12:15], 0 ; 4-byte Folded 
Reload
+; GCN-NOHSA-SI-NEXT:s_waitcnt vmcnt(0)

Pierre-vh wrote:

Why does this non-gfx12 test change?

https://github.com/llvm/llvm-project/pull/105549
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] GFX12 VMEM instructions can write VGPR results out of order (PR #105549)

2024-08-22 Thread Pierre van Houtryve via llvm-branch-commits


@@ -754,13 +754,21 @@ define amdgpu_kernel void 
@constant_load_v16i16_align2(ptr addrspace(4) %ptr0) #
 ; GFX12-NEXT:global_load_u16 v6, v8, s[0:1] offset:8
 ; GFX12-NEXT:global_load_u16 v5, v8, s[0:1] offset:4
 ; GFX12-NEXT:global_load_u16 v4, v8, s[0:1]
+; GFX12-NEXT:s_wait_loadcnt 0x7

Pierre-vh wrote:

I'm not sure i understand exactly what's happening here. Why do we need the 
extra `s_wait_loadcnt`? What happens when two `global_load_d16_hi_b16` execute 
back-to-back?

https://github.com/llvm/llvm-project/pull/105549
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] GFX12 VMEM instructions can write VGPR results out of order (PR #105549)

2024-08-22 Thread Pierre van Houtryve via llvm-branch-commits


@@ -953,6 +953,12 @@ def FeatureRequiredExportPriority : 
SubtargetFeature<"required-export-priority",
   "Export priority must be explicitly manipulated on GFX11.5"
 >;
 
+def FeatureVmemWriteVgprInOrder : SubtargetFeature<"vmem-write-vgpr-in-order",

Pierre-vh wrote:

Wouldn't it be easier to have a "VmemWriteVgprOutOfOrder" feature and just 
apply it to GFX12?

https://github.com/llvm/llvm-project/pull/105549
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] GFX12 VMEM loads can write VGPR results out of order (PR #105549)

2024-08-22 Thread Pierre van Houtryve via llvm-branch-commits


@@ -953,6 +953,12 @@ def FeatureRequiredExportPriority : 
SubtargetFeature<"required-export-priority",
   "Export priority must be explicitly manipulated on GFX11.5"
 >;
 
+def FeatureVmemWriteVgprInOrder : SubtargetFeature<"vmem-write-vgpr-in-order",

Pierre-vh wrote:

Right, I didn't see things that way. I agree conservative is better


https://github.com/llvm/llvm-project/pull/105549
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] GFX12 VMEM loads can write VGPR results out of order (PR #105549)

2024-08-22 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh approved this pull request.


https://github.com/llvm/llvm-project/pull/105549
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] edaf6a0 - [AMDGPU][GISel] Combine G_INSERT_VECTOR_ELT to G_SHUFFLE_VECTOR

2022-10-19 Thread Pierre van Houtryve via llvm-branch-commits

Author: Pierre van Houtryve
Date: 2022-10-19T10:16:08Z
New Revision: edaf6a07a4aafd963ea958703890d03ab58ff2dd

URL: 
https://github.com/llvm/llvm-project/commit/edaf6a07a4aafd963ea958703890d03ab58ff2dd
DIFF: 
https://github.com/llvm/llvm-project/commit/edaf6a07a4aafd963ea958703890d03ab58ff2dd.diff

LOG: [AMDGPU][GISel] Combine G_INSERT_VECTOR_ELT to G_SHUFFLE_VECTOR

Depends on D134967

Differential Revision: https://reviews.llvm.org/D135145

Added: 

llvm/test/CodeGen/AMDGPU/GlobalISel/prelegalizer-combiner-insertvecelt-to-shufflevector.mir

Modified: 
llvm/lib/Target/AMDGPU/AMDGPUCombine.td
llvm/lib/Target/AMDGPU/AMDGPUPreLegalizerCombiner.cpp

Removed: 




diff  --git a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td 
b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
index 2415fdfecaae2..8b2ff164d3365 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
@@ -45,6 +45,12 @@ def cvt_f32_ubyteN : GICombineRule<
  [{ return PostLegalizerHelper.matchCvtF32UByteN(*${cvt_f32_ubyteN}, 
${matchinfo}); }]),
   (apply [{ PostLegalizerHelper.applyCvtF32UByteN(*${cvt_f32_ubyteN}, 
${matchinfo}); }])>;
 
+def insert_vec_elt_to_shuffle : GICombineRule<
+  (defs root:$insertelt, unsigned_matchinfo:$matchinfo),
+  (match (wip_match_opcode G_INSERT_VECTOR_ELT):$insertelt,
+  [{ return 
PreLegalizerHelper.matchInsertVectorEltToShuffle(*${insertelt}, ${matchinfo}); 
}]),
+  (apply [{ PreLegalizerHelper.applyInsertVectorEltToShuffle(*${insertelt}, 
${matchinfo}); }])>;
+
 def clamp_i64_to_i16_matchdata : 
GIDefMatchData<"AMDGPUPreLegalizerCombinerHelper::ClampI64ToI16MatchInfo">;
 
 def clamp_i64_to_i16 : GICombineRule<
@@ -109,7 +115,7 @@ def gfx6gfx7_combines : 
GICombineGroup<[fcmp_select_to_fmin_fmax_legacy]>;
 
 def AMDGPUPreLegalizerCombinerHelper: GICombinerHelper<
   "AMDGPUGenPreLegalizerCombinerHelper",
-  [all_combines, clamp_i64_to_i16, foldable_fneg]> {
+  [all_combines, clamp_i64_to_i16, foldable_fneg, insert_vec_elt_to_shuffle]> {
   let DisableRuleOption = "amdgpuprelegalizercombiner-disable-rule";
   let StateClass = "AMDGPUPreLegalizerCombinerHelperState";
   let AdditionalArguments = [];

diff  --git a/llvm/lib/Target/AMDGPU/AMDGPUPreLegalizerCombiner.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUPreLegalizerCombiner.cpp
index 6d6c69adaa658..08eefc6da4d31 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUPreLegalizerCombiner.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUPreLegalizerCombiner.cpp
@@ -55,6 +55,9 @@ class AMDGPUPreLegalizerCombinerHelper {
 
   void applyClampI64ToI16(MachineInstr &MI,
   const ClampI64ToI16MatchInfo &MatchInfo);
+
+  bool matchInsertVectorEltToShuffle(MachineInstr &MI, unsigned &Idx);
+  void applyInsertVectorEltToShuffle(MachineInstr &MI, unsigned &Idx);
 };
 
 bool AMDGPUPreLegalizerCombinerHelper::matchClampI64ToI16(
@@ -154,6 +157,73 @@ void AMDGPUPreLegalizerCombinerHelper::applyClampI64ToI16(
   MI.eraseFromParent();
 }
 
+bool AMDGPUPreLegalizerCombinerHelper::matchInsertVectorEltToShuffle(
+MachineInstr &MI, unsigned &Idx) {
+  // Transfroms a G_INSERT_VECTOR_ELT into an equivalent G_SHUFFLE_MASK if:
+  //- Scalar Pack insts are present (for <32 bits element types)
+  //- The vector has <= 4 elements.
+  // as this is a preferred canonical form of the operation.
+  //
+  // Note that both restrictions are arbitrary. Currently, it's mostly targeted
+  // towards 2x16 vectors. Restrictions could be relaxed or entirely removed in
+  // the future if codegen can handle it without causing regressions.
+
+  LLT VecTy = MRI.getType(MI.getOperand(0).getReg());
+  const unsigned EltSize = VecTy.getElementType().getSizeInBits();
+  if (EltSize < 32 &&
+  !MI.getMF()->getSubtarget().hasScalarPackInsts())
+return false;
+
+  if (VecTy.isScalable() || VecTy.getNumElements() > 4)
+return false;
+
+  Optional MaybeIdxVal =
+  getIConstantVRegValWithLookThrough(MI.getOperand(3).getReg(), MRI);
+  if (!MaybeIdxVal)
+return false;
+
+  Idx = MaybeIdxVal->Value.getZExtValue();
+  return true;
+}
+
+void AMDGPUPreLegalizerCombinerHelper::applyInsertVectorEltToShuffle(
+MachineInstr &MI, unsigned &Idx) {
+  B.setInstrAndDebugLoc(MI);
+
+  Register Ins = MI.getOperand(2).getReg();
+  Register Vec = MI.getOperand(1).getReg();
+  Register Dst = MI.getOperand(0).getReg();
+
+  LLT VecTy = MRI.getType(Dst);
+  LLT EltTy = VecTy.getElementType();
+  const unsigned NumElts = VecTy.getNumElements();
+
+  const auto Undef = MRI.createGenericVirtualRegister(EltTy);
+  B.buildUndef(Undef);
+
+  const auto OtherVec = MRI.createGenericVirtualRegister(VecTy);
+
+  SmallVector Srcs;
+  Srcs.push_back(Ins);
+  for (unsigned K = 1; K < NumElts; ++K)
+Srcs.push_back(Undef);
+
+  B.buildBuildVector(OtherVec, Srcs);
+
+  // NumElts == Ins in OtherVec
+  // 0...(NumElts-1) = Original elements
+  SmallVector ShuffleMask;
+  for (unsig

[llvm-branch-commits] [llvm] 007ef6f - [AMDGPU][GISel] Constrain selected operands in selectG_BUILD_VECTOR

2022-10-19 Thread Pierre van Houtryve via llvm-branch-commits

Author: Pierre van Houtryve
Date: 2022-10-19T10:16:08Z
New Revision: 007ef6fa4d89f7e60a82af8c7cc004a6204fd72b

URL: 
https://github.com/llvm/llvm-project/commit/007ef6fa4d89f7e60a82af8c7cc004a6204fd72b
DIFF: 
https://github.com/llvm/llvm-project/commit/007ef6fa4d89f7e60a82af8c7cc004a6204fd72b.diff

LOG: [AMDGPU][GISel] Constrain selected operands in selectG_BUILD_VECTOR

Small bugfix. Currently harmless but a case in D134354 triggers it.

Differential Revision: https://reviews.llvm.org/D136235

Added: 


Modified: 
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp

Removed: 




diff  --git a/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
index 7f41e8593692..0a6896693510 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
@@ -686,13 +686,19 @@ bool 
AMDGPUInstructionSelector::selectG_BUILD_VECTOR(MachineInstr &MI) const {
   // TODO: Can be improved?
   if (IsVector) {
 Register TmpReg = MRI->createVirtualRegister(&AMDGPU::VGPR_32RegClass);
-BuildMI(*BB, MI, DL, TII.get(AMDGPU::V_AND_B32_e32), TmpReg)
-.addImm(0x)
-.addReg(Src0);
-BuildMI(*BB, MI, DL, TII.get(AMDGPU::V_LSHL_OR_B32_e64), Dst)
-.addReg(Src1)
-.addImm(16)
-.addReg(TmpReg);
+auto MIB = BuildMI(*BB, MI, DL, TII.get(AMDGPU::V_AND_B32_e32), TmpReg)
+   .addImm(0x)
+   .addReg(Src0);
+if (!constrainSelectedInstRegOperands(*MIB, TII, TRI, RBI))
+  return false;
+
+MIB = BuildMI(*BB, MI, DL, TII.get(AMDGPU::V_LSHL_OR_B32_e64), Dst)
+  .addReg(Src1)
+  .addImm(16)
+  .addReg(TmpReg);
+if (!constrainSelectedInstRegOperands(*MIB, TII, TRI, RBI))
+  return false;
+
 MI.eraseFromParent();
 return true;
   }



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] e07c05b - [AMDGPU] Clear bodies of function with incompatible features

2022-11-30 Thread Pierre van Houtryve via llvm-branch-commits

Author: Pierre van Houtryve
Date: 2022-11-30T06:14:35-05:00
New Revision: e07c05bc91ae1dfb625b7b0d93a83e5c6039fcb2

URL: 
https://github.com/llvm/llvm-project/commit/e07c05bc91ae1dfb625b7b0d93a83e5c6039fcb2
DIFF: 
https://github.com/llvm/llvm-project/commit/e07c05bc91ae1dfb625b7b0d93a83e5c6039fcb2.diff

LOG: [AMDGPU] Clear bodies of function with incompatible features

Adds a new passs that replaces the body of a function with trap+unreachable
if it uses features that are not supported on the current GPU.

This change is aimed at preventing crashes when building code at O0 that
uses idioms such as `if (ISA_VERSION >= N) intrinsic_a(); else intrinsic_b();`
where ISA_VERSION is not constexpr, and intrinsic_a is not selectable
on older targets.
This is a pattern that's used all over the ROCm device libs. The main
motive behind this change is to allow code using ROCm device libs
to be built at O0.

Note: the feature checking logic is done ad-hoc in the pass. There is no other
pass that needs (or will need in the foreseeable future) to do similar
feature-checking logic so I did not see a need to generalize the feature
checking logic yet. It can (and should probably) be generalized later and
moved to a TargetInfo-like class or helper file.

Added: 
llvm/lib/Target/AMDGPU/AMDGPUClearIncompatibleFunctions.cpp
llvm/test/CodeGen/AMDGPU/clear-incompatible-functions.ll

Modified: 
llvm/lib/Target/AMDGPU/AMDGPU.h
llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
llvm/lib/Target/AMDGPU/CMakeLists.txt
llvm/test/CodeGen/AMDGPU/GlobalISel/dummy-target.ll
llvm/test/CodeGen/AMDGPU/llc-pipeline.ll

Removed: 




diff  --git a/llvm/lib/Target/AMDGPU/AMDGPU.h b/llvm/lib/Target/AMDGPU/AMDGPU.h
index 355aa0ba465b4..6a9ac1d165724 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.h
@@ -47,6 +47,7 @@ FunctionPass *createSIFormMemoryClausesPass();
 FunctionPass *createSIPostRABundlerPass();
 FunctionPass *createAMDGPUSimplifyLibCallsPass(const TargetMachine *);
 FunctionPass *createAMDGPUUseNativeCallsPass();
+FunctionPass *createAMDGPUClearIncompatibleFunctionsPass(const TargetMachine 
*);
 FunctionPass *createAMDGPUCodeGenPreparePass();
 FunctionPass *createAMDGPULateCodeGenPreparePass();
 FunctionPass *createAMDGPUMachineCFGStructurizerPass();
@@ -287,6 +288,9 @@ extern char &AMDGPUAnnotateUniformValuesPassID;
 void initializeAMDGPUCodeGenPreparePass(PassRegistry&);
 extern char &AMDGPUCodeGenPrepareID;
 
+void initializeAMDGPUClearIncompatibleFunctionsPass(PassRegistry &);
+extern char &AMDGPUClearIncompatibleFunctionsID;
+
 void initializeAMDGPULateCodeGenPreparePass(PassRegistry &);
 extern char &AMDGPULateCodeGenPrepareID;
 

diff  --git a/llvm/lib/Target/AMDGPU/AMDGPUClearIncompatibleFunctions.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUClearIncompatibleFunctions.cpp
new file mode 100644
index 0..e0ea3aac5b7f5
--- /dev/null
+++ b/llvm/lib/Target/AMDGPU/AMDGPUClearIncompatibleFunctions.cpp
@@ -0,0 +1,120 @@
+//===-- AMDGPUClearIncompatibleFunctions.cpp 
--===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+/// \file
+/// This pass replaces the bodies of functions that have attributes 
incompatible
+/// with the current target with trap/unreachable.
+//
+//===--===//
+
+#include "AMDGPU.h"
+#include "GCNSubtarget.h"
+#include "llvm/IR/Function.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/Target/TargetMachine.h"
+#include "llvm/IR/DiagnosticInfo.h"
+#include "llvm/Pass.h"
+
+#define DEBUG_TYPE "amdgpu-clear-incompatible-functions"
+
+using namespace llvm;
+
+namespace llvm {
+extern const SubtargetFeatureKV 
AMDGPUFeatureKV[AMDGPU::NumSubtargetFeatures-1];
+}
+
+namespace {
+
+using Generation = AMDGPUSubtarget::Generation;
+
+class AMDGPUClearIncompatibleFunctions : public FunctionPass {
+public:
+  static char ID;
+
+  AMDGPUClearIncompatibleFunctions(const TargetMachine *TM = nullptr) : 
FunctionPass(ID), TM(TM) {
+assert(TM && "No TargetMachine!");
+  }
+
+  StringRef getPassName() const override {
+return "AMDGPU Clear Incompatible Functions Bodies";
+  }
+
+  void getAnalysisUsage(AnalysisUsage &AU) const override {
+// If changes are made, no analyses are preserved.
+  }
+
+  bool runOnFunction(Function &F) override;
+
+private:
+  const TargetMachine *TM = nullptr;
+};
+
+// List of features alongside the minimum GPU generation needed to support 
them.
+constexpr std::array, 6> FeatureAndMinGen = {{
+  { AMDGPU::FeatureGFX11Insts, Generation::GFX11 },
+  { AMDGPU::FeatureGFX10Insts, Genera

[llvm-branch-commits] [llvm] AMDGPU: Custom expand flat cmpxchg which may access private (PR #109410)

2024-10-02 Thread Pierre van Houtryve via llvm-branch-commits


@@ -43,7 +43,7 @@ define i64 @test_flat_atomicrmw_sub_0_i64_agent(ptr %ptr) {
 ; ALL:   [[ATOMICRMW_PRIVATE]]:
 ; ALL-NEXT:[[TMP1:%.*]] = addrspacecast ptr [[PTR]] to ptr addrspace(5)
 ; ALL-NEXT:[[LOADED_PRIVATE:%.*]] = load i64, ptr addrspace(5) [[TMP1]], 
align 8
-; ALL-NEXT:[[NEW:%.*]] = sub i64 [[LOADED_PRIVATE]], 0
+; ALL-NEXT:[[NEW:%.*]] = add i64 [[LOADED_PRIVATE]], 0

Pierre-vh wrote:

Why does this transform happen more often now?

https://github.com/llvm/llvm-project/pull/109410
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Custom expand flat cmpxchg which may access private (PR #109410)

2024-10-02 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh approved this pull request.


https://github.com/llvm/llvm-project/pull/109410
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Precommit si-fold-bitmask.mir (PR #131310)

2025-03-15 Thread Pierre van Houtryve via llvm-branch-commits

Pierre-vh wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/131310?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#131312** https://app.graphite.dev/github/pr/llvm/llvm-project/131312?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131311** https://app.graphite.dev/github/pr/llvm/llvm-project/131311?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131310** https://app.graphite.dev/github/pr/llvm/llvm-project/131310?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/131310?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#131309** https://app.graphite.dev/github/pr/llvm/llvm-project/131309?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131308** https://app.graphite.dev/github/pr/llvm/llvm-project/131308?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131307** https://app.graphite.dev/github/pr/llvm/llvm-project/131307?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131306** https://app.graphite.dev/github/pr/llvm/llvm-project/131306?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131305** https://app.graphite.dev/github/pr/llvm/llvm-project/131305?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/131310
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Precommit si-fold-bitmask.mir (PR #131310)

2025-03-15 Thread Pierre van Houtryve via llvm-branch-commits

Pierre-vh wrote:

> We can fold the clamp of the shift amount into the shift instruction during 
> selection as we know the instruction ignores the high bits. We do that in the 
> DAG path already. I think it special cases the and & (bitwidth - 1) pattern, 
> which should form canonically. In principle it could do a general simplify 
> demand bits

Where and how should that be implemented ? I struggled with that. I tried 
adding a new special case in TableGen but I just couldn't find the right way to 
do it.
Do I just add it in C++ InstructionSelector before it checks the patterns?
Or should it be some kind of post-processing step after the shift has been 
selected, but before the G_ZEXT is selected?


https://github.com/llvm/llvm-project/pull/131310
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][GlobalISel] Allow forming s16 U/SBFX pre-regbankselect (PR #131309)

2025-03-21 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/131309

>From 16cbcc2c44bfe74ba54f00c5be634c54ff43a5cf Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Wed, 12 Mar 2025 09:43:15 +0100
Subject: [PATCH] [AMDGPU][GlobalISel] Allow forming s16 U/SBFX
 pre-regbankselect

Make s16 G_U/SBFX legal and widen them in RegBankSelect.
This allows the set of BFX formation combines to work on s16 types.
---
 .../lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp |   9 +-
 .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp  |  33 +-
 llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll   | 645 --
 llvm/test/CodeGen/AMDGPU/GlobalISel/fshr.ll   | 380 ---
 .../AMDGPU/GlobalISel/legalize-sbfx.mir   |  26 +-
 .../AMDGPU/GlobalISel/legalize-ubfx.mir   |  27 +-
 llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll   |  27 +-
 7 files changed, 503 insertions(+), 644 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
index d6675f225cdfc..cc014fbd32466 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
@@ -2068,10 +2068,13 @@ AMDGPULegalizerInfo::AMDGPULegalizerInfo(const 
GCNSubtarget &ST_,
   .minScalar(0, S32)
   .lower();
 
+  // Only {S32, S32} or {S32, S64} should ever reach codegen.
+  // We allow S/UBFX for S16 so the combiner can form them before
+  // RegBankSelect, and RegBankSelect will then legalize them correctly.
   getActionDefinitionsBuilder({G_SBFX, G_UBFX})
-  .legalFor({{S32, S32}, {S64, S32}})
-  .clampScalar(1, S32, S32)
-  .clampScalar(0, S32, S64)
+  .legalFor({{S16, S16}, {S32, S32}, {S64, S32}})
+  .clampScalar(1, S16, S32)
+  .clampScalar(0, S16, S64)
   .widenScalarToNextPow2(0)
   .scalarize(0);
 
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
index b46fc7d9c752a..1c9d67826186f 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
@@ -1485,7 +1485,9 @@ bool 
AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B,
   Register DstReg = MI.getOperand(0).getReg();
   LLT Ty = MRI.getType(DstReg);
 
+  const LLT S64 = LLT::scalar(64);
   const LLT S32 = LLT::scalar(32);
+  const LLT S16 = LLT::scalar(16);
 
   unsigned FirstOpnd = isa(MI) ? 2 : 1;
   Register SrcReg = MI.getOperand(FirstOpnd).getReg();
@@ -1495,6 +1497,18 @@ bool 
AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B,
   const RegisterBank *DstBank =
 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
   if (DstBank == &AMDGPU::VGPRRegBank) {
+if (Ty == S16) {
+  ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::VGPRRegBank);
+  B.setInsertPt(B.getMBB(), MI);
+  LegalizerHelper Helper(B.getMF(), ApplyBank, B);
+
+  Helper.widenScalarDst(MI, S32);
+  Helper.widenScalarSrc(MI, S32, 1, AMDGPU::G_ANYEXT);
+  Helper.widenScalarSrc(MI, S32, 2, AMDGPU::G_ZEXT);
+  Helper.widenScalarSrc(MI, S32, 3, AMDGPU::G_ZEXT);
+  return true;
+}
+
 if (Ty == S32)
   return true;
 
@@ -1554,6 +1568,11 @@ bool 
AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B,
 
   ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::SGPRRegBank);
 
+  if (Ty == S16) {
+OffsetReg = B.buildAnyExtOrTrunc(S32, OffsetReg).getReg(0);
+WidthReg = B.buildAnyExtOrTrunc(S32, WidthReg).getReg(0);
+  }
+
   // Ensure the high bits are clear to insert the offset.
   auto OffsetMask = B.buildConstant(S32, maskTrailingOnes(6));
   auto ClampOffset = B.buildAnd(S32, OffsetReg, OffsetMask);
@@ -1568,13 +1587,21 @@ bool 
AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B,
 
   // TODO: It might be worth using a pseudo here to avoid scc clobber and
   // register class constraints.
-  unsigned Opc = Ty == S32 ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32) :
- (Signed ? AMDGPU::S_BFE_I64 : AMDGPU::S_BFE_U64);
+  unsigned Opc = (Ty != S64) ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32)
+ : (Signed ? AMDGPU::S_BFE_I64 : 
AMDGPU::S_BFE_U64);
 
-  auto MIB = B.buildInstr(Opc, {DstReg}, {SrcReg, MergedInputs});
+  Register BFEDst = DstReg;
+  if (Ty == S16) {
+BFEDst = MRI.createGenericVirtualRegister(S32);
+MRI.setRegBank(BFEDst, AMDGPU::SGPRRegBank);
+  }
+  auto MIB = B.buildInstr(Opc, {BFEDst}, {SrcReg, MergedInputs});
   if (!constrainSelectedInstRegOperands(*MIB, *TII, *TRI, *this))
 llvm_unreachable("failed to constrain BFE");
 
+  if (BFEDst != DstReg)
+B.buildZExtOrTrunc(DstReg, BFEDst);
+
   MI.eraseFromParent();
   return true;
 }
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll
index 07fcb02d98649..d2b600b04f9fc 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/fsh

[llvm-branch-commits] [llvm] [AMDGPU][GlobalISel] Allow forming s16 U/SBFX pre-regbankselect (PR #131309)

2025-03-21 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/131309

>From 16cbcc2c44bfe74ba54f00c5be634c54ff43a5cf Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Wed, 12 Mar 2025 09:43:15 +0100
Subject: [PATCH] [AMDGPU][GlobalISel] Allow forming s16 U/SBFX
 pre-regbankselect

Make s16 G_U/SBFX legal and widen them in RegBankSelect.
This allows the set of BFX formation combines to work on s16 types.
---
 .../lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp |   9 +-
 .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp  |  33 +-
 llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll   | 645 --
 llvm/test/CodeGen/AMDGPU/GlobalISel/fshr.ll   | 380 ---
 .../AMDGPU/GlobalISel/legalize-sbfx.mir   |  26 +-
 .../AMDGPU/GlobalISel/legalize-ubfx.mir   |  27 +-
 llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll   |  27 +-
 7 files changed, 503 insertions(+), 644 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
index d6675f225cdfc..cc014fbd32466 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
@@ -2068,10 +2068,13 @@ AMDGPULegalizerInfo::AMDGPULegalizerInfo(const 
GCNSubtarget &ST_,
   .minScalar(0, S32)
   .lower();
 
+  // Only {S32, S32} or {S32, S64} should ever reach codegen.
+  // We allow S/UBFX for S16 so the combiner can form them before
+  // RegBankSelect, and RegBankSelect will then legalize them correctly.
   getActionDefinitionsBuilder({G_SBFX, G_UBFX})
-  .legalFor({{S32, S32}, {S64, S32}})
-  .clampScalar(1, S32, S32)
-  .clampScalar(0, S32, S64)
+  .legalFor({{S16, S16}, {S32, S32}, {S64, S32}})
+  .clampScalar(1, S16, S32)
+  .clampScalar(0, S16, S64)
   .widenScalarToNextPow2(0)
   .scalarize(0);
 
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
index b46fc7d9c752a..1c9d67826186f 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
@@ -1485,7 +1485,9 @@ bool 
AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B,
   Register DstReg = MI.getOperand(0).getReg();
   LLT Ty = MRI.getType(DstReg);
 
+  const LLT S64 = LLT::scalar(64);
   const LLT S32 = LLT::scalar(32);
+  const LLT S16 = LLT::scalar(16);
 
   unsigned FirstOpnd = isa(MI) ? 2 : 1;
   Register SrcReg = MI.getOperand(FirstOpnd).getReg();
@@ -1495,6 +1497,18 @@ bool 
AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B,
   const RegisterBank *DstBank =
 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
   if (DstBank == &AMDGPU::VGPRRegBank) {
+if (Ty == S16) {
+  ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::VGPRRegBank);
+  B.setInsertPt(B.getMBB(), MI);
+  LegalizerHelper Helper(B.getMF(), ApplyBank, B);
+
+  Helper.widenScalarDst(MI, S32);
+  Helper.widenScalarSrc(MI, S32, 1, AMDGPU::G_ANYEXT);
+  Helper.widenScalarSrc(MI, S32, 2, AMDGPU::G_ZEXT);
+  Helper.widenScalarSrc(MI, S32, 3, AMDGPU::G_ZEXT);
+  return true;
+}
+
 if (Ty == S32)
   return true;
 
@@ -1554,6 +1568,11 @@ bool 
AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B,
 
   ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::SGPRRegBank);
 
+  if (Ty == S16) {
+OffsetReg = B.buildAnyExtOrTrunc(S32, OffsetReg).getReg(0);
+WidthReg = B.buildAnyExtOrTrunc(S32, WidthReg).getReg(0);
+  }
+
   // Ensure the high bits are clear to insert the offset.
   auto OffsetMask = B.buildConstant(S32, maskTrailingOnes(6));
   auto ClampOffset = B.buildAnd(S32, OffsetReg, OffsetMask);
@@ -1568,13 +1587,21 @@ bool 
AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B,
 
   // TODO: It might be worth using a pseudo here to avoid scc clobber and
   // register class constraints.
-  unsigned Opc = Ty == S32 ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32) :
- (Signed ? AMDGPU::S_BFE_I64 : AMDGPU::S_BFE_U64);
+  unsigned Opc = (Ty != S64) ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32)
+ : (Signed ? AMDGPU::S_BFE_I64 : 
AMDGPU::S_BFE_U64);
 
-  auto MIB = B.buildInstr(Opc, {DstReg}, {SrcReg, MergedInputs});
+  Register BFEDst = DstReg;
+  if (Ty == S16) {
+BFEDst = MRI.createGenericVirtualRegister(S32);
+MRI.setRegBank(BFEDst, AMDGPU::SGPRRegBank);
+  }
+  auto MIB = B.buildInstr(Opc, {BFEDst}, {SrcReg, MergedInputs});
   if (!constrainSelectedInstRegOperands(*MIB, *TII, *TRI, *this))
 llvm_unreachable("failed to constrain BFE");
 
+  if (BFEDst != DstReg)
+B.buildZExtOrTrunc(DstReg, BFEDst);
+
   MI.eraseFromParent();
   return true;
 }
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll
index 07fcb02d98649..d2b600b04f9fc 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/fsh

[llvm-branch-commits] [llvm] [AMDGPU] Precommit si-fold-bitmask.mir (PR #131310)

2025-03-21 Thread Pierre van Houtryve via llvm-branch-commits

Pierre-vh wrote:

> > Where and how should that be implemented ? I struggled with that. I tried 
> > adding a new special case in TableGen but I just couldn't find the right 
> > way to do it. Do I just add it in C++ InstructionSelector before it checks 
> > the patterns? Or should it be some kind of post-processing step after the 
> > shift has been selected, but before the G_ZEXT is selected?
> 
> It already exists as a complex pattern, isUnneededShiftMask. The combiners 
> should be trying to get the clamping code into this form which expects the and

I tried it but the DAG immediately transforms `(and x, 0xFF)` into a zext and 
it seems pretty stubborn about it as it's a basic transform.
I don't mind trying to make it work a bit longer, but I could also just bring 
this back. What do you think?

https://github.com/llvm/llvm-project/pull/131310
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Add sext_trunc in RegBankCombiner (PR #131623)

2025-03-18 Thread Pierre van Houtryve via llvm-branch-commits

Pierre-vh wrote:

Ah, this doesn't do anything at this stage. It's only helpful once we disable 
widening of i16 ops to i32 in CGP. Then this pattern can appear and it'll fold 
it.

This combine is tested in AArch64. Should I copy over a few simple test cases 
in the AMDGPU folder just to show the combine works in RegBankCombiner?

https://github.com/llvm/llvm-project/pull/131623
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][GlobalISel] Allow forming s16 U/SBFX pre-regbankselect (PR #131309)

2025-03-14 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh created 
https://github.com/llvm/llvm-project/pull/131309

Make s16 G_U/SBFX legal and widen them in RegBankSelect.
This allows the set of BFX formation combines to work on s16 types.

>From ee917df6c6e996135d1b08f924b6645649eafa0d Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Wed, 12 Mar 2025 09:43:15 +0100
Subject: [PATCH] [AMDGPU][GlobalISel] Allow forming s16 U/SBFX
 pre-regbankselect

Make s16 G_U/SBFX legal and widen them in RegBankSelect.
This allows the set of BFX formation combines to work on s16 types.
---
 .../lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp |   9 +-
 .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp  |  33 +-
 llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll   | 645 --
 llvm/test/CodeGen/AMDGPU/GlobalISel/fshr.ll   | 380 ---
 .../AMDGPU/GlobalISel/legalize-sbfx.mir   |  26 +-
 .../AMDGPU/GlobalISel/legalize-ubfx.mir   |  27 +-
 llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll   |  27 +-
 7 files changed, 503 insertions(+), 644 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
index 6e611ebb4b625..23dd20b51e8e7 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
@@ -2068,10 +2068,13 @@ AMDGPULegalizerInfo::AMDGPULegalizerInfo(const 
GCNSubtarget &ST_,
   .minScalar(0, S32)
   .lower();
 
+  // Only {S32, S32} or {S32, S64} should ever reach codegen.
+  // We allow S/UBFX for S16 so the combiner can form them before
+  // RegBankSelect, and RegBankSelect will then legalize them correctly.
   getActionDefinitionsBuilder({G_SBFX, G_UBFX})
-  .legalFor({{S32, S32}, {S64, S32}})
-  .clampScalar(1, S32, S32)
-  .clampScalar(0, S32, S64)
+  .legalFor({{S16, S16}, {S32, S32}, {S64, S32}})
+  .clampScalar(1, S16, S32)
+  .clampScalar(0, S16, S64)
   .widenScalarToNextPow2(0)
   .scalarize(0);
 
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
index 27b86723ce474..ed0d52f6b2441 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
@@ -1485,7 +1485,9 @@ bool 
AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B,
   Register DstReg = MI.getOperand(0).getReg();
   LLT Ty = MRI.getType(DstReg);
 
+  const LLT S64 = LLT::scalar(64);
   const LLT S32 = LLT::scalar(32);
+  const LLT S16 = LLT::scalar(16);
 
   unsigned FirstOpnd = isa(MI) ? 2 : 1;
   Register SrcReg = MI.getOperand(FirstOpnd).getReg();
@@ -1495,6 +1497,18 @@ bool 
AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B,
   const RegisterBank *DstBank =
 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
   if (DstBank == &AMDGPU::VGPRRegBank) {
+if (Ty == S16) {
+  ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::VGPRRegBank);
+  B.setInsertPt(B.getMBB(), MI);
+  LegalizerHelper Helper(B.getMF(), ApplyBank, B);
+
+  Helper.widenScalarDst(MI, S32);
+  Helper.widenScalarSrc(MI, S32, 1, AMDGPU::G_ANYEXT);
+  Helper.widenScalarSrc(MI, S32, 2, AMDGPU::G_ZEXT);
+  Helper.widenScalarSrc(MI, S32, 3, AMDGPU::G_ZEXT);
+  return true;
+}
+
 if (Ty == S32)
   return true;
 
@@ -1554,6 +1568,11 @@ bool 
AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B,
 
   ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::SGPRRegBank);
 
+  if (Ty == S16) {
+OffsetReg = B.buildAnyExtOrTrunc(S32, OffsetReg).getReg(0);
+WidthReg = B.buildAnyExtOrTrunc(S32, WidthReg).getReg(0);
+  }
+
   // Ensure the high bits are clear to insert the offset.
   auto OffsetMask = B.buildConstant(S32, maskTrailingOnes(6));
   auto ClampOffset = B.buildAnd(S32, OffsetReg, OffsetMask);
@@ -1568,13 +1587,21 @@ bool 
AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B,
 
   // TODO: It might be worth using a pseudo here to avoid scc clobber and
   // register class constraints.
-  unsigned Opc = Ty == S32 ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32) :
- (Signed ? AMDGPU::S_BFE_I64 : AMDGPU::S_BFE_U64);
+  unsigned Opc = (Ty != S64) ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32)
+ : (Signed ? AMDGPU::S_BFE_I64 : 
AMDGPU::S_BFE_U64);
 
-  auto MIB = B.buildInstr(Opc, {DstReg}, {SrcReg, MergedInputs});
+  Register BFEDst = DstReg;
+  if (Ty == S16) {
+BFEDst = MRI.createGenericVirtualRegister(S32);
+MRI.setRegBank(BFEDst, AMDGPU::SGPRRegBank);
+  }
+  auto MIB = B.buildInstr(Opc, {BFEDst}, {SrcReg, MergedInputs});
   if (!constrainSelectedInstRegOperands(*MIB, *TII, *TRI, *this))
 llvm_unreachable("failed to constrain BFE");
 
+  if (BFEDst != DstReg)
+B.buildZExtOrTrunc(DstReg, BFEDst);
+
   MI.eraseFromParent();
   return true;
 }
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll
index 07fcb

[llvm-branch-commits] [llvm] [AMDGPU][Legalizer] Widen i16 G_SEXT_INREG (PR #131308)

2025-03-14 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh ready_for_review 
https://github.com/llvm/llvm-project/pull/131308
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Add sext_trunc in RegBankCombiner (PR #131623)

2025-03-18 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/131623

>From 4feac2fc42257cac9a1ca0070ec199f93a901b0d Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Mon, 17 Mar 2025 13:22:25 +0100
Subject: [PATCH] [AMDGPU] Add sext_trunc in RegBankCombiner

---
 llvm/lib/Target/AMDGPU/AMDGPUCombine.td | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td 
b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
index a21505356274b..083ce48911689 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
@@ -181,5 +181,5 @@ def AMDGPURegBankCombiner : GICombiner<
zext_trunc_fold, int_minmax_to_med3, ptr_add_immed_chain,
fp_minmax_to_clamp, fp_minmax_to_med3, fmed3_intrinsic_to_clamp,
identity_combines, redundant_and, constant_fold_cast_op,
-   cast_of_cast_combines]> {
+   cast_of_cast_combines, sext_trunc]> {
 }

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Add sext_trunc in RegBankCombiner (PR #131623)

2025-03-18 Thread Pierre van Houtryve via llvm-branch-commits

Pierre-vh wrote:

Test changes were in the previous diff in the stack, it should be fixed now.

https://github.com/llvm/llvm-project/pull/131623
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][GlobalISel] Allow forming s16 U/SBFX pre-regbankselect (PR #131309)

2025-03-18 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/131309

>From 8aa7f8b8f1c73d8fec55a229ea8dff020fc4c906 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Wed, 12 Mar 2025 09:43:15 +0100
Subject: [PATCH] [AMDGPU][GlobalISel] Allow forming s16 U/SBFX
 pre-regbankselect

Make s16 G_U/SBFX legal and widen them in RegBankSelect.
This allows the set of BFX formation combines to work on s16 types.
---
 .../lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp |   9 +-
 .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp  |  33 +-
 llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll   | 645 --
 llvm/test/CodeGen/AMDGPU/GlobalISel/fshr.ll   | 380 ---
 .../AMDGPU/GlobalISel/legalize-sbfx.mir   |  26 +-
 .../AMDGPU/GlobalISel/legalize-ubfx.mir   |  27 +-
 llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll   |  27 +-
 7 files changed, 503 insertions(+), 644 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
index d6675f225cdfc..cc014fbd32466 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
@@ -2068,10 +2068,13 @@ AMDGPULegalizerInfo::AMDGPULegalizerInfo(const 
GCNSubtarget &ST_,
   .minScalar(0, S32)
   .lower();
 
+  // Only {S32, S32} or {S32, S64} should ever reach codegen.
+  // We allow S/UBFX for S16 so the combiner can form them before
+  // RegBankSelect, and RegBankSelect will then legalize them correctly.
   getActionDefinitionsBuilder({G_SBFX, G_UBFX})
-  .legalFor({{S32, S32}, {S64, S32}})
-  .clampScalar(1, S32, S32)
-  .clampScalar(0, S32, S64)
+  .legalFor({{S16, S16}, {S32, S32}, {S64, S32}})
+  .clampScalar(1, S16, S32)
+  .clampScalar(0, S16, S64)
   .widenScalarToNextPow2(0)
   .scalarize(0);
 
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
index b46fc7d9c752a..1c9d67826186f 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
@@ -1485,7 +1485,9 @@ bool 
AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B,
   Register DstReg = MI.getOperand(0).getReg();
   LLT Ty = MRI.getType(DstReg);
 
+  const LLT S64 = LLT::scalar(64);
   const LLT S32 = LLT::scalar(32);
+  const LLT S16 = LLT::scalar(16);
 
   unsigned FirstOpnd = isa(MI) ? 2 : 1;
   Register SrcReg = MI.getOperand(FirstOpnd).getReg();
@@ -1495,6 +1497,18 @@ bool 
AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B,
   const RegisterBank *DstBank =
 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
   if (DstBank == &AMDGPU::VGPRRegBank) {
+if (Ty == S16) {
+  ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::VGPRRegBank);
+  B.setInsertPt(B.getMBB(), MI);
+  LegalizerHelper Helper(B.getMF(), ApplyBank, B);
+
+  Helper.widenScalarDst(MI, S32);
+  Helper.widenScalarSrc(MI, S32, 1, AMDGPU::G_ANYEXT);
+  Helper.widenScalarSrc(MI, S32, 2, AMDGPU::G_ZEXT);
+  Helper.widenScalarSrc(MI, S32, 3, AMDGPU::G_ZEXT);
+  return true;
+}
+
 if (Ty == S32)
   return true;
 
@@ -1554,6 +1568,11 @@ bool 
AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B,
 
   ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::SGPRRegBank);
 
+  if (Ty == S16) {
+OffsetReg = B.buildAnyExtOrTrunc(S32, OffsetReg).getReg(0);
+WidthReg = B.buildAnyExtOrTrunc(S32, WidthReg).getReg(0);
+  }
+
   // Ensure the high bits are clear to insert the offset.
   auto OffsetMask = B.buildConstant(S32, maskTrailingOnes(6));
   auto ClampOffset = B.buildAnd(S32, OffsetReg, OffsetMask);
@@ -1568,13 +1587,21 @@ bool 
AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B,
 
   // TODO: It might be worth using a pseudo here to avoid scc clobber and
   // register class constraints.
-  unsigned Opc = Ty == S32 ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32) :
- (Signed ? AMDGPU::S_BFE_I64 : AMDGPU::S_BFE_U64);
+  unsigned Opc = (Ty != S64) ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32)
+ : (Signed ? AMDGPU::S_BFE_I64 : 
AMDGPU::S_BFE_U64);
 
-  auto MIB = B.buildInstr(Opc, {DstReg}, {SrcReg, MergedInputs});
+  Register BFEDst = DstReg;
+  if (Ty == S16) {
+BFEDst = MRI.createGenericVirtualRegister(S32);
+MRI.setRegBank(BFEDst, AMDGPU::SGPRRegBank);
+  }
+  auto MIB = B.buildInstr(Opc, {BFEDst}, {SrcReg, MergedInputs});
   if (!constrainSelectedInstRegOperands(*MIB, *TII, *TRI, *this))
 llvm_unreachable("failed to constrain BFE");
 
+  if (BFEDst != DstReg)
+B.buildZExtOrTrunc(DstReg, BFEDst);
+
   MI.eraseFromParent();
   return true;
 }
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll
index 07fcb02d98649..d2b600b04f9fc 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/fsh

[llvm-branch-commits] [llvm] [AMDGPU][GlobalISel] Allow forming s16 U/SBFX pre-regbankselect (PR #131309)

2025-03-18 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/131309

>From 8aa7f8b8f1c73d8fec55a229ea8dff020fc4c906 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Wed, 12 Mar 2025 09:43:15 +0100
Subject: [PATCH] [AMDGPU][GlobalISel] Allow forming s16 U/SBFX
 pre-regbankselect

Make s16 G_U/SBFX legal and widen them in RegBankSelect.
This allows the set of BFX formation combines to work on s16 types.
---
 .../lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp |   9 +-
 .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp  |  33 +-
 llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll   | 645 --
 llvm/test/CodeGen/AMDGPU/GlobalISel/fshr.ll   | 380 ---
 .../AMDGPU/GlobalISel/legalize-sbfx.mir   |  26 +-
 .../AMDGPU/GlobalISel/legalize-ubfx.mir   |  27 +-
 llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll   |  27 +-
 7 files changed, 503 insertions(+), 644 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
index d6675f225cdfc..cc014fbd32466 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
@@ -2068,10 +2068,13 @@ AMDGPULegalizerInfo::AMDGPULegalizerInfo(const 
GCNSubtarget &ST_,
   .minScalar(0, S32)
   .lower();
 
+  // Only {S32, S32} or {S32, S64} should ever reach codegen.
+  // We allow S/UBFX for S16 so the combiner can form them before
+  // RegBankSelect, and RegBankSelect will then legalize them correctly.
   getActionDefinitionsBuilder({G_SBFX, G_UBFX})
-  .legalFor({{S32, S32}, {S64, S32}})
-  .clampScalar(1, S32, S32)
-  .clampScalar(0, S32, S64)
+  .legalFor({{S16, S16}, {S32, S32}, {S64, S32}})
+  .clampScalar(1, S16, S32)
+  .clampScalar(0, S16, S64)
   .widenScalarToNextPow2(0)
   .scalarize(0);
 
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
index b46fc7d9c752a..1c9d67826186f 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
@@ -1485,7 +1485,9 @@ bool 
AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B,
   Register DstReg = MI.getOperand(0).getReg();
   LLT Ty = MRI.getType(DstReg);
 
+  const LLT S64 = LLT::scalar(64);
   const LLT S32 = LLT::scalar(32);
+  const LLT S16 = LLT::scalar(16);
 
   unsigned FirstOpnd = isa(MI) ? 2 : 1;
   Register SrcReg = MI.getOperand(FirstOpnd).getReg();
@@ -1495,6 +1497,18 @@ bool 
AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B,
   const RegisterBank *DstBank =
 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
   if (DstBank == &AMDGPU::VGPRRegBank) {
+if (Ty == S16) {
+  ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::VGPRRegBank);
+  B.setInsertPt(B.getMBB(), MI);
+  LegalizerHelper Helper(B.getMF(), ApplyBank, B);
+
+  Helper.widenScalarDst(MI, S32);
+  Helper.widenScalarSrc(MI, S32, 1, AMDGPU::G_ANYEXT);
+  Helper.widenScalarSrc(MI, S32, 2, AMDGPU::G_ZEXT);
+  Helper.widenScalarSrc(MI, S32, 3, AMDGPU::G_ZEXT);
+  return true;
+}
+
 if (Ty == S32)
   return true;
 
@@ -1554,6 +1568,11 @@ bool 
AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B,
 
   ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::SGPRRegBank);
 
+  if (Ty == S16) {
+OffsetReg = B.buildAnyExtOrTrunc(S32, OffsetReg).getReg(0);
+WidthReg = B.buildAnyExtOrTrunc(S32, WidthReg).getReg(0);
+  }
+
   // Ensure the high bits are clear to insert the offset.
   auto OffsetMask = B.buildConstant(S32, maskTrailingOnes(6));
   auto ClampOffset = B.buildAnd(S32, OffsetReg, OffsetMask);
@@ -1568,13 +1587,21 @@ bool 
AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B,
 
   // TODO: It might be worth using a pseudo here to avoid scc clobber and
   // register class constraints.
-  unsigned Opc = Ty == S32 ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32) :
- (Signed ? AMDGPU::S_BFE_I64 : AMDGPU::S_BFE_U64);
+  unsigned Opc = (Ty != S64) ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32)
+ : (Signed ? AMDGPU::S_BFE_I64 : 
AMDGPU::S_BFE_U64);
 
-  auto MIB = B.buildInstr(Opc, {DstReg}, {SrcReg, MergedInputs});
+  Register BFEDst = DstReg;
+  if (Ty == S16) {
+BFEDst = MRI.createGenericVirtualRegister(S32);
+MRI.setRegBank(BFEDst, AMDGPU::SGPRRegBank);
+  }
+  auto MIB = B.buildInstr(Opc, {BFEDst}, {SrcReg, MergedInputs});
   if (!constrainSelectedInstRegOperands(*MIB, *TII, *TRI, *this))
 llvm_unreachable("failed to constrain BFE");
 
+  if (BFEDst != DstReg)
+B.buildZExtOrTrunc(DstReg, BFEDst);
+
   MI.eraseFromParent();
   return true;
 }
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll
index 07fcb02d98649..d2b600b04f9fc 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/fsh

[llvm-branch-commits] [llvm] [AMDGPU] Add sext_trunc in RegBankCombiner (PR #131623)

2025-03-18 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/131623

>From 4feac2fc42257cac9a1ca0070ec199f93a901b0d Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Mon, 17 Mar 2025 13:22:25 +0100
Subject: [PATCH] [AMDGPU] Add sext_trunc in RegBankCombiner

---
 llvm/lib/Target/AMDGPU/AMDGPUCombine.td | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td 
b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
index a21505356274b..083ce48911689 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
@@ -181,5 +181,5 @@ def AMDGPURegBankCombiner : GICombiner<
zext_trunc_fold, int_minmax_to_med3, ptr_add_immed_chain,
fp_minmax_to_clamp, fp_minmax_to_med3, fmed3_intrinsic_to_clamp,
identity_combines, redundant_and, constant_fold_cast_op,
-   cast_of_cast_combines]> {
+   cast_of_cast_combines, sext_trunc]> {
 }

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][Legalizer] Widen i16 G_SEXT_INREG (PR #131308)

2025-03-17 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/131308

>From cdfba0ea7ab0fcb60d632a25433b18b421022c25 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Wed, 5 Mar 2025 13:41:04 +0100
Subject: [PATCH 1/2] [AMDGPU][Legalizer] Widen i16 G_SEXT_INREG

It's better to widen them to avoid it being lowered into a G_ASHR + G_SHL. With 
this change we just extend to i32 then trunc the result.
---
 .../lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp |   3 +-
 llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll   |   7 +-
 .../AMDGPU/GlobalISel/legalize-abs.mir|   8 +-
 .../AMDGPU/GlobalISel/legalize-ashr.mir   |  20 +--
 .../AMDGPU/GlobalISel/legalize-sext-inreg.mir | 155 +++---
 .../AMDGPU/GlobalISel/legalize-sext.mir   | 101 ++--
 .../AMDGPU/GlobalISel/legalize-smax.mir   |  33 +++-
 .../AMDGPU/GlobalISel/legalize-smin.mir   |  33 +++-
 .../AMDGPU/GlobalISel/legalize-smulh.mir  | 132 +++
 .../CodeGen/AMDGPU/GlobalISel/llvm.abs.ll |  45 ++---
 .../CodeGen/AMDGPU/GlobalISel/sext_inreg.ll   | 130 ++-
 11 files changed, 299 insertions(+), 368 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
index b3a8183beeacf..6e611ebb4b625 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
@@ -2009,7 +2009,8 @@ AMDGPULegalizerInfo::AMDGPULegalizerInfo(const 
GCNSubtarget &ST_,
   // S64 is only legal on SALU, and needs to be broken into 32-bit elements in
   // RegBankSelect.
   auto &SextInReg = getActionDefinitionsBuilder(G_SEXT_INREG)
-.legalFor({{S32}, {S64}});
+.legalFor({{S32}, {S64}})
+.widenScalarIf(typeIs(0, S16), widenScalarOrEltToNextPow2(0, 32));
 
   if (ST.hasVOP3PInsts()) {
 SextInReg.lowerFor({{V2S16}})
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll
index 493e8cef63890..f81d7f1c300b8 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll
@@ -17,8 +17,7 @@ define i8 @v_ashr_i8(i8 %value, i8 %amount) {
 ; GFX8-LABEL: v_ashr_i8:
 ; GFX8:   ; %bb.0:
 ; GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-NEXT:v_lshlrev_b16_e32 v0, 8, v0
-; GFX8-NEXT:v_ashrrev_i16_sdwa v0, v1, sext(v0) dst_sel:DWORD 
dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:BYTE_1
+; GFX8-NEXT:v_ashrrev_i16_sdwa v0, v1, sext(v0) dst_sel:DWORD 
dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:BYTE_0
 ; GFX8-NEXT:s_setpc_b64 s[30:31]
 ;
 ; GFX9-LABEL: v_ashr_i8:
@@ -49,8 +48,8 @@ define i8 @v_ashr_i8_7(i8 %value) {
 ; GFX8-LABEL: v_ashr_i8_7:
 ; GFX8:   ; %bb.0:
 ; GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-NEXT:v_lshlrev_b16_e32 v0, 8, v0
-; GFX8-NEXT:v_ashrrev_i16_e32 v0, 15, v0
+; GFX8-NEXT:v_mov_b32_e32 v1, 7
+; GFX8-NEXT:v_ashrrev_i16_sdwa v0, v1, sext(v0) dst_sel:DWORD 
dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:BYTE_0
 ; GFX8-NEXT:s_setpc_b64 s[30:31]
 ;
 ; GFX9-LABEL: v_ashr_i8_7:
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-abs.mir 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-abs.mir
index a9fe80eb47e76..2b911b2dce697 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-abs.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-abs.mir
@@ -144,11 +144,9 @@ body: |
 ; VI: liveins: $vgpr0
 ; VI-NEXT: {{  $}}
 ; VI-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
-; VI-NEXT: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY]](s32)
-; VI-NEXT: [[C:%[0-9]+]]:_(s16) = G_CONSTANT i16 8
-; VI-NEXT: [[SHL:%[0-9]+]]:_(s16) = G_SHL [[TRUNC]], [[C]](s16)
-; VI-NEXT: [[ASHR:%[0-9]+]]:_(s16) = G_ASHR [[SHL]], [[C]](s16)
-; VI-NEXT: [[ABS:%[0-9]+]]:_(s16) = G_ABS [[ASHR]]
+; VI-NEXT: [[SEXT_INREG:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY]], 8
+; VI-NEXT: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[SEXT_INREG]](s32)
+; VI-NEXT: [[ABS:%[0-9]+]]:_(s16) = G_ABS [[TRUNC]]
 ; VI-NEXT: [[ANYEXT:%[0-9]+]]:_(s32) = G_ANYEXT [[ABS]](s16)
 ; VI-NEXT: $vgpr0 = COPY [[ANYEXT]](s32)
 ;
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ashr.mir 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ashr.mir
index f4aaab745e03b..53905a2f49dd0 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ashr.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ashr.mir
@@ -319,12 +319,10 @@ body: |
 ; VI-NEXT: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY1]](s32)
 ; VI-NEXT: [[C:%[0-9]+]]:_(s16) = G_CONSTANT i16 255
 ; VI-NEXT: [[AND:%[0-9]+]]:_(s16) = G_AND [[TRUNC]], [[C]]
-; VI-NEXT: [[TRUNC1:%[0-9]+]]:_(s16) = G_TRUNC [[COPY]](s32)
-; VI-NEXT: [[C1:%[0-9]+]]:_(s16) = G_CONSTANT i16 8
-; VI-NEXT: [[SHL:%[0-9]+]]:_(s16) = G_SHL [[TRUNC1]], [[C1]](s16)
-; VI-NEXT: [[ASHR:%[0-9]+]]:_(s16) = G_ASHR [[SHL]], [[C1]](s16)
-; VI-NEXT: [[ASHR1:%[0-9]+]]:_(s16) = G_ASHR [[ASHR]], [

[llvm-branch-commits] [llvm] [AMDGPU][GlobalISel] Allow forming s16 U/SBFX pre-regbankselect (PR #131309)

2025-03-17 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/131309

>From 2dc7126ab1abb6aa49aaf263a0591759130ddca5 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Wed, 12 Mar 2025 09:43:15 +0100
Subject: [PATCH] [AMDGPU][GlobalISel] Allow forming s16 U/SBFX
 pre-regbankselect

Make s16 G_U/SBFX legal and widen them in RegBankSelect.
This allows the set of BFX formation combines to work on s16 types.
---
 .../lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp |   9 +-
 .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp  |  33 +-
 llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll   | 645 --
 llvm/test/CodeGen/AMDGPU/GlobalISel/fshr.ll   | 380 ---
 .../AMDGPU/GlobalISel/legalize-sbfx.mir   |  26 +-
 .../AMDGPU/GlobalISel/legalize-ubfx.mir   |  27 +-
 llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll   |  27 +-
 7 files changed, 503 insertions(+), 644 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
index cfb5c3b3006f0..ab900157d2095 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
@@ -2069,10 +2069,13 @@ AMDGPULegalizerInfo::AMDGPULegalizerInfo(const 
GCNSubtarget &ST_,
   .minScalar(0, S32)
   .lower();
 
+  // Only {S32, S32} or {S32, S64} should ever reach codegen.
+  // We allow S/UBFX for S16 so the combiner can form them before
+  // RegBankSelect, and RegBankSelect will then legalize them correctly.
   getActionDefinitionsBuilder({G_SBFX, G_UBFX})
-  .legalFor({{S32, S32}, {S64, S32}})
-  .clampScalar(1, S32, S32)
-  .clampScalar(0, S32, S64)
+  .legalFor({{S16, S16}, {S32, S32}, {S64, S32}})
+  .clampScalar(1, S16, S32)
+  .clampScalar(0, S16, S64)
   .widenScalarToNextPow2(0)
   .scalarize(0);
 
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
index b46fc7d9c752a..1c9d67826186f 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
@@ -1485,7 +1485,9 @@ bool 
AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B,
   Register DstReg = MI.getOperand(0).getReg();
   LLT Ty = MRI.getType(DstReg);
 
+  const LLT S64 = LLT::scalar(64);
   const LLT S32 = LLT::scalar(32);
+  const LLT S16 = LLT::scalar(16);
 
   unsigned FirstOpnd = isa(MI) ? 2 : 1;
   Register SrcReg = MI.getOperand(FirstOpnd).getReg();
@@ -1495,6 +1497,18 @@ bool 
AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B,
   const RegisterBank *DstBank =
 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
   if (DstBank == &AMDGPU::VGPRRegBank) {
+if (Ty == S16) {
+  ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::VGPRRegBank);
+  B.setInsertPt(B.getMBB(), MI);
+  LegalizerHelper Helper(B.getMF(), ApplyBank, B);
+
+  Helper.widenScalarDst(MI, S32);
+  Helper.widenScalarSrc(MI, S32, 1, AMDGPU::G_ANYEXT);
+  Helper.widenScalarSrc(MI, S32, 2, AMDGPU::G_ZEXT);
+  Helper.widenScalarSrc(MI, S32, 3, AMDGPU::G_ZEXT);
+  return true;
+}
+
 if (Ty == S32)
   return true;
 
@@ -1554,6 +1568,11 @@ bool 
AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B,
 
   ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::SGPRRegBank);
 
+  if (Ty == S16) {
+OffsetReg = B.buildAnyExtOrTrunc(S32, OffsetReg).getReg(0);
+WidthReg = B.buildAnyExtOrTrunc(S32, WidthReg).getReg(0);
+  }
+
   // Ensure the high bits are clear to insert the offset.
   auto OffsetMask = B.buildConstant(S32, maskTrailingOnes(6));
   auto ClampOffset = B.buildAnd(S32, OffsetReg, OffsetMask);
@@ -1568,13 +1587,21 @@ bool 
AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B,
 
   // TODO: It might be worth using a pseudo here to avoid scc clobber and
   // register class constraints.
-  unsigned Opc = Ty == S32 ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32) :
- (Signed ? AMDGPU::S_BFE_I64 : AMDGPU::S_BFE_U64);
+  unsigned Opc = (Ty != S64) ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32)
+ : (Signed ? AMDGPU::S_BFE_I64 : 
AMDGPU::S_BFE_U64);
 
-  auto MIB = B.buildInstr(Opc, {DstReg}, {SrcReg, MergedInputs});
+  Register BFEDst = DstReg;
+  if (Ty == S16) {
+BFEDst = MRI.createGenericVirtualRegister(S32);
+MRI.setRegBank(BFEDst, AMDGPU::SGPRRegBank);
+  }
+  auto MIB = B.buildInstr(Opc, {BFEDst}, {SrcReg, MergedInputs});
   if (!constrainSelectedInstRegOperands(*MIB, *TII, *TRI, *this))
 llvm_unreachable("failed to constrain BFE");
 
+  if (BFEDst != DstReg)
+B.buildZExtOrTrunc(DstReg, BFEDst);
+
   MI.eraseFromParent();
   return true;
 }
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll
index 07fcb02d98649..d2b600b04f9fc 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/fsh

[llvm-branch-commits] [llvm] [AMDGPU] Precommit si-fold-bitmask.mir (PR #131310)

2025-03-17 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/131310

>From d4b257d1b34b51018f51546974bffdc2ea56433d Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Fri, 14 Mar 2025 10:00:21 +0100
Subject: [PATCH] [AMDGPU] Precommit si-fold-bitmask.mir

---
 llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir | 429 ++
 1 file changed, 429 insertions(+)
 create mode 100644 llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir

diff --git a/llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir 
b/llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir
new file mode 100644
index 0..1edf970591179
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir
@@ -0,0 +1,429 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -run-pass=si-fold-operands 
-verify-machineinstrs -o - %s | FileCheck --check-prefix=GCN %s
+
+# Test supported instructions
+
+---
+name: v_ashr_i32_e64__v_and_b32_e32
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0, $vgpr1
+
+; GCN-LABEL: name: v_ashr_i32_e64__v_and_b32_e32
+; GCN: liveins: $vgpr0, $vgpr1
+; GCN-NEXT: {{  $}}
+; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0
+; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1
+; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit 
$exec
+; GCN-NEXT: %ret:vgpr_32 = V_ASHR_I32_e64 %src, %shiftmask, implicit $exec
+; GCN-NEXT: $vgpr0 = COPY %ret
+%src:vgpr_32 = COPY $vgpr0
+%shift:vgpr_32 = COPY $vgpr1
+%shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit $exec
+%ret:vgpr_32 = V_ASHR_I32_e64 %src, %shiftmask, implicit $exec
+$vgpr0 = COPY %ret
+...
+
+---
+name: v_lshr_b32_e64__v_and_b32_e32
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0, $vgpr1
+
+; GCN-LABEL: name: v_lshr_b32_e64__v_and_b32_e32
+; GCN: liveins: $vgpr0, $vgpr1
+; GCN-NEXT: {{  $}}
+; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0
+; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1
+; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit 
$exec
+; GCN-NEXT: %ret:vgpr_32 = V_LSHR_B32_e64 %src, %shiftmask, implicit $exec
+; GCN-NEXT: $vgpr0 = COPY %ret
+%src:vgpr_32 = COPY $vgpr0
+%shift:vgpr_32 = COPY $vgpr1
+%shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit $exec
+%ret:vgpr_32 = V_LSHR_B32_e64 %src, %shiftmask, implicit $exec
+$vgpr0 = COPY %ret
+...
+
+---
+name: v_lshr_b32_e32__v_and_b32_e32
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0, $vgpr1
+
+; GCN-LABEL: name: v_lshr_b32_e32__v_and_b32_e32
+; GCN: liveins: $vgpr0, $vgpr1
+; GCN-NEXT: {{  $}}
+; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0
+; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1
+; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit 
$exec
+; GCN-NEXT: %ret:vgpr_32 = V_LSHR_B32_e32 %src, %shiftmask, implicit $exec
+; GCN-NEXT: $vgpr0 = COPY %ret
+%src:vgpr_32 = COPY $vgpr0
+%shift:vgpr_32 = COPY $vgpr1
+%shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit $exec
+%ret:vgpr_32 = V_LSHR_B32_e32 %src, %shiftmask, implicit $exec
+$vgpr0 = COPY %ret
+...
+
+---
+name: v_lshl_b32_e64__v_and_b32_e32
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0, $vgpr1
+
+; GCN-LABEL: name: v_lshl_b32_e64__v_and_b32_e32
+; GCN: liveins: $vgpr0, $vgpr1
+; GCN-NEXT: {{  $}}
+; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0
+; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1
+; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit 
$exec
+; GCN-NEXT: %ret:vgpr_32 = V_LSHL_B32_e64 %src, %shiftmask, implicit $exec
+; GCN-NEXT: $vgpr0 = COPY %ret
+%src:vgpr_32 = COPY $vgpr0
+%shift:vgpr_32 = COPY $vgpr1
+%shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit $exec
+%ret:vgpr_32 = V_LSHL_B32_e64 %src, %shiftmask, implicit $exec
+$vgpr0 = COPY %ret
+...
+
+---
+name: s_lshl_b32__s_and_b32
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $sgpr0, $sgpr1
+
+; GCN-LABEL: name: s_lshl_b32__s_and_b32
+; GCN: liveins: $sgpr0, $sgpr1
+; GCN-NEXT: {{  $}}
+; GCN-NEXT: %src:sgpr_32 = COPY $sgpr0
+; GCN-NEXT: %shift:sgpr_32 = COPY $sgpr1
+; GCN-NEXT: %shiftmask:sgpr_32 = S_AND_B32 65535, %shift, implicit-def $scc
+; GCN-NEXT: %ret:sgpr_32 = S_LSHL_B32 %src, %shiftmask, implicit-def $scc
+; GCN-NEXT: $sgpr0 = COPY %ret
+%src:sgpr_32 = COPY $sgpr0
+%shift:sgpr_32 = COPY $sgpr1
+%shiftmask:sgpr_32 = S_AND_B32 65535, %shift, implicit-def $scc
+%ret:sgpr_32 = S_LSHL_B32 %src, %shiftmask, implicit-def $scc
+$sgpr0 = COPY %ret
+...
+
+---
+name: s_lshr_b32__s_and_b32
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $sgpr0, $sgpr1
+
+; GCN-LABEL: name: s_lshr_b32__s_and_b32
+; GCN: liveins: $sgpr0, $sgpr1
+; GCN-NEXT: {{  $}}
+; GCN-NEXT: %src:sgpr_32 = COPY $sgpr0
+; GCN-NEXT: %shift:sgpr_

[llvm-branch-commits] [llvm] [AMDGPU][SIFoldOperands] Fold some redundant bitmasks (PR #131311)

2025-03-17 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/131311

>From 17e13825f173be8fd67494f13f002f35d93e357f Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Fri, 14 Mar 2025 10:05:19 +0100
Subject: [PATCH 1/2] [AMDGPU][SIFoldOperands] Fold some redundant bitmasks

Instructions like shifts only read some of the bits of the shift amount 
operand, between 4 and 6 bits.
If the source operand is being masked, we can just ignore the mask.

Effects are minimal right now but this will kick in more once we disable 
uniform i16 operation widening in CGP.
With that disabled, we get more i16 shift amounts
that are zext'd and without this we'd end up with
more `s_and_b32 s1, s1, 0x` in the output.

Ideally ISel should handle this but it's proving difficult to get the patterns 
right, and after a few hours of trying I just decided to go with this as it's 
simple enough and it "just works" for this purpose.
---
 llvm/lib/Target/AMDGPU/SIFoldOperands.cpp |  97 +++-
 llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll   |   8 +-
 llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll   | 201 -
 llvm/test/CodeGen/AMDGPU/GlobalISel/fshr.ll   | 207 --
 llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll   |   8 +-
 llvm/test/CodeGen/AMDGPU/GlobalISel/shl.ll|   6 +-
 llvm/test/CodeGen/AMDGPU/constrained-shift.ll |   1 -
 llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir |  26 +--
 8 files changed, 303 insertions(+), 251 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp 
b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
index cc15dd7cb495c..5f666e10b5cb7 100644
--- a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
@@ -131,6 +131,7 @@ class SIFoldOperandsImpl {
   std::optional getImmOrMaterializedImm(MachineOperand &Op) const;
   bool tryConstantFoldOp(MachineInstr *MI) const;
   bool tryFoldCndMask(MachineInstr &MI) const;
+  bool tryFoldBitMask(MachineInstr &MI) const;
   bool tryFoldZeroHighBits(MachineInstr &MI) const;
   bool foldInstOperand(MachineInstr &MI, MachineOperand &OpToFold) const;
 
@@ -1447,6 +1448,99 @@ bool SIFoldOperandsImpl::tryFoldCndMask(MachineInstr 
&MI) const {
   return true;
 }
 
+static bool getBitsReadByInst(unsigned Opc, unsigned &NumBitsRead,
+  unsigned &OpIdx) {
+  switch (Opc) {
+  case AMDGPU::V_ASHR_I32_e64:
+  case AMDGPU::V_ASHR_I32_e32:
+  case AMDGPU::V_LSHR_B32_e64:
+  case AMDGPU::V_LSHR_B32_e32:
+  case AMDGPU::V_LSHL_B32_e64:
+  case AMDGPU::V_LSHL_B32_e32:
+  case AMDGPU::S_LSHL_B32:
+  case AMDGPU::S_LSHR_B32:
+  case AMDGPU::S_ASHR_I32:
+NumBitsRead = 5;
+OpIdx = 2;
+return true;
+  case AMDGPU::S_LSHL_B64:
+  case AMDGPU::S_LSHR_B64:
+  case AMDGPU::S_ASHR_I64:
+NumBitsRead = 6;
+OpIdx = 2;
+return true;
+  case AMDGPU::V_LSHLREV_B32_e64:
+  case AMDGPU::V_LSHLREV_B32_e32:
+  case AMDGPU::V_LSHRREV_B32_e64:
+  case AMDGPU::V_LSHRREV_B32_e32:
+  case AMDGPU::V_ASHRREV_I32_e64:
+  case AMDGPU::V_ASHRREV_I32_e32:
+NumBitsRead = 5;
+OpIdx = 1;
+return true;
+  default:
+return false;
+  }
+}
+
+static bool isAndBitMaskRedundant(MachineInstr &MI, unsigned BitsNeeded,
+unsigned &SrcOp) {
+  MachineOperand *RegOp = &MI.getOperand(1);
+  MachineOperand *ImmOp = &MI.getOperand(2);
+
+  if (!RegOp->isReg() || !ImmOp->isImm()) {
+if (ImmOp->isReg() && RegOp->isImm())
+  std::swap(RegOp, ImmOp);
+else
+  return false;
+  }
+
+  SrcOp = RegOp->getOperandNo();
+
+  const unsigned BitMask = maskTrailingOnes(BitsNeeded);
+  return (ImmOp->getImm() & BitMask) == BitMask;
+}
+
+bool SIFoldOperandsImpl::tryFoldBitMask(MachineInstr &MI) const {
+  unsigned NumBitsRead = 0;
+  unsigned OpIdx = 0;
+  if (!getBitsReadByInst(MI.getOpcode(), NumBitsRead, OpIdx))
+return false;
+
+  MachineOperand &Op = MI.getOperand(OpIdx);
+  if (!Op.isReg())
+return false;
+
+  Register OpReg = Op.getReg();
+  if (OpReg.isPhysical())
+return false;
+
+  MachineInstr *OpDef = MRI->getVRegDef(OpReg);
+  if (!OpDef)
+return false ;
+
+  LLVM_DEBUG(dbgs() << "tryFoldBitMask: " << MI << "\tOpIdx:" << OpIdx << ", 
NumBitsRead:" << NumBitsRead << "\n");
+
+  unsigned ReplaceWith;
+  switch (OpDef->getOpcode()) {
+  // TODO: add more opcodes?
+  case AMDGPU::S_AND_B32:
+  case AMDGPU::V_AND_B32_e32:
+  case AMDGPU::V_AND_B32_e64:
+if (!isAndBitMaskRedundant(*OpDef, NumBitsRead, ReplaceWith))
+  return false;
+break;
+  default:
+return false;
+  }
+
+  MachineOperand &ReplaceWithOp = OpDef->getOperand(ReplaceWith);
+  LLVM_DEBUG(dbgs() << "\treplacing operand with:" << ReplaceWithOp << "\n");
+
+  MI.getOperand(OpIdx).setReg(ReplaceWithOp.getReg());
+  return true;
+}
+
 bool SIFoldOperandsImpl::tryFoldZeroHighBits(MachineInstr &MI) const {
   if (MI.getOpcode() != AMDGPU::V_AND_B32_e64 &&
   MI.getOpcode() != AMDGPU::V_AND_B32_e32)
@@ -1458,7 +1552,7 @@ bool SIFoldOperands

[llvm-branch-commits] [llvm] [AMDGPU][GlobalISel] Combine (sext (trunc (sext_in_reg x))) (PR #131312)

2025-03-17 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/131312

>From 9fabf931105e1cf86cf69f90bd5c62068846c3e1 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Fri, 14 Mar 2025 10:34:51 +0100
Subject: [PATCH] [AMDGPU][GlobalISel] Combine (sext (trunc (sext_in_reg x)))

This is a bit of an akward pattern that can come up as a result
of legalization and then widening of i16 operations to i32 in RegBankSelect
on AMDGPU.

This quick combine avoids redundant patterns like
```
s_sext_i32_i8 s0, s0
s_sext_i32_i16 s0, s0
s_ashr_i32 s0, s0, s1
```

With this the second sext is removed as it's redundant.
---
 .../include/llvm/Target/GlobalISel/Combine.td | 12 ++-
 .../combine-sext-trunc-sextinreg.mir  | 86 +++
 .../CodeGen/AMDGPU/GlobalISel/llvm.abs.ll | 78 -
 3 files changed, 113 insertions(+), 63 deletions(-)
 create mode 100644 
llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir

diff --git a/llvm/include/llvm/Target/GlobalISel/Combine.td 
b/llvm/include/llvm/Target/GlobalISel/Combine.td
index 3590ab221ad44..9727b86b4be8b 100644
--- a/llvm/include/llvm/Target/GlobalISel/Combine.td
+++ b/llvm/include/llvm/Target/GlobalISel/Combine.td
@@ -258,6 +258,14 @@ def sext_trunc_sextload : GICombineRule<
  [{ return Helper.matchSextTruncSextLoad(*${d}); }]),
   (apply [{ Helper.applySextTruncSextLoad(*${d}); }])>;
 
+def sext_trunc_sextinreg : GICombineRule<
+  (defs root:$dst),
+  (match (G_SEXT_INREG $sir, $src, $width),
+ (G_TRUNC $trunc, $sir),
+ (G_SEXT $dst, $trunc),
+ [{ return (MRI.getType(${trunc}.getReg()).getScalarSizeInBits() >= 
${width}.getImm()); }]),
+  (apply (GIReplaceReg $dst, $sir))>;
+
 def sext_inreg_of_load_matchdata : GIDefMatchData<"std::tuple">;
 def sext_inreg_of_load : GICombineRule<
   (defs root:$root, sext_inreg_of_load_matchdata:$matchinfo),
@@ -1896,7 +1904,9 @@ def cast_of_cast_combines: GICombineGroup<[
   sext_of_anyext,
   anyext_of_anyext,
   anyext_of_zext,
-  anyext_of_sext
+  anyext_of_sext,
+
+  sext_trunc_sextinreg
 ]>;
 
 def cast_combines: GICombineGroup<[
diff --git 
a/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir
new file mode 100644
index 0..d41e5b172efc2
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir
@@ -0,0 +1,86 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1030 
-run-pass=amdgpu-postlegalizer-combiner -verify-machineinstrs %s -o - | 
FileCheck %s
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1030 
-run-pass=amdgpu-regbank-combiner -verify-machineinstrs %s -o - | FileCheck %s
+
+---
+name: trunc_s16_inreg_8
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0
+; CHECK-LABEL: name: trunc_s16_inreg_8
+; CHECK: liveins: $vgpr0
+; CHECK-NEXT: {{  $}}
+; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0
+; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 8
+; CHECK-NEXT: $vgpr0 = COPY %inreg(s32)
+%copy:_(s32) = COPY $vgpr0
+%inreg:_(s32) = G_SEXT_INREG %copy, 8
+%trunc:_(s16) = G_TRUNC %inreg
+%sext:_(s32) = G_SEXT %trunc
+$vgpr0 = COPY %sext
+...
+
+---
+name: trunc_s16_inreg_16
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0
+; CHECK-LABEL: name: trunc_s16_inreg_16
+; CHECK: liveins: $vgpr0
+; CHECK-NEXT: {{  $}}
+; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0
+; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 16
+; CHECK-NEXT: $vgpr0 = COPY %inreg(s32)
+%copy:_(s32) = COPY $vgpr0
+%inreg:_(s32) = G_SEXT_INREG %copy, 16
+%trunc:_(s16) = G_TRUNC %inreg
+%sext:_(s32) = G_SEXT %trunc
+$vgpr0 = COPY %sext
+...
+
+---
+name: trunc_s8_inreg_16
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0
+; CHECK-LABEL: name: trunc_s8_inreg_16
+; CHECK: liveins: $vgpr0
+; CHECK-NEXT: {{  $}}
+; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0
+; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 16
+; CHECK-NEXT: %trunc:_(s8) = G_TRUNC %inreg(s32)
+; CHECK-NEXT: %sext:_(s32) = G_SEXT %trunc(s8)
+; CHECK-NEXT: $vgpr0 = COPY %sext(s32)
+%copy:_(s32) = COPY $vgpr0
+%inreg:_(s32) = G_SEXT_INREG %copy, 16
+%trunc:_(s8) = G_TRUNC %inreg
+%sext:_(s32) = G_SEXT %trunc
+$vgpr0 = COPY %sext
+...
+
+# TODO?: We could handle this by inserting a trunc, but I'm not sure how 
useful that'd be.
+---
+name: mismatching_types
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0
+; CHECK-LABEL: name: mismatching_types
+; CHECK: liveins: $vgpr0
+; CHECK-NEXT: {{  $}}
+; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0
+; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 8
+; CHECK-NEXT: %trunc:_(s8) = G_TRUNC %inreg(s32)
+; CHECK-NEXT: %sext:_(s16

[llvm-branch-commits] [llvm] [AMDGPU][SIFoldOperands] Fold some redundant bitmasks (PR #131311)

2025-03-17 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/131311

>From 520757cf40d285b58eb0539840be2bf282c0a0af Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Fri, 14 Mar 2025 10:05:19 +0100
Subject: [PATCH 1/2] [AMDGPU][SIFoldOperands] Fold some redundant bitmasks

Instructions like shifts only read some of the bits of the shift amount 
operand, between 4 and 6 bits.
If the source operand is being masked, we can just ignore the mask.

Effects are minimal right now but this will kick in more once we disable 
uniform i16 operation widening in CGP.
With that disabled, we get more i16 shift amounts
that are zext'd and without this we'd end up with
more `s_and_b32 s1, s1, 0x` in the output.

Ideally ISel should handle this but it's proving difficult to get the patterns 
right, and after a few hours of trying I just decided to go with this as it's 
simple enough and it "just works" for this purpose.
---
 llvm/lib/Target/AMDGPU/SIFoldOperands.cpp |  97 +++-
 llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll   |   8 +-
 llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll   | 201 -
 llvm/test/CodeGen/AMDGPU/GlobalISel/fshr.ll   | 207 --
 llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll   |   8 +-
 llvm/test/CodeGen/AMDGPU/GlobalISel/shl.ll|   6 +-
 llvm/test/CodeGen/AMDGPU/constrained-shift.ll |   1 -
 llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir |  26 +--
 8 files changed, 303 insertions(+), 251 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp 
b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
index 91df516b80857..a279a0a973e75 100644
--- a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
@@ -131,6 +131,7 @@ class SIFoldOperandsImpl {
   std::optional getImmOrMaterializedImm(MachineOperand &Op) const;
   bool tryConstantFoldOp(MachineInstr *MI) const;
   bool tryFoldCndMask(MachineInstr &MI) const;
+  bool tryFoldBitMask(MachineInstr &MI) const;
   bool tryFoldZeroHighBits(MachineInstr &MI) const;
   bool foldInstOperand(MachineInstr &MI, MachineOperand &OpToFold) const;
 
@@ -1447,6 +1448,99 @@ bool SIFoldOperandsImpl::tryFoldCndMask(MachineInstr 
&MI) const {
   return true;
 }
 
+static bool getBitsReadByInst(unsigned Opc, unsigned &NumBitsRead,
+  unsigned &OpIdx) {
+  switch (Opc) {
+  case AMDGPU::V_ASHR_I32_e64:
+  case AMDGPU::V_ASHR_I32_e32:
+  case AMDGPU::V_LSHR_B32_e64:
+  case AMDGPU::V_LSHR_B32_e32:
+  case AMDGPU::V_LSHL_B32_e64:
+  case AMDGPU::V_LSHL_B32_e32:
+  case AMDGPU::S_LSHL_B32:
+  case AMDGPU::S_LSHR_B32:
+  case AMDGPU::S_ASHR_I32:
+NumBitsRead = 5;
+OpIdx = 2;
+return true;
+  case AMDGPU::S_LSHL_B64:
+  case AMDGPU::S_LSHR_B64:
+  case AMDGPU::S_ASHR_I64:
+NumBitsRead = 6;
+OpIdx = 2;
+return true;
+  case AMDGPU::V_LSHLREV_B32_e64:
+  case AMDGPU::V_LSHLREV_B32_e32:
+  case AMDGPU::V_LSHRREV_B32_e64:
+  case AMDGPU::V_LSHRREV_B32_e32:
+  case AMDGPU::V_ASHRREV_I32_e64:
+  case AMDGPU::V_ASHRREV_I32_e32:
+NumBitsRead = 5;
+OpIdx = 1;
+return true;
+  default:
+return false;
+  }
+}
+
+static bool isAndBitMaskRedundant(MachineInstr &MI, unsigned BitsNeeded,
+unsigned &SrcOp) {
+  MachineOperand *RegOp = &MI.getOperand(1);
+  MachineOperand *ImmOp = &MI.getOperand(2);
+
+  if (!RegOp->isReg() || !ImmOp->isImm()) {
+if (ImmOp->isReg() && RegOp->isImm())
+  std::swap(RegOp, ImmOp);
+else
+  return false;
+  }
+
+  SrcOp = RegOp->getOperandNo();
+
+  const unsigned BitMask = maskTrailingOnes(BitsNeeded);
+  return (ImmOp->getImm() & BitMask) == BitMask;
+}
+
+bool SIFoldOperandsImpl::tryFoldBitMask(MachineInstr &MI) const {
+  unsigned NumBitsRead = 0;
+  unsigned OpIdx = 0;
+  if (!getBitsReadByInst(MI.getOpcode(), NumBitsRead, OpIdx))
+return false;
+
+  MachineOperand &Op = MI.getOperand(OpIdx);
+  if (!Op.isReg())
+return false;
+
+  Register OpReg = Op.getReg();
+  if (OpReg.isPhysical())
+return false;
+
+  MachineInstr *OpDef = MRI->getVRegDef(OpReg);
+  if (!OpDef)
+return false ;
+
+  LLVM_DEBUG(dbgs() << "tryFoldBitMask: " << MI << "\tOpIdx:" << OpIdx << ", 
NumBitsRead:" << NumBitsRead << "\n");
+
+  unsigned ReplaceWith;
+  switch (OpDef->getOpcode()) {
+  // TODO: add more opcodes?
+  case AMDGPU::S_AND_B32:
+  case AMDGPU::V_AND_B32_e32:
+  case AMDGPU::V_AND_B32_e64:
+if (!isAndBitMaskRedundant(*OpDef, NumBitsRead, ReplaceWith))
+  return false;
+break;
+  default:
+return false;
+  }
+
+  MachineOperand &ReplaceWithOp = OpDef->getOperand(ReplaceWith);
+  LLVM_DEBUG(dbgs() << "\treplacing operand with:" << ReplaceWithOp << "\n");
+
+  MI.getOperand(OpIdx).setReg(ReplaceWithOp.getReg());
+  return true;
+}
+
 bool SIFoldOperandsImpl::tryFoldZeroHighBits(MachineInstr &MI) const {
   if (MI.getOpcode() != AMDGPU::V_AND_B32_e64 &&
   MI.getOpcode() != AMDGPU::V_AND_B32_e32)
@@ -1458,7 +1552,7 @@ bool SIFoldOperands

[llvm-branch-commits] [llvm] [AMDGPU][GlobalISel] Combine (sext (trunc (sext_in_reg x))) (PR #131312)

2025-03-17 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/131312

>From 4751d38d86886106c00e9140bf0bb3a3459950cb Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Fri, 14 Mar 2025 10:34:51 +0100
Subject: [PATCH] [AMDGPU][GlobalISel] Combine (sext (trunc (sext_in_reg x)))

This is a bit of an akward pattern that can come up as a result
of legalization and then widening of i16 operations to i32 in RegBankSelect
on AMDGPU.

This quick combine avoids redundant patterns like
```
s_sext_i32_i8 s0, s0
s_sext_i32_i16 s0, s0
s_ashr_i32 s0, s0, s1
```

With this the second sext is removed as it's redundant.
---
 .../include/llvm/Target/GlobalISel/Combine.td | 12 ++-
 .../combine-sext-trunc-sextinreg.mir  | 86 +++
 .../CodeGen/AMDGPU/GlobalISel/llvm.abs.ll | 78 -
 3 files changed, 113 insertions(+), 63 deletions(-)
 create mode 100644 
llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir

diff --git a/llvm/include/llvm/Target/GlobalISel/Combine.td 
b/llvm/include/llvm/Target/GlobalISel/Combine.td
index 3590ab221ad44..9727b86b4be8b 100644
--- a/llvm/include/llvm/Target/GlobalISel/Combine.td
+++ b/llvm/include/llvm/Target/GlobalISel/Combine.td
@@ -258,6 +258,14 @@ def sext_trunc_sextload : GICombineRule<
  [{ return Helper.matchSextTruncSextLoad(*${d}); }]),
   (apply [{ Helper.applySextTruncSextLoad(*${d}); }])>;
 
+def sext_trunc_sextinreg : GICombineRule<
+  (defs root:$dst),
+  (match (G_SEXT_INREG $sir, $src, $width),
+ (G_TRUNC $trunc, $sir),
+ (G_SEXT $dst, $trunc),
+ [{ return (MRI.getType(${trunc}.getReg()).getScalarSizeInBits() >= 
${width}.getImm()); }]),
+  (apply (GIReplaceReg $dst, $sir))>;
+
 def sext_inreg_of_load_matchdata : GIDefMatchData<"std::tuple">;
 def sext_inreg_of_load : GICombineRule<
   (defs root:$root, sext_inreg_of_load_matchdata:$matchinfo),
@@ -1896,7 +1904,9 @@ def cast_of_cast_combines: GICombineGroup<[
   sext_of_anyext,
   anyext_of_anyext,
   anyext_of_zext,
-  anyext_of_sext
+  anyext_of_sext,
+
+  sext_trunc_sextinreg
 ]>;
 
 def cast_combines: GICombineGroup<[
diff --git 
a/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir
new file mode 100644
index 0..d41e5b172efc2
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir
@@ -0,0 +1,86 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1030 
-run-pass=amdgpu-postlegalizer-combiner -verify-machineinstrs %s -o - | 
FileCheck %s
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1030 
-run-pass=amdgpu-regbank-combiner -verify-machineinstrs %s -o - | FileCheck %s
+
+---
+name: trunc_s16_inreg_8
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0
+; CHECK-LABEL: name: trunc_s16_inreg_8
+; CHECK: liveins: $vgpr0
+; CHECK-NEXT: {{  $}}
+; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0
+; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 8
+; CHECK-NEXT: $vgpr0 = COPY %inreg(s32)
+%copy:_(s32) = COPY $vgpr0
+%inreg:_(s32) = G_SEXT_INREG %copy, 8
+%trunc:_(s16) = G_TRUNC %inreg
+%sext:_(s32) = G_SEXT %trunc
+$vgpr0 = COPY %sext
+...
+
+---
+name: trunc_s16_inreg_16
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0
+; CHECK-LABEL: name: trunc_s16_inreg_16
+; CHECK: liveins: $vgpr0
+; CHECK-NEXT: {{  $}}
+; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0
+; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 16
+; CHECK-NEXT: $vgpr0 = COPY %inreg(s32)
+%copy:_(s32) = COPY $vgpr0
+%inreg:_(s32) = G_SEXT_INREG %copy, 16
+%trunc:_(s16) = G_TRUNC %inreg
+%sext:_(s32) = G_SEXT %trunc
+$vgpr0 = COPY %sext
+...
+
+---
+name: trunc_s8_inreg_16
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0
+; CHECK-LABEL: name: trunc_s8_inreg_16
+; CHECK: liveins: $vgpr0
+; CHECK-NEXT: {{  $}}
+; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0
+; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 16
+; CHECK-NEXT: %trunc:_(s8) = G_TRUNC %inreg(s32)
+; CHECK-NEXT: %sext:_(s32) = G_SEXT %trunc(s8)
+; CHECK-NEXT: $vgpr0 = COPY %sext(s32)
+%copy:_(s32) = COPY $vgpr0
+%inreg:_(s32) = G_SEXT_INREG %copy, 16
+%trunc:_(s8) = G_TRUNC %inreg
+%sext:_(s32) = G_SEXT %trunc
+$vgpr0 = COPY %sext
+...
+
+# TODO?: We could handle this by inserting a trunc, but I'm not sure how 
useful that'd be.
+---
+name: mismatching_types
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0
+; CHECK-LABEL: name: mismatching_types
+; CHECK: liveins: $vgpr0
+; CHECK-NEXT: {{  $}}
+; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0
+; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 8
+; CHECK-NEXT: %trunc:_(s8) = G_TRUNC %inreg(s32)
+; CHECK-NEXT: %sext:_(s16

[llvm-branch-commits] [llvm] [AMDGPU][RegBankInfo] Promote scalar i16 and/or/xor to i32 (PR #131306)

2025-03-17 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/131306

>From 1af83464f02df212384bd97848b0073d41053234 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Wed, 5 Mar 2025 10:46:01 +0100
Subject: [PATCH 1/2] [AMDGPU][RegBankInfo] Promote scalar i16 and/or/xor to
 i32

See #64591
---
 .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp  |  28 +-
 llvm/test/CodeGen/AMDGPU/GlobalISel/andn2.ll  |  10 +-
 llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll   | 519 --
 llvm/test/CodeGen/AMDGPU/GlobalISel/fshr.ll   | 286 +-
 llvm/test/CodeGen/AMDGPU/GlobalISel/orn2.ll   |  10 +-
 5 files changed, 403 insertions(+), 450 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
index c19ee14ab1574..27b86723ce474 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
@@ -2416,9 +2416,10 @@ void AMDGPURegisterBankInfo::applyMappingImpl(
 Register DstReg = MI.getOperand(0).getReg();
 LLT DstTy = MRI.getType(DstReg);
 
-if (DstTy.getSizeInBits() == 1) {
-  const RegisterBank *DstBank =
+const RegisterBank *DstBank =
 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
+
+if (DstTy.getSizeInBits() == 1) {
   if (DstBank == &AMDGPU::VCCRegBank)
 break;
 
@@ -2432,6 +2433,29 @@ void AMDGPURegisterBankInfo::applyMappingImpl(
   return;
 }
 
+// 16-bit operations are VALU only, but can be promoted to 32-bit SALU.
+// Packed 16-bit operations need to be scalarized and promoted.
+if (DstTy.getSizeInBits() == 16 && DstBank == &AMDGPU::SGPRRegBank) {
+  const LLT S32 = LLT::scalar(32);
+  MachineBasicBlock *MBB = MI.getParent();
+  MachineFunction *MF = MBB->getParent();
+  ApplyRegBankMapping ApplySALU(B, *this, MRI, &AMDGPU::SGPRRegBank);
+  LegalizerHelper Helper(*MF, ApplySALU, B);
+  // Widen to S32, but handle `G_XOR x, -1` differently. Legalizer widening
+  // will use a G_ANYEXT to extend the -1 which prevents matching G_XOR -1
+  // as "not".
+  if (MI.getOpcode() == AMDGPU::G_XOR &&
+  mi_match(MI.getOperand(2).getReg(), MRI, m_SpecificICstOrSplat(-1))) 
{
+Helper.widenScalarSrc(MI, S32, 1, AMDGPU::G_ANYEXT);
+Helper.widenScalarSrc(MI, S32, 2, AMDGPU::G_SEXT);
+Helper.widenScalarDst(MI, S32);
+  } else {
+if (Helper.widenScalar(MI, 0, S32) != LegalizerHelper::Legalized)
+  llvm_unreachable("widen scalar should have succeeded");
+  }
+  return;
+}
+
 if (DstTy.getSizeInBits() != 64)
   break;
 
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/andn2.ll 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/andn2.ll
index 1a94429b1b5a1..36359579ea442 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/andn2.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/andn2.ll
@@ -391,20 +391,20 @@ define amdgpu_ps i16 @s_andn2_i16_commute(i16 inreg 
%src0, i16 inreg %src1) {
 define amdgpu_ps { i16, i16 } @s_andn2_i16_multi_use(i16 inreg %src0, i16 
inreg %src1) {
 ; GCN-LABEL: s_andn2_i16_multi_use:
 ; GCN:   ; %bb.0:
-; GCN-NEXT:s_xor_b32 s1, s3, -1
+; GCN-NEXT:s_not_b32 s1, s3
 ; GCN-NEXT:s_andn2_b32 s0, s2, s3
 ; GCN-NEXT:; return to shader part epilog
 ;
 ; GFX10-LABEL: s_andn2_i16_multi_use:
 ; GFX10:   ; %bb.0:
 ; GFX10-NEXT:s_andn2_b32 s0, s2, s3
-; GFX10-NEXT:s_xor_b32 s1, s3, -1
+; GFX10-NEXT:s_not_b32 s1, s3
 ; GFX10-NEXT:; return to shader part epilog
 ;
 ; GFX11-LABEL: s_andn2_i16_multi_use:
 ; GFX11:   ; %bb.0:
 ; GFX11-NEXT:s_and_not1_b32 s0, s2, s3
-; GFX11-NEXT:s_xor_b32 s1, s3, -1
+; GFX11-NEXT:s_not_b32 s1, s3
 ; GFX11-NEXT:; return to shader part epilog
   %not.src1 = xor i16 %src1, -1
   %and = and i16 %src0, %not.src1
@@ -482,14 +482,14 @@ define amdgpu_ps float @v_andn2_i16_sv(i16 inreg %src0, 
i16 %src1) {
 define amdgpu_ps float @v_andn2_i16_vs(i16 %src0, i16 inreg %src1) {
 ; GCN-LABEL: v_andn2_i16_vs:
 ; GCN:   ; %bb.0:
-; GCN-NEXT:s_xor_b32 s0, s2, -1
+; GCN-NEXT:s_not_b32 s0, s2
 ; GCN-NEXT:v_and_b32_e32 v0, s0, v0
 ; GCN-NEXT:v_and_b32_e32 v0, 0x, v0
 ; GCN-NEXT:; return to shader part epilog
 ;
 ; GFX10PLUS-LABEL: v_andn2_i16_vs:
 ; GFX10PLUS:   ; %bb.0:
-; GFX10PLUS-NEXT:s_xor_b32 s0, s2, -1
+; GFX10PLUS-NEXT:s_not_b32 s0, s2
 ; GFX10PLUS-NEXT:v_and_b32_e32 v0, s0, v0
 ; GFX10PLUS-NEXT:v_and_b32_e32 v0, 0x, v0
 ; GFX10PLUS-NEXT:; return to shader part epilog
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll
index e60739fd84059..3a52497bd6e91 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll
@@ -1052,17 +1052,14 @@ define amdgpu_ps i32 @s_fshl_v4i8(i32 inreg %lhs.arg, 
i32 inreg %rhs.arg, i32 in
 ; GFX8-NEXT:s_lshr_b32 s2, s2, s3
 ; GFX8-NEXT

[llvm-branch-commits] [llvm] [AMDGPU] Precommit si-fold-bitmask.mir (PR #131310)

2025-03-17 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/131310

>From 6db5fe8cc5ff82cc7dc8751ac584870ddbf1b537 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Fri, 14 Mar 2025 10:00:21 +0100
Subject: [PATCH] [AMDGPU] Precommit si-fold-bitmask.mir

---
 llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir | 429 ++
 1 file changed, 429 insertions(+)
 create mode 100644 llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir

diff --git a/llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir 
b/llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir
new file mode 100644
index 0..1edf970591179
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir
@@ -0,0 +1,429 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -run-pass=si-fold-operands 
-verify-machineinstrs -o - %s | FileCheck --check-prefix=GCN %s
+
+# Test supported instructions
+
+---
+name: v_ashr_i32_e64__v_and_b32_e32
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0, $vgpr1
+
+; GCN-LABEL: name: v_ashr_i32_e64__v_and_b32_e32
+; GCN: liveins: $vgpr0, $vgpr1
+; GCN-NEXT: {{  $}}
+; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0
+; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1
+; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit 
$exec
+; GCN-NEXT: %ret:vgpr_32 = V_ASHR_I32_e64 %src, %shiftmask, implicit $exec
+; GCN-NEXT: $vgpr0 = COPY %ret
+%src:vgpr_32 = COPY $vgpr0
+%shift:vgpr_32 = COPY $vgpr1
+%shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit $exec
+%ret:vgpr_32 = V_ASHR_I32_e64 %src, %shiftmask, implicit $exec
+$vgpr0 = COPY %ret
+...
+
+---
+name: v_lshr_b32_e64__v_and_b32_e32
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0, $vgpr1
+
+; GCN-LABEL: name: v_lshr_b32_e64__v_and_b32_e32
+; GCN: liveins: $vgpr0, $vgpr1
+; GCN-NEXT: {{  $}}
+; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0
+; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1
+; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit 
$exec
+; GCN-NEXT: %ret:vgpr_32 = V_LSHR_B32_e64 %src, %shiftmask, implicit $exec
+; GCN-NEXT: $vgpr0 = COPY %ret
+%src:vgpr_32 = COPY $vgpr0
+%shift:vgpr_32 = COPY $vgpr1
+%shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit $exec
+%ret:vgpr_32 = V_LSHR_B32_e64 %src, %shiftmask, implicit $exec
+$vgpr0 = COPY %ret
+...
+
+---
+name: v_lshr_b32_e32__v_and_b32_e32
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0, $vgpr1
+
+; GCN-LABEL: name: v_lshr_b32_e32__v_and_b32_e32
+; GCN: liveins: $vgpr0, $vgpr1
+; GCN-NEXT: {{  $}}
+; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0
+; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1
+; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit 
$exec
+; GCN-NEXT: %ret:vgpr_32 = V_LSHR_B32_e32 %src, %shiftmask, implicit $exec
+; GCN-NEXT: $vgpr0 = COPY %ret
+%src:vgpr_32 = COPY $vgpr0
+%shift:vgpr_32 = COPY $vgpr1
+%shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit $exec
+%ret:vgpr_32 = V_LSHR_B32_e32 %src, %shiftmask, implicit $exec
+$vgpr0 = COPY %ret
+...
+
+---
+name: v_lshl_b32_e64__v_and_b32_e32
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0, $vgpr1
+
+; GCN-LABEL: name: v_lshl_b32_e64__v_and_b32_e32
+; GCN: liveins: $vgpr0, $vgpr1
+; GCN-NEXT: {{  $}}
+; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0
+; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1
+; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit 
$exec
+; GCN-NEXT: %ret:vgpr_32 = V_LSHL_B32_e64 %src, %shiftmask, implicit $exec
+; GCN-NEXT: $vgpr0 = COPY %ret
+%src:vgpr_32 = COPY $vgpr0
+%shift:vgpr_32 = COPY $vgpr1
+%shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit $exec
+%ret:vgpr_32 = V_LSHL_B32_e64 %src, %shiftmask, implicit $exec
+$vgpr0 = COPY %ret
+...
+
+---
+name: s_lshl_b32__s_and_b32
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $sgpr0, $sgpr1
+
+; GCN-LABEL: name: s_lshl_b32__s_and_b32
+; GCN: liveins: $sgpr0, $sgpr1
+; GCN-NEXT: {{  $}}
+; GCN-NEXT: %src:sgpr_32 = COPY $sgpr0
+; GCN-NEXT: %shift:sgpr_32 = COPY $sgpr1
+; GCN-NEXT: %shiftmask:sgpr_32 = S_AND_B32 65535, %shift, implicit-def $scc
+; GCN-NEXT: %ret:sgpr_32 = S_LSHL_B32 %src, %shiftmask, implicit-def $scc
+; GCN-NEXT: $sgpr0 = COPY %ret
+%src:sgpr_32 = COPY $sgpr0
+%shift:sgpr_32 = COPY $sgpr1
+%shiftmask:sgpr_32 = S_AND_B32 65535, %shift, implicit-def $scc
+%ret:sgpr_32 = S_LSHL_B32 %src, %shiftmask, implicit-def $scc
+$sgpr0 = COPY %ret
+...
+
+---
+name: s_lshr_b32__s_and_b32
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $sgpr0, $sgpr1
+
+; GCN-LABEL: name: s_lshr_b32__s_and_b32
+; GCN: liveins: $sgpr0, $sgpr1
+; GCN-NEXT: {{  $}}
+; GCN-NEXT: %src:sgpr_32 = COPY $sgpr0
+; GCN-NEXT: %shift:sgpr_

[llvm-branch-commits] [llvm] [AMDGPU][GlobalISel] Allow forming s16 U/SBFX pre-regbankselect (PR #131309)

2025-03-17 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/131309

>From 090fa3eb8b5ebb595a6ec4b78ec337af71466a73 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Wed, 12 Mar 2025 09:43:15 +0100
Subject: [PATCH] [AMDGPU][GlobalISel] Allow forming s16 U/SBFX
 pre-regbankselect

Make s16 G_U/SBFX legal and widen them in RegBankSelect.
This allows the set of BFX formation combines to work on s16 types.
---
 .../lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp |   9 +-
 .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp  |  33 +-
 llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll   | 645 --
 llvm/test/CodeGen/AMDGPU/GlobalISel/fshr.ll   | 380 ---
 .../AMDGPU/GlobalISel/legalize-sbfx.mir   |  26 +-
 .../AMDGPU/GlobalISel/legalize-ubfx.mir   |  27 +-
 llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll   |  27 +-
 7 files changed, 503 insertions(+), 644 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
index cfb5c3b3006f0..ab900157d2095 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
@@ -2069,10 +2069,13 @@ AMDGPULegalizerInfo::AMDGPULegalizerInfo(const 
GCNSubtarget &ST_,
   .minScalar(0, S32)
   .lower();
 
+  // Only {S32, S32} or {S32, S64} should ever reach codegen.
+  // We allow S/UBFX for S16 so the combiner can form them before
+  // RegBankSelect, and RegBankSelect will then legalize them correctly.
   getActionDefinitionsBuilder({G_SBFX, G_UBFX})
-  .legalFor({{S32, S32}, {S64, S32}})
-  .clampScalar(1, S32, S32)
-  .clampScalar(0, S32, S64)
+  .legalFor({{S16, S16}, {S32, S32}, {S64, S32}})
+  .clampScalar(1, S16, S32)
+  .clampScalar(0, S16, S64)
   .widenScalarToNextPow2(0)
   .scalarize(0);
 
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
index a7df9a0edd21a..844251be24c42 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
@@ -1485,7 +1485,9 @@ bool 
AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B,
   Register DstReg = MI.getOperand(0).getReg();
   LLT Ty = MRI.getType(DstReg);
 
+  const LLT S64 = LLT::scalar(64);
   const LLT S32 = LLT::scalar(32);
+  const LLT S16 = LLT::scalar(16);
 
   unsigned FirstOpnd = isa(MI) ? 2 : 1;
   Register SrcReg = MI.getOperand(FirstOpnd).getReg();
@@ -1495,6 +1497,18 @@ bool 
AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B,
   const RegisterBank *DstBank =
 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
   if (DstBank == &AMDGPU::VGPRRegBank) {
+if (Ty == S16) {
+  ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::VGPRRegBank);
+  B.setInsertPt(B.getMBB(), MI);
+  LegalizerHelper Helper(B.getMF(), ApplyBank, B);
+
+  Helper.widenScalarDst(MI, S32);
+  Helper.widenScalarSrc(MI, S32, 1, AMDGPU::G_ANYEXT);
+  Helper.widenScalarSrc(MI, S32, 2, AMDGPU::G_ZEXT);
+  Helper.widenScalarSrc(MI, S32, 3, AMDGPU::G_ZEXT);
+  return true;
+}
+
 if (Ty == S32)
   return true;
 
@@ -1554,6 +1568,11 @@ bool 
AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B,
 
   ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::SGPRRegBank);
 
+  if (Ty == S16) {
+OffsetReg = B.buildAnyExtOrTrunc(S32, OffsetReg).getReg(0);
+WidthReg = B.buildAnyExtOrTrunc(S32, WidthReg).getReg(0);
+  }
+
   // Ensure the high bits are clear to insert the offset.
   auto OffsetMask = B.buildConstant(S32, maskTrailingOnes(6));
   auto ClampOffset = B.buildAnd(S32, OffsetReg, OffsetMask);
@@ -1568,13 +1587,21 @@ bool 
AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B,
 
   // TODO: It might be worth using a pseudo here to avoid scc clobber and
   // register class constraints.
-  unsigned Opc = Ty == S32 ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32) :
- (Signed ? AMDGPU::S_BFE_I64 : AMDGPU::S_BFE_U64);
+  unsigned Opc = (Ty != S64) ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32)
+ : (Signed ? AMDGPU::S_BFE_I64 : 
AMDGPU::S_BFE_U64);
 
-  auto MIB = B.buildInstr(Opc, {DstReg}, {SrcReg, MergedInputs});
+  Register BFEDst = DstReg;
+  if (Ty == S16) {
+BFEDst = MRI.createGenericVirtualRegister(S32);
+MRI.setRegBank(BFEDst, AMDGPU::SGPRRegBank);
+  }
+  auto MIB = B.buildInstr(Opc, {BFEDst}, {SrcReg, MergedInputs});
   if (!constrainSelectedInstRegOperands(*MIB, *TII, *TRI, *this))
 llvm_unreachable("failed to constrain BFE");
 
+  if (BFEDst != DstReg)
+B.buildZExtOrTrunc(DstReg, BFEDst);
+
   MI.eraseFromParent();
   return true;
 }
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll
index 07fcb02d98649..d2b600b04f9fc 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/fsh

[llvm-branch-commits] [llvm] [AMDGPU] Precommit si-fold-bitmask.mir (PR #131310)

2025-03-17 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/131310

>From 6db5fe8cc5ff82cc7dc8751ac584870ddbf1b537 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Fri, 14 Mar 2025 10:00:21 +0100
Subject: [PATCH] [AMDGPU] Precommit si-fold-bitmask.mir

---
 llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir | 429 ++
 1 file changed, 429 insertions(+)
 create mode 100644 llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir

diff --git a/llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir 
b/llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir
new file mode 100644
index 0..1edf970591179
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir
@@ -0,0 +1,429 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -run-pass=si-fold-operands 
-verify-machineinstrs -o - %s | FileCheck --check-prefix=GCN %s
+
+# Test supported instructions
+
+---
+name: v_ashr_i32_e64__v_and_b32_e32
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0, $vgpr1
+
+; GCN-LABEL: name: v_ashr_i32_e64__v_and_b32_e32
+; GCN: liveins: $vgpr0, $vgpr1
+; GCN-NEXT: {{  $}}
+; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0
+; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1
+; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit 
$exec
+; GCN-NEXT: %ret:vgpr_32 = V_ASHR_I32_e64 %src, %shiftmask, implicit $exec
+; GCN-NEXT: $vgpr0 = COPY %ret
+%src:vgpr_32 = COPY $vgpr0
+%shift:vgpr_32 = COPY $vgpr1
+%shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit $exec
+%ret:vgpr_32 = V_ASHR_I32_e64 %src, %shiftmask, implicit $exec
+$vgpr0 = COPY %ret
+...
+
+---
+name: v_lshr_b32_e64__v_and_b32_e32
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0, $vgpr1
+
+; GCN-LABEL: name: v_lshr_b32_e64__v_and_b32_e32
+; GCN: liveins: $vgpr0, $vgpr1
+; GCN-NEXT: {{  $}}
+; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0
+; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1
+; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit 
$exec
+; GCN-NEXT: %ret:vgpr_32 = V_LSHR_B32_e64 %src, %shiftmask, implicit $exec
+; GCN-NEXT: $vgpr0 = COPY %ret
+%src:vgpr_32 = COPY $vgpr0
+%shift:vgpr_32 = COPY $vgpr1
+%shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit $exec
+%ret:vgpr_32 = V_LSHR_B32_e64 %src, %shiftmask, implicit $exec
+$vgpr0 = COPY %ret
+...
+
+---
+name: v_lshr_b32_e32__v_and_b32_e32
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0, $vgpr1
+
+; GCN-LABEL: name: v_lshr_b32_e32__v_and_b32_e32
+; GCN: liveins: $vgpr0, $vgpr1
+; GCN-NEXT: {{  $}}
+; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0
+; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1
+; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit 
$exec
+; GCN-NEXT: %ret:vgpr_32 = V_LSHR_B32_e32 %src, %shiftmask, implicit $exec
+; GCN-NEXT: $vgpr0 = COPY %ret
+%src:vgpr_32 = COPY $vgpr0
+%shift:vgpr_32 = COPY $vgpr1
+%shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit $exec
+%ret:vgpr_32 = V_LSHR_B32_e32 %src, %shiftmask, implicit $exec
+$vgpr0 = COPY %ret
+...
+
+---
+name: v_lshl_b32_e64__v_and_b32_e32
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0, $vgpr1
+
+; GCN-LABEL: name: v_lshl_b32_e64__v_and_b32_e32
+; GCN: liveins: $vgpr0, $vgpr1
+; GCN-NEXT: {{  $}}
+; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0
+; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1
+; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit 
$exec
+; GCN-NEXT: %ret:vgpr_32 = V_LSHL_B32_e64 %src, %shiftmask, implicit $exec
+; GCN-NEXT: $vgpr0 = COPY %ret
+%src:vgpr_32 = COPY $vgpr0
+%shift:vgpr_32 = COPY $vgpr1
+%shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit $exec
+%ret:vgpr_32 = V_LSHL_B32_e64 %src, %shiftmask, implicit $exec
+$vgpr0 = COPY %ret
+...
+
+---
+name: s_lshl_b32__s_and_b32
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $sgpr0, $sgpr1
+
+; GCN-LABEL: name: s_lshl_b32__s_and_b32
+; GCN: liveins: $sgpr0, $sgpr1
+; GCN-NEXT: {{  $}}
+; GCN-NEXT: %src:sgpr_32 = COPY $sgpr0
+; GCN-NEXT: %shift:sgpr_32 = COPY $sgpr1
+; GCN-NEXT: %shiftmask:sgpr_32 = S_AND_B32 65535, %shift, implicit-def $scc
+; GCN-NEXT: %ret:sgpr_32 = S_LSHL_B32 %src, %shiftmask, implicit-def $scc
+; GCN-NEXT: $sgpr0 = COPY %ret
+%src:sgpr_32 = COPY $sgpr0
+%shift:sgpr_32 = COPY $sgpr1
+%shiftmask:sgpr_32 = S_AND_B32 65535, %shift, implicit-def $scc
+%ret:sgpr_32 = S_LSHL_B32 %src, %shiftmask, implicit-def $scc
+$sgpr0 = COPY %ret
+...
+
+---
+name: s_lshr_b32__s_and_b32
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $sgpr0, $sgpr1
+
+; GCN-LABEL: name: s_lshr_b32__s_and_b32
+; GCN: liveins: $sgpr0, $sgpr1
+; GCN-NEXT: {{  $}}
+; GCN-NEXT: %src:sgpr_32 = COPY $sgpr0
+; GCN-NEXT: %shift:sgpr_

[llvm-branch-commits] [llvm] [AMDGPU][GlobalISel] Allow forming s16 U/SBFX pre-regbankselect (PR #131309)

2025-03-17 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/131309

>From 090fa3eb8b5ebb595a6ec4b78ec337af71466a73 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Wed, 12 Mar 2025 09:43:15 +0100
Subject: [PATCH] [AMDGPU][GlobalISel] Allow forming s16 U/SBFX
 pre-regbankselect

Make s16 G_U/SBFX legal and widen them in RegBankSelect.
This allows the set of BFX formation combines to work on s16 types.
---
 .../lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp |   9 +-
 .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp  |  33 +-
 llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll   | 645 --
 llvm/test/CodeGen/AMDGPU/GlobalISel/fshr.ll   | 380 ---
 .../AMDGPU/GlobalISel/legalize-sbfx.mir   |  26 +-
 .../AMDGPU/GlobalISel/legalize-ubfx.mir   |  27 +-
 llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll   |  27 +-
 7 files changed, 503 insertions(+), 644 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
index cfb5c3b3006f0..ab900157d2095 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
@@ -2069,10 +2069,13 @@ AMDGPULegalizerInfo::AMDGPULegalizerInfo(const 
GCNSubtarget &ST_,
   .minScalar(0, S32)
   .lower();
 
+  // Only {S32, S32} or {S32, S64} should ever reach codegen.
+  // We allow S/UBFX for S16 so the combiner can form them before
+  // RegBankSelect, and RegBankSelect will then legalize them correctly.
   getActionDefinitionsBuilder({G_SBFX, G_UBFX})
-  .legalFor({{S32, S32}, {S64, S32}})
-  .clampScalar(1, S32, S32)
-  .clampScalar(0, S32, S64)
+  .legalFor({{S16, S16}, {S32, S32}, {S64, S32}})
+  .clampScalar(1, S16, S32)
+  .clampScalar(0, S16, S64)
   .widenScalarToNextPow2(0)
   .scalarize(0);
 
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
index a7df9a0edd21a..844251be24c42 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
@@ -1485,7 +1485,9 @@ bool 
AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B,
   Register DstReg = MI.getOperand(0).getReg();
   LLT Ty = MRI.getType(DstReg);
 
+  const LLT S64 = LLT::scalar(64);
   const LLT S32 = LLT::scalar(32);
+  const LLT S16 = LLT::scalar(16);
 
   unsigned FirstOpnd = isa(MI) ? 2 : 1;
   Register SrcReg = MI.getOperand(FirstOpnd).getReg();
@@ -1495,6 +1497,18 @@ bool 
AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B,
   const RegisterBank *DstBank =
 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
   if (DstBank == &AMDGPU::VGPRRegBank) {
+if (Ty == S16) {
+  ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::VGPRRegBank);
+  B.setInsertPt(B.getMBB(), MI);
+  LegalizerHelper Helper(B.getMF(), ApplyBank, B);
+
+  Helper.widenScalarDst(MI, S32);
+  Helper.widenScalarSrc(MI, S32, 1, AMDGPU::G_ANYEXT);
+  Helper.widenScalarSrc(MI, S32, 2, AMDGPU::G_ZEXT);
+  Helper.widenScalarSrc(MI, S32, 3, AMDGPU::G_ZEXT);
+  return true;
+}
+
 if (Ty == S32)
   return true;
 
@@ -1554,6 +1568,11 @@ bool 
AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B,
 
   ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::SGPRRegBank);
 
+  if (Ty == S16) {
+OffsetReg = B.buildAnyExtOrTrunc(S32, OffsetReg).getReg(0);
+WidthReg = B.buildAnyExtOrTrunc(S32, WidthReg).getReg(0);
+  }
+
   // Ensure the high bits are clear to insert the offset.
   auto OffsetMask = B.buildConstant(S32, maskTrailingOnes(6));
   auto ClampOffset = B.buildAnd(S32, OffsetReg, OffsetMask);
@@ -1568,13 +1587,21 @@ bool 
AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B,
 
   // TODO: It might be worth using a pseudo here to avoid scc clobber and
   // register class constraints.
-  unsigned Opc = Ty == S32 ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32) :
- (Signed ? AMDGPU::S_BFE_I64 : AMDGPU::S_BFE_U64);
+  unsigned Opc = (Ty != S64) ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32)
+ : (Signed ? AMDGPU::S_BFE_I64 : 
AMDGPU::S_BFE_U64);
 
-  auto MIB = B.buildInstr(Opc, {DstReg}, {SrcReg, MergedInputs});
+  Register BFEDst = DstReg;
+  if (Ty == S16) {
+BFEDst = MRI.createGenericVirtualRegister(S32);
+MRI.setRegBank(BFEDst, AMDGPU::SGPRRegBank);
+  }
+  auto MIB = B.buildInstr(Opc, {BFEDst}, {SrcReg, MergedInputs});
   if (!constrainSelectedInstRegOperands(*MIB, *TII, *TRI, *this))
 llvm_unreachable("failed to constrain BFE");
 
+  if (BFEDst != DstReg)
+B.buildZExtOrTrunc(DstReg, BFEDst);
+
   MI.eraseFromParent();
   return true;
 }
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll
index 07fcb02d98649..d2b600b04f9fc 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/fsh

[llvm-branch-commits] [llvm] [AMDGPU][GlobalISel] Combine (sext (trunc (sext_in_reg x))) (PR #131312)

2025-03-17 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/131312

>From 4751d38d86886106c00e9140bf0bb3a3459950cb Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Fri, 14 Mar 2025 10:34:51 +0100
Subject: [PATCH] [AMDGPU][GlobalISel] Combine (sext (trunc (sext_in_reg x)))

This is a bit of an akward pattern that can come up as a result
of legalization and then widening of i16 operations to i32 in RegBankSelect
on AMDGPU.

This quick combine avoids redundant patterns like
```
s_sext_i32_i8 s0, s0
s_sext_i32_i16 s0, s0
s_ashr_i32 s0, s0, s1
```

With this the second sext is removed as it's redundant.
---
 .../include/llvm/Target/GlobalISel/Combine.td | 12 ++-
 .../combine-sext-trunc-sextinreg.mir  | 86 +++
 .../CodeGen/AMDGPU/GlobalISel/llvm.abs.ll | 78 -
 3 files changed, 113 insertions(+), 63 deletions(-)
 create mode 100644 
llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir

diff --git a/llvm/include/llvm/Target/GlobalISel/Combine.td 
b/llvm/include/llvm/Target/GlobalISel/Combine.td
index 3590ab221ad44..9727b86b4be8b 100644
--- a/llvm/include/llvm/Target/GlobalISel/Combine.td
+++ b/llvm/include/llvm/Target/GlobalISel/Combine.td
@@ -258,6 +258,14 @@ def sext_trunc_sextload : GICombineRule<
  [{ return Helper.matchSextTruncSextLoad(*${d}); }]),
   (apply [{ Helper.applySextTruncSextLoad(*${d}); }])>;
 
+def sext_trunc_sextinreg : GICombineRule<
+  (defs root:$dst),
+  (match (G_SEXT_INREG $sir, $src, $width),
+ (G_TRUNC $trunc, $sir),
+ (G_SEXT $dst, $trunc),
+ [{ return (MRI.getType(${trunc}.getReg()).getScalarSizeInBits() >= 
${width}.getImm()); }]),
+  (apply (GIReplaceReg $dst, $sir))>;
+
 def sext_inreg_of_load_matchdata : GIDefMatchData<"std::tuple">;
 def sext_inreg_of_load : GICombineRule<
   (defs root:$root, sext_inreg_of_load_matchdata:$matchinfo),
@@ -1896,7 +1904,9 @@ def cast_of_cast_combines: GICombineGroup<[
   sext_of_anyext,
   anyext_of_anyext,
   anyext_of_zext,
-  anyext_of_sext
+  anyext_of_sext,
+
+  sext_trunc_sextinreg
 ]>;
 
 def cast_combines: GICombineGroup<[
diff --git 
a/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir
new file mode 100644
index 0..d41e5b172efc2
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir
@@ -0,0 +1,86 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1030 
-run-pass=amdgpu-postlegalizer-combiner -verify-machineinstrs %s -o - | 
FileCheck %s
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1030 
-run-pass=amdgpu-regbank-combiner -verify-machineinstrs %s -o - | FileCheck %s
+
+---
+name: trunc_s16_inreg_8
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0
+; CHECK-LABEL: name: trunc_s16_inreg_8
+; CHECK: liveins: $vgpr0
+; CHECK-NEXT: {{  $}}
+; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0
+; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 8
+; CHECK-NEXT: $vgpr0 = COPY %inreg(s32)
+%copy:_(s32) = COPY $vgpr0
+%inreg:_(s32) = G_SEXT_INREG %copy, 8
+%trunc:_(s16) = G_TRUNC %inreg
+%sext:_(s32) = G_SEXT %trunc
+$vgpr0 = COPY %sext
+...
+
+---
+name: trunc_s16_inreg_16
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0
+; CHECK-LABEL: name: trunc_s16_inreg_16
+; CHECK: liveins: $vgpr0
+; CHECK-NEXT: {{  $}}
+; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0
+; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 16
+; CHECK-NEXT: $vgpr0 = COPY %inreg(s32)
+%copy:_(s32) = COPY $vgpr0
+%inreg:_(s32) = G_SEXT_INREG %copy, 16
+%trunc:_(s16) = G_TRUNC %inreg
+%sext:_(s32) = G_SEXT %trunc
+$vgpr0 = COPY %sext
+...
+
+---
+name: trunc_s8_inreg_16
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0
+; CHECK-LABEL: name: trunc_s8_inreg_16
+; CHECK: liveins: $vgpr0
+; CHECK-NEXT: {{  $}}
+; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0
+; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 16
+; CHECK-NEXT: %trunc:_(s8) = G_TRUNC %inreg(s32)
+; CHECK-NEXT: %sext:_(s32) = G_SEXT %trunc(s8)
+; CHECK-NEXT: $vgpr0 = COPY %sext(s32)
+%copy:_(s32) = COPY $vgpr0
+%inreg:_(s32) = G_SEXT_INREG %copy, 16
+%trunc:_(s8) = G_TRUNC %inreg
+%sext:_(s32) = G_SEXT %trunc
+$vgpr0 = COPY %sext
+...
+
+# TODO?: We could handle this by inserting a trunc, but I'm not sure how 
useful that'd be.
+---
+name: mismatching_types
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0
+; CHECK-LABEL: name: mismatching_types
+; CHECK: liveins: $vgpr0
+; CHECK-NEXT: {{  $}}
+; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0
+; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 8
+; CHECK-NEXT: %trunc:_(s8) = G_TRUNC %inreg(s32)
+; CHECK-NEXT: %sext:_(s16

[llvm-branch-commits] [llvm] [AMDGPU][Legalizer] Widen i16 G_SEXT_INREG (PR #131308)

2025-03-17 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/131308

>From be5c76eeb981e94017cc2a504f35079d47d7ce5c Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Wed, 5 Mar 2025 13:41:04 +0100
Subject: [PATCH 1/2] [AMDGPU][Legalizer] Widen i16 G_SEXT_INREG

It's better to widen them to avoid it being lowered into a G_ASHR + G_SHL. With 
this change we just extend to i32 then trunc the result.
---
 .../lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp |   3 +-
 llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll   |   7 +-
 .../AMDGPU/GlobalISel/legalize-abs.mir|   8 +-
 .../AMDGPU/GlobalISel/legalize-ashr.mir   |  20 +--
 .../AMDGPU/GlobalISel/legalize-sext-inreg.mir | 155 +++---
 .../AMDGPU/GlobalISel/legalize-sext.mir   | 101 ++--
 .../AMDGPU/GlobalISel/legalize-smax.mir   |  33 +++-
 .../AMDGPU/GlobalISel/legalize-smin.mir   |  33 +++-
 .../AMDGPU/GlobalISel/legalize-smulh.mir  | 132 +++
 .../CodeGen/AMDGPU/GlobalISel/llvm.abs.ll |  45 ++---
 .../CodeGen/AMDGPU/GlobalISel/sext_inreg.ll   | 130 ++-
 11 files changed, 299 insertions(+), 368 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
index b3a8183beeacf..6e611ebb4b625 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
@@ -2009,7 +2009,8 @@ AMDGPULegalizerInfo::AMDGPULegalizerInfo(const 
GCNSubtarget &ST_,
   // S64 is only legal on SALU, and needs to be broken into 32-bit elements in
   // RegBankSelect.
   auto &SextInReg = getActionDefinitionsBuilder(G_SEXT_INREG)
-.legalFor({{S32}, {S64}});
+.legalFor({{S32}, {S64}})
+.widenScalarIf(typeIs(0, S16), widenScalarOrEltToNextPow2(0, 32));
 
   if (ST.hasVOP3PInsts()) {
 SextInReg.lowerFor({{V2S16}})
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll
index 493e8cef63890..f81d7f1c300b8 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll
@@ -17,8 +17,7 @@ define i8 @v_ashr_i8(i8 %value, i8 %amount) {
 ; GFX8-LABEL: v_ashr_i8:
 ; GFX8:   ; %bb.0:
 ; GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-NEXT:v_lshlrev_b16_e32 v0, 8, v0
-; GFX8-NEXT:v_ashrrev_i16_sdwa v0, v1, sext(v0) dst_sel:DWORD 
dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:BYTE_1
+; GFX8-NEXT:v_ashrrev_i16_sdwa v0, v1, sext(v0) dst_sel:DWORD 
dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:BYTE_0
 ; GFX8-NEXT:s_setpc_b64 s[30:31]
 ;
 ; GFX9-LABEL: v_ashr_i8:
@@ -49,8 +48,8 @@ define i8 @v_ashr_i8_7(i8 %value) {
 ; GFX8-LABEL: v_ashr_i8_7:
 ; GFX8:   ; %bb.0:
 ; GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-NEXT:v_lshlrev_b16_e32 v0, 8, v0
-; GFX8-NEXT:v_ashrrev_i16_e32 v0, 15, v0
+; GFX8-NEXT:v_mov_b32_e32 v1, 7
+; GFX8-NEXT:v_ashrrev_i16_sdwa v0, v1, sext(v0) dst_sel:DWORD 
dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:BYTE_0
 ; GFX8-NEXT:s_setpc_b64 s[30:31]
 ;
 ; GFX9-LABEL: v_ashr_i8_7:
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-abs.mir 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-abs.mir
index a9fe80eb47e76..2b911b2dce697 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-abs.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-abs.mir
@@ -144,11 +144,9 @@ body: |
 ; VI: liveins: $vgpr0
 ; VI-NEXT: {{  $}}
 ; VI-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
-; VI-NEXT: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY]](s32)
-; VI-NEXT: [[C:%[0-9]+]]:_(s16) = G_CONSTANT i16 8
-; VI-NEXT: [[SHL:%[0-9]+]]:_(s16) = G_SHL [[TRUNC]], [[C]](s16)
-; VI-NEXT: [[ASHR:%[0-9]+]]:_(s16) = G_ASHR [[SHL]], [[C]](s16)
-; VI-NEXT: [[ABS:%[0-9]+]]:_(s16) = G_ABS [[ASHR]]
+; VI-NEXT: [[SEXT_INREG:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY]], 8
+; VI-NEXT: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[SEXT_INREG]](s32)
+; VI-NEXT: [[ABS:%[0-9]+]]:_(s16) = G_ABS [[TRUNC]]
 ; VI-NEXT: [[ANYEXT:%[0-9]+]]:_(s32) = G_ANYEXT [[ABS]](s16)
 ; VI-NEXT: $vgpr0 = COPY [[ANYEXT]](s32)
 ;
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ashr.mir 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ashr.mir
index f4aaab745e03b..53905a2f49dd0 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ashr.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ashr.mir
@@ -319,12 +319,10 @@ body: |
 ; VI-NEXT: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY1]](s32)
 ; VI-NEXT: [[C:%[0-9]+]]:_(s16) = G_CONSTANT i16 255
 ; VI-NEXT: [[AND:%[0-9]+]]:_(s16) = G_AND [[TRUNC]], [[C]]
-; VI-NEXT: [[TRUNC1:%[0-9]+]]:_(s16) = G_TRUNC [[COPY]](s32)
-; VI-NEXT: [[C1:%[0-9]+]]:_(s16) = G_CONSTANT i16 8
-; VI-NEXT: [[SHL:%[0-9]+]]:_(s16) = G_SHL [[TRUNC1]], [[C1]](s16)
-; VI-NEXT: [[ASHR:%[0-9]+]]:_(s16) = G_ASHR [[SHL]], [[C1]](s16)
-; VI-NEXT: [[ASHR1:%[0-9]+]]:_(s16) = G_ASHR [[ASHR]], [

[llvm-branch-commits] [llvm] [AMDGPU][SIFoldOperands] Fold some redundant bitmasks (PR #131311)

2025-03-17 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/131311

>From 520757cf40d285b58eb0539840be2bf282c0a0af Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Fri, 14 Mar 2025 10:05:19 +0100
Subject: [PATCH 1/2] [AMDGPU][SIFoldOperands] Fold some redundant bitmasks

Instructions like shifts only read some of the bits of the shift amount 
operand, between 4 and 6 bits.
If the source operand is being masked, we can just ignore the mask.

Effects are minimal right now but this will kick in more once we disable 
uniform i16 operation widening in CGP.
With that disabled, we get more i16 shift amounts
that are zext'd and without this we'd end up with
more `s_and_b32 s1, s1, 0x` in the output.

Ideally ISel should handle this but it's proving difficult to get the patterns 
right, and after a few hours of trying I just decided to go with this as it's 
simple enough and it "just works" for this purpose.
---
 llvm/lib/Target/AMDGPU/SIFoldOperands.cpp |  97 +++-
 llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll   |   8 +-
 llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll   | 201 -
 llvm/test/CodeGen/AMDGPU/GlobalISel/fshr.ll   | 207 --
 llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll   |   8 +-
 llvm/test/CodeGen/AMDGPU/GlobalISel/shl.ll|   6 +-
 llvm/test/CodeGen/AMDGPU/constrained-shift.ll |   1 -
 llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir |  26 +--
 8 files changed, 303 insertions(+), 251 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp 
b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
index 91df516b80857..a279a0a973e75 100644
--- a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
@@ -131,6 +131,7 @@ class SIFoldOperandsImpl {
   std::optional getImmOrMaterializedImm(MachineOperand &Op) const;
   bool tryConstantFoldOp(MachineInstr *MI) const;
   bool tryFoldCndMask(MachineInstr &MI) const;
+  bool tryFoldBitMask(MachineInstr &MI) const;
   bool tryFoldZeroHighBits(MachineInstr &MI) const;
   bool foldInstOperand(MachineInstr &MI, MachineOperand &OpToFold) const;
 
@@ -1447,6 +1448,99 @@ bool SIFoldOperandsImpl::tryFoldCndMask(MachineInstr 
&MI) const {
   return true;
 }
 
+static bool getBitsReadByInst(unsigned Opc, unsigned &NumBitsRead,
+  unsigned &OpIdx) {
+  switch (Opc) {
+  case AMDGPU::V_ASHR_I32_e64:
+  case AMDGPU::V_ASHR_I32_e32:
+  case AMDGPU::V_LSHR_B32_e64:
+  case AMDGPU::V_LSHR_B32_e32:
+  case AMDGPU::V_LSHL_B32_e64:
+  case AMDGPU::V_LSHL_B32_e32:
+  case AMDGPU::S_LSHL_B32:
+  case AMDGPU::S_LSHR_B32:
+  case AMDGPU::S_ASHR_I32:
+NumBitsRead = 5;
+OpIdx = 2;
+return true;
+  case AMDGPU::S_LSHL_B64:
+  case AMDGPU::S_LSHR_B64:
+  case AMDGPU::S_ASHR_I64:
+NumBitsRead = 6;
+OpIdx = 2;
+return true;
+  case AMDGPU::V_LSHLREV_B32_e64:
+  case AMDGPU::V_LSHLREV_B32_e32:
+  case AMDGPU::V_LSHRREV_B32_e64:
+  case AMDGPU::V_LSHRREV_B32_e32:
+  case AMDGPU::V_ASHRREV_I32_e64:
+  case AMDGPU::V_ASHRREV_I32_e32:
+NumBitsRead = 5;
+OpIdx = 1;
+return true;
+  default:
+return false;
+  }
+}
+
+static bool isAndBitMaskRedundant(MachineInstr &MI, unsigned BitsNeeded,
+unsigned &SrcOp) {
+  MachineOperand *RegOp = &MI.getOperand(1);
+  MachineOperand *ImmOp = &MI.getOperand(2);
+
+  if (!RegOp->isReg() || !ImmOp->isImm()) {
+if (ImmOp->isReg() && RegOp->isImm())
+  std::swap(RegOp, ImmOp);
+else
+  return false;
+  }
+
+  SrcOp = RegOp->getOperandNo();
+
+  const unsigned BitMask = maskTrailingOnes(BitsNeeded);
+  return (ImmOp->getImm() & BitMask) == BitMask;
+}
+
+bool SIFoldOperandsImpl::tryFoldBitMask(MachineInstr &MI) const {
+  unsigned NumBitsRead = 0;
+  unsigned OpIdx = 0;
+  if (!getBitsReadByInst(MI.getOpcode(), NumBitsRead, OpIdx))
+return false;
+
+  MachineOperand &Op = MI.getOperand(OpIdx);
+  if (!Op.isReg())
+return false;
+
+  Register OpReg = Op.getReg();
+  if (OpReg.isPhysical())
+return false;
+
+  MachineInstr *OpDef = MRI->getVRegDef(OpReg);
+  if (!OpDef)
+return false ;
+
+  LLVM_DEBUG(dbgs() << "tryFoldBitMask: " << MI << "\tOpIdx:" << OpIdx << ", 
NumBitsRead:" << NumBitsRead << "\n");
+
+  unsigned ReplaceWith;
+  switch (OpDef->getOpcode()) {
+  // TODO: add more opcodes?
+  case AMDGPU::S_AND_B32:
+  case AMDGPU::V_AND_B32_e32:
+  case AMDGPU::V_AND_B32_e64:
+if (!isAndBitMaskRedundant(*OpDef, NumBitsRead, ReplaceWith))
+  return false;
+break;
+  default:
+return false;
+  }
+
+  MachineOperand &ReplaceWithOp = OpDef->getOperand(ReplaceWith);
+  LLVM_DEBUG(dbgs() << "\treplacing operand with:" << ReplaceWithOp << "\n");
+
+  MI.getOperand(OpIdx).setReg(ReplaceWithOp.getReg());
+  return true;
+}
+
 bool SIFoldOperandsImpl::tryFoldZeroHighBits(MachineInstr &MI) const {
   if (MI.getOpcode() != AMDGPU::V_AND_B32_e64 &&
   MI.getOpcode() != AMDGPU::V_AND_B32_e32)
@@ -1458,7 +1552,7 @@ bool SIFoldOperands

[llvm-branch-commits] [llvm] [AMDGPU][RegBankCombiner] Add cast_of_cast and constant_fold_cast combines (PR #131307)

2025-03-17 Thread Pierre van Houtryve via llvm-branch-commits

Pierre-vh wrote:

### Merge activity

* **Mar 17, 4:51 AM EDT**: A user started a stack merge that includes this pull 
request via 
[Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/131307).


https://github.com/llvm/llvm-project/pull/131307
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][RegBankInfo] Promote scalar i16 and/or/xor to i32 (PR #131306)

2025-03-17 Thread Pierre van Houtryve via llvm-branch-commits

Pierre-vh wrote:

### Merge activity

* **Mar 17, 4:51 AM EDT**: A user started a stack merge that includes this pull 
request via 
[Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/131306).


https://github.com/llvm/llvm-project/pull/131306
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][Legalizer] Widen i16 G_SEXT_INREG (PR #131308)

2025-03-17 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/131308

>From be5c76eeb981e94017cc2a504f35079d47d7ce5c Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Wed, 5 Mar 2025 13:41:04 +0100
Subject: [PATCH 1/2] [AMDGPU][Legalizer] Widen i16 G_SEXT_INREG

It's better to widen them to avoid it being lowered into a G_ASHR + G_SHL. With 
this change we just extend to i32 then trunc the result.
---
 .../lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp |   3 +-
 llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll   |   7 +-
 .../AMDGPU/GlobalISel/legalize-abs.mir|   8 +-
 .../AMDGPU/GlobalISel/legalize-ashr.mir   |  20 +--
 .../AMDGPU/GlobalISel/legalize-sext-inreg.mir | 155 +++---
 .../AMDGPU/GlobalISel/legalize-sext.mir   | 101 ++--
 .../AMDGPU/GlobalISel/legalize-smax.mir   |  33 +++-
 .../AMDGPU/GlobalISel/legalize-smin.mir   |  33 +++-
 .../AMDGPU/GlobalISel/legalize-smulh.mir  | 132 +++
 .../CodeGen/AMDGPU/GlobalISel/llvm.abs.ll |  45 ++---
 .../CodeGen/AMDGPU/GlobalISel/sext_inreg.ll   | 130 ++-
 11 files changed, 299 insertions(+), 368 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
index b3a8183beeacf..6e611ebb4b625 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
@@ -2009,7 +2009,8 @@ AMDGPULegalizerInfo::AMDGPULegalizerInfo(const 
GCNSubtarget &ST_,
   // S64 is only legal on SALU, and needs to be broken into 32-bit elements in
   // RegBankSelect.
   auto &SextInReg = getActionDefinitionsBuilder(G_SEXT_INREG)
-.legalFor({{S32}, {S64}});
+.legalFor({{S32}, {S64}})
+.widenScalarIf(typeIs(0, S16), widenScalarOrEltToNextPow2(0, 32));
 
   if (ST.hasVOP3PInsts()) {
 SextInReg.lowerFor({{V2S16}})
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll
index 493e8cef63890..f81d7f1c300b8 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll
@@ -17,8 +17,7 @@ define i8 @v_ashr_i8(i8 %value, i8 %amount) {
 ; GFX8-LABEL: v_ashr_i8:
 ; GFX8:   ; %bb.0:
 ; GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-NEXT:v_lshlrev_b16_e32 v0, 8, v0
-; GFX8-NEXT:v_ashrrev_i16_sdwa v0, v1, sext(v0) dst_sel:DWORD 
dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:BYTE_1
+; GFX8-NEXT:v_ashrrev_i16_sdwa v0, v1, sext(v0) dst_sel:DWORD 
dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:BYTE_0
 ; GFX8-NEXT:s_setpc_b64 s[30:31]
 ;
 ; GFX9-LABEL: v_ashr_i8:
@@ -49,8 +48,8 @@ define i8 @v_ashr_i8_7(i8 %value) {
 ; GFX8-LABEL: v_ashr_i8_7:
 ; GFX8:   ; %bb.0:
 ; GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-NEXT:v_lshlrev_b16_e32 v0, 8, v0
-; GFX8-NEXT:v_ashrrev_i16_e32 v0, 15, v0
+; GFX8-NEXT:v_mov_b32_e32 v1, 7
+; GFX8-NEXT:v_ashrrev_i16_sdwa v0, v1, sext(v0) dst_sel:DWORD 
dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:BYTE_0
 ; GFX8-NEXT:s_setpc_b64 s[30:31]
 ;
 ; GFX9-LABEL: v_ashr_i8_7:
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-abs.mir 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-abs.mir
index a9fe80eb47e76..2b911b2dce697 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-abs.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-abs.mir
@@ -144,11 +144,9 @@ body: |
 ; VI: liveins: $vgpr0
 ; VI-NEXT: {{  $}}
 ; VI-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
-; VI-NEXT: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY]](s32)
-; VI-NEXT: [[C:%[0-9]+]]:_(s16) = G_CONSTANT i16 8
-; VI-NEXT: [[SHL:%[0-9]+]]:_(s16) = G_SHL [[TRUNC]], [[C]](s16)
-; VI-NEXT: [[ASHR:%[0-9]+]]:_(s16) = G_ASHR [[SHL]], [[C]](s16)
-; VI-NEXT: [[ABS:%[0-9]+]]:_(s16) = G_ABS [[ASHR]]
+; VI-NEXT: [[SEXT_INREG:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY]], 8
+; VI-NEXT: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[SEXT_INREG]](s32)
+; VI-NEXT: [[ABS:%[0-9]+]]:_(s16) = G_ABS [[TRUNC]]
 ; VI-NEXT: [[ANYEXT:%[0-9]+]]:_(s32) = G_ANYEXT [[ABS]](s16)
 ; VI-NEXT: $vgpr0 = COPY [[ANYEXT]](s32)
 ;
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ashr.mir 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ashr.mir
index f4aaab745e03b..53905a2f49dd0 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ashr.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ashr.mir
@@ -319,12 +319,10 @@ body: |
 ; VI-NEXT: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY1]](s32)
 ; VI-NEXT: [[C:%[0-9]+]]:_(s16) = G_CONSTANT i16 255
 ; VI-NEXT: [[AND:%[0-9]+]]:_(s16) = G_AND [[TRUNC]], [[C]]
-; VI-NEXT: [[TRUNC1:%[0-9]+]]:_(s16) = G_TRUNC [[COPY]](s32)
-; VI-NEXT: [[C1:%[0-9]+]]:_(s16) = G_CONSTANT i16 8
-; VI-NEXT: [[SHL:%[0-9]+]]:_(s16) = G_SHL [[TRUNC1]], [[C1]](s16)
-; VI-NEXT: [[ASHR:%[0-9]+]]:_(s16) = G_ASHR [[SHL]], [[C1]](s16)
-; VI-NEXT: [[ASHR1:%[0-9]+]]:_(s16) = G_ASHR [[ASHR]], [

[llvm-branch-commits] [llvm] [AMDGPU][RegBankInfo] Promote scalar i16 and/or/xor to i32 (PR #131306)

2025-03-17 Thread Pierre van Houtryve via llvm-branch-commits


@@ -2432,6 +2433,29 @@ void AMDGPURegisterBankInfo::applyMappingImpl(
   return;
 }
 
+// 16-bit operations are VALU only, but can be promoted to 32-bit SALU.
+// Packed 16-bit operations need to be scalarized and promoted.

Pierre-vh wrote:

It was copy pasted from below and I forgot to remove it, it's irrelevant here

https://github.com/llvm/llvm-project/pull/131306
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][GlobalISel] Combine (sext (trunc (sext_in_reg x))) (PR #131312)

2025-03-17 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh closed 
https://github.com/llvm/llvm-project/pull/131312
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][SIFoldOperands] Fold some redundant bitmasks (PR #131311)

2025-03-17 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh closed 
https://github.com/llvm/llvm-project/pull/131311
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][GlobalISel] Allow forming s16 U/SBFX pre-regbankselect (PR #131309)

2025-03-17 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/131309

>From d65db023bfae0c9a5eaeb5bebac39d75723c27d6 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Wed, 12 Mar 2025 09:43:15 +0100
Subject: [PATCH] [AMDGPU][GlobalISel] Allow forming s16 U/SBFX
 pre-regbankselect

Make s16 G_U/SBFX legal and widen them in RegBankSelect.
This allows the set of BFX formation combines to work on s16 types.
---
 .../lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp |   9 +-
 .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp  |  33 +-
 llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll   | 645 --
 llvm/test/CodeGen/AMDGPU/GlobalISel/fshr.ll   | 380 ---
 .../AMDGPU/GlobalISel/legalize-sbfx.mir   |  26 +-
 .../AMDGPU/GlobalISel/legalize-ubfx.mir   |  27 +-
 llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll   |  27 +-
 7 files changed, 503 insertions(+), 644 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
index cfb5c3b3006f0..ab900157d2095 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
@@ -2069,10 +2069,13 @@ AMDGPULegalizerInfo::AMDGPULegalizerInfo(const 
GCNSubtarget &ST_,
   .minScalar(0, S32)
   .lower();
 
+  // Only {S32, S32} or {S32, S64} should ever reach codegen.
+  // We allow S/UBFX for S16 so the combiner can form them before
+  // RegBankSelect, and RegBankSelect will then legalize them correctly.
   getActionDefinitionsBuilder({G_SBFX, G_UBFX})
-  .legalFor({{S32, S32}, {S64, S32}})
-  .clampScalar(1, S32, S32)
-  .clampScalar(0, S32, S64)
+  .legalFor({{S16, S16}, {S32, S32}, {S64, S32}})
+  .clampScalar(1, S16, S32)
+  .clampScalar(0, S16, S64)
   .widenScalarToNextPow2(0)
   .scalarize(0);
 
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
index b46fc7d9c752a..1c9d67826186f 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
@@ -1485,7 +1485,9 @@ bool 
AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B,
   Register DstReg = MI.getOperand(0).getReg();
   LLT Ty = MRI.getType(DstReg);
 
+  const LLT S64 = LLT::scalar(64);
   const LLT S32 = LLT::scalar(32);
+  const LLT S16 = LLT::scalar(16);
 
   unsigned FirstOpnd = isa(MI) ? 2 : 1;
   Register SrcReg = MI.getOperand(FirstOpnd).getReg();
@@ -1495,6 +1497,18 @@ bool 
AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B,
   const RegisterBank *DstBank =
 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
   if (DstBank == &AMDGPU::VGPRRegBank) {
+if (Ty == S16) {
+  ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::VGPRRegBank);
+  B.setInsertPt(B.getMBB(), MI);
+  LegalizerHelper Helper(B.getMF(), ApplyBank, B);
+
+  Helper.widenScalarDst(MI, S32);
+  Helper.widenScalarSrc(MI, S32, 1, AMDGPU::G_ANYEXT);
+  Helper.widenScalarSrc(MI, S32, 2, AMDGPU::G_ZEXT);
+  Helper.widenScalarSrc(MI, S32, 3, AMDGPU::G_ZEXT);
+  return true;
+}
+
 if (Ty == S32)
   return true;
 
@@ -1554,6 +1568,11 @@ bool 
AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B,
 
   ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::SGPRRegBank);
 
+  if (Ty == S16) {
+OffsetReg = B.buildAnyExtOrTrunc(S32, OffsetReg).getReg(0);
+WidthReg = B.buildAnyExtOrTrunc(S32, WidthReg).getReg(0);
+  }
+
   // Ensure the high bits are clear to insert the offset.
   auto OffsetMask = B.buildConstant(S32, maskTrailingOnes(6));
   auto ClampOffset = B.buildAnd(S32, OffsetReg, OffsetMask);
@@ -1568,13 +1587,21 @@ bool 
AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B,
 
   // TODO: It might be worth using a pseudo here to avoid scc clobber and
   // register class constraints.
-  unsigned Opc = Ty == S32 ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32) :
- (Signed ? AMDGPU::S_BFE_I64 : AMDGPU::S_BFE_U64);
+  unsigned Opc = (Ty != S64) ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32)
+ : (Signed ? AMDGPU::S_BFE_I64 : 
AMDGPU::S_BFE_U64);
 
-  auto MIB = B.buildInstr(Opc, {DstReg}, {SrcReg, MergedInputs});
+  Register BFEDst = DstReg;
+  if (Ty == S16) {
+BFEDst = MRI.createGenericVirtualRegister(S32);
+MRI.setRegBank(BFEDst, AMDGPU::SGPRRegBank);
+  }
+  auto MIB = B.buildInstr(Opc, {BFEDst}, {SrcReg, MergedInputs});
   if (!constrainSelectedInstRegOperands(*MIB, *TII, *TRI, *this))
 llvm_unreachable("failed to constrain BFE");
 
+  if (BFEDst != DstReg)
+B.buildZExtOrTrunc(DstReg, BFEDst);
+
   MI.eraseFromParent();
   return true;
 }
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll
index 07fcb02d98649..d2b600b04f9fc 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/fsh

[llvm-branch-commits] [llvm] [AMDGPU][SIFoldOperands] Fold some redundant bitmasks (PR #131311)

2025-03-17 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/131311

>From f3fddad8dca1e8ed327d7cc7cfee7a465032dcc4 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Fri, 14 Mar 2025 10:05:19 +0100
Subject: [PATCH 1/2] [AMDGPU][SIFoldOperands] Fold some redundant bitmasks

Instructions like shifts only read some of the bits of the shift amount 
operand, between 4 and 6 bits.
If the source operand is being masked, we can just ignore the mask.

Effects are minimal right now but this will kick in more once we disable 
uniform i16 operation widening in CGP.
With that disabled, we get more i16 shift amounts
that are zext'd and without this we'd end up with
more `s_and_b32 s1, s1, 0x` in the output.

Ideally ISel should handle this but it's proving difficult to get the patterns 
right, and after a few hours of trying I just decided to go with this as it's 
simple enough and it "just works" for this purpose.
---
 llvm/lib/Target/AMDGPU/SIFoldOperands.cpp |  97 +++-
 llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll   |   8 +-
 llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll   | 201 -
 llvm/test/CodeGen/AMDGPU/GlobalISel/fshr.ll   | 207 --
 llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll   |   8 +-
 llvm/test/CodeGen/AMDGPU/GlobalISel/shl.ll|   6 +-
 llvm/test/CodeGen/AMDGPU/constrained-shift.ll |   1 -
 llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir |  26 +--
 8 files changed, 303 insertions(+), 251 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp 
b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
index cc15dd7cb495c..5f666e10b5cb7 100644
--- a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
@@ -131,6 +131,7 @@ class SIFoldOperandsImpl {
   std::optional getImmOrMaterializedImm(MachineOperand &Op) const;
   bool tryConstantFoldOp(MachineInstr *MI) const;
   bool tryFoldCndMask(MachineInstr &MI) const;
+  bool tryFoldBitMask(MachineInstr &MI) const;
   bool tryFoldZeroHighBits(MachineInstr &MI) const;
   bool foldInstOperand(MachineInstr &MI, MachineOperand &OpToFold) const;
 
@@ -1447,6 +1448,99 @@ bool SIFoldOperandsImpl::tryFoldCndMask(MachineInstr 
&MI) const {
   return true;
 }
 
+static bool getBitsReadByInst(unsigned Opc, unsigned &NumBitsRead,
+  unsigned &OpIdx) {
+  switch (Opc) {
+  case AMDGPU::V_ASHR_I32_e64:
+  case AMDGPU::V_ASHR_I32_e32:
+  case AMDGPU::V_LSHR_B32_e64:
+  case AMDGPU::V_LSHR_B32_e32:
+  case AMDGPU::V_LSHL_B32_e64:
+  case AMDGPU::V_LSHL_B32_e32:
+  case AMDGPU::S_LSHL_B32:
+  case AMDGPU::S_LSHR_B32:
+  case AMDGPU::S_ASHR_I32:
+NumBitsRead = 5;
+OpIdx = 2;
+return true;
+  case AMDGPU::S_LSHL_B64:
+  case AMDGPU::S_LSHR_B64:
+  case AMDGPU::S_ASHR_I64:
+NumBitsRead = 6;
+OpIdx = 2;
+return true;
+  case AMDGPU::V_LSHLREV_B32_e64:
+  case AMDGPU::V_LSHLREV_B32_e32:
+  case AMDGPU::V_LSHRREV_B32_e64:
+  case AMDGPU::V_LSHRREV_B32_e32:
+  case AMDGPU::V_ASHRREV_I32_e64:
+  case AMDGPU::V_ASHRREV_I32_e32:
+NumBitsRead = 5;
+OpIdx = 1;
+return true;
+  default:
+return false;
+  }
+}
+
+static bool isAndBitMaskRedundant(MachineInstr &MI, unsigned BitsNeeded,
+unsigned &SrcOp) {
+  MachineOperand *RegOp = &MI.getOperand(1);
+  MachineOperand *ImmOp = &MI.getOperand(2);
+
+  if (!RegOp->isReg() || !ImmOp->isImm()) {
+if (ImmOp->isReg() && RegOp->isImm())
+  std::swap(RegOp, ImmOp);
+else
+  return false;
+  }
+
+  SrcOp = RegOp->getOperandNo();
+
+  const unsigned BitMask = maskTrailingOnes(BitsNeeded);
+  return (ImmOp->getImm() & BitMask) == BitMask;
+}
+
+bool SIFoldOperandsImpl::tryFoldBitMask(MachineInstr &MI) const {
+  unsigned NumBitsRead = 0;
+  unsigned OpIdx = 0;
+  if (!getBitsReadByInst(MI.getOpcode(), NumBitsRead, OpIdx))
+return false;
+
+  MachineOperand &Op = MI.getOperand(OpIdx);
+  if (!Op.isReg())
+return false;
+
+  Register OpReg = Op.getReg();
+  if (OpReg.isPhysical())
+return false;
+
+  MachineInstr *OpDef = MRI->getVRegDef(OpReg);
+  if (!OpDef)
+return false ;
+
+  LLVM_DEBUG(dbgs() << "tryFoldBitMask: " << MI << "\tOpIdx:" << OpIdx << ", 
NumBitsRead:" << NumBitsRead << "\n");
+
+  unsigned ReplaceWith;
+  switch (OpDef->getOpcode()) {
+  // TODO: add more opcodes?
+  case AMDGPU::S_AND_B32:
+  case AMDGPU::V_AND_B32_e32:
+  case AMDGPU::V_AND_B32_e64:
+if (!isAndBitMaskRedundant(*OpDef, NumBitsRead, ReplaceWith))
+  return false;
+break;
+  default:
+return false;
+  }
+
+  MachineOperand &ReplaceWithOp = OpDef->getOperand(ReplaceWith);
+  LLVM_DEBUG(dbgs() << "\treplacing operand with:" << ReplaceWithOp << "\n");
+
+  MI.getOperand(OpIdx).setReg(ReplaceWithOp.getReg());
+  return true;
+}
+
 bool SIFoldOperandsImpl::tryFoldZeroHighBits(MachineInstr &MI) const {
   if (MI.getOpcode() != AMDGPU::V_AND_B32_e64 &&
   MI.getOpcode() != AMDGPU::V_AND_B32_e32)
@@ -1458,7 +1552,7 @@ bool SIFoldOperands

[llvm-branch-commits] [llvm] [AMDGPU] Precommit si-fold-bitmask.mir (PR #131310)

2025-03-17 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/131310

>From 65d5012c30366cc713b793a30ab5119ddf8a77af Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Fri, 14 Mar 2025 10:00:21 +0100
Subject: [PATCH] [AMDGPU] Precommit si-fold-bitmask.mir

---
 llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir | 429 ++
 1 file changed, 429 insertions(+)
 create mode 100644 llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir

diff --git a/llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir 
b/llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir
new file mode 100644
index 0..1edf970591179
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir
@@ -0,0 +1,429 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -run-pass=si-fold-operands 
-verify-machineinstrs -o - %s | FileCheck --check-prefix=GCN %s
+
+# Test supported instructions
+
+---
+name: v_ashr_i32_e64__v_and_b32_e32
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0, $vgpr1
+
+; GCN-LABEL: name: v_ashr_i32_e64__v_and_b32_e32
+; GCN: liveins: $vgpr0, $vgpr1
+; GCN-NEXT: {{  $}}
+; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0
+; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1
+; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit 
$exec
+; GCN-NEXT: %ret:vgpr_32 = V_ASHR_I32_e64 %src, %shiftmask, implicit $exec
+; GCN-NEXT: $vgpr0 = COPY %ret
+%src:vgpr_32 = COPY $vgpr0
+%shift:vgpr_32 = COPY $vgpr1
+%shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit $exec
+%ret:vgpr_32 = V_ASHR_I32_e64 %src, %shiftmask, implicit $exec
+$vgpr0 = COPY %ret
+...
+
+---
+name: v_lshr_b32_e64__v_and_b32_e32
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0, $vgpr1
+
+; GCN-LABEL: name: v_lshr_b32_e64__v_and_b32_e32
+; GCN: liveins: $vgpr0, $vgpr1
+; GCN-NEXT: {{  $}}
+; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0
+; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1
+; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit 
$exec
+; GCN-NEXT: %ret:vgpr_32 = V_LSHR_B32_e64 %src, %shiftmask, implicit $exec
+; GCN-NEXT: $vgpr0 = COPY %ret
+%src:vgpr_32 = COPY $vgpr0
+%shift:vgpr_32 = COPY $vgpr1
+%shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit $exec
+%ret:vgpr_32 = V_LSHR_B32_e64 %src, %shiftmask, implicit $exec
+$vgpr0 = COPY %ret
+...
+
+---
+name: v_lshr_b32_e32__v_and_b32_e32
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0, $vgpr1
+
+; GCN-LABEL: name: v_lshr_b32_e32__v_and_b32_e32
+; GCN: liveins: $vgpr0, $vgpr1
+; GCN-NEXT: {{  $}}
+; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0
+; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1
+; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit 
$exec
+; GCN-NEXT: %ret:vgpr_32 = V_LSHR_B32_e32 %src, %shiftmask, implicit $exec
+; GCN-NEXT: $vgpr0 = COPY %ret
+%src:vgpr_32 = COPY $vgpr0
+%shift:vgpr_32 = COPY $vgpr1
+%shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit $exec
+%ret:vgpr_32 = V_LSHR_B32_e32 %src, %shiftmask, implicit $exec
+$vgpr0 = COPY %ret
+...
+
+---
+name: v_lshl_b32_e64__v_and_b32_e32
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0, $vgpr1
+
+; GCN-LABEL: name: v_lshl_b32_e64__v_and_b32_e32
+; GCN: liveins: $vgpr0, $vgpr1
+; GCN-NEXT: {{  $}}
+; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0
+; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1
+; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit 
$exec
+; GCN-NEXT: %ret:vgpr_32 = V_LSHL_B32_e64 %src, %shiftmask, implicit $exec
+; GCN-NEXT: $vgpr0 = COPY %ret
+%src:vgpr_32 = COPY $vgpr0
+%shift:vgpr_32 = COPY $vgpr1
+%shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit $exec
+%ret:vgpr_32 = V_LSHL_B32_e64 %src, %shiftmask, implicit $exec
+$vgpr0 = COPY %ret
+...
+
+---
+name: s_lshl_b32__s_and_b32
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $sgpr0, $sgpr1
+
+; GCN-LABEL: name: s_lshl_b32__s_and_b32
+; GCN: liveins: $sgpr0, $sgpr1
+; GCN-NEXT: {{  $}}
+; GCN-NEXT: %src:sgpr_32 = COPY $sgpr0
+; GCN-NEXT: %shift:sgpr_32 = COPY $sgpr1
+; GCN-NEXT: %shiftmask:sgpr_32 = S_AND_B32 65535, %shift, implicit-def $scc
+; GCN-NEXT: %ret:sgpr_32 = S_LSHL_B32 %src, %shiftmask, implicit-def $scc
+; GCN-NEXT: $sgpr0 = COPY %ret
+%src:sgpr_32 = COPY $sgpr0
+%shift:sgpr_32 = COPY $sgpr1
+%shiftmask:sgpr_32 = S_AND_B32 65535, %shift, implicit-def $scc
+%ret:sgpr_32 = S_LSHL_B32 %src, %shiftmask, implicit-def $scc
+$sgpr0 = COPY %ret
+...
+
+---
+name: s_lshr_b32__s_and_b32
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $sgpr0, $sgpr1
+
+; GCN-LABEL: name: s_lshr_b32__s_and_b32
+; GCN: liveins: $sgpr0, $sgpr1
+; GCN-NEXT: {{  $}}
+; GCN-NEXT: %src:sgpr_32 = COPY $sgpr0
+; GCN-NEXT: %shift:sgpr_

[llvm-branch-commits] [llvm] [AMDGPU][GlobalISel] Combine (sext (trunc (sext_in_reg x))) (PR #131312)

2025-03-17 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/131312

>From 782153a9a47d4a0fdb897e811033179fa67c5060 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Fri, 14 Mar 2025 10:34:51 +0100
Subject: [PATCH] [AMDGPU][GlobalISel] Combine (sext (trunc (sext_in_reg x)))

This is a bit of an akward pattern that can come up as a result
of legalization and then widening of i16 operations to i32 in RegBankSelect
on AMDGPU.

This quick combine avoids redundant patterns like
```
s_sext_i32_i8 s0, s0
s_sext_i32_i16 s0, s0
s_ashr_i32 s0, s0, s1
```

With this the second sext is removed as it's redundant.
---
 .../include/llvm/Target/GlobalISel/Combine.td | 12 ++-
 .../combine-sext-trunc-sextinreg.mir  | 86 +++
 .../CodeGen/AMDGPU/GlobalISel/llvm.abs.ll | 78 -
 3 files changed, 113 insertions(+), 63 deletions(-)
 create mode 100644 
llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir

diff --git a/llvm/include/llvm/Target/GlobalISel/Combine.td 
b/llvm/include/llvm/Target/GlobalISel/Combine.td
index 3590ab221ad44..9727b86b4be8b 100644
--- a/llvm/include/llvm/Target/GlobalISel/Combine.td
+++ b/llvm/include/llvm/Target/GlobalISel/Combine.td
@@ -258,6 +258,14 @@ def sext_trunc_sextload : GICombineRule<
  [{ return Helper.matchSextTruncSextLoad(*${d}); }]),
   (apply [{ Helper.applySextTruncSextLoad(*${d}); }])>;
 
+def sext_trunc_sextinreg : GICombineRule<
+  (defs root:$dst),
+  (match (G_SEXT_INREG $sir, $src, $width),
+ (G_TRUNC $trunc, $sir),
+ (G_SEXT $dst, $trunc),
+ [{ return (MRI.getType(${trunc}.getReg()).getScalarSizeInBits() >= 
${width}.getImm()); }]),
+  (apply (GIReplaceReg $dst, $sir))>;
+
 def sext_inreg_of_load_matchdata : GIDefMatchData<"std::tuple">;
 def sext_inreg_of_load : GICombineRule<
   (defs root:$root, sext_inreg_of_load_matchdata:$matchinfo),
@@ -1896,7 +1904,9 @@ def cast_of_cast_combines: GICombineGroup<[
   sext_of_anyext,
   anyext_of_anyext,
   anyext_of_zext,
-  anyext_of_sext
+  anyext_of_sext,
+
+  sext_trunc_sextinreg
 ]>;
 
 def cast_combines: GICombineGroup<[
diff --git 
a/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir
new file mode 100644
index 0..d41e5b172efc2
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir
@@ -0,0 +1,86 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1030 
-run-pass=amdgpu-postlegalizer-combiner -verify-machineinstrs %s -o - | 
FileCheck %s
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1030 
-run-pass=amdgpu-regbank-combiner -verify-machineinstrs %s -o - | FileCheck %s
+
+---
+name: trunc_s16_inreg_8
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0
+; CHECK-LABEL: name: trunc_s16_inreg_8
+; CHECK: liveins: $vgpr0
+; CHECK-NEXT: {{  $}}
+; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0
+; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 8
+; CHECK-NEXT: $vgpr0 = COPY %inreg(s32)
+%copy:_(s32) = COPY $vgpr0
+%inreg:_(s32) = G_SEXT_INREG %copy, 8
+%trunc:_(s16) = G_TRUNC %inreg
+%sext:_(s32) = G_SEXT %trunc
+$vgpr0 = COPY %sext
+...
+
+---
+name: trunc_s16_inreg_16
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0
+; CHECK-LABEL: name: trunc_s16_inreg_16
+; CHECK: liveins: $vgpr0
+; CHECK-NEXT: {{  $}}
+; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0
+; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 16
+; CHECK-NEXT: $vgpr0 = COPY %inreg(s32)
+%copy:_(s32) = COPY $vgpr0
+%inreg:_(s32) = G_SEXT_INREG %copy, 16
+%trunc:_(s16) = G_TRUNC %inreg
+%sext:_(s32) = G_SEXT %trunc
+$vgpr0 = COPY %sext
+...
+
+---
+name: trunc_s8_inreg_16
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0
+; CHECK-LABEL: name: trunc_s8_inreg_16
+; CHECK: liveins: $vgpr0
+; CHECK-NEXT: {{  $}}
+; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0
+; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 16
+; CHECK-NEXT: %trunc:_(s8) = G_TRUNC %inreg(s32)
+; CHECK-NEXT: %sext:_(s32) = G_SEXT %trunc(s8)
+; CHECK-NEXT: $vgpr0 = COPY %sext(s32)
+%copy:_(s32) = COPY $vgpr0
+%inreg:_(s32) = G_SEXT_INREG %copy, 16
+%trunc:_(s8) = G_TRUNC %inreg
+%sext:_(s32) = G_SEXT %trunc
+$vgpr0 = COPY %sext
+...
+
+# TODO?: We could handle this by inserting a trunc, but I'm not sure how 
useful that'd be.
+---
+name: mismatching_types
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0
+; CHECK-LABEL: name: mismatching_types
+; CHECK: liveins: $vgpr0
+; CHECK-NEXT: {{  $}}
+; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0
+; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 8
+; CHECK-NEXT: %trunc:_(s8) = G_TRUNC %inreg(s32)
+; CHECK-NEXT: %sext:_(s16

[llvm-branch-commits] [llvm] [GlobalISel] Combine redundant sext_inreg (PR #131624)

2025-03-18 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/131624

>From 3f3c67934d0c9ea34c11cbd24becc24541baf567 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Mon, 17 Mar 2025 13:54:59 +0100
Subject: [PATCH 1/3] [GlobalISel] Combine redundant sext_inreg

---
 .../llvm/CodeGen/GlobalISel/CombinerHelper.h  |   3 +
 .../include/llvm/Target/GlobalISel/Combine.td |   9 +-
 .../GlobalISel/CombinerHelperCasts.cpp|  27 +++
 .../combine-redundant-sext-inreg.mir  | 164 ++
 .../combine-sext-trunc-sextinreg.mir  |  87 ++
 .../CodeGen/AMDGPU/GlobalISel/llvm.abs.ll |   5 -
 6 files changed, 289 insertions(+), 6 deletions(-)
 create mode 100644 
llvm/test/CodeGen/AMDGPU/GlobalISel/combine-redundant-sext-inreg.mir
 create mode 100644 
llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir

diff --git a/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h 
b/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
index 9b78342c8fc39..5778377d125a8 100644
--- a/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
+++ b/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
@@ -994,6 +994,9 @@ class CombinerHelper {
   // overflow sub
   bool matchSuboCarryOut(const MachineInstr &MI, BuildFnTy &MatchInfo) const;
 
+  // (sext_inreg (sext_inreg x, K0), K1)
+  void applyRedundantSextInReg(MachineInstr &Root, MachineInstr &Other) const;
+
 private:
   /// Checks for legality of an indexed variant of \p LdSt.
   bool isIndexedLoadStoreLegal(GLoadStore &LdSt) const;
diff --git a/llvm/include/llvm/Target/GlobalISel/Combine.td 
b/llvm/include/llvm/Target/GlobalISel/Combine.td
index 660b03080f92e..6a0ff683a4647 100644
--- a/llvm/include/llvm/Target/GlobalISel/Combine.td
+++ b/llvm/include/llvm/Target/GlobalISel/Combine.td
@@ -1849,6 +1849,12 @@ def anyext_of_anyext : ext_of_ext_opcodes;
 def anyext_of_zext : ext_of_ext_opcodes;
 def anyext_of_sext : ext_of_ext_opcodes;
 
+def sext_inreg_of_sext_inreg : GICombineRule<
+   (defs root:$dst),
+   (match (G_SEXT_INREG $x, $src, $a):$other,
+  (G_SEXT_INREG $dst, $x, $b):$root),
+   (apply [{ Helper.applyRedundantSextInReg(*${root}, *${other}); }])>;
+
 // Push cast through build vector.
 class buildvector_of_opcode : GICombineRule <
   (defs root:$root, build_fn_matchinfo:$matchinfo),
@@ -1896,7 +1902,8 @@ def cast_of_cast_combines: GICombineGroup<[
   sext_of_anyext,
   anyext_of_anyext,
   anyext_of_zext,
-  anyext_of_sext
+  anyext_of_sext,
+  sext_inreg_of_sext_inreg,
 ]>;
 
 def cast_combines: GICombineGroup<[
diff --git a/llvm/lib/CodeGen/GlobalISel/CombinerHelperCasts.cpp 
b/llvm/lib/CodeGen/GlobalISel/CombinerHelperCasts.cpp
index 576fd5fd81703..883a62c308232 100644
--- a/llvm/lib/CodeGen/GlobalISel/CombinerHelperCasts.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/CombinerHelperCasts.cpp
@@ -378,3 +378,30 @@ bool CombinerHelper::matchCastOfInteger(const MachineInstr 
&CastMI,
 return false;
   }
 }
+
+void CombinerHelper::applyRedundantSextInReg(MachineInstr &Root,
+ MachineInstr &Other) const {
+  assert(Root.getOpcode() == TargetOpcode::G_SEXT_INREG &&
+ Other.getOpcode() == TargetOpcode::G_SEXT_INREG);
+
+  unsigned RootWidth = Root.getOperand(2).getImm();
+  unsigned OtherWidth = Other.getOperand(2).getImm();
+
+  Register Dst = Root.getOperand(0).getReg();
+  Register OtherDst = Other.getOperand(0).getReg();
+  Register Src = Other.getOperand(1).getReg();
+
+  if (RootWidth >= OtherWidth) {
+// The root sext_inreg is entirely redundant because the other one
+// is narrower.
+Observer.changingAllUsesOfReg(MRI, Dst);
+MRI.replaceRegWith(Dst, OtherDst);
+Observer.finishedChangingAllUsesOfReg();
+  } else {
+// RootWidth < OtherWidth, rewrite this G_SEXT_INREG with the source of the
+// other G_SEXT_INREG.
+Builder.buildSExtInReg(Dst, Src, RootWidth);
+  }
+
+  Root.eraseFromParent();
+}
diff --git 
a/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-redundant-sext-inreg.mir 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-redundant-sext-inreg.mir
new file mode 100644
index 0..566ee8e6c338d
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-redundant-sext-inreg.mir
@@ -0,0 +1,164 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1030 
-run-pass=amdgpu-regbank-combiner -verify-machineinstrs %s -o - | FileCheck %s
+
+---
+name: inreg8_inreg16
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0
+; CHECK-LABEL: name: inreg8_inreg16
+; CHECK: liveins: $vgpr0
+; CHECK-NEXT: {{  $}}
+; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0
+; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 8
+; CHECK-NEXT: $vgpr0 = COPY %inreg(s32)
+%copy:_(s32) = COPY $vgpr0
+%inreg:_(s32) = G_SEXT_INREG %copy, 8
+%inreg1:_(s32) = G_SEXT_INREG %inreg, 16
+$vgpr0 = COPY %inreg1
+...
+
+

[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: add RegBankLegalize rules for extends and trunc (PR #132383)

2025-04-04 Thread Pierre van Houtryve via llvm-branch-commits


@@ -489,22 +489,61 @@ RegBankLegalizeRules::RegBankLegalizeRules(const 
GCNSubtarget &_ST,
   .Uni(B32, {{SgprB32}, {Sgpr32AExtBoolInReg, SgprB32, SgprB32}});
 
   addRulesForGOpcs({G_ANYEXT})
+  .Any({{UniS16, S1}, {{None}, {None}}}) // should be combined away
   .Any({{UniS32, S1}, {{None}, {None}}}) // should be combined away
-  .Any({{UniS32, S16}, {{Sgpr32}, {Sgpr16}}});
+  .Any({{UniS64, S1}, {{None}, {None}}}) // should be combined away
+  .Any({{{DivS16, S1}}, {{Vgpr16}, {Vcc}, VccExtToSel}})
+  .Any({{{DivS32, S1}}, {{Vgpr32}, {Vcc}, VccExtToSel}})
+  .Any({{{DivS64, S1}}, {{Vgpr64}, {Vcc}, VccExtToSel}})
+  .Any({{UniS64, S32}, {{Sgpr64}, {Sgpr32}, Ext32To64}})

Pierre-vh wrote:

unrelated to the patch: These should be better documented, otherwise it's very 
hard to read what's actually happening here. I had to go find 2 different 
struct signatures before getting an idea of what these lines do.

A small comment on top `RegBankLegalizeRules` that explains how many braces are 
needed and how the  arguments are laid out could go a long way.

I also feel like we could eliminate one or even two sets of braces by just 
making them arguments, further helping readability. It could just be an 
overload that's preferred when manually writing the rules, and keep the current 
signature if we're pushing rules using a loop or something?



https://github.com/llvm/llvm-project/pull/132383
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [GlobalISel] Combine redundant sext_inreg (PR #131624)

2025-04-05 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/131624

>From 3f3c67934d0c9ea34c11cbd24becc24541baf567 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Mon, 17 Mar 2025 13:54:59 +0100
Subject: [PATCH 1/2] [GlobalISel] Combine redundant sext_inreg

---
 .../llvm/CodeGen/GlobalISel/CombinerHelper.h  |   3 +
 .../include/llvm/Target/GlobalISel/Combine.td |   9 +-
 .../GlobalISel/CombinerHelperCasts.cpp|  27 +++
 .../combine-redundant-sext-inreg.mir  | 164 ++
 .../combine-sext-trunc-sextinreg.mir  |  87 ++
 .../CodeGen/AMDGPU/GlobalISel/llvm.abs.ll |   5 -
 6 files changed, 289 insertions(+), 6 deletions(-)
 create mode 100644 
llvm/test/CodeGen/AMDGPU/GlobalISel/combine-redundant-sext-inreg.mir
 create mode 100644 
llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir

diff --git a/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h 
b/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
index 9b78342c8fc39..5778377d125a8 100644
--- a/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
+++ b/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
@@ -994,6 +994,9 @@ class CombinerHelper {
   // overflow sub
   bool matchSuboCarryOut(const MachineInstr &MI, BuildFnTy &MatchInfo) const;
 
+  // (sext_inreg (sext_inreg x, K0), K1)
+  void applyRedundantSextInReg(MachineInstr &Root, MachineInstr &Other) const;
+
 private:
   /// Checks for legality of an indexed variant of \p LdSt.
   bool isIndexedLoadStoreLegal(GLoadStore &LdSt) const;
diff --git a/llvm/include/llvm/Target/GlobalISel/Combine.td 
b/llvm/include/llvm/Target/GlobalISel/Combine.td
index 660b03080f92e..6a0ff683a4647 100644
--- a/llvm/include/llvm/Target/GlobalISel/Combine.td
+++ b/llvm/include/llvm/Target/GlobalISel/Combine.td
@@ -1849,6 +1849,12 @@ def anyext_of_anyext : ext_of_ext_opcodes;
 def anyext_of_zext : ext_of_ext_opcodes;
 def anyext_of_sext : ext_of_ext_opcodes;
 
+def sext_inreg_of_sext_inreg : GICombineRule<
+   (defs root:$dst),
+   (match (G_SEXT_INREG $x, $src, $a):$other,
+  (G_SEXT_INREG $dst, $x, $b):$root),
+   (apply [{ Helper.applyRedundantSextInReg(*${root}, *${other}); }])>;
+
 // Push cast through build vector.
 class buildvector_of_opcode : GICombineRule <
   (defs root:$root, build_fn_matchinfo:$matchinfo),
@@ -1896,7 +1902,8 @@ def cast_of_cast_combines: GICombineGroup<[
   sext_of_anyext,
   anyext_of_anyext,
   anyext_of_zext,
-  anyext_of_sext
+  anyext_of_sext,
+  sext_inreg_of_sext_inreg,
 ]>;
 
 def cast_combines: GICombineGroup<[
diff --git a/llvm/lib/CodeGen/GlobalISel/CombinerHelperCasts.cpp 
b/llvm/lib/CodeGen/GlobalISel/CombinerHelperCasts.cpp
index 576fd5fd81703..883a62c308232 100644
--- a/llvm/lib/CodeGen/GlobalISel/CombinerHelperCasts.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/CombinerHelperCasts.cpp
@@ -378,3 +378,30 @@ bool CombinerHelper::matchCastOfInteger(const MachineInstr 
&CastMI,
 return false;
   }
 }
+
+void CombinerHelper::applyRedundantSextInReg(MachineInstr &Root,
+ MachineInstr &Other) const {
+  assert(Root.getOpcode() == TargetOpcode::G_SEXT_INREG &&
+ Other.getOpcode() == TargetOpcode::G_SEXT_INREG);
+
+  unsigned RootWidth = Root.getOperand(2).getImm();
+  unsigned OtherWidth = Other.getOperand(2).getImm();
+
+  Register Dst = Root.getOperand(0).getReg();
+  Register OtherDst = Other.getOperand(0).getReg();
+  Register Src = Other.getOperand(1).getReg();
+
+  if (RootWidth >= OtherWidth) {
+// The root sext_inreg is entirely redundant because the other one
+// is narrower.
+Observer.changingAllUsesOfReg(MRI, Dst);
+MRI.replaceRegWith(Dst, OtherDst);
+Observer.finishedChangingAllUsesOfReg();
+  } else {
+// RootWidth < OtherWidth, rewrite this G_SEXT_INREG with the source of the
+// other G_SEXT_INREG.
+Builder.buildSExtInReg(Dst, Src, RootWidth);
+  }
+
+  Root.eraseFromParent();
+}
diff --git 
a/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-redundant-sext-inreg.mir 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-redundant-sext-inreg.mir
new file mode 100644
index 0..566ee8e6c338d
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-redundant-sext-inreg.mir
@@ -0,0 +1,164 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1030 
-run-pass=amdgpu-regbank-combiner -verify-machineinstrs %s -o - | FileCheck %s
+
+---
+name: inreg8_inreg16
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0
+; CHECK-LABEL: name: inreg8_inreg16
+; CHECK: liveins: $vgpr0
+; CHECK-NEXT: {{  $}}
+; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0
+; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 8
+; CHECK-NEXT: $vgpr0 = COPY %inreg(s32)
+%copy:_(s32) = COPY $vgpr0
+%inreg:_(s32) = G_SEXT_INREG %copy, 8
+%inreg1:_(s32) = G_SEXT_INREG %inreg, 16
+$vgpr0 = COPY %inreg1
+...
+
+

[llvm-branch-commits] [llvm] [GlobalISel] Combine redundant sext_inreg (PR #131624)

2025-04-05 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/131624

>From 3f3c67934d0c9ea34c11cbd24becc24541baf567 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Mon, 17 Mar 2025 13:54:59 +0100
Subject: [PATCH 1/2] [GlobalISel] Combine redundant sext_inreg

---
 .../llvm/CodeGen/GlobalISel/CombinerHelper.h  |   3 +
 .../include/llvm/Target/GlobalISel/Combine.td |   9 +-
 .../GlobalISel/CombinerHelperCasts.cpp|  27 +++
 .../combine-redundant-sext-inreg.mir  | 164 ++
 .../combine-sext-trunc-sextinreg.mir  |  87 ++
 .../CodeGen/AMDGPU/GlobalISel/llvm.abs.ll |   5 -
 6 files changed, 289 insertions(+), 6 deletions(-)
 create mode 100644 
llvm/test/CodeGen/AMDGPU/GlobalISel/combine-redundant-sext-inreg.mir
 create mode 100644 
llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir

diff --git a/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h 
b/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
index 9b78342c8fc39..5778377d125a8 100644
--- a/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
+++ b/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
@@ -994,6 +994,9 @@ class CombinerHelper {
   // overflow sub
   bool matchSuboCarryOut(const MachineInstr &MI, BuildFnTy &MatchInfo) const;
 
+  // (sext_inreg (sext_inreg x, K0), K1)
+  void applyRedundantSextInReg(MachineInstr &Root, MachineInstr &Other) const;
+
 private:
   /// Checks for legality of an indexed variant of \p LdSt.
   bool isIndexedLoadStoreLegal(GLoadStore &LdSt) const;
diff --git a/llvm/include/llvm/Target/GlobalISel/Combine.td 
b/llvm/include/llvm/Target/GlobalISel/Combine.td
index 660b03080f92e..6a0ff683a4647 100644
--- a/llvm/include/llvm/Target/GlobalISel/Combine.td
+++ b/llvm/include/llvm/Target/GlobalISel/Combine.td
@@ -1849,6 +1849,12 @@ def anyext_of_anyext : ext_of_ext_opcodes;
 def anyext_of_zext : ext_of_ext_opcodes;
 def anyext_of_sext : ext_of_ext_opcodes;
 
+def sext_inreg_of_sext_inreg : GICombineRule<
+   (defs root:$dst),
+   (match (G_SEXT_INREG $x, $src, $a):$other,
+  (G_SEXT_INREG $dst, $x, $b):$root),
+   (apply [{ Helper.applyRedundantSextInReg(*${root}, *${other}); }])>;
+
 // Push cast through build vector.
 class buildvector_of_opcode : GICombineRule <
   (defs root:$root, build_fn_matchinfo:$matchinfo),
@@ -1896,7 +1902,8 @@ def cast_of_cast_combines: GICombineGroup<[
   sext_of_anyext,
   anyext_of_anyext,
   anyext_of_zext,
-  anyext_of_sext
+  anyext_of_sext,
+  sext_inreg_of_sext_inreg,
 ]>;
 
 def cast_combines: GICombineGroup<[
diff --git a/llvm/lib/CodeGen/GlobalISel/CombinerHelperCasts.cpp 
b/llvm/lib/CodeGen/GlobalISel/CombinerHelperCasts.cpp
index 576fd5fd81703..883a62c308232 100644
--- a/llvm/lib/CodeGen/GlobalISel/CombinerHelperCasts.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/CombinerHelperCasts.cpp
@@ -378,3 +378,30 @@ bool CombinerHelper::matchCastOfInteger(const MachineInstr 
&CastMI,
 return false;
   }
 }
+
+void CombinerHelper::applyRedundantSextInReg(MachineInstr &Root,
+ MachineInstr &Other) const {
+  assert(Root.getOpcode() == TargetOpcode::G_SEXT_INREG &&
+ Other.getOpcode() == TargetOpcode::G_SEXT_INREG);
+
+  unsigned RootWidth = Root.getOperand(2).getImm();
+  unsigned OtherWidth = Other.getOperand(2).getImm();
+
+  Register Dst = Root.getOperand(0).getReg();
+  Register OtherDst = Other.getOperand(0).getReg();
+  Register Src = Other.getOperand(1).getReg();
+
+  if (RootWidth >= OtherWidth) {
+// The root sext_inreg is entirely redundant because the other one
+// is narrower.
+Observer.changingAllUsesOfReg(MRI, Dst);
+MRI.replaceRegWith(Dst, OtherDst);
+Observer.finishedChangingAllUsesOfReg();
+  } else {
+// RootWidth < OtherWidth, rewrite this G_SEXT_INREG with the source of the
+// other G_SEXT_INREG.
+Builder.buildSExtInReg(Dst, Src, RootWidth);
+  }
+
+  Root.eraseFromParent();
+}
diff --git 
a/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-redundant-sext-inreg.mir 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-redundant-sext-inreg.mir
new file mode 100644
index 0..566ee8e6c338d
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-redundant-sext-inreg.mir
@@ -0,0 +1,164 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1030 
-run-pass=amdgpu-regbank-combiner -verify-machineinstrs %s -o - | FileCheck %s
+
+---
+name: inreg8_inreg16
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0
+; CHECK-LABEL: name: inreg8_inreg16
+; CHECK: liveins: $vgpr0
+; CHECK-NEXT: {{  $}}
+; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0
+; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 8
+; CHECK-NEXT: $vgpr0 = COPY %inreg(s32)
+%copy:_(s32) = COPY $vgpr0
+%inreg:_(s32) = G_SEXT_INREG %copy, 8
+%inreg1:_(s32) = G_SEXT_INREG %inreg, 16
+$vgpr0 = COPY %inreg1
+...
+
+

[llvm-branch-commits] [llvm] [AMDGPU] Precommit si-fold-bitmask.mir (PR #131310)

2025-04-04 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh closed 
https://github.com/llvm/llvm-project/pull/131310
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [GlobalISel] Combine redundant sext_inreg (PR #131624)

2025-03-26 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/131624

>From f4c801437460aef9b9c2e5f49d1e98ec90fadb16 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Mon, 17 Mar 2025 13:54:59 +0100
Subject: [PATCH 1/4] [GlobalISel] Combine redundant sext_inreg

---
 .../llvm/CodeGen/GlobalISel/CombinerHelper.h  |   3 +
 .../include/llvm/Target/GlobalISel/Combine.td |   9 +-
 .../GlobalISel/CombinerHelperCasts.cpp|  27 +++
 .../combine-redundant-sext-inreg.mir  | 164 ++
 .../combine-sext-trunc-sextinreg.mir  |  87 ++
 .../CodeGen/AMDGPU/GlobalISel/llvm.abs.ll |   5 -
 6 files changed, 289 insertions(+), 6 deletions(-)
 create mode 100644 
llvm/test/CodeGen/AMDGPU/GlobalISel/combine-redundant-sext-inreg.mir
 create mode 100644 
llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir

diff --git a/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h 
b/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
index 9b78342c8fc39..5778377d125a8 100644
--- a/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
+++ b/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
@@ -994,6 +994,9 @@ class CombinerHelper {
   // overflow sub
   bool matchSuboCarryOut(const MachineInstr &MI, BuildFnTy &MatchInfo) const;
 
+  // (sext_inreg (sext_inreg x, K0), K1)
+  void applyRedundantSextInReg(MachineInstr &Root, MachineInstr &Other) const;
+
 private:
   /// Checks for legality of an indexed variant of \p LdSt.
   bool isIndexedLoadStoreLegal(GLoadStore &LdSt) const;
diff --git a/llvm/include/llvm/Target/GlobalISel/Combine.td 
b/llvm/include/llvm/Target/GlobalISel/Combine.td
index 660b03080f92e..6a0ff683a4647 100644
--- a/llvm/include/llvm/Target/GlobalISel/Combine.td
+++ b/llvm/include/llvm/Target/GlobalISel/Combine.td
@@ -1849,6 +1849,12 @@ def anyext_of_anyext : ext_of_ext_opcodes;
 def anyext_of_zext : ext_of_ext_opcodes;
 def anyext_of_sext : ext_of_ext_opcodes;
 
+def sext_inreg_of_sext_inreg : GICombineRule<
+   (defs root:$dst),
+   (match (G_SEXT_INREG $x, $src, $a):$other,
+  (G_SEXT_INREG $dst, $x, $b):$root),
+   (apply [{ Helper.applyRedundantSextInReg(*${root}, *${other}); }])>;
+
 // Push cast through build vector.
 class buildvector_of_opcode : GICombineRule <
   (defs root:$root, build_fn_matchinfo:$matchinfo),
@@ -1896,7 +1902,8 @@ def cast_of_cast_combines: GICombineGroup<[
   sext_of_anyext,
   anyext_of_anyext,
   anyext_of_zext,
-  anyext_of_sext
+  anyext_of_sext,
+  sext_inreg_of_sext_inreg,
 ]>;
 
 def cast_combines: GICombineGroup<[
diff --git a/llvm/lib/CodeGen/GlobalISel/CombinerHelperCasts.cpp 
b/llvm/lib/CodeGen/GlobalISel/CombinerHelperCasts.cpp
index 576fd5fd81703..883a62c308232 100644
--- a/llvm/lib/CodeGen/GlobalISel/CombinerHelperCasts.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/CombinerHelperCasts.cpp
@@ -378,3 +378,30 @@ bool CombinerHelper::matchCastOfInteger(const MachineInstr 
&CastMI,
 return false;
   }
 }
+
+void CombinerHelper::applyRedundantSextInReg(MachineInstr &Root,
+ MachineInstr &Other) const {
+  assert(Root.getOpcode() == TargetOpcode::G_SEXT_INREG &&
+ Other.getOpcode() == TargetOpcode::G_SEXT_INREG);
+
+  unsigned RootWidth = Root.getOperand(2).getImm();
+  unsigned OtherWidth = Other.getOperand(2).getImm();
+
+  Register Dst = Root.getOperand(0).getReg();
+  Register OtherDst = Other.getOperand(0).getReg();
+  Register Src = Other.getOperand(1).getReg();
+
+  if (RootWidth >= OtherWidth) {
+// The root sext_inreg is entirely redundant because the other one
+// is narrower.
+Observer.changingAllUsesOfReg(MRI, Dst);
+MRI.replaceRegWith(Dst, OtherDst);
+Observer.finishedChangingAllUsesOfReg();
+  } else {
+// RootWidth < OtherWidth, rewrite this G_SEXT_INREG with the source of the
+// other G_SEXT_INREG.
+Builder.buildSExtInReg(Dst, Src, RootWidth);
+  }
+
+  Root.eraseFromParent();
+}
diff --git 
a/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-redundant-sext-inreg.mir 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-redundant-sext-inreg.mir
new file mode 100644
index 0..566ee8e6c338d
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-redundant-sext-inreg.mir
@@ -0,0 +1,164 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1030 
-run-pass=amdgpu-regbank-combiner -verify-machineinstrs %s -o - | FileCheck %s
+
+---
+name: inreg8_inreg16
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0
+; CHECK-LABEL: name: inreg8_inreg16
+; CHECK: liveins: $vgpr0
+; CHECK-NEXT: {{  $}}
+; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0
+; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 8
+; CHECK-NEXT: $vgpr0 = COPY %inreg(s32)
+%copy:_(s32) = COPY $vgpr0
+%inreg:_(s32) = G_SEXT_INREG %copy, 8
+%inreg1:_(s32) = G_SEXT_INREG %inreg, 16
+$vgpr0 = COPY %inreg1
+...
+
+

[llvm-branch-commits] [llvm] [AMDGPU][GlobalISel] Combine (sext (trunc (sext_in_reg x))) (PR #131312)

2025-03-14 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh created 
https://github.com/llvm/llvm-project/pull/131312

This is a bit of an akward pattern that can come up as a result
of legalization and then widening of i16 operations to i32 in RegBankSelect
on AMDGPU.

This quick combine avoids redundant patterns like
```
s_sext_i32_i8 s0, s0
s_sext_i32_i16 s0, s0
s_ashr_i32 s0, s0, s1
```

With this the second sext is removed as it's redundant.

>From 3289b2373ce2ec850a9bebb597168243d36608a6 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Fri, 14 Mar 2025 10:34:51 +0100
Subject: [PATCH] [AMDGPU][GlobalISel] Combine (sext (trunc (sext_in_reg x)))

This is a bit of an akward pattern that can come up as a result
of legalization and then widening of i16 operations to i32 in RegBankSelect
on AMDGPU.

This quick combine avoids redundant patterns like
```
s_sext_i32_i8 s0, s0
s_sext_i32_i16 s0, s0
s_ashr_i32 s0, s0, s1
```

With this the second sext is removed as it's redundant.
---
 .../include/llvm/Target/GlobalISel/Combine.td | 12 ++-
 .../combine-sext-trunc-sextinreg.mir  | 86 +++
 .../CodeGen/AMDGPU/GlobalISel/llvm.abs.ll | 78 -
 3 files changed, 113 insertions(+), 63 deletions(-)
 create mode 100644 
llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir

diff --git a/llvm/include/llvm/Target/GlobalISel/Combine.td 
b/llvm/include/llvm/Target/GlobalISel/Combine.td
index 3590ab221ad44..9727b86b4be8b 100644
--- a/llvm/include/llvm/Target/GlobalISel/Combine.td
+++ b/llvm/include/llvm/Target/GlobalISel/Combine.td
@@ -258,6 +258,14 @@ def sext_trunc_sextload : GICombineRule<
  [{ return Helper.matchSextTruncSextLoad(*${d}); }]),
   (apply [{ Helper.applySextTruncSextLoad(*${d}); }])>;
 
+def sext_trunc_sextinreg : GICombineRule<
+  (defs root:$dst),
+  (match (G_SEXT_INREG $sir, $src, $width),
+ (G_TRUNC $trunc, $sir),
+ (G_SEXT $dst, $trunc),
+ [{ return (MRI.getType(${trunc}.getReg()).getScalarSizeInBits() >= 
${width}.getImm()); }]),
+  (apply (GIReplaceReg $dst, $sir))>;
+
 def sext_inreg_of_load_matchdata : GIDefMatchData<"std::tuple">;
 def sext_inreg_of_load : GICombineRule<
   (defs root:$root, sext_inreg_of_load_matchdata:$matchinfo),
@@ -1896,7 +1904,9 @@ def cast_of_cast_combines: GICombineGroup<[
   sext_of_anyext,
   anyext_of_anyext,
   anyext_of_zext,
-  anyext_of_sext
+  anyext_of_sext,
+
+  sext_trunc_sextinreg
 ]>;
 
 def cast_combines: GICombineGroup<[
diff --git 
a/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir
new file mode 100644
index 0..d41e5b172efc2
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir
@@ -0,0 +1,86 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1030 
-run-pass=amdgpu-postlegalizer-combiner -verify-machineinstrs %s -o - | 
FileCheck %s
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1030 
-run-pass=amdgpu-regbank-combiner -verify-machineinstrs %s -o - | FileCheck %s
+
+---
+name: trunc_s16_inreg_8
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0
+; CHECK-LABEL: name: trunc_s16_inreg_8
+; CHECK: liveins: $vgpr0
+; CHECK-NEXT: {{  $}}
+; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0
+; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 8
+; CHECK-NEXT: $vgpr0 = COPY %inreg(s32)
+%copy:_(s32) = COPY $vgpr0
+%inreg:_(s32) = G_SEXT_INREG %copy, 8
+%trunc:_(s16) = G_TRUNC %inreg
+%sext:_(s32) = G_SEXT %trunc
+$vgpr0 = COPY %sext
+...
+
+---
+name: trunc_s16_inreg_16
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0
+; CHECK-LABEL: name: trunc_s16_inreg_16
+; CHECK: liveins: $vgpr0
+; CHECK-NEXT: {{  $}}
+; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0
+; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 16
+; CHECK-NEXT: $vgpr0 = COPY %inreg(s32)
+%copy:_(s32) = COPY $vgpr0
+%inreg:_(s32) = G_SEXT_INREG %copy, 16
+%trunc:_(s16) = G_TRUNC %inreg
+%sext:_(s32) = G_SEXT %trunc
+$vgpr0 = COPY %sext
+...
+
+---
+name: trunc_s8_inreg_16
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0
+; CHECK-LABEL: name: trunc_s8_inreg_16
+; CHECK: liveins: $vgpr0
+; CHECK-NEXT: {{  $}}
+; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0
+; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 16
+; CHECK-NEXT: %trunc:_(s8) = G_TRUNC %inreg(s32)
+; CHECK-NEXT: %sext:_(s32) = G_SEXT %trunc(s8)
+; CHECK-NEXT: $vgpr0 = COPY %sext(s32)
+%copy:_(s32) = COPY $vgpr0
+%inreg:_(s32) = G_SEXT_INREG %copy, 16
+%trunc:_(s8) = G_TRUNC %inreg
+%sext:_(s32) = G_SEXT %trunc
+$vgpr0 = COPY %sext
+...
+
+# TODO?: We could handle this by inserting a trunc, but I'm not sure how 
useful that'd be.
+---
+name: mismatching_types
+tracksRegLiveness: true
+body:

[llvm-branch-commits] [llvm] [AMDGPU][Legalizer] Widen i16 G_SEXT_INREG (PR #131308)

2025-03-14 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh created 
https://github.com/llvm/llvm-project/pull/131308

It's better to widen them to avoid it being lowered into a G_ASHR + G_SHL. With 
this change we just extend to i32 then trunc the result.

>From 815595b1ca20b613b5b4b08cafedda93e397cf92 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Wed, 5 Mar 2025 13:41:04 +0100
Subject: [PATCH] [AMDGPU][Legalizer] Widen i16 G_SEXT_INREG

It's better to widen them to avoid it being lowered into a G_ASHR + G_SHL. With 
this change we just extend to i32 then trunc the result.
---
 .../lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp |   3 +-
 llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll   |   7 +-
 .../AMDGPU/GlobalISel/legalize-abs.mir|   8 +-
 .../AMDGPU/GlobalISel/legalize-ashr.mir   |  20 +--
 .../AMDGPU/GlobalISel/legalize-sext-inreg.mir | 155 +++---
 .../AMDGPU/GlobalISel/legalize-sext.mir   | 101 ++--
 .../AMDGPU/GlobalISel/legalize-smax.mir   |  33 +++-
 .../AMDGPU/GlobalISel/legalize-smin.mir   |  33 +++-
 .../AMDGPU/GlobalISel/legalize-smulh.mir  | 132 +++
 .../CodeGen/AMDGPU/GlobalISel/llvm.abs.ll |  45 ++---
 .../CodeGen/AMDGPU/GlobalISel/sext_inreg.ll   | 130 ++-
 11 files changed, 299 insertions(+), 368 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
index b3a8183beeacf..6e611ebb4b625 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
@@ -2009,7 +2009,8 @@ AMDGPULegalizerInfo::AMDGPULegalizerInfo(const 
GCNSubtarget &ST_,
   // S64 is only legal on SALU, and needs to be broken into 32-bit elements in
   // RegBankSelect.
   auto &SextInReg = getActionDefinitionsBuilder(G_SEXT_INREG)
-.legalFor({{S32}, {S64}});
+.legalFor({{S32}, {S64}})
+.widenScalarIf(typeIs(0, S16), widenScalarOrEltToNextPow2(0, 32));
 
   if (ST.hasVOP3PInsts()) {
 SextInReg.lowerFor({{V2S16}})
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll
index 493e8cef63890..f81d7f1c300b8 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll
@@ -17,8 +17,7 @@ define i8 @v_ashr_i8(i8 %value, i8 %amount) {
 ; GFX8-LABEL: v_ashr_i8:
 ; GFX8:   ; %bb.0:
 ; GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-NEXT:v_lshlrev_b16_e32 v0, 8, v0
-; GFX8-NEXT:v_ashrrev_i16_sdwa v0, v1, sext(v0) dst_sel:DWORD 
dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:BYTE_1
+; GFX8-NEXT:v_ashrrev_i16_sdwa v0, v1, sext(v0) dst_sel:DWORD 
dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:BYTE_0
 ; GFX8-NEXT:s_setpc_b64 s[30:31]
 ;
 ; GFX9-LABEL: v_ashr_i8:
@@ -49,8 +48,8 @@ define i8 @v_ashr_i8_7(i8 %value) {
 ; GFX8-LABEL: v_ashr_i8_7:
 ; GFX8:   ; %bb.0:
 ; GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-NEXT:v_lshlrev_b16_e32 v0, 8, v0
-; GFX8-NEXT:v_ashrrev_i16_e32 v0, 15, v0
+; GFX8-NEXT:v_mov_b32_e32 v1, 7
+; GFX8-NEXT:v_ashrrev_i16_sdwa v0, v1, sext(v0) dst_sel:DWORD 
dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:BYTE_0
 ; GFX8-NEXT:s_setpc_b64 s[30:31]
 ;
 ; GFX9-LABEL: v_ashr_i8_7:
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-abs.mir 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-abs.mir
index a9fe80eb47e76..2b911b2dce697 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-abs.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-abs.mir
@@ -144,11 +144,9 @@ body: |
 ; VI: liveins: $vgpr0
 ; VI-NEXT: {{  $}}
 ; VI-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
-; VI-NEXT: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY]](s32)
-; VI-NEXT: [[C:%[0-9]+]]:_(s16) = G_CONSTANT i16 8
-; VI-NEXT: [[SHL:%[0-9]+]]:_(s16) = G_SHL [[TRUNC]], [[C]](s16)
-; VI-NEXT: [[ASHR:%[0-9]+]]:_(s16) = G_ASHR [[SHL]], [[C]](s16)
-; VI-NEXT: [[ABS:%[0-9]+]]:_(s16) = G_ABS [[ASHR]]
+; VI-NEXT: [[SEXT_INREG:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY]], 8
+; VI-NEXT: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[SEXT_INREG]](s32)
+; VI-NEXT: [[ABS:%[0-9]+]]:_(s16) = G_ABS [[TRUNC]]
 ; VI-NEXT: [[ANYEXT:%[0-9]+]]:_(s32) = G_ANYEXT [[ABS]](s16)
 ; VI-NEXT: $vgpr0 = COPY [[ANYEXT]](s32)
 ;
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ashr.mir 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ashr.mir
index f4aaab745e03b..53905a2f49dd0 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ashr.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ashr.mir
@@ -319,12 +319,10 @@ body: |
 ; VI-NEXT: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY1]](s32)
 ; VI-NEXT: [[C:%[0-9]+]]:_(s16) = G_CONSTANT i16 255
 ; VI-NEXT: [[AND:%[0-9]+]]:_(s16) = G_AND [[TRUNC]], [[C]]
-; VI-NEXT: [[TRUNC1:%[0-9]+]]:_(s16) = G_TRUNC [[COPY]](s32)
-; VI-NEXT: [[C1:%[0-9]+]]:_(s16) = G_CONSTANT i16 8
-; VI-NEXT: [[SHL:%[0-9]+]]:_(s16) = G_SHL [[TRUNC1]], [[C1]](s

[llvm-branch-commits] [llvm] [AMDGPU] Precommit si-fold-bitmask.mir (PR #131310)

2025-03-14 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh created 
https://github.com/llvm/llvm-project/pull/131310

None

>From b87a9db3b8ab29db3f1bb668a4d3bf312add817b Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Fri, 14 Mar 2025 10:00:21 +0100
Subject: [PATCH] [AMDGPU] Precommit si-fold-bitmask.mir

---
 llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir | 429 ++
 1 file changed, 429 insertions(+)
 create mode 100644 llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir

diff --git a/llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir 
b/llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir
new file mode 100644
index 0..1edf970591179
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir
@@ -0,0 +1,429 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -run-pass=si-fold-operands 
-verify-machineinstrs -o - %s | FileCheck --check-prefix=GCN %s
+
+# Test supported instructions
+
+---
+name: v_ashr_i32_e64__v_and_b32_e32
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0, $vgpr1
+
+; GCN-LABEL: name: v_ashr_i32_e64__v_and_b32_e32
+; GCN: liveins: $vgpr0, $vgpr1
+; GCN-NEXT: {{  $}}
+; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0
+; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1
+; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit 
$exec
+; GCN-NEXT: %ret:vgpr_32 = V_ASHR_I32_e64 %src, %shiftmask, implicit $exec
+; GCN-NEXT: $vgpr0 = COPY %ret
+%src:vgpr_32 = COPY $vgpr0
+%shift:vgpr_32 = COPY $vgpr1
+%shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit $exec
+%ret:vgpr_32 = V_ASHR_I32_e64 %src, %shiftmask, implicit $exec
+$vgpr0 = COPY %ret
+...
+
+---
+name: v_lshr_b32_e64__v_and_b32_e32
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0, $vgpr1
+
+; GCN-LABEL: name: v_lshr_b32_e64__v_and_b32_e32
+; GCN: liveins: $vgpr0, $vgpr1
+; GCN-NEXT: {{  $}}
+; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0
+; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1
+; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit 
$exec
+; GCN-NEXT: %ret:vgpr_32 = V_LSHR_B32_e64 %src, %shiftmask, implicit $exec
+; GCN-NEXT: $vgpr0 = COPY %ret
+%src:vgpr_32 = COPY $vgpr0
+%shift:vgpr_32 = COPY $vgpr1
+%shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit $exec
+%ret:vgpr_32 = V_LSHR_B32_e64 %src, %shiftmask, implicit $exec
+$vgpr0 = COPY %ret
+...
+
+---
+name: v_lshr_b32_e32__v_and_b32_e32
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0, $vgpr1
+
+; GCN-LABEL: name: v_lshr_b32_e32__v_and_b32_e32
+; GCN: liveins: $vgpr0, $vgpr1
+; GCN-NEXT: {{  $}}
+; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0
+; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1
+; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit 
$exec
+; GCN-NEXT: %ret:vgpr_32 = V_LSHR_B32_e32 %src, %shiftmask, implicit $exec
+; GCN-NEXT: $vgpr0 = COPY %ret
+%src:vgpr_32 = COPY $vgpr0
+%shift:vgpr_32 = COPY $vgpr1
+%shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit $exec
+%ret:vgpr_32 = V_LSHR_B32_e32 %src, %shiftmask, implicit $exec
+$vgpr0 = COPY %ret
+...
+
+---
+name: v_lshl_b32_e64__v_and_b32_e32
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0, $vgpr1
+
+; GCN-LABEL: name: v_lshl_b32_e64__v_and_b32_e32
+; GCN: liveins: $vgpr0, $vgpr1
+; GCN-NEXT: {{  $}}
+; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0
+; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1
+; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit 
$exec
+; GCN-NEXT: %ret:vgpr_32 = V_LSHL_B32_e64 %src, %shiftmask, implicit $exec
+; GCN-NEXT: $vgpr0 = COPY %ret
+%src:vgpr_32 = COPY $vgpr0
+%shift:vgpr_32 = COPY $vgpr1
+%shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit $exec
+%ret:vgpr_32 = V_LSHL_B32_e64 %src, %shiftmask, implicit $exec
+$vgpr0 = COPY %ret
+...
+
+---
+name: s_lshl_b32__s_and_b32
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $sgpr0, $sgpr1
+
+; GCN-LABEL: name: s_lshl_b32__s_and_b32
+; GCN: liveins: $sgpr0, $sgpr1
+; GCN-NEXT: {{  $}}
+; GCN-NEXT: %src:sgpr_32 = COPY $sgpr0
+; GCN-NEXT: %shift:sgpr_32 = COPY $sgpr1
+; GCN-NEXT: %shiftmask:sgpr_32 = S_AND_B32 65535, %shift, implicit-def $scc
+; GCN-NEXT: %ret:sgpr_32 = S_LSHL_B32 %src, %shiftmask, implicit-def $scc
+; GCN-NEXT: $sgpr0 = COPY %ret
+%src:sgpr_32 = COPY $sgpr0
+%shift:sgpr_32 = COPY $sgpr1
+%shiftmask:sgpr_32 = S_AND_B32 65535, %shift, implicit-def $scc
+%ret:sgpr_32 = S_LSHL_B32 %src, %shiftmask, implicit-def $scc
+$sgpr0 = COPY %ret
+...
+
+---
+name: s_lshr_b32__s_and_b32
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $sgpr0, $sgpr1
+
+; GCN-LABEL: name: s_lshr_b32__s_and_b32
+; GCN: liveins: $sgpr0, $sgpr1
+; GCN-NEXT: {{  $}}
+; GCN-NEXT: %src:sgpr_32 = COPY $sgpr0
+; GCN-NEXT: %shift

[llvm-branch-commits] [llvm] [AMDGPU][RegBankCombiner] Add cast_of_cast and constant_fold_cast combines (PR #131307)

2025-03-14 Thread Pierre van Houtryve via llvm-branch-commits

Pierre-vh wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/131307?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#131312** https://app.graphite.dev/github/pr/llvm/llvm-project/131312?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131311** https://app.graphite.dev/github/pr/llvm/llvm-project/131311?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131310** https://app.graphite.dev/github/pr/llvm/llvm-project/131310?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131309** https://app.graphite.dev/github/pr/llvm/llvm-project/131309?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131308** https://app.graphite.dev/github/pr/llvm/llvm-project/131308?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131307** https://app.graphite.dev/github/pr/llvm/llvm-project/131307?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/131307?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#131306** https://app.graphite.dev/github/pr/llvm/llvm-project/131306?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131305** https://app.graphite.dev/github/pr/llvm/llvm-project/131305?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/131307
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][SIFoldOperands] Fold some redundant bitmasks (PR #131311)

2025-03-14 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh created 
https://github.com/llvm/llvm-project/pull/131311

Instructions like shifts only read some of the bits of the shift amount 
operand, between 4 and 6 bits.
If the source operand is being masked, we can just ignore the mask.

Effects are minimal right now but this will kick in more once we disable 
uniform i16 operation widening in CGP.
With that disabled, we get more i16 shift amounts
that are zext'd and without this we'd end up with
more `s_and_b32 s1, s1, 0x` in the output.

Ideally ISel should handle this but it's proving difficult to get the patterns 
right, and after a few hours of trying I just decided to go with this as it's 
simple enough and it "just works" for this purpose.

>From f46e24f0f5f98e5deb7bd13d737ed8c674da75e1 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Fri, 14 Mar 2025 10:05:19 +0100
Subject: [PATCH] [AMDGPU][SIFoldOperands] Fold some redundant bitmasks

Instructions like shifts only read some of the bits of the shift amount 
operand, between 4 and 6 bits.
If the source operand is being masked, we can just ignore the mask.

Effects are minimal right now but this will kick in more once we disable 
uniform i16 operation widening in CGP.
With that disabled, we get more i16 shift amounts
that are zext'd and without this we'd end up with
more `s_and_b32 s1, s1, 0x` in the output.

Ideally ISel should handle this but it's proving difficult to get the patterns 
right, and after a few hours of trying I just decided to go with this as it's 
simple enough and it "just works" for this purpose.
---
 llvm/lib/Target/AMDGPU/SIFoldOperands.cpp |  97 +++-
 llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll   |   8 +-
 llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll   | 201 -
 llvm/test/CodeGen/AMDGPU/GlobalISel/fshr.ll   | 207 --
 llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll   |   8 +-
 llvm/test/CodeGen/AMDGPU/GlobalISel/shl.ll|   6 +-
 llvm/test/CodeGen/AMDGPU/constrained-shift.ll |   1 -
 llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir |  26 +--
 8 files changed, 303 insertions(+), 251 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp 
b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
index 91df516b80857..a279a0a973e75 100644
--- a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
@@ -131,6 +131,7 @@ class SIFoldOperandsImpl {
   std::optional getImmOrMaterializedImm(MachineOperand &Op) const;
   bool tryConstantFoldOp(MachineInstr *MI) const;
   bool tryFoldCndMask(MachineInstr &MI) const;
+  bool tryFoldBitMask(MachineInstr &MI) const;
   bool tryFoldZeroHighBits(MachineInstr &MI) const;
   bool foldInstOperand(MachineInstr &MI, MachineOperand &OpToFold) const;
 
@@ -1447,6 +1448,99 @@ bool SIFoldOperandsImpl::tryFoldCndMask(MachineInstr 
&MI) const {
   return true;
 }
 
+static bool getBitsReadByInst(unsigned Opc, unsigned &NumBitsRead,
+  unsigned &OpIdx) {
+  switch (Opc) {
+  case AMDGPU::V_ASHR_I32_e64:
+  case AMDGPU::V_ASHR_I32_e32:
+  case AMDGPU::V_LSHR_B32_e64:
+  case AMDGPU::V_LSHR_B32_e32:
+  case AMDGPU::V_LSHL_B32_e64:
+  case AMDGPU::V_LSHL_B32_e32:
+  case AMDGPU::S_LSHL_B32:
+  case AMDGPU::S_LSHR_B32:
+  case AMDGPU::S_ASHR_I32:
+NumBitsRead = 5;
+OpIdx = 2;
+return true;
+  case AMDGPU::S_LSHL_B64:
+  case AMDGPU::S_LSHR_B64:
+  case AMDGPU::S_ASHR_I64:
+NumBitsRead = 6;
+OpIdx = 2;
+return true;
+  case AMDGPU::V_LSHLREV_B32_e64:
+  case AMDGPU::V_LSHLREV_B32_e32:
+  case AMDGPU::V_LSHRREV_B32_e64:
+  case AMDGPU::V_LSHRREV_B32_e32:
+  case AMDGPU::V_ASHRREV_I32_e64:
+  case AMDGPU::V_ASHRREV_I32_e32:
+NumBitsRead = 5;
+OpIdx = 1;
+return true;
+  default:
+return false;
+  }
+}
+
+static bool isAndBitMaskRedundant(MachineInstr &MI, unsigned BitsNeeded,
+unsigned &SrcOp) {
+  MachineOperand *RegOp = &MI.getOperand(1);
+  MachineOperand *ImmOp = &MI.getOperand(2);
+
+  if (!RegOp->isReg() || !ImmOp->isImm()) {
+if (ImmOp->isReg() && RegOp->isImm())
+  std::swap(RegOp, ImmOp);
+else
+  return false;
+  }
+
+  SrcOp = RegOp->getOperandNo();
+
+  const unsigned BitMask = maskTrailingOnes(BitsNeeded);
+  return (ImmOp->getImm() & BitMask) == BitMask;
+}
+
+bool SIFoldOperandsImpl::tryFoldBitMask(MachineInstr &MI) const {
+  unsigned NumBitsRead = 0;
+  unsigned OpIdx = 0;
+  if (!getBitsReadByInst(MI.getOpcode(), NumBitsRead, OpIdx))
+return false;
+
+  MachineOperand &Op = MI.getOperand(OpIdx);
+  if (!Op.isReg())
+return false;
+
+  Register OpReg = Op.getReg();
+  if (OpReg.isPhysical())
+return false;
+
+  MachineInstr *OpDef = MRI->getVRegDef(OpReg);
+  if (!OpDef)
+return false ;
+
+  LLVM_DEBUG(dbgs() << "tryFoldBitMask: " << MI << "\tOpIdx:" << OpIdx << ", 
NumBitsRead:" << NumBitsRead << "\n");
+
+  unsigned ReplaceWith;
+  switch (OpDef->getOpcode()) {
+  // TODO: add more opcodes?
+  case AMDGPU::S_AND

[llvm-branch-commits] [llvm] [AMDGPU][RegBankInfo] Promote scalar i16 and/or/xor to i32 (PR #131306)

2025-03-14 Thread Pierre van Houtryve via llvm-branch-commits

Pierre-vh wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/131306?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#131312** https://app.graphite.dev/github/pr/llvm/llvm-project/131312?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131311** https://app.graphite.dev/github/pr/llvm/llvm-project/131311?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131310** https://app.graphite.dev/github/pr/llvm/llvm-project/131310?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131309** https://app.graphite.dev/github/pr/llvm/llvm-project/131309?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131308** https://app.graphite.dev/github/pr/llvm/llvm-project/131308?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131307** https://app.graphite.dev/github/pr/llvm/llvm-project/131307?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131306** https://app.graphite.dev/github/pr/llvm/llvm-project/131306?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/131306?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#131305** https://app.graphite.dev/github/pr/llvm/llvm-project/131305?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/131306
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Precommit si-fold-bitmask.mir (PR #131310)

2025-03-14 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh ready_for_review 
https://github.com/llvm/llvm-project/pull/131310
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][RegBankInfo] Promote scalar i16 and/or/xor to i32 (PR #131306)

2025-03-14 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh ready_for_review 
https://github.com/llvm/llvm-project/pull/131306
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][RegBankCombiner] Add cast_of_cast and constant_fold_cast combines (PR #131307)

2025-03-14 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh ready_for_review 
https://github.com/llvm/llvm-project/pull/131307
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][GlobalISel] Allow forming s16 U/SBFX pre-regbankselect (PR #131309)

2025-03-14 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh ready_for_review 
https://github.com/llvm/llvm-project/pull/131309
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][GlobalISel] Combine (sext (trunc (sext_in_reg x))) (PR #131312)

2025-03-14 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/131312

>From b9bf3f2f53fcf7cbd133e57d4c7f64a8f06763b2 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Fri, 14 Mar 2025 10:34:51 +0100
Subject: [PATCH] [AMDGPU][GlobalISel] Combine (sext (trunc (sext_in_reg x)))

This is a bit of an akward pattern that can come up as a result
of legalization and then widening of i16 operations to i32 in RegBankSelect
on AMDGPU.

This quick combine avoids redundant patterns like
```
s_sext_i32_i8 s0, s0
s_sext_i32_i16 s0, s0
s_ashr_i32 s0, s0, s1
```

With this the second sext is removed as it's redundant.
---
 .../include/llvm/Target/GlobalISel/Combine.td | 12 ++-
 .../combine-sext-trunc-sextinreg.mir  | 86 +++
 .../CodeGen/AMDGPU/GlobalISel/llvm.abs.ll | 78 -
 3 files changed, 113 insertions(+), 63 deletions(-)
 create mode 100644 
llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir

diff --git a/llvm/include/llvm/Target/GlobalISel/Combine.td 
b/llvm/include/llvm/Target/GlobalISel/Combine.td
index 3590ab221ad44..9727b86b4be8b 100644
--- a/llvm/include/llvm/Target/GlobalISel/Combine.td
+++ b/llvm/include/llvm/Target/GlobalISel/Combine.td
@@ -258,6 +258,14 @@ def sext_trunc_sextload : GICombineRule<
  [{ return Helper.matchSextTruncSextLoad(*${d}); }]),
   (apply [{ Helper.applySextTruncSextLoad(*${d}); }])>;
 
+def sext_trunc_sextinreg : GICombineRule<
+  (defs root:$dst),
+  (match (G_SEXT_INREG $sir, $src, $width),
+ (G_TRUNC $trunc, $sir),
+ (G_SEXT $dst, $trunc),
+ [{ return (MRI.getType(${trunc}.getReg()).getScalarSizeInBits() >= 
${width}.getImm()); }]),
+  (apply (GIReplaceReg $dst, $sir))>;
+
 def sext_inreg_of_load_matchdata : GIDefMatchData<"std::tuple">;
 def sext_inreg_of_load : GICombineRule<
   (defs root:$root, sext_inreg_of_load_matchdata:$matchinfo),
@@ -1896,7 +1904,9 @@ def cast_of_cast_combines: GICombineGroup<[
   sext_of_anyext,
   anyext_of_anyext,
   anyext_of_zext,
-  anyext_of_sext
+  anyext_of_sext,
+
+  sext_trunc_sextinreg
 ]>;
 
 def cast_combines: GICombineGroup<[
diff --git 
a/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir
new file mode 100644
index 0..d41e5b172efc2
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir
@@ -0,0 +1,86 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1030 
-run-pass=amdgpu-postlegalizer-combiner -verify-machineinstrs %s -o - | 
FileCheck %s
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1030 
-run-pass=amdgpu-regbank-combiner -verify-machineinstrs %s -o - | FileCheck %s
+
+---
+name: trunc_s16_inreg_8
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0
+; CHECK-LABEL: name: trunc_s16_inreg_8
+; CHECK: liveins: $vgpr0
+; CHECK-NEXT: {{  $}}
+; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0
+; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 8
+; CHECK-NEXT: $vgpr0 = COPY %inreg(s32)
+%copy:_(s32) = COPY $vgpr0
+%inreg:_(s32) = G_SEXT_INREG %copy, 8
+%trunc:_(s16) = G_TRUNC %inreg
+%sext:_(s32) = G_SEXT %trunc
+$vgpr0 = COPY %sext
+...
+
+---
+name: trunc_s16_inreg_16
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0
+; CHECK-LABEL: name: trunc_s16_inreg_16
+; CHECK: liveins: $vgpr0
+; CHECK-NEXT: {{  $}}
+; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0
+; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 16
+; CHECK-NEXT: $vgpr0 = COPY %inreg(s32)
+%copy:_(s32) = COPY $vgpr0
+%inreg:_(s32) = G_SEXT_INREG %copy, 16
+%trunc:_(s16) = G_TRUNC %inreg
+%sext:_(s32) = G_SEXT %trunc
+$vgpr0 = COPY %sext
+...
+
+---
+name: trunc_s8_inreg_16
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0
+; CHECK-LABEL: name: trunc_s8_inreg_16
+; CHECK: liveins: $vgpr0
+; CHECK-NEXT: {{  $}}
+; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0
+; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 16
+; CHECK-NEXT: %trunc:_(s8) = G_TRUNC %inreg(s32)
+; CHECK-NEXT: %sext:_(s32) = G_SEXT %trunc(s8)
+; CHECK-NEXT: $vgpr0 = COPY %sext(s32)
+%copy:_(s32) = COPY $vgpr0
+%inreg:_(s32) = G_SEXT_INREG %copy, 16
+%trunc:_(s8) = G_TRUNC %inreg
+%sext:_(s32) = G_SEXT %trunc
+$vgpr0 = COPY %sext
+...
+
+# TODO?: We could handle this by inserting a trunc, but I'm not sure how 
useful that'd be.
+---
+name: mismatching_types
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0
+; CHECK-LABEL: name: mismatching_types
+; CHECK: liveins: $vgpr0
+; CHECK-NEXT: {{  $}}
+; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0
+; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 8
+; CHECK-NEXT: %trunc:_(s8) = G_TRUNC %inreg(s32)
+; CHECK-NEXT: %sext:_(s16

[llvm-branch-commits] [llvm] [AMDGPU][RegBankInfo] Promote scalar i16 and/or/xor to i32 (PR #131306)

2025-03-14 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/131306

>From 1af83464f02df212384bd97848b0073d41053234 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Wed, 5 Mar 2025 10:46:01 +0100
Subject: [PATCH] [AMDGPU][RegBankInfo] Promote scalar i16 and/or/xor to i32

See #64591
---
 .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp  |  28 +-
 llvm/test/CodeGen/AMDGPU/GlobalISel/andn2.ll  |  10 +-
 llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll   | 519 --
 llvm/test/CodeGen/AMDGPU/GlobalISel/fshr.ll   | 286 +-
 llvm/test/CodeGen/AMDGPU/GlobalISel/orn2.ll   |  10 +-
 5 files changed, 403 insertions(+), 450 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
index c19ee14ab1574..27b86723ce474 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
@@ -2416,9 +2416,10 @@ void AMDGPURegisterBankInfo::applyMappingImpl(
 Register DstReg = MI.getOperand(0).getReg();
 LLT DstTy = MRI.getType(DstReg);
 
-if (DstTy.getSizeInBits() == 1) {
-  const RegisterBank *DstBank =
+const RegisterBank *DstBank =
 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
+
+if (DstTy.getSizeInBits() == 1) {
   if (DstBank == &AMDGPU::VCCRegBank)
 break;
 
@@ -2432,6 +2433,29 @@ void AMDGPURegisterBankInfo::applyMappingImpl(
   return;
 }
 
+// 16-bit operations are VALU only, but can be promoted to 32-bit SALU.
+// Packed 16-bit operations need to be scalarized and promoted.
+if (DstTy.getSizeInBits() == 16 && DstBank == &AMDGPU::SGPRRegBank) {
+  const LLT S32 = LLT::scalar(32);
+  MachineBasicBlock *MBB = MI.getParent();
+  MachineFunction *MF = MBB->getParent();
+  ApplyRegBankMapping ApplySALU(B, *this, MRI, &AMDGPU::SGPRRegBank);
+  LegalizerHelper Helper(*MF, ApplySALU, B);
+  // Widen to S32, but handle `G_XOR x, -1` differently. Legalizer widening
+  // will use a G_ANYEXT to extend the -1 which prevents matching G_XOR -1
+  // as "not".
+  if (MI.getOpcode() == AMDGPU::G_XOR &&
+  mi_match(MI.getOperand(2).getReg(), MRI, m_SpecificICstOrSplat(-1))) 
{
+Helper.widenScalarSrc(MI, S32, 1, AMDGPU::G_ANYEXT);
+Helper.widenScalarSrc(MI, S32, 2, AMDGPU::G_SEXT);
+Helper.widenScalarDst(MI, S32);
+  } else {
+if (Helper.widenScalar(MI, 0, S32) != LegalizerHelper::Legalized)
+  llvm_unreachable("widen scalar should have succeeded");
+  }
+  return;
+}
+
 if (DstTy.getSizeInBits() != 64)
   break;
 
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/andn2.ll 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/andn2.ll
index 1a94429b1b5a1..36359579ea442 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/andn2.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/andn2.ll
@@ -391,20 +391,20 @@ define amdgpu_ps i16 @s_andn2_i16_commute(i16 inreg 
%src0, i16 inreg %src1) {
 define amdgpu_ps { i16, i16 } @s_andn2_i16_multi_use(i16 inreg %src0, i16 
inreg %src1) {
 ; GCN-LABEL: s_andn2_i16_multi_use:
 ; GCN:   ; %bb.0:
-; GCN-NEXT:s_xor_b32 s1, s3, -1
+; GCN-NEXT:s_not_b32 s1, s3
 ; GCN-NEXT:s_andn2_b32 s0, s2, s3
 ; GCN-NEXT:; return to shader part epilog
 ;
 ; GFX10-LABEL: s_andn2_i16_multi_use:
 ; GFX10:   ; %bb.0:
 ; GFX10-NEXT:s_andn2_b32 s0, s2, s3
-; GFX10-NEXT:s_xor_b32 s1, s3, -1
+; GFX10-NEXT:s_not_b32 s1, s3
 ; GFX10-NEXT:; return to shader part epilog
 ;
 ; GFX11-LABEL: s_andn2_i16_multi_use:
 ; GFX11:   ; %bb.0:
 ; GFX11-NEXT:s_and_not1_b32 s0, s2, s3
-; GFX11-NEXT:s_xor_b32 s1, s3, -1
+; GFX11-NEXT:s_not_b32 s1, s3
 ; GFX11-NEXT:; return to shader part epilog
   %not.src1 = xor i16 %src1, -1
   %and = and i16 %src0, %not.src1
@@ -482,14 +482,14 @@ define amdgpu_ps float @v_andn2_i16_sv(i16 inreg %src0, 
i16 %src1) {
 define amdgpu_ps float @v_andn2_i16_vs(i16 %src0, i16 inreg %src1) {
 ; GCN-LABEL: v_andn2_i16_vs:
 ; GCN:   ; %bb.0:
-; GCN-NEXT:s_xor_b32 s0, s2, -1
+; GCN-NEXT:s_not_b32 s0, s2
 ; GCN-NEXT:v_and_b32_e32 v0, s0, v0
 ; GCN-NEXT:v_and_b32_e32 v0, 0x, v0
 ; GCN-NEXT:; return to shader part epilog
 ;
 ; GFX10PLUS-LABEL: v_andn2_i16_vs:
 ; GFX10PLUS:   ; %bb.0:
-; GFX10PLUS-NEXT:s_xor_b32 s0, s2, -1
+; GFX10PLUS-NEXT:s_not_b32 s0, s2
 ; GFX10PLUS-NEXT:v_and_b32_e32 v0, s0, v0
 ; GFX10PLUS-NEXT:v_and_b32_e32 v0, 0x, v0
 ; GFX10PLUS-NEXT:; return to shader part epilog
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll
index e60739fd84059..3a52497bd6e91 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll
@@ -1052,17 +1052,14 @@ define amdgpu_ps i32 @s_fshl_v4i8(i32 inreg %lhs.arg, 
i32 inreg %rhs.arg, i32 in
 ; GFX8-NEXT:s_lshr_b32 s2, s2, s3
 ; GFX8-NEXT:

[llvm-branch-commits] [llvm] [AMDGPU][SIFoldOperands] Fold some redundant bitmasks (PR #131311)

2025-03-14 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/131311

>From d6e5dc03ae8bb46972b7bcffd35e60babbfbc678 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Fri, 14 Mar 2025 10:05:19 +0100
Subject: [PATCH 1/2] [AMDGPU][SIFoldOperands] Fold some redundant bitmasks

Instructions like shifts only read some of the bits of the shift amount 
operand, between 4 and 6 bits.
If the source operand is being masked, we can just ignore the mask.

Effects are minimal right now but this will kick in more once we disable 
uniform i16 operation widening in CGP.
With that disabled, we get more i16 shift amounts
that are zext'd and without this we'd end up with
more `s_and_b32 s1, s1, 0x` in the output.

Ideally ISel should handle this but it's proving difficult to get the patterns 
right, and after a few hours of trying I just decided to go with this as it's 
simple enough and it "just works" for this purpose.
---
 llvm/lib/Target/AMDGPU/SIFoldOperands.cpp |  97 +++-
 llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll   |   8 +-
 llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll   | 201 -
 llvm/test/CodeGen/AMDGPU/GlobalISel/fshr.ll   | 207 --
 llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll   |   8 +-
 llvm/test/CodeGen/AMDGPU/GlobalISel/shl.ll|   6 +-
 llvm/test/CodeGen/AMDGPU/constrained-shift.ll |   1 -
 llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir |  26 +--
 8 files changed, 303 insertions(+), 251 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp 
b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
index 91df516b80857..a279a0a973e75 100644
--- a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
@@ -131,6 +131,7 @@ class SIFoldOperandsImpl {
   std::optional getImmOrMaterializedImm(MachineOperand &Op) const;
   bool tryConstantFoldOp(MachineInstr *MI) const;
   bool tryFoldCndMask(MachineInstr &MI) const;
+  bool tryFoldBitMask(MachineInstr &MI) const;
   bool tryFoldZeroHighBits(MachineInstr &MI) const;
   bool foldInstOperand(MachineInstr &MI, MachineOperand &OpToFold) const;
 
@@ -1447,6 +1448,99 @@ bool SIFoldOperandsImpl::tryFoldCndMask(MachineInstr 
&MI) const {
   return true;
 }
 
+static bool getBitsReadByInst(unsigned Opc, unsigned &NumBitsRead,
+  unsigned &OpIdx) {
+  switch (Opc) {
+  case AMDGPU::V_ASHR_I32_e64:
+  case AMDGPU::V_ASHR_I32_e32:
+  case AMDGPU::V_LSHR_B32_e64:
+  case AMDGPU::V_LSHR_B32_e32:
+  case AMDGPU::V_LSHL_B32_e64:
+  case AMDGPU::V_LSHL_B32_e32:
+  case AMDGPU::S_LSHL_B32:
+  case AMDGPU::S_LSHR_B32:
+  case AMDGPU::S_ASHR_I32:
+NumBitsRead = 5;
+OpIdx = 2;
+return true;
+  case AMDGPU::S_LSHL_B64:
+  case AMDGPU::S_LSHR_B64:
+  case AMDGPU::S_ASHR_I64:
+NumBitsRead = 6;
+OpIdx = 2;
+return true;
+  case AMDGPU::V_LSHLREV_B32_e64:
+  case AMDGPU::V_LSHLREV_B32_e32:
+  case AMDGPU::V_LSHRREV_B32_e64:
+  case AMDGPU::V_LSHRREV_B32_e32:
+  case AMDGPU::V_ASHRREV_I32_e64:
+  case AMDGPU::V_ASHRREV_I32_e32:
+NumBitsRead = 5;
+OpIdx = 1;
+return true;
+  default:
+return false;
+  }
+}
+
+static bool isAndBitMaskRedundant(MachineInstr &MI, unsigned BitsNeeded,
+unsigned &SrcOp) {
+  MachineOperand *RegOp = &MI.getOperand(1);
+  MachineOperand *ImmOp = &MI.getOperand(2);
+
+  if (!RegOp->isReg() || !ImmOp->isImm()) {
+if (ImmOp->isReg() && RegOp->isImm())
+  std::swap(RegOp, ImmOp);
+else
+  return false;
+  }
+
+  SrcOp = RegOp->getOperandNo();
+
+  const unsigned BitMask = maskTrailingOnes(BitsNeeded);
+  return (ImmOp->getImm() & BitMask) == BitMask;
+}
+
+bool SIFoldOperandsImpl::tryFoldBitMask(MachineInstr &MI) const {
+  unsigned NumBitsRead = 0;
+  unsigned OpIdx = 0;
+  if (!getBitsReadByInst(MI.getOpcode(), NumBitsRead, OpIdx))
+return false;
+
+  MachineOperand &Op = MI.getOperand(OpIdx);
+  if (!Op.isReg())
+return false;
+
+  Register OpReg = Op.getReg();
+  if (OpReg.isPhysical())
+return false;
+
+  MachineInstr *OpDef = MRI->getVRegDef(OpReg);
+  if (!OpDef)
+return false ;
+
+  LLVM_DEBUG(dbgs() << "tryFoldBitMask: " << MI << "\tOpIdx:" << OpIdx << ", 
NumBitsRead:" << NumBitsRead << "\n");
+
+  unsigned ReplaceWith;
+  switch (OpDef->getOpcode()) {
+  // TODO: add more opcodes?
+  case AMDGPU::S_AND_B32:
+  case AMDGPU::V_AND_B32_e32:
+  case AMDGPU::V_AND_B32_e64:
+if (!isAndBitMaskRedundant(*OpDef, NumBitsRead, ReplaceWith))
+  return false;
+break;
+  default:
+return false;
+  }
+
+  MachineOperand &ReplaceWithOp = OpDef->getOperand(ReplaceWith);
+  LLVM_DEBUG(dbgs() << "\treplacing operand with:" << ReplaceWithOp << "\n");
+
+  MI.getOperand(OpIdx).setReg(ReplaceWithOp.getReg());
+  return true;
+}
+
 bool SIFoldOperandsImpl::tryFoldZeroHighBits(MachineInstr &MI) const {
   if (MI.getOpcode() != AMDGPU::V_AND_B32_e64 &&
   MI.getOpcode() != AMDGPU::V_AND_B32_e32)
@@ -1458,7 +1552,7 @@ bool SIFoldOperands

[llvm-branch-commits] [llvm] [AMDGPU][Legalizer] Widen i16 G_SEXT_INREG (PR #131308)

2025-03-14 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/131308

>From e6862b4528d1ed48bbca9e742dd9a96d8777545b Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Wed, 5 Mar 2025 13:41:04 +0100
Subject: [PATCH 1/2] [AMDGPU][Legalizer] Widen i16 G_SEXT_INREG

It's better to widen them to avoid it being lowered into a G_ASHR + G_SHL. With 
this change we just extend to i32 then trunc the result.
---
 .../lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp |   3 +-
 llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll   |   7 +-
 .../AMDGPU/GlobalISel/legalize-abs.mir|   8 +-
 .../AMDGPU/GlobalISel/legalize-ashr.mir   |  20 +--
 .../AMDGPU/GlobalISel/legalize-sext-inreg.mir | 155 +++---
 .../AMDGPU/GlobalISel/legalize-sext.mir   | 101 ++--
 .../AMDGPU/GlobalISel/legalize-smax.mir   |  33 +++-
 .../AMDGPU/GlobalISel/legalize-smin.mir   |  33 +++-
 .../AMDGPU/GlobalISel/legalize-smulh.mir  | 132 +++
 .../CodeGen/AMDGPU/GlobalISel/llvm.abs.ll |  45 ++---
 .../CodeGen/AMDGPU/GlobalISel/sext_inreg.ll   | 130 ++-
 11 files changed, 299 insertions(+), 368 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
index b3a8183beeacf..6e611ebb4b625 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
@@ -2009,7 +2009,8 @@ AMDGPULegalizerInfo::AMDGPULegalizerInfo(const 
GCNSubtarget &ST_,
   // S64 is only legal on SALU, and needs to be broken into 32-bit elements in
   // RegBankSelect.
   auto &SextInReg = getActionDefinitionsBuilder(G_SEXT_INREG)
-.legalFor({{S32}, {S64}});
+.legalFor({{S32}, {S64}})
+.widenScalarIf(typeIs(0, S16), widenScalarOrEltToNextPow2(0, 32));
 
   if (ST.hasVOP3PInsts()) {
 SextInReg.lowerFor({{V2S16}})
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll
index 493e8cef63890..f81d7f1c300b8 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll
@@ -17,8 +17,7 @@ define i8 @v_ashr_i8(i8 %value, i8 %amount) {
 ; GFX8-LABEL: v_ashr_i8:
 ; GFX8:   ; %bb.0:
 ; GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-NEXT:v_lshlrev_b16_e32 v0, 8, v0
-; GFX8-NEXT:v_ashrrev_i16_sdwa v0, v1, sext(v0) dst_sel:DWORD 
dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:BYTE_1
+; GFX8-NEXT:v_ashrrev_i16_sdwa v0, v1, sext(v0) dst_sel:DWORD 
dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:BYTE_0
 ; GFX8-NEXT:s_setpc_b64 s[30:31]
 ;
 ; GFX9-LABEL: v_ashr_i8:
@@ -49,8 +48,8 @@ define i8 @v_ashr_i8_7(i8 %value) {
 ; GFX8-LABEL: v_ashr_i8_7:
 ; GFX8:   ; %bb.0:
 ; GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-NEXT:v_lshlrev_b16_e32 v0, 8, v0
-; GFX8-NEXT:v_ashrrev_i16_e32 v0, 15, v0
+; GFX8-NEXT:v_mov_b32_e32 v1, 7
+; GFX8-NEXT:v_ashrrev_i16_sdwa v0, v1, sext(v0) dst_sel:DWORD 
dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:BYTE_0
 ; GFX8-NEXT:s_setpc_b64 s[30:31]
 ;
 ; GFX9-LABEL: v_ashr_i8_7:
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-abs.mir 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-abs.mir
index a9fe80eb47e76..2b911b2dce697 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-abs.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-abs.mir
@@ -144,11 +144,9 @@ body: |
 ; VI: liveins: $vgpr0
 ; VI-NEXT: {{  $}}
 ; VI-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
-; VI-NEXT: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY]](s32)
-; VI-NEXT: [[C:%[0-9]+]]:_(s16) = G_CONSTANT i16 8
-; VI-NEXT: [[SHL:%[0-9]+]]:_(s16) = G_SHL [[TRUNC]], [[C]](s16)
-; VI-NEXT: [[ASHR:%[0-9]+]]:_(s16) = G_ASHR [[SHL]], [[C]](s16)
-; VI-NEXT: [[ABS:%[0-9]+]]:_(s16) = G_ABS [[ASHR]]
+; VI-NEXT: [[SEXT_INREG:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY]], 8
+; VI-NEXT: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[SEXT_INREG]](s32)
+; VI-NEXT: [[ABS:%[0-9]+]]:_(s16) = G_ABS [[TRUNC]]
 ; VI-NEXT: [[ANYEXT:%[0-9]+]]:_(s32) = G_ANYEXT [[ABS]](s16)
 ; VI-NEXT: $vgpr0 = COPY [[ANYEXT]](s32)
 ;
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ashr.mir 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ashr.mir
index f4aaab745e03b..53905a2f49dd0 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ashr.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ashr.mir
@@ -319,12 +319,10 @@ body: |
 ; VI-NEXT: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY1]](s32)
 ; VI-NEXT: [[C:%[0-9]+]]:_(s16) = G_CONSTANT i16 255
 ; VI-NEXT: [[AND:%[0-9]+]]:_(s16) = G_AND [[TRUNC]], [[C]]
-; VI-NEXT: [[TRUNC1:%[0-9]+]]:_(s16) = G_TRUNC [[COPY]](s32)
-; VI-NEXT: [[C1:%[0-9]+]]:_(s16) = G_CONSTANT i16 8
-; VI-NEXT: [[SHL:%[0-9]+]]:_(s16) = G_SHL [[TRUNC1]], [[C1]](s16)
-; VI-NEXT: [[ASHR:%[0-9]+]]:_(s16) = G_ASHR [[SHL]], [[C1]](s16)
-; VI-NEXT: [[ASHR1:%[0-9]+]]:_(s16) = G_ASHR [[ASHR]], [

[llvm-branch-commits] [llvm] [AMDGPU] Precommit si-fold-bitmask.mir (PR #131310)

2025-03-14 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/131310

>From fcd5623ccd18100197817f7f4d5a500ca433f8dc Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Fri, 14 Mar 2025 10:00:21 +0100
Subject: [PATCH] [AMDGPU] Precommit si-fold-bitmask.mir

---
 llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir | 429 ++
 1 file changed, 429 insertions(+)
 create mode 100644 llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir

diff --git a/llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir 
b/llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir
new file mode 100644
index 0..1edf970591179
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir
@@ -0,0 +1,429 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -run-pass=si-fold-operands 
-verify-machineinstrs -o - %s | FileCheck --check-prefix=GCN %s
+
+# Test supported instructions
+
+---
+name: v_ashr_i32_e64__v_and_b32_e32
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0, $vgpr1
+
+; GCN-LABEL: name: v_ashr_i32_e64__v_and_b32_e32
+; GCN: liveins: $vgpr0, $vgpr1
+; GCN-NEXT: {{  $}}
+; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0
+; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1
+; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit 
$exec
+; GCN-NEXT: %ret:vgpr_32 = V_ASHR_I32_e64 %src, %shiftmask, implicit $exec
+; GCN-NEXT: $vgpr0 = COPY %ret
+%src:vgpr_32 = COPY $vgpr0
+%shift:vgpr_32 = COPY $vgpr1
+%shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit $exec
+%ret:vgpr_32 = V_ASHR_I32_e64 %src, %shiftmask, implicit $exec
+$vgpr0 = COPY %ret
+...
+
+---
+name: v_lshr_b32_e64__v_and_b32_e32
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0, $vgpr1
+
+; GCN-LABEL: name: v_lshr_b32_e64__v_and_b32_e32
+; GCN: liveins: $vgpr0, $vgpr1
+; GCN-NEXT: {{  $}}
+; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0
+; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1
+; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit 
$exec
+; GCN-NEXT: %ret:vgpr_32 = V_LSHR_B32_e64 %src, %shiftmask, implicit $exec
+; GCN-NEXT: $vgpr0 = COPY %ret
+%src:vgpr_32 = COPY $vgpr0
+%shift:vgpr_32 = COPY $vgpr1
+%shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit $exec
+%ret:vgpr_32 = V_LSHR_B32_e64 %src, %shiftmask, implicit $exec
+$vgpr0 = COPY %ret
+...
+
+---
+name: v_lshr_b32_e32__v_and_b32_e32
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0, $vgpr1
+
+; GCN-LABEL: name: v_lshr_b32_e32__v_and_b32_e32
+; GCN: liveins: $vgpr0, $vgpr1
+; GCN-NEXT: {{  $}}
+; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0
+; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1
+; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit 
$exec
+; GCN-NEXT: %ret:vgpr_32 = V_LSHR_B32_e32 %src, %shiftmask, implicit $exec
+; GCN-NEXT: $vgpr0 = COPY %ret
+%src:vgpr_32 = COPY $vgpr0
+%shift:vgpr_32 = COPY $vgpr1
+%shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit $exec
+%ret:vgpr_32 = V_LSHR_B32_e32 %src, %shiftmask, implicit $exec
+$vgpr0 = COPY %ret
+...
+
+---
+name: v_lshl_b32_e64__v_and_b32_e32
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0, $vgpr1
+
+; GCN-LABEL: name: v_lshl_b32_e64__v_and_b32_e32
+; GCN: liveins: $vgpr0, $vgpr1
+; GCN-NEXT: {{  $}}
+; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0
+; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1
+; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit 
$exec
+; GCN-NEXT: %ret:vgpr_32 = V_LSHL_B32_e64 %src, %shiftmask, implicit $exec
+; GCN-NEXT: $vgpr0 = COPY %ret
+%src:vgpr_32 = COPY $vgpr0
+%shift:vgpr_32 = COPY $vgpr1
+%shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit $exec
+%ret:vgpr_32 = V_LSHL_B32_e64 %src, %shiftmask, implicit $exec
+$vgpr0 = COPY %ret
+...
+
+---
+name: s_lshl_b32__s_and_b32
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $sgpr0, $sgpr1
+
+; GCN-LABEL: name: s_lshl_b32__s_and_b32
+; GCN: liveins: $sgpr0, $sgpr1
+; GCN-NEXT: {{  $}}
+; GCN-NEXT: %src:sgpr_32 = COPY $sgpr0
+; GCN-NEXT: %shift:sgpr_32 = COPY $sgpr1
+; GCN-NEXT: %shiftmask:sgpr_32 = S_AND_B32 65535, %shift, implicit-def $scc
+; GCN-NEXT: %ret:sgpr_32 = S_LSHL_B32 %src, %shiftmask, implicit-def $scc
+; GCN-NEXT: $sgpr0 = COPY %ret
+%src:sgpr_32 = COPY $sgpr0
+%shift:sgpr_32 = COPY $sgpr1
+%shiftmask:sgpr_32 = S_AND_B32 65535, %shift, implicit-def $scc
+%ret:sgpr_32 = S_LSHL_B32 %src, %shiftmask, implicit-def $scc
+$sgpr0 = COPY %ret
+...
+
+---
+name: s_lshr_b32__s_and_b32
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $sgpr0, $sgpr1
+
+; GCN-LABEL: name: s_lshr_b32__s_and_b32
+; GCN: liveins: $sgpr0, $sgpr1
+; GCN-NEXT: {{  $}}
+; GCN-NEXT: %src:sgpr_32 = COPY $sgpr0
+; GCN-NEXT: %shift:sgpr_

[llvm-branch-commits] [llvm] [AMDGPU][GlobalISel] Allow forming s16 U/SBFX pre-regbankselect (PR #131309)

2025-03-14 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/131309

>From c30cc50e3650137bdb8acc9674c312f6c088983f Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Wed, 12 Mar 2025 09:43:15 +0100
Subject: [PATCH] [AMDGPU][GlobalISel] Allow forming s16 U/SBFX
 pre-regbankselect

Make s16 G_U/SBFX legal and widen them in RegBankSelect.
This allows the set of BFX formation combines to work on s16 types.
---
 .../lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp |   9 +-
 .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp  |  33 +-
 llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll   | 645 --
 llvm/test/CodeGen/AMDGPU/GlobalISel/fshr.ll   | 380 ---
 .../AMDGPU/GlobalISel/legalize-sbfx.mir   |  26 +-
 .../AMDGPU/GlobalISel/legalize-ubfx.mir   |  27 +-
 llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll   |  27 +-
 7 files changed, 503 insertions(+), 644 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
index cfb5c3b3006f0..ab900157d2095 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
@@ -2069,10 +2069,13 @@ AMDGPULegalizerInfo::AMDGPULegalizerInfo(const 
GCNSubtarget &ST_,
   .minScalar(0, S32)
   .lower();
 
+  // Only {S32, S32} or {S32, S64} should ever reach codegen.
+  // We allow S/UBFX for S16 so the combiner can form them before
+  // RegBankSelect, and RegBankSelect will then legalize them correctly.
   getActionDefinitionsBuilder({G_SBFX, G_UBFX})
-  .legalFor({{S32, S32}, {S64, S32}})
-  .clampScalar(1, S32, S32)
-  .clampScalar(0, S32, S64)
+  .legalFor({{S16, S16}, {S32, S32}, {S64, S32}})
+  .clampScalar(1, S16, S32)
+  .clampScalar(0, S16, S64)
   .widenScalarToNextPow2(0)
   .scalarize(0);
 
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
index 27b86723ce474..ed0d52f6b2441 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
@@ -1485,7 +1485,9 @@ bool 
AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B,
   Register DstReg = MI.getOperand(0).getReg();
   LLT Ty = MRI.getType(DstReg);
 
+  const LLT S64 = LLT::scalar(64);
   const LLT S32 = LLT::scalar(32);
+  const LLT S16 = LLT::scalar(16);
 
   unsigned FirstOpnd = isa(MI) ? 2 : 1;
   Register SrcReg = MI.getOperand(FirstOpnd).getReg();
@@ -1495,6 +1497,18 @@ bool 
AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B,
   const RegisterBank *DstBank =
 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
   if (DstBank == &AMDGPU::VGPRRegBank) {
+if (Ty == S16) {
+  ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::VGPRRegBank);
+  B.setInsertPt(B.getMBB(), MI);
+  LegalizerHelper Helper(B.getMF(), ApplyBank, B);
+
+  Helper.widenScalarDst(MI, S32);
+  Helper.widenScalarSrc(MI, S32, 1, AMDGPU::G_ANYEXT);
+  Helper.widenScalarSrc(MI, S32, 2, AMDGPU::G_ZEXT);
+  Helper.widenScalarSrc(MI, S32, 3, AMDGPU::G_ZEXT);
+  return true;
+}
+
 if (Ty == S32)
   return true;
 
@@ -1554,6 +1568,11 @@ bool 
AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B,
 
   ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::SGPRRegBank);
 
+  if (Ty == S16) {
+OffsetReg = B.buildAnyExtOrTrunc(S32, OffsetReg).getReg(0);
+WidthReg = B.buildAnyExtOrTrunc(S32, WidthReg).getReg(0);
+  }
+
   // Ensure the high bits are clear to insert the offset.
   auto OffsetMask = B.buildConstant(S32, maskTrailingOnes(6));
   auto ClampOffset = B.buildAnd(S32, OffsetReg, OffsetMask);
@@ -1568,13 +1587,21 @@ bool 
AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B,
 
   // TODO: It might be worth using a pseudo here to avoid scc clobber and
   // register class constraints.
-  unsigned Opc = Ty == S32 ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32) :
- (Signed ? AMDGPU::S_BFE_I64 : AMDGPU::S_BFE_U64);
+  unsigned Opc = (Ty != S64) ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32)
+ : (Signed ? AMDGPU::S_BFE_I64 : 
AMDGPU::S_BFE_U64);
 
-  auto MIB = B.buildInstr(Opc, {DstReg}, {SrcReg, MergedInputs});
+  Register BFEDst = DstReg;
+  if (Ty == S16) {
+BFEDst = MRI.createGenericVirtualRegister(S32);
+MRI.setRegBank(BFEDst, AMDGPU::SGPRRegBank);
+  }
+  auto MIB = B.buildInstr(Opc, {BFEDst}, {SrcReg, MergedInputs});
   if (!constrainSelectedInstRegOperands(*MIB, *TII, *TRI, *this))
 llvm_unreachable("failed to constrain BFE");
 
+  if (BFEDst != DstReg)
+B.buildZExtOrTrunc(DstReg, BFEDst);
+
   MI.eraseFromParent();
   return true;
 }
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll
index 07fcb02d98649..d2b600b04f9fc 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/fsh

[llvm-branch-commits] [llvm] [AMDGPU][GlobalISel] Allow forming s16 U/SBFX pre-regbankselect (PR #131309)

2025-03-14 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/131309

>From c30cc50e3650137bdb8acc9674c312f6c088983f Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Wed, 12 Mar 2025 09:43:15 +0100
Subject: [PATCH] [AMDGPU][GlobalISel] Allow forming s16 U/SBFX
 pre-regbankselect

Make s16 G_U/SBFX legal and widen them in RegBankSelect.
This allows the set of BFX formation combines to work on s16 types.
---
 .../lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp |   9 +-
 .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp  |  33 +-
 llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll   | 645 --
 llvm/test/CodeGen/AMDGPU/GlobalISel/fshr.ll   | 380 ---
 .../AMDGPU/GlobalISel/legalize-sbfx.mir   |  26 +-
 .../AMDGPU/GlobalISel/legalize-ubfx.mir   |  27 +-
 llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll   |  27 +-
 7 files changed, 503 insertions(+), 644 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
index cfb5c3b3006f0..ab900157d2095 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
@@ -2069,10 +2069,13 @@ AMDGPULegalizerInfo::AMDGPULegalizerInfo(const 
GCNSubtarget &ST_,
   .minScalar(0, S32)
   .lower();
 
+  // Only {S32, S32} or {S32, S64} should ever reach codegen.
+  // We allow S/UBFX for S16 so the combiner can form them before
+  // RegBankSelect, and RegBankSelect will then legalize them correctly.
   getActionDefinitionsBuilder({G_SBFX, G_UBFX})
-  .legalFor({{S32, S32}, {S64, S32}})
-  .clampScalar(1, S32, S32)
-  .clampScalar(0, S32, S64)
+  .legalFor({{S16, S16}, {S32, S32}, {S64, S32}})
+  .clampScalar(1, S16, S32)
+  .clampScalar(0, S16, S64)
   .widenScalarToNextPow2(0)
   .scalarize(0);
 
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
index 27b86723ce474..ed0d52f6b2441 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
@@ -1485,7 +1485,9 @@ bool 
AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B,
   Register DstReg = MI.getOperand(0).getReg();
   LLT Ty = MRI.getType(DstReg);
 
+  const LLT S64 = LLT::scalar(64);
   const LLT S32 = LLT::scalar(32);
+  const LLT S16 = LLT::scalar(16);
 
   unsigned FirstOpnd = isa(MI) ? 2 : 1;
   Register SrcReg = MI.getOperand(FirstOpnd).getReg();
@@ -1495,6 +1497,18 @@ bool 
AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B,
   const RegisterBank *DstBank =
 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
   if (DstBank == &AMDGPU::VGPRRegBank) {
+if (Ty == S16) {
+  ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::VGPRRegBank);
+  B.setInsertPt(B.getMBB(), MI);
+  LegalizerHelper Helper(B.getMF(), ApplyBank, B);
+
+  Helper.widenScalarDst(MI, S32);
+  Helper.widenScalarSrc(MI, S32, 1, AMDGPU::G_ANYEXT);
+  Helper.widenScalarSrc(MI, S32, 2, AMDGPU::G_ZEXT);
+  Helper.widenScalarSrc(MI, S32, 3, AMDGPU::G_ZEXT);
+  return true;
+}
+
 if (Ty == S32)
   return true;
 
@@ -1554,6 +1568,11 @@ bool 
AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B,
 
   ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::SGPRRegBank);
 
+  if (Ty == S16) {
+OffsetReg = B.buildAnyExtOrTrunc(S32, OffsetReg).getReg(0);
+WidthReg = B.buildAnyExtOrTrunc(S32, WidthReg).getReg(0);
+  }
+
   // Ensure the high bits are clear to insert the offset.
   auto OffsetMask = B.buildConstant(S32, maskTrailingOnes(6));
   auto ClampOffset = B.buildAnd(S32, OffsetReg, OffsetMask);
@@ -1568,13 +1587,21 @@ bool 
AMDGPURegisterBankInfo::applyMappingBFE(MachineIRBuilder &B,
 
   // TODO: It might be worth using a pseudo here to avoid scc clobber and
   // register class constraints.
-  unsigned Opc = Ty == S32 ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32) :
- (Signed ? AMDGPU::S_BFE_I64 : AMDGPU::S_BFE_U64);
+  unsigned Opc = (Ty != S64) ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32)
+ : (Signed ? AMDGPU::S_BFE_I64 : 
AMDGPU::S_BFE_U64);
 
-  auto MIB = B.buildInstr(Opc, {DstReg}, {SrcReg, MergedInputs});
+  Register BFEDst = DstReg;
+  if (Ty == S16) {
+BFEDst = MRI.createGenericVirtualRegister(S32);
+MRI.setRegBank(BFEDst, AMDGPU::SGPRRegBank);
+  }
+  auto MIB = B.buildInstr(Opc, {BFEDst}, {SrcReg, MergedInputs});
   if (!constrainSelectedInstRegOperands(*MIB, *TII, *TRI, *this))
 llvm_unreachable("failed to constrain BFE");
 
+  if (BFEDst != DstReg)
+B.buildZExtOrTrunc(DstReg, BFEDst);
+
   MI.eraseFromParent();
   return true;
 }
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll
index 07fcb02d98649..d2b600b04f9fc 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/fsh

[llvm-branch-commits] [llvm] [AMDGPU] Precommit si-fold-bitmask.mir (PR #131310)

2025-03-14 Thread Pierre van Houtryve via llvm-branch-commits

Pierre-vh wrote:

> > GlobalISel unfortunately needs it. We can end up with things like a 
> > `G_LSHR` with the shift amount being zext'd, and they're both lowered 
> > independently so we have a `s_and_b32` of the shift amount.
> 
> It should always be post legalize / post regbankselect combinable. Things are 
> strictly more difficult after selection

The main issue I was having was with code that had <32 bit arguments in 
registers.
We'd have
```
%0(s32) = COPY $sgpr0
%1(s16) = G_TRUNC %0
%2(s32) = G_ZEXT %1
```
Then %2 being used as the shift amount. We can't eliminate the zext/trunc 
because the generic opcode has no mention of reading only the lower bits, 
AFAIK. I tried experimenting with multiple approaches but I didn't find 
anything better than doing it in SIFoldOperand

https://github.com/llvm/llvm-project/pull/131310
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][GlobalISel] Combine (sext (trunc (sext_in_reg x))) (PR #131312)

2025-03-14 Thread Pierre van Houtryve via llvm-branch-commits


@@ -258,6 +258,14 @@ def sext_trunc_sextload : GICombineRule<
  [{ return Helper.matchSextTruncSextLoad(*${d}); }]),
   (apply [{ Helper.applySextTruncSextLoad(*${d}); }])>;
 
+def sext_trunc_sextinreg : GICombineRule<
+  (defs root:$dst),
+  (match (G_SEXT_INREG $sir, $src, $width),
+ (G_TRUNC $trunc, $sir),
+ (G_SEXT $dst, $trunc),
+ [{ return (MRI.getType(${trunc}.getReg()).getScalarSizeInBits() >= 
${width}.getImm()); }]),

Pierre-vh wrote:

Apply isn't allowed to fail. It's just that the presence of `GIReplaceReg` 
triggers emission of a `canReplaceReg` call during the matching portion of the 
match table rule.

> On a related note, couldn't you split this whole combine into two 
> independently useful parts:

Good idea, I can try that


https://github.com/llvm/llvm-project/pull/131312
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][GlobalISel] Combine (sext (trunc (sext_in_reg x))) (PR #131312)

2025-03-15 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh ready_for_review 
https://github.com/llvm/llvm-project/pull/131312
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Precommit si-fold-bitmask.mir (PR #131310)

2025-03-15 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/131310

>From fcd5623ccd18100197817f7f4d5a500ca433f8dc Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Fri, 14 Mar 2025 10:00:21 +0100
Subject: [PATCH] [AMDGPU] Precommit si-fold-bitmask.mir

---
 llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir | 429 ++
 1 file changed, 429 insertions(+)
 create mode 100644 llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir

diff --git a/llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir 
b/llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir
new file mode 100644
index 0..1edf970591179
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir
@@ -0,0 +1,429 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -run-pass=si-fold-operands 
-verify-machineinstrs -o - %s | FileCheck --check-prefix=GCN %s
+
+# Test supported instructions
+
+---
+name: v_ashr_i32_e64__v_and_b32_e32
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0, $vgpr1
+
+; GCN-LABEL: name: v_ashr_i32_e64__v_and_b32_e32
+; GCN: liveins: $vgpr0, $vgpr1
+; GCN-NEXT: {{  $}}
+; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0
+; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1
+; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit 
$exec
+; GCN-NEXT: %ret:vgpr_32 = V_ASHR_I32_e64 %src, %shiftmask, implicit $exec
+; GCN-NEXT: $vgpr0 = COPY %ret
+%src:vgpr_32 = COPY $vgpr0
+%shift:vgpr_32 = COPY $vgpr1
+%shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit $exec
+%ret:vgpr_32 = V_ASHR_I32_e64 %src, %shiftmask, implicit $exec
+$vgpr0 = COPY %ret
+...
+
+---
+name: v_lshr_b32_e64__v_and_b32_e32
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0, $vgpr1
+
+; GCN-LABEL: name: v_lshr_b32_e64__v_and_b32_e32
+; GCN: liveins: $vgpr0, $vgpr1
+; GCN-NEXT: {{  $}}
+; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0
+; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1
+; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit 
$exec
+; GCN-NEXT: %ret:vgpr_32 = V_LSHR_B32_e64 %src, %shiftmask, implicit $exec
+; GCN-NEXT: $vgpr0 = COPY %ret
+%src:vgpr_32 = COPY $vgpr0
+%shift:vgpr_32 = COPY $vgpr1
+%shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit $exec
+%ret:vgpr_32 = V_LSHR_B32_e64 %src, %shiftmask, implicit $exec
+$vgpr0 = COPY %ret
+...
+
+---
+name: v_lshr_b32_e32__v_and_b32_e32
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0, $vgpr1
+
+; GCN-LABEL: name: v_lshr_b32_e32__v_and_b32_e32
+; GCN: liveins: $vgpr0, $vgpr1
+; GCN-NEXT: {{  $}}
+; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0
+; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1
+; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit 
$exec
+; GCN-NEXT: %ret:vgpr_32 = V_LSHR_B32_e32 %src, %shiftmask, implicit $exec
+; GCN-NEXT: $vgpr0 = COPY %ret
+%src:vgpr_32 = COPY $vgpr0
+%shift:vgpr_32 = COPY $vgpr1
+%shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit $exec
+%ret:vgpr_32 = V_LSHR_B32_e32 %src, %shiftmask, implicit $exec
+$vgpr0 = COPY %ret
+...
+
+---
+name: v_lshl_b32_e64__v_and_b32_e32
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0, $vgpr1
+
+; GCN-LABEL: name: v_lshl_b32_e64__v_and_b32_e32
+; GCN: liveins: $vgpr0, $vgpr1
+; GCN-NEXT: {{  $}}
+; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0
+; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1
+; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit 
$exec
+; GCN-NEXT: %ret:vgpr_32 = V_LSHL_B32_e64 %src, %shiftmask, implicit $exec
+; GCN-NEXT: $vgpr0 = COPY %ret
+%src:vgpr_32 = COPY $vgpr0
+%shift:vgpr_32 = COPY $vgpr1
+%shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit $exec
+%ret:vgpr_32 = V_LSHL_B32_e64 %src, %shiftmask, implicit $exec
+$vgpr0 = COPY %ret
+...
+
+---
+name: s_lshl_b32__s_and_b32
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $sgpr0, $sgpr1
+
+; GCN-LABEL: name: s_lshl_b32__s_and_b32
+; GCN: liveins: $sgpr0, $sgpr1
+; GCN-NEXT: {{  $}}
+; GCN-NEXT: %src:sgpr_32 = COPY $sgpr0
+; GCN-NEXT: %shift:sgpr_32 = COPY $sgpr1
+; GCN-NEXT: %shiftmask:sgpr_32 = S_AND_B32 65535, %shift, implicit-def $scc
+; GCN-NEXT: %ret:sgpr_32 = S_LSHL_B32 %src, %shiftmask, implicit-def $scc
+; GCN-NEXT: $sgpr0 = COPY %ret
+%src:sgpr_32 = COPY $sgpr0
+%shift:sgpr_32 = COPY $sgpr1
+%shiftmask:sgpr_32 = S_AND_B32 65535, %shift, implicit-def $scc
+%ret:sgpr_32 = S_LSHL_B32 %src, %shiftmask, implicit-def $scc
+$sgpr0 = COPY %ret
+...
+
+---
+name: s_lshr_b32__s_and_b32
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $sgpr0, $sgpr1
+
+; GCN-LABEL: name: s_lshr_b32__s_and_b32
+; GCN: liveins: $sgpr0, $sgpr1
+; GCN-NEXT: {{  $}}
+; GCN-NEXT: %src:sgpr_32 = COPY $sgpr0
+; GCN-NEXT: %shift:sgpr_

[llvm-branch-commits] [llvm] [AMDGPU][GlobalISel] Allow forming s16 U/SBFX pre-regbankselect (PR #131309)

2025-03-15 Thread Pierre van Houtryve via llvm-branch-commits

Pierre-vh wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/131309?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#131312** https://app.graphite.dev/github/pr/llvm/llvm-project/131312?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131311** https://app.graphite.dev/github/pr/llvm/llvm-project/131311?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131310** https://app.graphite.dev/github/pr/llvm/llvm-project/131310?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131309** https://app.graphite.dev/github/pr/llvm/llvm-project/131309?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/131309?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#131308** https://app.graphite.dev/github/pr/llvm/llvm-project/131308?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131307** https://app.graphite.dev/github/pr/llvm/llvm-project/131307?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131306** https://app.graphite.dev/github/pr/llvm/llvm-project/131306?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131305** https://app.graphite.dev/github/pr/llvm/llvm-project/131305?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/131309
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][SIFoldOperands] Fold some redundant bitmasks (PR #131311)

2025-03-15 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh ready_for_review 
https://github.com/llvm/llvm-project/pull/131311
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][Legalizer] Widen i16 G_SEXT_INREG (PR #131308)

2025-03-15 Thread Pierre van Houtryve via llvm-branch-commits

Pierre-vh wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/131308?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#131312** https://app.graphite.dev/github/pr/llvm/llvm-project/131312?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131311** https://app.graphite.dev/github/pr/llvm/llvm-project/131311?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131310** https://app.graphite.dev/github/pr/llvm/llvm-project/131310?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131309** https://app.graphite.dev/github/pr/llvm/llvm-project/131309?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131308** https://app.graphite.dev/github/pr/llvm/llvm-project/131308?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/131308?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#131307** https://app.graphite.dev/github/pr/llvm/llvm-project/131307?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131306** https://app.graphite.dev/github/pr/llvm/llvm-project/131306?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131305** https://app.graphite.dev/github/pr/llvm/llvm-project/131305?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/131308
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][GlobalISel] Combine (sext (trunc (sext_in_reg x))) (PR #131312)

2025-03-14 Thread Pierre van Houtryve via llvm-branch-commits

Pierre-vh wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/131312?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#131312** https://app.graphite.dev/github/pr/llvm/llvm-project/131312?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/131312?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#131311** https://app.graphite.dev/github/pr/llvm/llvm-project/131311?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131310** https://app.graphite.dev/github/pr/llvm/llvm-project/131310?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131309** https://app.graphite.dev/github/pr/llvm/llvm-project/131309?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131308** https://app.graphite.dev/github/pr/llvm/llvm-project/131308?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131307** https://app.graphite.dev/github/pr/llvm/llvm-project/131307?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131306** https://app.graphite.dev/github/pr/llvm/llvm-project/131306?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131305** https://app.graphite.dev/github/pr/llvm/llvm-project/131305?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/131312
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][RegBankInfo] Promote scalar i16 and/or/xor to i32 (PR #131306)

2025-03-14 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh created 
https://github.com/llvm/llvm-project/pull/131306

See #64591

>From a9f0563665a6d2b69fdee0d826cb52d6651c3dc4 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Wed, 5 Mar 2025 10:46:01 +0100
Subject: [PATCH] [AMDGPU][RegBankInfo] Promote scalar i16 and/or/xor to i32

See #64591
---
 .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp  |  28 +-
 llvm/test/CodeGen/AMDGPU/GlobalISel/andn2.ll  |  10 +-
 llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll   | 519 --
 llvm/test/CodeGen/AMDGPU/GlobalISel/fshr.ll   | 286 +-
 llvm/test/CodeGen/AMDGPU/GlobalISel/orn2.ll   |  10 +-
 5 files changed, 403 insertions(+), 450 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
index c19ee14ab1574..27b86723ce474 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
@@ -2416,9 +2416,10 @@ void AMDGPURegisterBankInfo::applyMappingImpl(
 Register DstReg = MI.getOperand(0).getReg();
 LLT DstTy = MRI.getType(DstReg);
 
-if (DstTy.getSizeInBits() == 1) {
-  const RegisterBank *DstBank =
+const RegisterBank *DstBank =
 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
+
+if (DstTy.getSizeInBits() == 1) {
   if (DstBank == &AMDGPU::VCCRegBank)
 break;
 
@@ -2432,6 +2433,29 @@ void AMDGPURegisterBankInfo::applyMappingImpl(
   return;
 }
 
+// 16-bit operations are VALU only, but can be promoted to 32-bit SALU.
+// Packed 16-bit operations need to be scalarized and promoted.
+if (DstTy.getSizeInBits() == 16 && DstBank == &AMDGPU::SGPRRegBank) {
+  const LLT S32 = LLT::scalar(32);
+  MachineBasicBlock *MBB = MI.getParent();
+  MachineFunction *MF = MBB->getParent();
+  ApplyRegBankMapping ApplySALU(B, *this, MRI, &AMDGPU::SGPRRegBank);
+  LegalizerHelper Helper(*MF, ApplySALU, B);
+  // Widen to S32, but handle `G_XOR x, -1` differently. Legalizer widening
+  // will use a G_ANYEXT to extend the -1 which prevents matching G_XOR -1
+  // as "not".
+  if (MI.getOpcode() == AMDGPU::G_XOR &&
+  mi_match(MI.getOperand(2).getReg(), MRI, m_SpecificICstOrSplat(-1))) 
{
+Helper.widenScalarSrc(MI, S32, 1, AMDGPU::G_ANYEXT);
+Helper.widenScalarSrc(MI, S32, 2, AMDGPU::G_SEXT);
+Helper.widenScalarDst(MI, S32);
+  } else {
+if (Helper.widenScalar(MI, 0, S32) != LegalizerHelper::Legalized)
+  llvm_unreachable("widen scalar should have succeeded");
+  }
+  return;
+}
+
 if (DstTy.getSizeInBits() != 64)
   break;
 
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/andn2.ll 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/andn2.ll
index 1a94429b1b5a1..36359579ea442 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/andn2.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/andn2.ll
@@ -391,20 +391,20 @@ define amdgpu_ps i16 @s_andn2_i16_commute(i16 inreg 
%src0, i16 inreg %src1) {
 define amdgpu_ps { i16, i16 } @s_andn2_i16_multi_use(i16 inreg %src0, i16 
inreg %src1) {
 ; GCN-LABEL: s_andn2_i16_multi_use:
 ; GCN:   ; %bb.0:
-; GCN-NEXT:s_xor_b32 s1, s3, -1
+; GCN-NEXT:s_not_b32 s1, s3
 ; GCN-NEXT:s_andn2_b32 s0, s2, s3
 ; GCN-NEXT:; return to shader part epilog
 ;
 ; GFX10-LABEL: s_andn2_i16_multi_use:
 ; GFX10:   ; %bb.0:
 ; GFX10-NEXT:s_andn2_b32 s0, s2, s3
-; GFX10-NEXT:s_xor_b32 s1, s3, -1
+; GFX10-NEXT:s_not_b32 s1, s3
 ; GFX10-NEXT:; return to shader part epilog
 ;
 ; GFX11-LABEL: s_andn2_i16_multi_use:
 ; GFX11:   ; %bb.0:
 ; GFX11-NEXT:s_and_not1_b32 s0, s2, s3
-; GFX11-NEXT:s_xor_b32 s1, s3, -1
+; GFX11-NEXT:s_not_b32 s1, s3
 ; GFX11-NEXT:; return to shader part epilog
   %not.src1 = xor i16 %src1, -1
   %and = and i16 %src0, %not.src1
@@ -482,14 +482,14 @@ define amdgpu_ps float @v_andn2_i16_sv(i16 inreg %src0, 
i16 %src1) {
 define amdgpu_ps float @v_andn2_i16_vs(i16 %src0, i16 inreg %src1) {
 ; GCN-LABEL: v_andn2_i16_vs:
 ; GCN:   ; %bb.0:
-; GCN-NEXT:s_xor_b32 s0, s2, -1
+; GCN-NEXT:s_not_b32 s0, s2
 ; GCN-NEXT:v_and_b32_e32 v0, s0, v0
 ; GCN-NEXT:v_and_b32_e32 v0, 0x, v0
 ; GCN-NEXT:; return to shader part epilog
 ;
 ; GFX10PLUS-LABEL: v_andn2_i16_vs:
 ; GFX10PLUS:   ; %bb.0:
-; GFX10PLUS-NEXT:s_xor_b32 s0, s2, -1
+; GFX10PLUS-NEXT:s_not_b32 s0, s2
 ; GFX10PLUS-NEXT:v_and_b32_e32 v0, s0, v0
 ; GFX10PLUS-NEXT:v_and_b32_e32 v0, 0x, v0
 ; GFX10PLUS-NEXT:; return to shader part epilog
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll
index e60739fd84059..3a52497bd6e91 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll
@@ -1052,17 +1052,14 @@ define amdgpu_ps i32 @s_fshl_v4i8(i32 inreg %lhs.arg, 
i32 inreg %rhs.arg, i32 in
 ; GFX8-NEXT:s_lshr_b32 s2, s2, s3
 ; GF

[llvm-branch-commits] [llvm] [AMDGPU][SIFoldOperands] Fold some redundant bitmasks (PR #131311)

2025-03-14 Thread Pierre van Houtryve via llvm-branch-commits

Pierre-vh wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/131311?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#131312** https://app.graphite.dev/github/pr/llvm/llvm-project/131312?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131311** https://app.graphite.dev/github/pr/llvm/llvm-project/131311?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/131311?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#131310** https://app.graphite.dev/github/pr/llvm/llvm-project/131310?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131309** https://app.graphite.dev/github/pr/llvm/llvm-project/131309?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131308** https://app.graphite.dev/github/pr/llvm/llvm-project/131308?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131307** https://app.graphite.dev/github/pr/llvm/llvm-project/131307?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131306** https://app.graphite.dev/github/pr/llvm/llvm-project/131306?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131305** https://app.graphite.dev/github/pr/llvm/llvm-project/131305?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/131311
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][Legalizer] Widen i16 G_SEXT_INREG (PR #131308)

2025-03-15 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/131308

>From e6862b4528d1ed48bbca9e742dd9a96d8777545b Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Wed, 5 Mar 2025 13:41:04 +0100
Subject: [PATCH 1/2] [AMDGPU][Legalizer] Widen i16 G_SEXT_INREG

It's better to widen them to avoid it being lowered into a G_ASHR + G_SHL. With 
this change we just extend to i32 then trunc the result.
---
 .../lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp |   3 +-
 llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll   |   7 +-
 .../AMDGPU/GlobalISel/legalize-abs.mir|   8 +-
 .../AMDGPU/GlobalISel/legalize-ashr.mir   |  20 +--
 .../AMDGPU/GlobalISel/legalize-sext-inreg.mir | 155 +++---
 .../AMDGPU/GlobalISel/legalize-sext.mir   | 101 ++--
 .../AMDGPU/GlobalISel/legalize-smax.mir   |  33 +++-
 .../AMDGPU/GlobalISel/legalize-smin.mir   |  33 +++-
 .../AMDGPU/GlobalISel/legalize-smulh.mir  | 132 +++
 .../CodeGen/AMDGPU/GlobalISel/llvm.abs.ll |  45 ++---
 .../CodeGen/AMDGPU/GlobalISel/sext_inreg.ll   | 130 ++-
 11 files changed, 299 insertions(+), 368 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
index b3a8183beeacf..6e611ebb4b625 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
@@ -2009,7 +2009,8 @@ AMDGPULegalizerInfo::AMDGPULegalizerInfo(const 
GCNSubtarget &ST_,
   // S64 is only legal on SALU, and needs to be broken into 32-bit elements in
   // RegBankSelect.
   auto &SextInReg = getActionDefinitionsBuilder(G_SEXT_INREG)
-.legalFor({{S32}, {S64}});
+.legalFor({{S32}, {S64}})
+.widenScalarIf(typeIs(0, S16), widenScalarOrEltToNextPow2(0, 32));
 
   if (ST.hasVOP3PInsts()) {
 SextInReg.lowerFor({{V2S16}})
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll
index 493e8cef63890..f81d7f1c300b8 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll
@@ -17,8 +17,7 @@ define i8 @v_ashr_i8(i8 %value, i8 %amount) {
 ; GFX8-LABEL: v_ashr_i8:
 ; GFX8:   ; %bb.0:
 ; GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-NEXT:v_lshlrev_b16_e32 v0, 8, v0
-; GFX8-NEXT:v_ashrrev_i16_sdwa v0, v1, sext(v0) dst_sel:DWORD 
dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:BYTE_1
+; GFX8-NEXT:v_ashrrev_i16_sdwa v0, v1, sext(v0) dst_sel:DWORD 
dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:BYTE_0
 ; GFX8-NEXT:s_setpc_b64 s[30:31]
 ;
 ; GFX9-LABEL: v_ashr_i8:
@@ -49,8 +48,8 @@ define i8 @v_ashr_i8_7(i8 %value) {
 ; GFX8-LABEL: v_ashr_i8_7:
 ; GFX8:   ; %bb.0:
 ; GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-NEXT:v_lshlrev_b16_e32 v0, 8, v0
-; GFX8-NEXT:v_ashrrev_i16_e32 v0, 15, v0
+; GFX8-NEXT:v_mov_b32_e32 v1, 7
+; GFX8-NEXT:v_ashrrev_i16_sdwa v0, v1, sext(v0) dst_sel:DWORD 
dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:BYTE_0
 ; GFX8-NEXT:s_setpc_b64 s[30:31]
 ;
 ; GFX9-LABEL: v_ashr_i8_7:
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-abs.mir 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-abs.mir
index a9fe80eb47e76..2b911b2dce697 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-abs.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-abs.mir
@@ -144,11 +144,9 @@ body: |
 ; VI: liveins: $vgpr0
 ; VI-NEXT: {{  $}}
 ; VI-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
-; VI-NEXT: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY]](s32)
-; VI-NEXT: [[C:%[0-9]+]]:_(s16) = G_CONSTANT i16 8
-; VI-NEXT: [[SHL:%[0-9]+]]:_(s16) = G_SHL [[TRUNC]], [[C]](s16)
-; VI-NEXT: [[ASHR:%[0-9]+]]:_(s16) = G_ASHR [[SHL]], [[C]](s16)
-; VI-NEXT: [[ABS:%[0-9]+]]:_(s16) = G_ABS [[ASHR]]
+; VI-NEXT: [[SEXT_INREG:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY]], 8
+; VI-NEXT: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[SEXT_INREG]](s32)
+; VI-NEXT: [[ABS:%[0-9]+]]:_(s16) = G_ABS [[TRUNC]]
 ; VI-NEXT: [[ANYEXT:%[0-9]+]]:_(s32) = G_ANYEXT [[ABS]](s16)
 ; VI-NEXT: $vgpr0 = COPY [[ANYEXT]](s32)
 ;
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ashr.mir 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ashr.mir
index f4aaab745e03b..53905a2f49dd0 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ashr.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ashr.mir
@@ -319,12 +319,10 @@ body: |
 ; VI-NEXT: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY1]](s32)
 ; VI-NEXT: [[C:%[0-9]+]]:_(s16) = G_CONSTANT i16 255
 ; VI-NEXT: [[AND:%[0-9]+]]:_(s16) = G_AND [[TRUNC]], [[C]]
-; VI-NEXT: [[TRUNC1:%[0-9]+]]:_(s16) = G_TRUNC [[COPY]](s32)
-; VI-NEXT: [[C1:%[0-9]+]]:_(s16) = G_CONSTANT i16 8
-; VI-NEXT: [[SHL:%[0-9]+]]:_(s16) = G_SHL [[TRUNC1]], [[C1]](s16)
-; VI-NEXT: [[ASHR:%[0-9]+]]:_(s16) = G_ASHR [[SHL]], [[C1]](s16)
-; VI-NEXT: [[ASHR1:%[0-9]+]]:_(s16) = G_ASHR [[ASHR]], [

[llvm-branch-commits] [llvm] [GlobalISel] Combine redundant sext_inreg (PR #131624)

2025-03-17 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh created 
https://github.com/llvm/llvm-project/pull/131624

None

>From e36f66595a582b6ba926186674b6da6b41236ff5 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Mon, 17 Mar 2025 13:54:59 +0100
Subject: [PATCH] [GlobalISel] Combine redundant sext_inreg

---
 .../llvm/CodeGen/GlobalISel/CombinerHelper.h  |   3 +
 .../include/llvm/Target/GlobalISel/Combine.td |   9 +-
 .../GlobalISel/CombinerHelperCasts.cpp|  27 +++
 .../combine-redundant-sext-inreg.mir  | 164 ++
 .../combine-sext-trunc-sextinreg.mir  |  87 ++
 .../CodeGen/AMDGPU/GlobalISel/llvm.abs.ll |   5 -
 6 files changed, 289 insertions(+), 6 deletions(-)
 create mode 100644 
llvm/test/CodeGen/AMDGPU/GlobalISel/combine-redundant-sext-inreg.mir
 create mode 100644 
llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir

diff --git a/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h 
b/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
index 9b78342c8fc39..5778377d125a8 100644
--- a/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
+++ b/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
@@ -994,6 +994,9 @@ class CombinerHelper {
   // overflow sub
   bool matchSuboCarryOut(const MachineInstr &MI, BuildFnTy &MatchInfo) const;
 
+  // (sext_inreg (sext_inreg x, K0), K1)
+  void applyRedundantSextInReg(MachineInstr &Root, MachineInstr &Other) const;
+
 private:
   /// Checks for legality of an indexed variant of \p LdSt.
   bool isIndexedLoadStoreLegal(GLoadStore &LdSt) const;
diff --git a/llvm/include/llvm/Target/GlobalISel/Combine.td 
b/llvm/include/llvm/Target/GlobalISel/Combine.td
index 660b03080f92e..6a0ff683a4647 100644
--- a/llvm/include/llvm/Target/GlobalISel/Combine.td
+++ b/llvm/include/llvm/Target/GlobalISel/Combine.td
@@ -1849,6 +1849,12 @@ def anyext_of_anyext : ext_of_ext_opcodes;
 def anyext_of_zext : ext_of_ext_opcodes;
 def anyext_of_sext : ext_of_ext_opcodes;
 
+def sext_inreg_of_sext_inreg : GICombineRule<
+   (defs root:$dst),
+   (match (G_SEXT_INREG $x, $src, $a):$other,
+  (G_SEXT_INREG $dst, $x, $b):$root),
+   (apply [{ Helper.applyRedundantSextInReg(*${root}, *${other}); }])>;
+
 // Push cast through build vector.
 class buildvector_of_opcode : GICombineRule <
   (defs root:$root, build_fn_matchinfo:$matchinfo),
@@ -1896,7 +1902,8 @@ def cast_of_cast_combines: GICombineGroup<[
   sext_of_anyext,
   anyext_of_anyext,
   anyext_of_zext,
-  anyext_of_sext
+  anyext_of_sext,
+  sext_inreg_of_sext_inreg,
 ]>;
 
 def cast_combines: GICombineGroup<[
diff --git a/llvm/lib/CodeGen/GlobalISel/CombinerHelperCasts.cpp 
b/llvm/lib/CodeGen/GlobalISel/CombinerHelperCasts.cpp
index 182484754d091..ffc2384fc14fd 100644
--- a/llvm/lib/CodeGen/GlobalISel/CombinerHelperCasts.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/CombinerHelperCasts.cpp
@@ -372,3 +372,30 @@ bool CombinerHelper::matchCastOfInteger(const MachineInstr 
&CastMI,
 return false;
   }
 }
+
+void CombinerHelper::applyRedundantSextInReg(MachineInstr &Root,
+ MachineInstr &Other) const {
+  assert(Root.getOpcode() == TargetOpcode::G_SEXT_INREG &&
+ Other.getOpcode() == TargetOpcode::G_SEXT_INREG);
+
+  unsigned RootWidth = Root.getOperand(2).getImm();
+  unsigned OtherWidth = Other.getOperand(2).getImm();
+
+  Register Dst = Root.getOperand(0).getReg();
+  Register OtherDst = Other.getOperand(0).getReg();
+  Register Src = Other.getOperand(1).getReg();
+
+  if (RootWidth >= OtherWidth) {
+// The root sext_inreg is entirely redundant because the other one
+// is narrower.
+Observer.changingAllUsesOfReg(MRI, Dst);
+MRI.replaceRegWith(Dst, OtherDst);
+Observer.finishedChangingAllUsesOfReg();
+  } else {
+// RootWidth < OtherWidth, rewrite this G_SEXT_INREG with the source of the
+// other G_SEXT_INREG.
+Builder.buildSExtInReg(Dst, Src, RootWidth);
+  }
+
+  Root.eraseFromParent();
+}
diff --git 
a/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-redundant-sext-inreg.mir 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-redundant-sext-inreg.mir
new file mode 100644
index 0..566ee8e6c338d
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-redundant-sext-inreg.mir
@@ -0,0 +1,164 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1030 
-run-pass=amdgpu-regbank-combiner -verify-machineinstrs %s -o - | FileCheck %s
+
+---
+name: inreg8_inreg16
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0
+; CHECK-LABEL: name: inreg8_inreg16
+; CHECK: liveins: $vgpr0
+; CHECK-NEXT: {{  $}}
+; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0
+; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 8
+; CHECK-NEXT: $vgpr0 = COPY %inreg(s32)
+%copy:_(s32) = COPY $vgpr0
+%inreg:_(s32) = G_SEXT_INREG %copy, 8
+%inreg1:_(s32) = G_SEXT_INREG %inreg, 16
+$vgpr0 = COPY %inreg1
+...
+

[llvm-branch-commits] [llvm] [AMDGPU] Add sext_trunc in RegBankCombiner (PR #131623)

2025-03-17 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh created 
https://github.com/llvm/llvm-project/pull/131623

None

>From 3f2cbbd6addf4844c7c861a6de55be59a8c96c35 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Mon, 17 Mar 2025 13:22:25 +0100
Subject: [PATCH] [AMDGPU] Add sext_trunc in RegBankCombiner

---
 llvm/lib/Target/AMDGPU/AMDGPUCombine.td | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td 
b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
index a21505356274b..083ce48911689 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
@@ -181,5 +181,5 @@ def AMDGPURegBankCombiner : GICombiner<
zext_trunc_fold, int_minmax_to_med3, ptr_add_immed_chain,
fp_minmax_to_clamp, fp_minmax_to_med3, fmed3_intrinsic_to_clamp,
identity_combines, redundant_and, constant_fold_cast_op,
-   cast_of_cast_combines]> {
+   cast_of_cast_combines, sext_trunc]> {
 }

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Add sext_trunc in RegBankCombiner (PR #131623)

2025-03-17 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh ready_for_review 
https://github.com/llvm/llvm-project/pull/131623
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [GlobalISel] Combine redundant sext_inreg (PR #131624)

2025-03-17 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh ready_for_review 
https://github.com/llvm/llvm-project/pull/131624
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Add sext_trunc in RegBankCombiner (PR #131623)

2025-03-17 Thread Pierre van Houtryve via llvm-branch-commits

Pierre-vh wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/131623?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#131624** https://app.graphite.dev/github/pr/llvm/llvm-project/131624?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131623** https://app.graphite.dev/github/pr/llvm/llvm-project/131623?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/131623?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#131622** https://app.graphite.dev/github/pr/llvm/llvm-project/131622?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/131623
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [GlobalISel] Combine redundant sext_inreg (PR #131624)

2025-03-17 Thread Pierre van Houtryve via llvm-branch-commits

Pierre-vh wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/131624?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#131624** https://app.graphite.dev/github/pr/llvm/llvm-project/131624?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/131624?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#131623** https://app.graphite.dev/github/pr/llvm/llvm-project/131623?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131622** https://app.graphite.dev/github/pr/llvm/llvm-project/131622?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/131624
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][InsertWaitCnts] Track global_wb/inv/wbinv (PR #135340)

2025-04-11 Thread Pierre van Houtryve via llvm-branch-commits

Pierre-vh wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/135340?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#135340** https://app.graphite.dev/github/pr/llvm/llvm-project/135340?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/135340?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#135339** https://app.graphite.dev/github/pr/llvm/llvm-project/135339?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/135340
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][InsertWaitCnts] Track global_wb/inv/wbinv (PR #135340)

2025-04-11 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh ready_for_review 
https://github.com/llvm/llvm-project/pull/135340
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


  1   2   3   >