[llvm-branch-commits] [llvm] f5bd5bf - Revert "Revert "LangRef: Clarify llvm.minnum and llvm.maxnum about sNaN and s…"

2025-11-30 Thread via llvm-branch-commits

Author: Matt Arsenault
Date: 2025-11-30T21:23:53-05:00
New Revision: f5bd5bf4484a7e424a7046957571351d8c60294b

URL: 
https://github.com/llvm/llvm-project/commit/f5bd5bf4484a7e424a7046957571351d8c60294b
DIFF: 
https://github.com/llvm/llvm-project/commit/f5bd5bf4484a7e424a7046957571351d8c60294b.diff

LOG: Revert "Revert "LangRef: Clarify llvm.minnum and llvm.maxnum about sNaN 
and s…"

This reverts commit 75aa01b89553bf4213a3b0e83829b6d0689941b9.

Added: 


Modified: 
llvm/docs/LangRef.rst
llvm/include/llvm/CodeGen/ISDOpcodes.h

Removed: 




diff  --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index a57351f9598e2..02865f8a29c67 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -17298,9 +17298,8 @@ LLVM Implementation:
 
 
 LLVM implements all ISO C flavors as listed in this table, except in the
-default floating-point environment exceptions are ignored and return value
-is non-deterministic if one or both inputs are sNaN. The constrained
-versions of the intrinsics respect the exception behavior and sNaN.
+default floating-point environment exceptions are ignored. The constrained
+versions of the intrinsics respect the exception behavior.
 
 .. list-table::
:header-rows: 1
@@ -17332,7 +17331,7 @@ versions of the intrinsics respect the exception 
behavior and sNaN.
  - qNaN, invalid exception
 
* - ``+0.0 vs -0.0``
- - either one
+ - +0.0(max)/-0.0(min)
  - +0.0(max)/-0.0(min)
  - +0.0(max)/-0.0(min)
 
@@ -17376,22 +17375,30 @@ type.
 
 Semantics:
 ""
+Follows the semantics of minNum in IEEE-754-2008, except that -0.0 < +0.0 for 
the purposes
+of this intrinsic. As for signaling NaNs, per the minNum semantics, if either 
operand is sNaN,
+the result is qNaN. This matches the recommended behavior for the libm
+function ``fmin``, although not all implementations have implemented these 
recommended behaviors.
+
+If either operand is a qNaN, returns the other non-NaN operand. Returns NaN 
only if both operands are
+NaN or if either operand is sNaN. Note that arithmetic on an sNaN doesn't 
consistently produce a qNaN,
+so arithmetic feeding into a minnum can produce inconsistent results. For 
example,
+``minnum(fadd(sNaN, -0.0), 1.0)`` can produce qNaN or 1.0 depending on whether 
``fadd`` is folded.
 
-Follows the IEEE-754-2008 semantics for minNum, except for handling of
-signaling NaNs. This matches the behavior of libm's fmin.
+IEEE-754-2008 defines minNum, and it was removed in IEEE-754-2019. As the 
replacement, IEEE-754-2019
+defines :ref:`minimumNumber `.
 
-If either operand is a NaN, returns the other non-NaN operand. Returns
-NaN only if both operands are NaN. If the operands compare equal,
-returns either one of the operands. For example, this means that
-fmin(+0.0, -0.0) non-deterministically returns either operand (-0.0
-or 0.0).
+If the intrinsic is marked with the nsz attribute, then the effect is as in 
the definition in C
+and IEEE-754-2008: the result of ``minnum(-0.0, +0.0)`` may be either -0.0 or 
+0.0.
 
-Unlike the IEEE-754-2008 behavior, this does not distinguish between
-signaling and quiet NaN inputs. If a target's implementation follows
-the standard and returns a quiet NaN if either input is a signaling
-NaN, the intrinsic lowering is responsible for quieting the inputs to
-correctly return the non-NaN input (e.g. by using the equivalent of
-``llvm.canonicalize``).
+Some architectures, such as ARMv8 (FMINNM), LoongArch (fmin), MIPSr6 
(min.fmt), PowerPC/VSX (xsmindp),
+have instructions that match these semantics exactly; thus it is quite simple 
for these architectures.
+Some architectures have similar ones while they are not exact equivalent. Such 
as x86 implements ``MINPS``,
+which implements the semantics of C code ``a`.
 
-Unlike the IEEE-754-2008 behavior, this does not distinguish between
-signaling and quiet NaN inputs. If a target's implementation follows
-the standard and returns a quiet NaN if either input is a signaling
-NaN, the intrinsic lowering is responsible for quieting the inputs to
-correctly return the non-NaN input (e.g. by using the equivalent of
-``llvm.canonicalize``).
+If the intrinsic is marked with the nsz attribute, then the effect is as in 
the definition in C
+and IEEE-754-2008: the result of maxnum(-0.0, +0.0) may be either -0.0 or +0.0.
+
+Some architectures, such as ARMv8 (FMAXNM), LoongArch (fmax), MIPSr6 
(max.fmt), PowerPC/VSX (xsmaxdp),
+have instructions that match these semantics exactly; thus it is quite simple 
for these architectures.
+Some architectures have similar ones while they are not exact equivalent. Such 
as x86 implements ``MAXPS``,
+which implements the semantics of C code ``a>b?a:b``: NUM vs qNaN always 
return qNaN. ``MAXPS`` can be used
+if ``nsz`` and ``nnan`` are given.
+
+For existing libc implementations, the behaviors of fmin may be quite 
diff erent 

[llvm-branch-commits] [clang] [flang] [llvm] [openmp] [Flang] Move builtin .mod generation into runtimes (Reapply #137828) (PR #169638)

2025-11-30 Thread Michael Kruse via llvm-branch-commits

https://github.com/Meinersbur ready_for_review 
https://github.com/llvm/llvm-project/pull/169638
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [flang] [llvm] [openmp] [Flang] Move builtin .mod generation into runtimes (Reapply #137828) (PR #169638)

2025-11-30 Thread Michael Kruse via llvm-branch-commits

https://github.com/Meinersbur edited 
https://github.com/llvm/llvm-project/pull/169638
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [mlir] [mlir][arith] Add support for `negf` to `ArithToAPFloat` (PR #169759)

2025-11-30 Thread Jakub Kuderski via llvm-branch-commits


@@ -482,6 +525,7 @@ void ArithToAPFloatConversionPass::runOnOperation() {
   patterns.add>(context, getOperation(),
/*isUnsigned=*/true);
   patterns.add(context, getOperation());
+  patterns.add(context, getOperation());

kuhar wrote:

Add these together in one call?

https://github.com/llvm/llvm-project/pull/169759
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [mlir] [mlir][arith] Add support for `negf` to `ArithToAPFloat` (PR #169759)

2025-11-30 Thread Jakub Kuderski via llvm-branch-commits

https://github.com/kuhar approved this pull request.


https://github.com/llvm/llvm-project/pull/169759
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [mlir] [mlir][arith] Add support for min/max to `ArithToAPFloat` (PR #169760)

2025-11-30 Thread Jakub Kuderski via llvm-branch-commits

https://github.com/kuhar approved this pull request.


https://github.com/llvm/llvm-project/pull/169760
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [flang] [llvm] [openmp] [Flang] Move builtin .mod generation into runtimes (Reapply #137828) (PR #169638)

2025-11-30 Thread Michael Kruse via llvm-branch-commits

https://github.com/Meinersbur edited 
https://github.com/llvm/llvm-project/pull/169638
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [NPM] Schedule PhysicalRegisterUsageAnalysis before RegUsageInfoCollectorPass (PR #168832)

2025-11-30 Thread Vikram Hegde via llvm-branch-commits

https://github.com/vikramRH updated 
https://github.com/llvm/llvm-project/pull/168832

>From 2136cfff87b421ecf44aa5cd9ff45374b385b6eb Mon Sep 17 00:00:00 2001
From: vikhegde 
Date: Tue, 18 Nov 2025 11:13:37 +0530
Subject: [PATCH] [NPM] Schedule PhysicalRegisterUsageAnalysis before
 RegUsageInfoCollectorPass

---
 llvm/include/llvm/Passes/CodeGenPassBuilder.h | 4 +++-
 llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll  | 6 +++---
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/llvm/include/llvm/Passes/CodeGenPassBuilder.h 
b/llvm/include/llvm/Passes/CodeGenPassBuilder.h
index 03777c7fcb45f..0e14f2e50ae04 100644
--- a/llvm/include/llvm/Passes/CodeGenPassBuilder.h
+++ b/llvm/include/llvm/Passes/CodeGenPassBuilder.h
@@ -1081,10 +1081,12 @@ Error CodeGenPassBuilder::addMachinePasses(
 
   derived().addPreEmitPass(addPass);
 
-  if (TM.Options.EnableIPRA)
+  if (TM.Options.EnableIPRA) {
 // Collect register usage information and produce a register mask of
 // clobbered registers, to be used to optimize call sites.
+addPass(RequireAnalysisPass());
 addPass(RegUsageInfoCollectorPass());
+  }
 
   addPass(FuncletLayoutPass());
 
diff --git a/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll 
b/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
index ba29a5c2a9a9d..667f8aef58459 100644
--- a/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
+++ b/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
@@ -9,11 +9,11 @@
 ; RUN:   | FileCheck -check-prefix=GCN-O3 %s
 
 
-; GCN-O0: 
require,require,require,require,pre-isel-intrinsic-lowering,function(expand-large-div-rem,expand-fp),amdgpu-remove-incompatible-functions,amdgpu-printf-runtime-binding,amdgpu-lower-ctor-dtor,function(amdgpu-uniform-intrinsic-combine),expand-variadics,amdgpu-always-inline,always-inline,amdgpu-export-kernel-runtime-handles,amdgpu-lower-exec-sync,amdgpu-sw-lower-lds,amdgpu-lower-module-lds,function(atomic-expand,verify,gc-lowering,lower-constant-intrinsics,unreachableblockelim,ee-instrument,scalarize-masked-mem-intrin,expand-reductions,amdgpu-lower-kernel-arguments),amdgpu-lower-buffer-fat-pointers,amdgpu-lower-intrinsics,cgscc(function(lower-switch,lower-invoke,unreachableblockelim,amdgpu-unify-divergent-exit-nodes,fix-irreducible,unify-loop-exits,StructurizeCFGPass,amdgpu-annotate-uniform,si-annotate-control-flow,amdgpu-rewrite-undef-for-phi,lcssa,require,callbr-prepare,safe-stack,stack-protector,verify)),cgscc(function(machine-function(amdgpu-isel,si-fix-sgpr-copies,si-i1-copies,finalize-isel,localstackalloc))),require,cgscc(function(machine-function(reg-usage-propagation,phi-node-elimination,two-address-instruction,regallocfast,si-fix-vgpr-copies,remove-redundant-debug-values,fixup-statepoint-caller-saved,prolog-epilog,post-ra-pseudos,si-post-ra-bundler,fentry-insert,xray-instrumentation,patchable-function,si-memory-legalizer,si-insert-waitcnts,si-mode-register,si-late-branch-lowering,post-RA-hazard-rec,amdgpu-wait-sgpr-hazards,amdgpu-lower-vgpr-encoding,branch-relaxation,reg-usage-collector,remove-loads-into-fake-uses,live-debug-values,machine-sanmd,stack-frame-layout,verify),free-machine-function))
+; GCN-O0: 
require,require,require,require,pre-isel-intrinsic-lowering,function(expand-large-div-rem,expand-fp),amdgpu-remove-incompatible-functions,amdgpu-printf-runtime-binding,amdgpu-lower-ctor-dtor,function(amdgpu-uniform-intrinsic-combine),expand-variadics,amdgpu-always-inline,always-inline,amdgpu-export-kernel-runtime-handles,amdgpu-lower-exec-sync,amdgpu-sw-lower-lds,amdgpu-lower-module-lds,function(atomic-expand,verify,gc-lowering,lower-constant-intrinsics,unreachableblockelim,ee-instrument,scalarize-masked-mem-intrin,expand-reductions,amdgpu-lower-kernel-arguments),amdgpu-lower-buffer-fat-pointers,amdgpu-lower-intrinsics,cgscc(function(lower-switch,lower-invoke,unreachableblockelim,amdgpu-unify-divergent-exit-nodes,fix-irreducible,unify-loop-exits,StructurizeCFGPass,amdgpu-annotate-uniform,si-annotate-control-flow,amdgpu-rewrite-undef-for-phi,lcssa,require,callbr-prepare,safe-stack,stack-protector,verify)),cgscc(function(machine-function(amdgpu-isel,si-fix-sgpr-copies,si-i1-copies,finalize-isel,localstackalloc))),require,cgscc(function(machine-function(reg-usage-propagation,phi-node-elimination,two-address-instruction,regallocfast,si-fix-vgpr-copies,remove-redundant-debug-values,fixup-statepoint-caller-saved,prolog-epilog,post-ra-pseudos,si-post-ra-bundler,fentry-insert,xray-instrumentation,patchable-function,si-memory-legalizer,si-insert-waitcnts,si-mode-register,si-late-branch-lowering,post-RA-hazard-rec,amdgpu-wait-sgpr-hazards,amdgpu-lower-vgpr-encoding,branch-relaxation))),require,cgscc(function(machine-function(reg-usage-collector,remove-loads-into-fake-uses,live-debug-values,machine-sanmd,stack-frame-layout,verify),free-machine-function))
 
-; GCN-O2: 
require,require,require,require,pre-isel-intrinsic-lowering,function(expand-large-div-rem,expand-fp),amdgpu-remove-incompatible-functions,amdgpu-printf-runtime-binding,amdgpu-l

[llvm-branch-commits] [llvm] [AMDGPU] Make SIShrinkInstructions pass return valid changed state (PR #168833)

2025-11-30 Thread Vikram Hegde via llvm-branch-commits

https://github.com/vikramRH updated 
https://github.com/llvm/llvm-project/pull/168833

>From 7b56e305b5631a4c8fdc21524fd33fc2a2b46b8b Mon Sep 17 00:00:00 2001
From: vikhegde 
Date: Thu, 20 Nov 2025 11:18:25 +0530
Subject: [PATCH 1/2] [AMDGPU] Make SIShrinkInstructions pass return valid
 changed state

---
 .../Target/AMDGPU/SIShrinkInstructions.cpp| 95 ---
 1 file changed, 60 insertions(+), 35 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/SIShrinkInstructions.cpp 
b/llvm/lib/Target/AMDGPU/SIShrinkInstructions.cpp
index 1b78f67e76d07..0c25092e38ccd 100644
--- a/llvm/lib/Target/AMDGPU/SIShrinkInstructions.cpp
+++ b/llvm/lib/Target/AMDGPU/SIShrinkInstructions.cpp
@@ -41,10 +41,10 @@ class SIShrinkInstructions {
   bool isKUImmOperand(const MachineOperand &Src) const;
   bool isKImmOrKUImmOperand(const MachineOperand &Src, bool &IsUnsigned) const;
   void copyExtraImplicitOps(MachineInstr &NewMI, MachineInstr &MI) const;
-  void shrinkScalarCompare(MachineInstr &MI) const;
-  void shrinkMIMG(MachineInstr &MI) const;
-  void shrinkMadFma(MachineInstr &MI) const;
-  bool shrinkScalarLogicOp(MachineInstr &MI) const;
+  bool shrinkScalarCompare(MachineInstr &MI) const;
+  bool shrinkMIMG(MachineInstr &MI) const;
+  bool shrinkMadFma(MachineInstr &MI) const;
+  bool shrinkScalarLogicOp(MachineInstr &MI, bool &MoveIterator) const;
   bool tryReplaceDeadSDST(MachineInstr &MI) const;
   bool instAccessReg(iterator_range &&R,
  Register Reg, unsigned SubReg) const;
@@ -241,27 +241,30 @@ void 
SIShrinkInstructions::copyExtraImplicitOps(MachineInstr &NewMI,
   }
 }
 
-void SIShrinkInstructions::shrinkScalarCompare(MachineInstr &MI) const {
+bool SIShrinkInstructions::shrinkScalarCompare(MachineInstr &MI) const {
   if (!ST->hasSCmpK())
-return;
+return false;
 
   // cmpk instructions do scc = dst  imm16, so commute the instruction 
to
   // get constants on the RHS.
-  if (!MI.getOperand(0).isReg())
-TII->commuteInstruction(MI, false, 0, 1);
+  bool Changed = false;
+  if (!MI.getOperand(0).isReg()) {
+if (TII->commuteInstruction(MI, false, 0, 1))
+  Changed = true;
+  }
 
   // cmpk requires src0 to be a register
   const MachineOperand &Src0 = MI.getOperand(0);
   if (!Src0.isReg())
-return;
+return Changed;
 
   MachineOperand &Src1 = MI.getOperand(1);
   if (!Src1.isImm())
-return;
+return Changed;
 
   int SOPKOpc = AMDGPU::getSOPKOp(MI.getOpcode());
   if (SOPKOpc == -1)
-return;
+return Changed;
 
   // eq/ne is special because the imm16 can be treated as signed or unsigned,
   // and initially selected to the unsigned versions.
@@ -275,9 +278,10 @@ void 
SIShrinkInstructions::shrinkScalarCompare(MachineInstr &MI) const {
   }
 
   MI.setDesc(TII->get(SOPKOpc));
+  Changed = true;
 }
 
-return;
+return Changed;
   }
 
   const MCInstrDesc &NewDesc = TII->get(SOPKOpc);
@@ -287,14 +291,16 @@ void 
SIShrinkInstructions::shrinkScalarCompare(MachineInstr &MI) const {
 if (!SIInstrInfo::sopkIsZext(SOPKOpc))
   Src1.setImm(SignExtend64(Src1.getImm(), 32));
 MI.setDesc(NewDesc);
+Changed = true;
   }
+  return Changed;
 }
 
 // Shrink NSA encoded instructions with contiguous VGPRs to non-NSA encoding.
-void SIShrinkInstructions::shrinkMIMG(MachineInstr &MI) const {
+bool SIShrinkInstructions::shrinkMIMG(MachineInstr &MI) const {
   const AMDGPU::MIMGInfo *Info = AMDGPU::getMIMGInfo(MI.getOpcode());
   if (!Info)
-return;
+return false;
 
   uint8_t NewEncoding;
   switch (Info->MIMGEncoding) {
@@ -305,7 +311,7 @@ void SIShrinkInstructions::shrinkMIMG(MachineInstr &MI) 
const {
 NewEncoding = AMDGPU::MIMGEncGfx11Default;
 break;
   default:
-return;
+return false;
   }
 
   int VAddr0Idx =
@@ -359,7 +365,7 @@ void SIShrinkInstructions::shrinkMIMG(MachineInstr &MI) 
const {
 } else if (Vgpr == NextVgpr) {
   NextVgpr = Vgpr + Dwords;
 } else {
-  return;
+  return false;
 }
 
 if (!Op.isUndef())
@@ -369,7 +375,7 @@ void SIShrinkInstructions::shrinkMIMG(MachineInstr &MI) 
const {
   }
 
   if (VgprBase + NewAddrDwords > 256)
-return;
+return false;
 
   // Further check for implicit tied operands - this may be present if TFE is
   // enabled
@@ -408,21 +414,22 @@ void SIShrinkInstructions::shrinkMIMG(MachineInstr &MI) 
const {
 AMDGPU::getNamedOperandIdx(MI.getOpcode(), AMDGPU::OpName::vdata),
 ToUntie - (EndVAddr - 1));
   }
+  return true;
 }
 
 // Shrink MAD to MADAK/MADMK and FMA to FMAAK/FMAMK.
-void SIShrinkInstructions::shrinkMadFma(MachineInstr &MI) const {
+bool SIShrinkInstructions::shrinkMadFma(MachineInstr &MI) const {
   // Pre-GFX10 VOP3 instructions like MAD/FMA cannot take a literal operand so
   // there is no reason to try to shrink them.
   if (!ST->hasVOP3Literal())
-return;
+return false;
 
   // There is no advantage to doing this pre-RA.
   if (!IsPostRA)
-return;
+return false;
 
   

[llvm-branch-commits] [llvm] [AMDGPU] Make SIShrinkInstructions pass return valid changed state (PR #168833)

2025-11-30 Thread Vikram Hegde via llvm-branch-commits

https://github.com/vikramRH updated 
https://github.com/llvm/llvm-project/pull/168833

>From 7b56e305b5631a4c8fdc21524fd33fc2a2b46b8b Mon Sep 17 00:00:00 2001
From: vikhegde 
Date: Thu, 20 Nov 2025 11:18:25 +0530
Subject: [PATCH 1/2] [AMDGPU] Make SIShrinkInstructions pass return valid
 changed state

---
 .../Target/AMDGPU/SIShrinkInstructions.cpp| 95 ---
 1 file changed, 60 insertions(+), 35 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/SIShrinkInstructions.cpp 
b/llvm/lib/Target/AMDGPU/SIShrinkInstructions.cpp
index 1b78f67e76d07..0c25092e38ccd 100644
--- a/llvm/lib/Target/AMDGPU/SIShrinkInstructions.cpp
+++ b/llvm/lib/Target/AMDGPU/SIShrinkInstructions.cpp
@@ -41,10 +41,10 @@ class SIShrinkInstructions {
   bool isKUImmOperand(const MachineOperand &Src) const;
   bool isKImmOrKUImmOperand(const MachineOperand &Src, bool &IsUnsigned) const;
   void copyExtraImplicitOps(MachineInstr &NewMI, MachineInstr &MI) const;
-  void shrinkScalarCompare(MachineInstr &MI) const;
-  void shrinkMIMG(MachineInstr &MI) const;
-  void shrinkMadFma(MachineInstr &MI) const;
-  bool shrinkScalarLogicOp(MachineInstr &MI) const;
+  bool shrinkScalarCompare(MachineInstr &MI) const;
+  bool shrinkMIMG(MachineInstr &MI) const;
+  bool shrinkMadFma(MachineInstr &MI) const;
+  bool shrinkScalarLogicOp(MachineInstr &MI, bool &MoveIterator) const;
   bool tryReplaceDeadSDST(MachineInstr &MI) const;
   bool instAccessReg(iterator_range &&R,
  Register Reg, unsigned SubReg) const;
@@ -241,27 +241,30 @@ void 
SIShrinkInstructions::copyExtraImplicitOps(MachineInstr &NewMI,
   }
 }
 
-void SIShrinkInstructions::shrinkScalarCompare(MachineInstr &MI) const {
+bool SIShrinkInstructions::shrinkScalarCompare(MachineInstr &MI) const {
   if (!ST->hasSCmpK())
-return;
+return false;
 
   // cmpk instructions do scc = dst  imm16, so commute the instruction 
to
   // get constants on the RHS.
-  if (!MI.getOperand(0).isReg())
-TII->commuteInstruction(MI, false, 0, 1);
+  bool Changed = false;
+  if (!MI.getOperand(0).isReg()) {
+if (TII->commuteInstruction(MI, false, 0, 1))
+  Changed = true;
+  }
 
   // cmpk requires src0 to be a register
   const MachineOperand &Src0 = MI.getOperand(0);
   if (!Src0.isReg())
-return;
+return Changed;
 
   MachineOperand &Src1 = MI.getOperand(1);
   if (!Src1.isImm())
-return;
+return Changed;
 
   int SOPKOpc = AMDGPU::getSOPKOp(MI.getOpcode());
   if (SOPKOpc == -1)
-return;
+return Changed;
 
   // eq/ne is special because the imm16 can be treated as signed or unsigned,
   // and initially selected to the unsigned versions.
@@ -275,9 +278,10 @@ void 
SIShrinkInstructions::shrinkScalarCompare(MachineInstr &MI) const {
   }
 
   MI.setDesc(TII->get(SOPKOpc));
+  Changed = true;
 }
 
-return;
+return Changed;
   }
 
   const MCInstrDesc &NewDesc = TII->get(SOPKOpc);
@@ -287,14 +291,16 @@ void 
SIShrinkInstructions::shrinkScalarCompare(MachineInstr &MI) const {
 if (!SIInstrInfo::sopkIsZext(SOPKOpc))
   Src1.setImm(SignExtend64(Src1.getImm(), 32));
 MI.setDesc(NewDesc);
+Changed = true;
   }
+  return Changed;
 }
 
 // Shrink NSA encoded instructions with contiguous VGPRs to non-NSA encoding.
-void SIShrinkInstructions::shrinkMIMG(MachineInstr &MI) const {
+bool SIShrinkInstructions::shrinkMIMG(MachineInstr &MI) const {
   const AMDGPU::MIMGInfo *Info = AMDGPU::getMIMGInfo(MI.getOpcode());
   if (!Info)
-return;
+return false;
 
   uint8_t NewEncoding;
   switch (Info->MIMGEncoding) {
@@ -305,7 +311,7 @@ void SIShrinkInstructions::shrinkMIMG(MachineInstr &MI) 
const {
 NewEncoding = AMDGPU::MIMGEncGfx11Default;
 break;
   default:
-return;
+return false;
   }
 
   int VAddr0Idx =
@@ -359,7 +365,7 @@ void SIShrinkInstructions::shrinkMIMG(MachineInstr &MI) 
const {
 } else if (Vgpr == NextVgpr) {
   NextVgpr = Vgpr + Dwords;
 } else {
-  return;
+  return false;
 }
 
 if (!Op.isUndef())
@@ -369,7 +375,7 @@ void SIShrinkInstructions::shrinkMIMG(MachineInstr &MI) 
const {
   }
 
   if (VgprBase + NewAddrDwords > 256)
-return;
+return false;
 
   // Further check for implicit tied operands - this may be present if TFE is
   // enabled
@@ -408,21 +414,22 @@ void SIShrinkInstructions::shrinkMIMG(MachineInstr &MI) 
const {
 AMDGPU::getNamedOperandIdx(MI.getOpcode(), AMDGPU::OpName::vdata),
 ToUntie - (EndVAddr - 1));
   }
+  return true;
 }
 
 // Shrink MAD to MADAK/MADMK and FMA to FMAAK/FMAMK.
-void SIShrinkInstructions::shrinkMadFma(MachineInstr &MI) const {
+bool SIShrinkInstructions::shrinkMadFma(MachineInstr &MI) const {
   // Pre-GFX10 VOP3 instructions like MAD/FMA cannot take a literal operand so
   // there is no reason to try to shrink them.
   if (!ST->hasVOP3Literal())
-return;
+return false;
 
   // There is no advantage to doing this pre-RA.
   if (!IsPostRA)
-return;
+return false;
 
   

[llvm-branch-commits] [llvm] [NPM] Schedule PhysicalRegisterUsageAnalysis before RegUsageInfoCollectorPass (PR #168832)

2025-11-30 Thread Vikram Hegde via llvm-branch-commits

https://github.com/vikramRH updated 
https://github.com/llvm/llvm-project/pull/168832

>From 2136cfff87b421ecf44aa5cd9ff45374b385b6eb Mon Sep 17 00:00:00 2001
From: vikhegde 
Date: Tue, 18 Nov 2025 11:13:37 +0530
Subject: [PATCH] [NPM] Schedule PhysicalRegisterUsageAnalysis before
 RegUsageInfoCollectorPass

---
 llvm/include/llvm/Passes/CodeGenPassBuilder.h | 4 +++-
 llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll  | 6 +++---
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/llvm/include/llvm/Passes/CodeGenPassBuilder.h 
b/llvm/include/llvm/Passes/CodeGenPassBuilder.h
index 03777c7fcb45f..0e14f2e50ae04 100644
--- a/llvm/include/llvm/Passes/CodeGenPassBuilder.h
+++ b/llvm/include/llvm/Passes/CodeGenPassBuilder.h
@@ -1081,10 +1081,12 @@ Error CodeGenPassBuilder::addMachinePasses(
 
   derived().addPreEmitPass(addPass);
 
-  if (TM.Options.EnableIPRA)
+  if (TM.Options.EnableIPRA) {
 // Collect register usage information and produce a register mask of
 // clobbered registers, to be used to optimize call sites.
+addPass(RequireAnalysisPass());
 addPass(RegUsageInfoCollectorPass());
+  }
 
   addPass(FuncletLayoutPass());
 
diff --git a/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll 
b/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
index ba29a5c2a9a9d..667f8aef58459 100644
--- a/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
+++ b/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
@@ -9,11 +9,11 @@
 ; RUN:   | FileCheck -check-prefix=GCN-O3 %s
 
 
-; GCN-O0: 
require,require,require,require,pre-isel-intrinsic-lowering,function(expand-large-div-rem,expand-fp),amdgpu-remove-incompatible-functions,amdgpu-printf-runtime-binding,amdgpu-lower-ctor-dtor,function(amdgpu-uniform-intrinsic-combine),expand-variadics,amdgpu-always-inline,always-inline,amdgpu-export-kernel-runtime-handles,amdgpu-lower-exec-sync,amdgpu-sw-lower-lds,amdgpu-lower-module-lds,function(atomic-expand,verify,gc-lowering,lower-constant-intrinsics,unreachableblockelim,ee-instrument,scalarize-masked-mem-intrin,expand-reductions,amdgpu-lower-kernel-arguments),amdgpu-lower-buffer-fat-pointers,amdgpu-lower-intrinsics,cgscc(function(lower-switch,lower-invoke,unreachableblockelim,amdgpu-unify-divergent-exit-nodes,fix-irreducible,unify-loop-exits,StructurizeCFGPass,amdgpu-annotate-uniform,si-annotate-control-flow,amdgpu-rewrite-undef-for-phi,lcssa,require,callbr-prepare,safe-stack,stack-protector,verify)),cgscc(function(machine-function(amdgpu-isel,si-fix-sgpr-copies,si-i1-copies,finalize-isel,localstackalloc))),require,cgscc(function(machine-function(reg-usage-propagation,phi-node-elimination,two-address-instruction,regallocfast,si-fix-vgpr-copies,remove-redundant-debug-values,fixup-statepoint-caller-saved,prolog-epilog,post-ra-pseudos,si-post-ra-bundler,fentry-insert,xray-instrumentation,patchable-function,si-memory-legalizer,si-insert-waitcnts,si-mode-register,si-late-branch-lowering,post-RA-hazard-rec,amdgpu-wait-sgpr-hazards,amdgpu-lower-vgpr-encoding,branch-relaxation,reg-usage-collector,remove-loads-into-fake-uses,live-debug-values,machine-sanmd,stack-frame-layout,verify),free-machine-function))
+; GCN-O0: 
require,require,require,require,pre-isel-intrinsic-lowering,function(expand-large-div-rem,expand-fp),amdgpu-remove-incompatible-functions,amdgpu-printf-runtime-binding,amdgpu-lower-ctor-dtor,function(amdgpu-uniform-intrinsic-combine),expand-variadics,amdgpu-always-inline,always-inline,amdgpu-export-kernel-runtime-handles,amdgpu-lower-exec-sync,amdgpu-sw-lower-lds,amdgpu-lower-module-lds,function(atomic-expand,verify,gc-lowering,lower-constant-intrinsics,unreachableblockelim,ee-instrument,scalarize-masked-mem-intrin,expand-reductions,amdgpu-lower-kernel-arguments),amdgpu-lower-buffer-fat-pointers,amdgpu-lower-intrinsics,cgscc(function(lower-switch,lower-invoke,unreachableblockelim,amdgpu-unify-divergent-exit-nodes,fix-irreducible,unify-loop-exits,StructurizeCFGPass,amdgpu-annotate-uniform,si-annotate-control-flow,amdgpu-rewrite-undef-for-phi,lcssa,require,callbr-prepare,safe-stack,stack-protector,verify)),cgscc(function(machine-function(amdgpu-isel,si-fix-sgpr-copies,si-i1-copies,finalize-isel,localstackalloc))),require,cgscc(function(machine-function(reg-usage-propagation,phi-node-elimination,two-address-instruction,regallocfast,si-fix-vgpr-copies,remove-redundant-debug-values,fixup-statepoint-caller-saved,prolog-epilog,post-ra-pseudos,si-post-ra-bundler,fentry-insert,xray-instrumentation,patchable-function,si-memory-legalizer,si-insert-waitcnts,si-mode-register,si-late-branch-lowering,post-RA-hazard-rec,amdgpu-wait-sgpr-hazards,amdgpu-lower-vgpr-encoding,branch-relaxation))),require,cgscc(function(machine-function(reg-usage-collector,remove-loads-into-fake-uses,live-debug-values,machine-sanmd,stack-frame-layout,verify),free-machine-function))
 
-; GCN-O2: 
require,require,require,require,pre-isel-intrinsic-lowering,function(expand-large-div-rem,expand-fp),amdgpu-remove-incompatible-functions,amdgpu-printf-runtime-binding,amdgpu-l