[llvm-branch-commits] [llvm] f5bd5bf - Revert "Revert "LangRef: Clarify llvm.minnum and llvm.maxnum about sNaN and s…"
Author: Matt Arsenault Date: 2025-11-30T21:23:53-05:00 New Revision: f5bd5bf4484a7e424a7046957571351d8c60294b URL: https://github.com/llvm/llvm-project/commit/f5bd5bf4484a7e424a7046957571351d8c60294b DIFF: https://github.com/llvm/llvm-project/commit/f5bd5bf4484a7e424a7046957571351d8c60294b.diff LOG: Revert "Revert "LangRef: Clarify llvm.minnum and llvm.maxnum about sNaN and s…" This reverts commit 75aa01b89553bf4213a3b0e83829b6d0689941b9. Added: Modified: llvm/docs/LangRef.rst llvm/include/llvm/CodeGen/ISDOpcodes.h Removed: diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst index a57351f9598e2..02865f8a29c67 100644 --- a/llvm/docs/LangRef.rst +++ b/llvm/docs/LangRef.rst @@ -17298,9 +17298,8 @@ LLVM Implementation: LLVM implements all ISO C flavors as listed in this table, except in the -default floating-point environment exceptions are ignored and return value -is non-deterministic if one or both inputs are sNaN. The constrained -versions of the intrinsics respect the exception behavior and sNaN. +default floating-point environment exceptions are ignored. The constrained +versions of the intrinsics respect the exception behavior. .. list-table:: :header-rows: 1 @@ -17332,7 +17331,7 @@ versions of the intrinsics respect the exception behavior and sNaN. - qNaN, invalid exception * - ``+0.0 vs -0.0`` - - either one + - +0.0(max)/-0.0(min) - +0.0(max)/-0.0(min) - +0.0(max)/-0.0(min) @@ -17376,22 +17375,30 @@ type. Semantics: "" +Follows the semantics of minNum in IEEE-754-2008, except that -0.0 < +0.0 for the purposes +of this intrinsic. As for signaling NaNs, per the minNum semantics, if either operand is sNaN, +the result is qNaN. This matches the recommended behavior for the libm +function ``fmin``, although not all implementations have implemented these recommended behaviors. + +If either operand is a qNaN, returns the other non-NaN operand. Returns NaN only if both operands are +NaN or if either operand is sNaN. Note that arithmetic on an sNaN doesn't consistently produce a qNaN, +so arithmetic feeding into a minnum can produce inconsistent results. For example, +``minnum(fadd(sNaN, -0.0), 1.0)`` can produce qNaN or 1.0 depending on whether ``fadd`` is folded. -Follows the IEEE-754-2008 semantics for minNum, except for handling of -signaling NaNs. This matches the behavior of libm's fmin. +IEEE-754-2008 defines minNum, and it was removed in IEEE-754-2019. As the replacement, IEEE-754-2019 +defines :ref:`minimumNumber `. -If either operand is a NaN, returns the other non-NaN operand. Returns -NaN only if both operands are NaN. If the operands compare equal, -returns either one of the operands. For example, this means that -fmin(+0.0, -0.0) non-deterministically returns either operand (-0.0 -or 0.0). +If the intrinsic is marked with the nsz attribute, then the effect is as in the definition in C +and IEEE-754-2008: the result of ``minnum(-0.0, +0.0)`` may be either -0.0 or +0.0. -Unlike the IEEE-754-2008 behavior, this does not distinguish between -signaling and quiet NaN inputs. If a target's implementation follows -the standard and returns a quiet NaN if either input is a signaling -NaN, the intrinsic lowering is responsible for quieting the inputs to -correctly return the non-NaN input (e.g. by using the equivalent of -``llvm.canonicalize``). +Some architectures, such as ARMv8 (FMINNM), LoongArch (fmin), MIPSr6 (min.fmt), PowerPC/VSX (xsmindp), +have instructions that match these semantics exactly; thus it is quite simple for these architectures. +Some architectures have similar ones while they are not exact equivalent. Such as x86 implements ``MINPS``, +which implements the semantics of C code ``a`. -Unlike the IEEE-754-2008 behavior, this does not distinguish between -signaling and quiet NaN inputs. If a target's implementation follows -the standard and returns a quiet NaN if either input is a signaling -NaN, the intrinsic lowering is responsible for quieting the inputs to -correctly return the non-NaN input (e.g. by using the equivalent of -``llvm.canonicalize``). +If the intrinsic is marked with the nsz attribute, then the effect is as in the definition in C +and IEEE-754-2008: the result of maxnum(-0.0, +0.0) may be either -0.0 or +0.0. + +Some architectures, such as ARMv8 (FMAXNM), LoongArch (fmax), MIPSr6 (max.fmt), PowerPC/VSX (xsmaxdp), +have instructions that match these semantics exactly; thus it is quite simple for these architectures. +Some architectures have similar ones while they are not exact equivalent. Such as x86 implements ``MAXPS``, +which implements the semantics of C code ``a>b?a:b``: NUM vs qNaN always return qNaN. ``MAXPS`` can be used +if ``nsz`` and ``nnan`` are given. + +For existing libc implementations, the behaviors of fmin may be quite diff erent
[llvm-branch-commits] [clang] [flang] [llvm] [openmp] [Flang] Move builtin .mod generation into runtimes (Reapply #137828) (PR #169638)
https://github.com/Meinersbur ready_for_review https://github.com/llvm/llvm-project/pull/169638 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [flang] [llvm] [openmp] [Flang] Move builtin .mod generation into runtimes (Reapply #137828) (PR #169638)
https://github.com/Meinersbur edited https://github.com/llvm/llvm-project/pull/169638 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [mlir][arith] Add support for `negf` to `ArithToAPFloat` (PR #169759)
@@ -482,6 +525,7 @@ void ArithToAPFloatConversionPass::runOnOperation() {
patterns.add>(context, getOperation(),
/*isUnsigned=*/true);
patterns.add(context, getOperation());
+ patterns.add(context, getOperation());
kuhar wrote:
Add these together in one call?
https://github.com/llvm/llvm-project/pull/169759
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [mlir][arith] Add support for `negf` to `ArithToAPFloat` (PR #169759)
https://github.com/kuhar approved this pull request. https://github.com/llvm/llvm-project/pull/169759 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [mlir][arith] Add support for min/max to `ArithToAPFloat` (PR #169760)
https://github.com/kuhar approved this pull request. https://github.com/llvm/llvm-project/pull/169760 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [flang] [llvm] [openmp] [Flang] Move builtin .mod generation into runtimes (Reapply #137828) (PR #169638)
https://github.com/Meinersbur edited https://github.com/llvm/llvm-project/pull/169638 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [NPM] Schedule PhysicalRegisterUsageAnalysis before RegUsageInfoCollectorPass (PR #168832)
https://github.com/vikramRH updated
https://github.com/llvm/llvm-project/pull/168832
>From 2136cfff87b421ecf44aa5cd9ff45374b385b6eb Mon Sep 17 00:00:00 2001
From: vikhegde
Date: Tue, 18 Nov 2025 11:13:37 +0530
Subject: [PATCH] [NPM] Schedule PhysicalRegisterUsageAnalysis before
RegUsageInfoCollectorPass
---
llvm/include/llvm/Passes/CodeGenPassBuilder.h | 4 +++-
llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll | 6 +++---
2 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/llvm/include/llvm/Passes/CodeGenPassBuilder.h
b/llvm/include/llvm/Passes/CodeGenPassBuilder.h
index 03777c7fcb45f..0e14f2e50ae04 100644
--- a/llvm/include/llvm/Passes/CodeGenPassBuilder.h
+++ b/llvm/include/llvm/Passes/CodeGenPassBuilder.h
@@ -1081,10 +1081,12 @@ Error CodeGenPassBuilder::addMachinePasses(
derived().addPreEmitPass(addPass);
- if (TM.Options.EnableIPRA)
+ if (TM.Options.EnableIPRA) {
// Collect register usage information and produce a register mask of
// clobbered registers, to be used to optimize call sites.
+addPass(RequireAnalysisPass());
addPass(RegUsageInfoCollectorPass());
+ }
addPass(FuncletLayoutPass());
diff --git a/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
b/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
index ba29a5c2a9a9d..667f8aef58459 100644
--- a/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
+++ b/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
@@ -9,11 +9,11 @@
; RUN: | FileCheck -check-prefix=GCN-O3 %s
-; GCN-O0:
require,require,require,require,pre-isel-intrinsic-lowering,function(expand-large-div-rem,expand-fp),amdgpu-remove-incompatible-functions,amdgpu-printf-runtime-binding,amdgpu-lower-ctor-dtor,function(amdgpu-uniform-intrinsic-combine),expand-variadics,amdgpu-always-inline,always-inline,amdgpu-export-kernel-runtime-handles,amdgpu-lower-exec-sync,amdgpu-sw-lower-lds,amdgpu-lower-module-lds,function(atomic-expand,verify,gc-lowering,lower-constant-intrinsics,unreachableblockelim,ee-instrument,scalarize-masked-mem-intrin,expand-reductions,amdgpu-lower-kernel-arguments),amdgpu-lower-buffer-fat-pointers,amdgpu-lower-intrinsics,cgscc(function(lower-switch,lower-invoke,unreachableblockelim,amdgpu-unify-divergent-exit-nodes,fix-irreducible,unify-loop-exits,StructurizeCFGPass,amdgpu-annotate-uniform,si-annotate-control-flow,amdgpu-rewrite-undef-for-phi,lcssa,require,callbr-prepare,safe-stack,stack-protector,verify)),cgscc(function(machine-function(amdgpu-isel,si-fix-sgpr-copies,si-i1-copies,finalize-isel,localstackalloc))),require,cgscc(function(machine-function(reg-usage-propagation,phi-node-elimination,two-address-instruction,regallocfast,si-fix-vgpr-copies,remove-redundant-debug-values,fixup-statepoint-caller-saved,prolog-epilog,post-ra-pseudos,si-post-ra-bundler,fentry-insert,xray-instrumentation,patchable-function,si-memory-legalizer,si-insert-waitcnts,si-mode-register,si-late-branch-lowering,post-RA-hazard-rec,amdgpu-wait-sgpr-hazards,amdgpu-lower-vgpr-encoding,branch-relaxation,reg-usage-collector,remove-loads-into-fake-uses,live-debug-values,machine-sanmd,stack-frame-layout,verify),free-machine-function))
+; GCN-O0:
require,require,require,require,pre-isel-intrinsic-lowering,function(expand-large-div-rem,expand-fp),amdgpu-remove-incompatible-functions,amdgpu-printf-runtime-binding,amdgpu-lower-ctor-dtor,function(amdgpu-uniform-intrinsic-combine),expand-variadics,amdgpu-always-inline,always-inline,amdgpu-export-kernel-runtime-handles,amdgpu-lower-exec-sync,amdgpu-sw-lower-lds,amdgpu-lower-module-lds,function(atomic-expand,verify,gc-lowering,lower-constant-intrinsics,unreachableblockelim,ee-instrument,scalarize-masked-mem-intrin,expand-reductions,amdgpu-lower-kernel-arguments),amdgpu-lower-buffer-fat-pointers,amdgpu-lower-intrinsics,cgscc(function(lower-switch,lower-invoke,unreachableblockelim,amdgpu-unify-divergent-exit-nodes,fix-irreducible,unify-loop-exits,StructurizeCFGPass,amdgpu-annotate-uniform,si-annotate-control-flow,amdgpu-rewrite-undef-for-phi,lcssa,require,callbr-prepare,safe-stack,stack-protector,verify)),cgscc(function(machine-function(amdgpu-isel,si-fix-sgpr-copies,si-i1-copies,finalize-isel,localstackalloc))),require,cgscc(function(machine-function(reg-usage-propagation,phi-node-elimination,two-address-instruction,regallocfast,si-fix-vgpr-copies,remove-redundant-debug-values,fixup-statepoint-caller-saved,prolog-epilog,post-ra-pseudos,si-post-ra-bundler,fentry-insert,xray-instrumentation,patchable-function,si-memory-legalizer,si-insert-waitcnts,si-mode-register,si-late-branch-lowering,post-RA-hazard-rec,amdgpu-wait-sgpr-hazards,amdgpu-lower-vgpr-encoding,branch-relaxation))),require,cgscc(function(machine-function(reg-usage-collector,remove-loads-into-fake-uses,live-debug-values,machine-sanmd,stack-frame-layout,verify),free-machine-function))
-; GCN-O2:
require,require,require,require,pre-isel-intrinsic-lowering,function(expand-large-div-rem,expand-fp),amdgpu-remove-incompatible-functions,amdgpu-printf-runtime-binding,amdgpu-l
[llvm-branch-commits] [llvm] [AMDGPU] Make SIShrinkInstructions pass return valid changed state (PR #168833)
https://github.com/vikramRH updated
https://github.com/llvm/llvm-project/pull/168833
>From 7b56e305b5631a4c8fdc21524fd33fc2a2b46b8b Mon Sep 17 00:00:00 2001
From: vikhegde
Date: Thu, 20 Nov 2025 11:18:25 +0530
Subject: [PATCH 1/2] [AMDGPU] Make SIShrinkInstructions pass return valid
changed state
---
.../Target/AMDGPU/SIShrinkInstructions.cpp| 95 ---
1 file changed, 60 insertions(+), 35 deletions(-)
diff --git a/llvm/lib/Target/AMDGPU/SIShrinkInstructions.cpp
b/llvm/lib/Target/AMDGPU/SIShrinkInstructions.cpp
index 1b78f67e76d07..0c25092e38ccd 100644
--- a/llvm/lib/Target/AMDGPU/SIShrinkInstructions.cpp
+++ b/llvm/lib/Target/AMDGPU/SIShrinkInstructions.cpp
@@ -41,10 +41,10 @@ class SIShrinkInstructions {
bool isKUImmOperand(const MachineOperand &Src) const;
bool isKImmOrKUImmOperand(const MachineOperand &Src, bool &IsUnsigned) const;
void copyExtraImplicitOps(MachineInstr &NewMI, MachineInstr &MI) const;
- void shrinkScalarCompare(MachineInstr &MI) const;
- void shrinkMIMG(MachineInstr &MI) const;
- void shrinkMadFma(MachineInstr &MI) const;
- bool shrinkScalarLogicOp(MachineInstr &MI) const;
+ bool shrinkScalarCompare(MachineInstr &MI) const;
+ bool shrinkMIMG(MachineInstr &MI) const;
+ bool shrinkMadFma(MachineInstr &MI) const;
+ bool shrinkScalarLogicOp(MachineInstr &MI, bool &MoveIterator) const;
bool tryReplaceDeadSDST(MachineInstr &MI) const;
bool instAccessReg(iterator_range &&R,
Register Reg, unsigned SubReg) const;
@@ -241,27 +241,30 @@ void
SIShrinkInstructions::copyExtraImplicitOps(MachineInstr &NewMI,
}
}
-void SIShrinkInstructions::shrinkScalarCompare(MachineInstr &MI) const {
+bool SIShrinkInstructions::shrinkScalarCompare(MachineInstr &MI) const {
if (!ST->hasSCmpK())
-return;
+return false;
// cmpk instructions do scc = dst imm16, so commute the instruction
to
// get constants on the RHS.
- if (!MI.getOperand(0).isReg())
-TII->commuteInstruction(MI, false, 0, 1);
+ bool Changed = false;
+ if (!MI.getOperand(0).isReg()) {
+if (TII->commuteInstruction(MI, false, 0, 1))
+ Changed = true;
+ }
// cmpk requires src0 to be a register
const MachineOperand &Src0 = MI.getOperand(0);
if (!Src0.isReg())
-return;
+return Changed;
MachineOperand &Src1 = MI.getOperand(1);
if (!Src1.isImm())
-return;
+return Changed;
int SOPKOpc = AMDGPU::getSOPKOp(MI.getOpcode());
if (SOPKOpc == -1)
-return;
+return Changed;
// eq/ne is special because the imm16 can be treated as signed or unsigned,
// and initially selected to the unsigned versions.
@@ -275,9 +278,10 @@ void
SIShrinkInstructions::shrinkScalarCompare(MachineInstr &MI) const {
}
MI.setDesc(TII->get(SOPKOpc));
+ Changed = true;
}
-return;
+return Changed;
}
const MCInstrDesc &NewDesc = TII->get(SOPKOpc);
@@ -287,14 +291,16 @@ void
SIShrinkInstructions::shrinkScalarCompare(MachineInstr &MI) const {
if (!SIInstrInfo::sopkIsZext(SOPKOpc))
Src1.setImm(SignExtend64(Src1.getImm(), 32));
MI.setDesc(NewDesc);
+Changed = true;
}
+ return Changed;
}
// Shrink NSA encoded instructions with contiguous VGPRs to non-NSA encoding.
-void SIShrinkInstructions::shrinkMIMG(MachineInstr &MI) const {
+bool SIShrinkInstructions::shrinkMIMG(MachineInstr &MI) const {
const AMDGPU::MIMGInfo *Info = AMDGPU::getMIMGInfo(MI.getOpcode());
if (!Info)
-return;
+return false;
uint8_t NewEncoding;
switch (Info->MIMGEncoding) {
@@ -305,7 +311,7 @@ void SIShrinkInstructions::shrinkMIMG(MachineInstr &MI)
const {
NewEncoding = AMDGPU::MIMGEncGfx11Default;
break;
default:
-return;
+return false;
}
int VAddr0Idx =
@@ -359,7 +365,7 @@ void SIShrinkInstructions::shrinkMIMG(MachineInstr &MI)
const {
} else if (Vgpr == NextVgpr) {
NextVgpr = Vgpr + Dwords;
} else {
- return;
+ return false;
}
if (!Op.isUndef())
@@ -369,7 +375,7 @@ void SIShrinkInstructions::shrinkMIMG(MachineInstr &MI)
const {
}
if (VgprBase + NewAddrDwords > 256)
-return;
+return false;
// Further check for implicit tied operands - this may be present if TFE is
// enabled
@@ -408,21 +414,22 @@ void SIShrinkInstructions::shrinkMIMG(MachineInstr &MI)
const {
AMDGPU::getNamedOperandIdx(MI.getOpcode(), AMDGPU::OpName::vdata),
ToUntie - (EndVAddr - 1));
}
+ return true;
}
// Shrink MAD to MADAK/MADMK and FMA to FMAAK/FMAMK.
-void SIShrinkInstructions::shrinkMadFma(MachineInstr &MI) const {
+bool SIShrinkInstructions::shrinkMadFma(MachineInstr &MI) const {
// Pre-GFX10 VOP3 instructions like MAD/FMA cannot take a literal operand so
// there is no reason to try to shrink them.
if (!ST->hasVOP3Literal())
-return;
+return false;
// There is no advantage to doing this pre-RA.
if (!IsPostRA)
-return;
+return false;
[llvm-branch-commits] [llvm] [AMDGPU] Make SIShrinkInstructions pass return valid changed state (PR #168833)
https://github.com/vikramRH updated
https://github.com/llvm/llvm-project/pull/168833
>From 7b56e305b5631a4c8fdc21524fd33fc2a2b46b8b Mon Sep 17 00:00:00 2001
From: vikhegde
Date: Thu, 20 Nov 2025 11:18:25 +0530
Subject: [PATCH 1/2] [AMDGPU] Make SIShrinkInstructions pass return valid
changed state
---
.../Target/AMDGPU/SIShrinkInstructions.cpp| 95 ---
1 file changed, 60 insertions(+), 35 deletions(-)
diff --git a/llvm/lib/Target/AMDGPU/SIShrinkInstructions.cpp
b/llvm/lib/Target/AMDGPU/SIShrinkInstructions.cpp
index 1b78f67e76d07..0c25092e38ccd 100644
--- a/llvm/lib/Target/AMDGPU/SIShrinkInstructions.cpp
+++ b/llvm/lib/Target/AMDGPU/SIShrinkInstructions.cpp
@@ -41,10 +41,10 @@ class SIShrinkInstructions {
bool isKUImmOperand(const MachineOperand &Src) const;
bool isKImmOrKUImmOperand(const MachineOperand &Src, bool &IsUnsigned) const;
void copyExtraImplicitOps(MachineInstr &NewMI, MachineInstr &MI) const;
- void shrinkScalarCompare(MachineInstr &MI) const;
- void shrinkMIMG(MachineInstr &MI) const;
- void shrinkMadFma(MachineInstr &MI) const;
- bool shrinkScalarLogicOp(MachineInstr &MI) const;
+ bool shrinkScalarCompare(MachineInstr &MI) const;
+ bool shrinkMIMG(MachineInstr &MI) const;
+ bool shrinkMadFma(MachineInstr &MI) const;
+ bool shrinkScalarLogicOp(MachineInstr &MI, bool &MoveIterator) const;
bool tryReplaceDeadSDST(MachineInstr &MI) const;
bool instAccessReg(iterator_range &&R,
Register Reg, unsigned SubReg) const;
@@ -241,27 +241,30 @@ void
SIShrinkInstructions::copyExtraImplicitOps(MachineInstr &NewMI,
}
}
-void SIShrinkInstructions::shrinkScalarCompare(MachineInstr &MI) const {
+bool SIShrinkInstructions::shrinkScalarCompare(MachineInstr &MI) const {
if (!ST->hasSCmpK())
-return;
+return false;
// cmpk instructions do scc = dst imm16, so commute the instruction
to
// get constants on the RHS.
- if (!MI.getOperand(0).isReg())
-TII->commuteInstruction(MI, false, 0, 1);
+ bool Changed = false;
+ if (!MI.getOperand(0).isReg()) {
+if (TII->commuteInstruction(MI, false, 0, 1))
+ Changed = true;
+ }
// cmpk requires src0 to be a register
const MachineOperand &Src0 = MI.getOperand(0);
if (!Src0.isReg())
-return;
+return Changed;
MachineOperand &Src1 = MI.getOperand(1);
if (!Src1.isImm())
-return;
+return Changed;
int SOPKOpc = AMDGPU::getSOPKOp(MI.getOpcode());
if (SOPKOpc == -1)
-return;
+return Changed;
// eq/ne is special because the imm16 can be treated as signed or unsigned,
// and initially selected to the unsigned versions.
@@ -275,9 +278,10 @@ void
SIShrinkInstructions::shrinkScalarCompare(MachineInstr &MI) const {
}
MI.setDesc(TII->get(SOPKOpc));
+ Changed = true;
}
-return;
+return Changed;
}
const MCInstrDesc &NewDesc = TII->get(SOPKOpc);
@@ -287,14 +291,16 @@ void
SIShrinkInstructions::shrinkScalarCompare(MachineInstr &MI) const {
if (!SIInstrInfo::sopkIsZext(SOPKOpc))
Src1.setImm(SignExtend64(Src1.getImm(), 32));
MI.setDesc(NewDesc);
+Changed = true;
}
+ return Changed;
}
// Shrink NSA encoded instructions with contiguous VGPRs to non-NSA encoding.
-void SIShrinkInstructions::shrinkMIMG(MachineInstr &MI) const {
+bool SIShrinkInstructions::shrinkMIMG(MachineInstr &MI) const {
const AMDGPU::MIMGInfo *Info = AMDGPU::getMIMGInfo(MI.getOpcode());
if (!Info)
-return;
+return false;
uint8_t NewEncoding;
switch (Info->MIMGEncoding) {
@@ -305,7 +311,7 @@ void SIShrinkInstructions::shrinkMIMG(MachineInstr &MI)
const {
NewEncoding = AMDGPU::MIMGEncGfx11Default;
break;
default:
-return;
+return false;
}
int VAddr0Idx =
@@ -359,7 +365,7 @@ void SIShrinkInstructions::shrinkMIMG(MachineInstr &MI)
const {
} else if (Vgpr == NextVgpr) {
NextVgpr = Vgpr + Dwords;
} else {
- return;
+ return false;
}
if (!Op.isUndef())
@@ -369,7 +375,7 @@ void SIShrinkInstructions::shrinkMIMG(MachineInstr &MI)
const {
}
if (VgprBase + NewAddrDwords > 256)
-return;
+return false;
// Further check for implicit tied operands - this may be present if TFE is
// enabled
@@ -408,21 +414,22 @@ void SIShrinkInstructions::shrinkMIMG(MachineInstr &MI)
const {
AMDGPU::getNamedOperandIdx(MI.getOpcode(), AMDGPU::OpName::vdata),
ToUntie - (EndVAddr - 1));
}
+ return true;
}
// Shrink MAD to MADAK/MADMK and FMA to FMAAK/FMAMK.
-void SIShrinkInstructions::shrinkMadFma(MachineInstr &MI) const {
+bool SIShrinkInstructions::shrinkMadFma(MachineInstr &MI) const {
// Pre-GFX10 VOP3 instructions like MAD/FMA cannot take a literal operand so
// there is no reason to try to shrink them.
if (!ST->hasVOP3Literal())
-return;
+return false;
// There is no advantage to doing this pre-RA.
if (!IsPostRA)
-return;
+return false;
[llvm-branch-commits] [llvm] [NPM] Schedule PhysicalRegisterUsageAnalysis before RegUsageInfoCollectorPass (PR #168832)
https://github.com/vikramRH updated
https://github.com/llvm/llvm-project/pull/168832
>From 2136cfff87b421ecf44aa5cd9ff45374b385b6eb Mon Sep 17 00:00:00 2001
From: vikhegde
Date: Tue, 18 Nov 2025 11:13:37 +0530
Subject: [PATCH] [NPM] Schedule PhysicalRegisterUsageAnalysis before
RegUsageInfoCollectorPass
---
llvm/include/llvm/Passes/CodeGenPassBuilder.h | 4 +++-
llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll | 6 +++---
2 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/llvm/include/llvm/Passes/CodeGenPassBuilder.h
b/llvm/include/llvm/Passes/CodeGenPassBuilder.h
index 03777c7fcb45f..0e14f2e50ae04 100644
--- a/llvm/include/llvm/Passes/CodeGenPassBuilder.h
+++ b/llvm/include/llvm/Passes/CodeGenPassBuilder.h
@@ -1081,10 +1081,12 @@ Error CodeGenPassBuilder::addMachinePasses(
derived().addPreEmitPass(addPass);
- if (TM.Options.EnableIPRA)
+ if (TM.Options.EnableIPRA) {
// Collect register usage information and produce a register mask of
// clobbered registers, to be used to optimize call sites.
+addPass(RequireAnalysisPass());
addPass(RegUsageInfoCollectorPass());
+ }
addPass(FuncletLayoutPass());
diff --git a/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
b/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
index ba29a5c2a9a9d..667f8aef58459 100644
--- a/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
+++ b/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
@@ -9,11 +9,11 @@
; RUN: | FileCheck -check-prefix=GCN-O3 %s
-; GCN-O0:
require,require,require,require,pre-isel-intrinsic-lowering,function(expand-large-div-rem,expand-fp),amdgpu-remove-incompatible-functions,amdgpu-printf-runtime-binding,amdgpu-lower-ctor-dtor,function(amdgpu-uniform-intrinsic-combine),expand-variadics,amdgpu-always-inline,always-inline,amdgpu-export-kernel-runtime-handles,amdgpu-lower-exec-sync,amdgpu-sw-lower-lds,amdgpu-lower-module-lds,function(atomic-expand,verify,gc-lowering,lower-constant-intrinsics,unreachableblockelim,ee-instrument,scalarize-masked-mem-intrin,expand-reductions,amdgpu-lower-kernel-arguments),amdgpu-lower-buffer-fat-pointers,amdgpu-lower-intrinsics,cgscc(function(lower-switch,lower-invoke,unreachableblockelim,amdgpu-unify-divergent-exit-nodes,fix-irreducible,unify-loop-exits,StructurizeCFGPass,amdgpu-annotate-uniform,si-annotate-control-flow,amdgpu-rewrite-undef-for-phi,lcssa,require,callbr-prepare,safe-stack,stack-protector,verify)),cgscc(function(machine-function(amdgpu-isel,si-fix-sgpr-copies,si-i1-copies,finalize-isel,localstackalloc))),require,cgscc(function(machine-function(reg-usage-propagation,phi-node-elimination,two-address-instruction,regallocfast,si-fix-vgpr-copies,remove-redundant-debug-values,fixup-statepoint-caller-saved,prolog-epilog,post-ra-pseudos,si-post-ra-bundler,fentry-insert,xray-instrumentation,patchable-function,si-memory-legalizer,si-insert-waitcnts,si-mode-register,si-late-branch-lowering,post-RA-hazard-rec,amdgpu-wait-sgpr-hazards,amdgpu-lower-vgpr-encoding,branch-relaxation,reg-usage-collector,remove-loads-into-fake-uses,live-debug-values,machine-sanmd,stack-frame-layout,verify),free-machine-function))
+; GCN-O0:
require,require,require,require,pre-isel-intrinsic-lowering,function(expand-large-div-rem,expand-fp),amdgpu-remove-incompatible-functions,amdgpu-printf-runtime-binding,amdgpu-lower-ctor-dtor,function(amdgpu-uniform-intrinsic-combine),expand-variadics,amdgpu-always-inline,always-inline,amdgpu-export-kernel-runtime-handles,amdgpu-lower-exec-sync,amdgpu-sw-lower-lds,amdgpu-lower-module-lds,function(atomic-expand,verify,gc-lowering,lower-constant-intrinsics,unreachableblockelim,ee-instrument,scalarize-masked-mem-intrin,expand-reductions,amdgpu-lower-kernel-arguments),amdgpu-lower-buffer-fat-pointers,amdgpu-lower-intrinsics,cgscc(function(lower-switch,lower-invoke,unreachableblockelim,amdgpu-unify-divergent-exit-nodes,fix-irreducible,unify-loop-exits,StructurizeCFGPass,amdgpu-annotate-uniform,si-annotate-control-flow,amdgpu-rewrite-undef-for-phi,lcssa,require,callbr-prepare,safe-stack,stack-protector,verify)),cgscc(function(machine-function(amdgpu-isel,si-fix-sgpr-copies,si-i1-copies,finalize-isel,localstackalloc))),require,cgscc(function(machine-function(reg-usage-propagation,phi-node-elimination,two-address-instruction,regallocfast,si-fix-vgpr-copies,remove-redundant-debug-values,fixup-statepoint-caller-saved,prolog-epilog,post-ra-pseudos,si-post-ra-bundler,fentry-insert,xray-instrumentation,patchable-function,si-memory-legalizer,si-insert-waitcnts,si-mode-register,si-late-branch-lowering,post-RA-hazard-rec,amdgpu-wait-sgpr-hazards,amdgpu-lower-vgpr-encoding,branch-relaxation))),require,cgscc(function(machine-function(reg-usage-collector,remove-loads-into-fake-uses,live-debug-values,machine-sanmd,stack-frame-layout,verify),free-machine-function))
-; GCN-O2:
require,require,require,require,pre-isel-intrinsic-lowering,function(expand-large-div-rem,expand-fp),amdgpu-remove-incompatible-functions,amdgpu-printf-runtime-binding,amdgpu-l
