date:20251024

[llvm-branch-commits] [SpecialCaseList] Filtering Globs with matching prefix and suffix (PR #164543)

2025-10-24 Thread Florian Mayer via llvm-branch-commits


https://github.com/fmayer approved this pull request.


https://github.com/llvm/llvm-project/pull/164543
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [FlowSensitive] Allow callback to initialize storagelocations (PR #164675)

2025-10-24 Thread Florian Mayer via llvm-branch-commits


https://github.com/fmayer converted_to_draft 
https://github.com/llvm/llvm-project/pull/164675
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [FlowSensitive] [StatusOr] [8/N] Support value ctor and assignment (PR #163894)

2025-10-24 Thread Florian Mayer via llvm-branch-commits


https://github.com/fmayer updated 
https://github.com/llvm/llvm-project/pull/163894

>From a410f9239726cb16960f047c67054b183035a361 Mon Sep 17 00:00:00 2001
From: Florian Mayer 
Date: Thu, 16 Oct 2025 17:27:24 -0700
Subject: [PATCH] fix test

Created using spr 1.3.7
---
 .../FlowSensitive/Models/UncheckedOptionalAccessModel.cpp   | 6 +++---
 .../FlowSensitive/Models/UncheckedStatusOrAccessModel.cpp   | 6 ++
 2 files changed, 5 insertions(+), 7 deletions(-)

diff --git 
a/clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp 
b/clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
index bb703eff4baff..0fa333eedcfdd 100644
--- a/clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
+++ b/clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
@@ -241,9 +241,9 @@ auto nulloptTypeDecl() {
 auto hasNulloptType() { return hasType(nulloptTypeDecl()); }
 
 auto inPlaceClass() {
-  return recordDecl(hasAnyName("std::in_place_t", "absl::in_place_t",
-   "base::in_place_t", "folly::in_place_t",
-   "bsl::in_place_t"));
+  return namedDecl(hasAnyName("std::in_place_t", "absl::in_place_t",
+  "base::in_place_t", "folly::in_place_t",
+  "bsl::in_place_t"));
 }
 
 auto isOptionalNulloptConstructor() {
diff --git 
a/clang/lib/Analysis/FlowSensitive/Models/UncheckedStatusOrAccessModel.cpp 
b/clang/lib/Analysis/FlowSensitive/Models/UncheckedStatusOrAccessModel.cpp
index c1d9e8d202f3d..542c35433d3de 100644
--- a/clang/lib/Analysis/FlowSensitive/Models/UncheckedStatusOrAccessModel.cpp
+++ b/clang/lib/Analysis/FlowSensitive/Models/UncheckedStatusOrAccessModel.cpp
@@ -177,10 +177,8 @@ static auto isStatusOrValueConstructor() {
   hasArgument(0,
   anyOf(hasType(hasCanonicalType(type(equalsBoundNode("T",
 nullPointerConstant(),
-hasType(namedDecl(hasName("absl::in_place_t"))),
-hasType(namedDecl(hasName("std::in_place_t")))
-
-)));
+hasType(namedDecl(hasAnyName("absl::in_place_t",
+ "std::in_place_t"));
 }
 
 static auto

___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [FlowSensitive] [StatusOr] [8/N] Support value ctor and assignment (PR #163894)

2025-10-24 Thread Florian Mayer via llvm-branch-commits


https://github.com/fmayer updated 
https://github.com/llvm/llvm-project/pull/163894

>From a410f9239726cb16960f047c67054b183035a361 Mon Sep 17 00:00:00 2001
From: Florian Mayer 
Date: Thu, 16 Oct 2025 17:27:24 -0700
Subject: [PATCH] fix test

Created using spr 1.3.7
---
 .../FlowSensitive/Models/UncheckedOptionalAccessModel.cpp   | 6 +++---
 .../FlowSensitive/Models/UncheckedStatusOrAccessModel.cpp   | 6 ++
 2 files changed, 5 insertions(+), 7 deletions(-)

diff --git 
a/clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp 
b/clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
index bb703eff4baff..0fa333eedcfdd 100644
--- a/clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
+++ b/clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
@@ -241,9 +241,9 @@ auto nulloptTypeDecl() {
 auto hasNulloptType() { return hasType(nulloptTypeDecl()); }
 
 auto inPlaceClass() {
-  return recordDecl(hasAnyName("std::in_place_t", "absl::in_place_t",
-   "base::in_place_t", "folly::in_place_t",
-   "bsl::in_place_t"));
+  return namedDecl(hasAnyName("std::in_place_t", "absl::in_place_t",
+  "base::in_place_t", "folly::in_place_t",
+  "bsl::in_place_t"));
 }
 
 auto isOptionalNulloptConstructor() {
diff --git 
a/clang/lib/Analysis/FlowSensitive/Models/UncheckedStatusOrAccessModel.cpp 
b/clang/lib/Analysis/FlowSensitive/Models/UncheckedStatusOrAccessModel.cpp
index c1d9e8d202f3d..542c35433d3de 100644
--- a/clang/lib/Analysis/FlowSensitive/Models/UncheckedStatusOrAccessModel.cpp
+++ b/clang/lib/Analysis/FlowSensitive/Models/UncheckedStatusOrAccessModel.cpp
@@ -177,10 +177,8 @@ static auto isStatusOrValueConstructor() {
   hasArgument(0,
   anyOf(hasType(hasCanonicalType(type(equalsBoundNode("T",
 nullPointerConstant(),
-hasType(namedDecl(hasName("absl::in_place_t"))),
-hasType(namedDecl(hasName("std::in_place_t")))
-
-)));
+hasType(namedDecl(hasAnyName("absl::in_place_t",
+ "std::in_place_t"));
 }
 
 static auto

___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [FlowSensitive] [StatusOr] [8/N] Support value ctor and assignment (PR #163894)

2025-10-24 Thread Florian Mayer via llvm-branch-commits



@@ -177,6 +177,27 @@ static auto isPointerComparisonOperatorCall(std::string 
operator_name) {
 pointee(anyOf(statusOrType(), statusType(;
 }
 
+static auto isStatusOrValueAssignmentCall() {
+  using namespace ::clang::ast_matchers; // NOLINT: Too many names
+  return cxxOperatorCallExpr(
+  hasOverloadedOperatorName("="),
+  callee(cxxMethodDecl(ofClass(statusOrClass(,
+  hasArgument(1, anyOf(hasType(hasUnqualifiedDesugaredType(
+   type(equalsBoundNode("T",
+   nullPointerConstant(;

fmayer wrote:

added comment for now. we can revisit improving this

https://github.com/llvm/llvm-project/pull/163894
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] X86: Make sure compiler-rt div calls are not added for msvc (PR #164591)

2025-10-24 Thread Reid Kleckner via llvm-branch-commits


https://github.com/rnk approved this pull request.

We should just ship the clang_rt.builtins everywhere at this point. It's in the 
issue tracker somewhere, but I haven't been able to make it a priority.

https://github.com/llvm/llvm-project/pull/164591
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU] Record old VGPR MSBs in the high bits of s_set_vgpr_msb (PR #165035)

2025-10-24 Thread Stanislav Mekhanoshin via llvm-branch-commits


https://github.com/rampitec updated 
https://github.com/llvm/llvm-project/pull/165035

>From 6bcab853c394f67ba4f0b317ac7c24030c20dd4d Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Fri, 24 Oct 2025 13:06:11 -0700
Subject: [PATCH] [AMDGPU] Record old VGPR MSBs in the high bits of
 s_set_vgpr_msb

Fixes: SWDEV-562450
---
 .../Target/AMDGPU/AMDGPULowerVGPREncoding.cpp |  16 +-
 llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp  |   2 +-
 .../MCTargetDesc/AMDGPUMCTargetDesc.cpp   |   2 +-
 .../AMDGPU/vgpr-lowering-gfx1250-t16.mir  |   4 +-
 .../CodeGen/AMDGPU/vgpr-lowering-gfx1250.mir  | 137 +-
 .../CodeGen/AMDGPU/whole-wave-functions.ll|  36 ++---
 6 files changed, 104 insertions(+), 93 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPULowerVGPREncoding.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPULowerVGPREncoding.cpp
index 9b932273b2216..d7d0292083e1c 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULowerVGPREncoding.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPULowerVGPREncoding.cpp
@@ -58,6 +58,8 @@ class AMDGPULowerVGPREncoding {
   static constexpr unsigned BitsPerField = 2;
   static constexpr unsigned NumFields = 4;
   static constexpr unsigned FieldMask = (1 << BitsPerField) - 1;
+  static constexpr unsigned ModeWidth = NumFields * BitsPerField;
+  static constexpr unsigned ModeMask = (1 << ModeWidth) - 1;
   using ModeType = PackedVector>;
 
@@ -152,13 +154,21 @@ bool AMDGPULowerVGPREncoding::setMode(ModeTy NewMode, 
ModeTy Mask,
 CurrentMode |= NewMode;
 CurrentMask |= Mask;
 
-MostRecentModeSet->getOperand(0).setImm(CurrentMode);
+MachineOperand &Op = MostRecentModeSet->getOperand(0);
+
+// Carry old mode bits from the existing instruction.
+int64_t OldModeBits = Op.getImm() & (ModeMask << ModeWidth);
+
+Op.setImm(CurrentMode | OldModeBits);
 return true;
   }
 
+  // Record previous mode into high 8 bits of the immediate.
+  int64_t OldModeBits = CurrentMode << ModeWidth;
+
   I = handleClause(I);
-  MostRecentModeSet =
-  BuildMI(*MBB, I, {}, TII->get(AMDGPU::S_SET_VGPR_MSB)).addImm(NewMode);
+  MostRecentModeSet = BuildMI(*MBB, I, {}, TII->get(AMDGPU::S_SET_VGPR_MSB))
+  .addImm(NewMode | OldModeBits);
 
   CurrentMode = NewMode;
   CurrentMask = Mask;
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp
index 680e7eb3de6be..844649ebb9ae6 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp
@@ -412,7 +412,7 @@ void AMDGPUAsmPrinter::emitInstruction(const MachineInstr 
*MI) {
  *OutStreamer);
 
 if (isVerbose() && MI->getOpcode() == AMDGPU::S_SET_VGPR_MSB) {
-  unsigned V = MI->getOperand(0).getImm();
+  unsigned V = MI->getOperand(0).getImm() & 0xff;
   OutStreamer->AddComment(
   " msbs: dst=" + Twine(V >> 6) + " src0=" + Twine(V & 3) +
   " src1=" + Twine((V >> 2) & 3) + " src2=" + Twine((V >> 4) & 3));
diff --git a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCTargetDesc.cpp 
b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCTargetDesc.cpp
index 013cfeb364048..28b4da8ab9ebb 100644
--- a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCTargetDesc.cpp
+++ b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCTargetDesc.cpp
@@ -168,7 +168,7 @@ bool AMDGPUMCInstrAnalysis::evaluateBranch(const MCInst 
&Inst, uint64_t Addr,
 
 void AMDGPUMCInstrAnalysis::updateState(const MCInst &Inst, uint64_t Addr) {
   if (Inst.getOpcode() == AMDGPU::S_SET_VGPR_MSB_gfx12)
-VgprMSBs = Inst.getOperand(0).getImm();
+VgprMSBs = Inst.getOperand(0).getImm() & 0xff;
   else if (isTerminator(Inst))
 VgprMSBs = 0;
 }
diff --git a/llvm/test/CodeGen/AMDGPU/vgpr-lowering-gfx1250-t16.mir 
b/llvm/test/CodeGen/AMDGPU/vgpr-lowering-gfx1250-t16.mir
index 8a70a8acd28d3..32cc398740d62 100644
--- a/llvm/test/CodeGen/AMDGPU/vgpr-lowering-gfx1250-t16.mir
+++ b/llvm/test/CodeGen/AMDGPU/vgpr-lowering-gfx1250-t16.mir
@@ -36,7 +36,7 @@ body: |
 ; GCN-NEXT: v_add_f16_e64 v128.l /*v384.l*/, v129.l /*v385.l*/, v130.l 
/*v386.l*/
 $vgpr384_lo16 = V_ADD_F16_t16_e64 0, undef $vgpr385_lo16, 0, undef 
$vgpr386_lo16, 0, 0, 0, implicit $exec, implicit $mode
 
-; GCN-NEXT: s_set_vgpr_msb 0x8a
+; GCN-NEXT: s_set_vgpr_msb 0x458a
 ; ASM-SAME: ;  msbs: dst=2 src0=2 
src1=2 src2=0
 ; GCN-NEXT: v_add_f16_e64 v0.h /*v512.h*/, v1.h /*v513.h*/, v2.h /*v514.h*/
 $vgpr512_hi16 = V_ADD_F16_t16_e64 0, undef $vgpr513_hi16, 0, undef 
$vgpr514_hi16, 0, 0, 0, implicit $exec, implicit $mode
@@ -50,7 +50,7 @@ body: |
 ; GCN-NEXT: v_add_f16_e64 v128.l /*v640.l*/, v129.l /*v641.l*/, v130.l 
/*v642.l*/
 $vgpr640_lo16 = V_ADD_F16_t16_e64 0, undef $vgpr641_lo16, 0, undef 
$vgpr642_lo16, 0, 0, 0, implicit $exec, implicit $mode
 
-; GCN-NEXT: s_set_vgpr_msb 0xcf
+; GCN-NEXT: s_set_vgpr_msb 0x8acf
 ; ASM-SAME:

[llvm-branch-commits] [llvm] [AMDGPU] Record old VGPR MSBs in the high bits of s_set_vgpr_msb (PR #165035)

2025-10-24 Thread Stanislav Mekhanoshin via llvm-branch-commits


https://github.com/rampitec updated 
https://github.com/llvm/llvm-project/pull/165035

>From 6bcab853c394f67ba4f0b317ac7c24030c20dd4d Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Fri, 24 Oct 2025 13:06:11 -0700
Subject: [PATCH] [AMDGPU] Record old VGPR MSBs in the high bits of
 s_set_vgpr_msb

Fixes: SWDEV-562450
---
 .../Target/AMDGPU/AMDGPULowerVGPREncoding.cpp |  16 +-
 llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp  |   2 +-
 .../MCTargetDesc/AMDGPUMCTargetDesc.cpp   |   2 +-
 .../AMDGPU/vgpr-lowering-gfx1250-t16.mir  |   4 +-
 .../CodeGen/AMDGPU/vgpr-lowering-gfx1250.mir  | 137 +-
 .../CodeGen/AMDGPU/whole-wave-functions.ll|  36 ++---
 6 files changed, 104 insertions(+), 93 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPULowerVGPREncoding.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPULowerVGPREncoding.cpp
index 9b932273b2216..d7d0292083e1c 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULowerVGPREncoding.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPULowerVGPREncoding.cpp
@@ -58,6 +58,8 @@ class AMDGPULowerVGPREncoding {
   static constexpr unsigned BitsPerField = 2;
   static constexpr unsigned NumFields = 4;
   static constexpr unsigned FieldMask = (1 << BitsPerField) - 1;
+  static constexpr unsigned ModeWidth = NumFields * BitsPerField;
+  static constexpr unsigned ModeMask = (1 << ModeWidth) - 1;
   using ModeType = PackedVector>;
 
@@ -152,13 +154,21 @@ bool AMDGPULowerVGPREncoding::setMode(ModeTy NewMode, 
ModeTy Mask,
 CurrentMode |= NewMode;
 CurrentMask |= Mask;
 
-MostRecentModeSet->getOperand(0).setImm(CurrentMode);
+MachineOperand &Op = MostRecentModeSet->getOperand(0);
+
+// Carry old mode bits from the existing instruction.
+int64_t OldModeBits = Op.getImm() & (ModeMask << ModeWidth);
+
+Op.setImm(CurrentMode | OldModeBits);
 return true;
   }
 
+  // Record previous mode into high 8 bits of the immediate.
+  int64_t OldModeBits = CurrentMode << ModeWidth;
+
   I = handleClause(I);
-  MostRecentModeSet =
-  BuildMI(*MBB, I, {}, TII->get(AMDGPU::S_SET_VGPR_MSB)).addImm(NewMode);
+  MostRecentModeSet = BuildMI(*MBB, I, {}, TII->get(AMDGPU::S_SET_VGPR_MSB))
+  .addImm(NewMode | OldModeBits);
 
   CurrentMode = NewMode;
   CurrentMask = Mask;
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp
index 680e7eb3de6be..844649ebb9ae6 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp
@@ -412,7 +412,7 @@ void AMDGPUAsmPrinter::emitInstruction(const MachineInstr 
*MI) {
  *OutStreamer);
 
 if (isVerbose() && MI->getOpcode() == AMDGPU::S_SET_VGPR_MSB) {
-  unsigned V = MI->getOperand(0).getImm();
+  unsigned V = MI->getOperand(0).getImm() & 0xff;
   OutStreamer->AddComment(
   " msbs: dst=" + Twine(V >> 6) + " src0=" + Twine(V & 3) +
   " src1=" + Twine((V >> 2) & 3) + " src2=" + Twine((V >> 4) & 3));
diff --git a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCTargetDesc.cpp 
b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCTargetDesc.cpp
index 013cfeb364048..28b4da8ab9ebb 100644
--- a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCTargetDesc.cpp
+++ b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCTargetDesc.cpp
@@ -168,7 +168,7 @@ bool AMDGPUMCInstrAnalysis::evaluateBranch(const MCInst 
&Inst, uint64_t Addr,
 
 void AMDGPUMCInstrAnalysis::updateState(const MCInst &Inst, uint64_t Addr) {
   if (Inst.getOpcode() == AMDGPU::S_SET_VGPR_MSB_gfx12)
-VgprMSBs = Inst.getOperand(0).getImm();
+VgprMSBs = Inst.getOperand(0).getImm() & 0xff;
   else if (isTerminator(Inst))
 VgprMSBs = 0;
 }
diff --git a/llvm/test/CodeGen/AMDGPU/vgpr-lowering-gfx1250-t16.mir 
b/llvm/test/CodeGen/AMDGPU/vgpr-lowering-gfx1250-t16.mir
index 8a70a8acd28d3..32cc398740d62 100644
--- a/llvm/test/CodeGen/AMDGPU/vgpr-lowering-gfx1250-t16.mir
+++ b/llvm/test/CodeGen/AMDGPU/vgpr-lowering-gfx1250-t16.mir
@@ -36,7 +36,7 @@ body: |
 ; GCN-NEXT: v_add_f16_e64 v128.l /*v384.l*/, v129.l /*v385.l*/, v130.l 
/*v386.l*/
 $vgpr384_lo16 = V_ADD_F16_t16_e64 0, undef $vgpr385_lo16, 0, undef 
$vgpr386_lo16, 0, 0, 0, implicit $exec, implicit $mode
 
-; GCN-NEXT: s_set_vgpr_msb 0x8a
+; GCN-NEXT: s_set_vgpr_msb 0x458a
 ; ASM-SAME: ;  msbs: dst=2 src0=2 
src1=2 src2=0
 ; GCN-NEXT: v_add_f16_e64 v0.h /*v512.h*/, v1.h /*v513.h*/, v2.h /*v514.h*/
 $vgpr512_hi16 = V_ADD_F16_t16_e64 0, undef $vgpr513_hi16, 0, undef 
$vgpr514_hi16, 0, 0, 0, implicit $exec, implicit $mode
@@ -50,7 +50,7 @@ body: |
 ; GCN-NEXT: v_add_f16_e64 v128.l /*v640.l*/, v129.l /*v641.l*/, v130.l 
/*v642.l*/
 $vgpr640_lo16 = V_ADD_F16_t16_e64 0, undef $vgpr641_lo16, 0, undef 
$vgpr642_lo16, 0, 0, 0, implicit $exec, implicit $mode
 
-; GCN-NEXT: s_set_vgpr_msb 0xcf
+; GCN-NEXT: s_set_vgpr_msb 0x8acf
 ; ASM-SAME:

[llvm-branch-commits] [llvm] [DirectX] Add DXIL validation of `llvm.loop` metadata (PR #164292)

2025-10-24 Thread Helena Kotas via llvm-branch-commits



@@ -0,0 +1,58 @@
+; RUN: split-file %s %t
+; RUN: opt -S --dxil-translate-metadata %t/not-distinct.ll 2>&1 | FileCheck 
%t/not-distinct.ll
+; RUN: opt -S --dxil-translate-metadata %t/not-md.ll 2>&1 | FileCheck 
%t/not-md.ll
+
+; Test that DXIL incompatible loop metadata is stripped
+
+;--- not-distinct.ll
+
+; Ensure it is stripped because it is not provided a distinct loop parent
+; CHECK-NOT: {!"llvm.loop.unroll.disable"}
+
+target triple = "dxilv1.0-unknown-shadermodel6.0-library"
+
+define void @example_loop(i32 %n) {
+entry:
+  br label %loop.header
+
+loop.header:
+  %i = phi i32 [ 0, %entry ], [ %i.next, %loop.body ]
+  %cmp = icmp slt i32 %i, %n
+  br i1 %cmp, label %loop.body, label %exit
+
+loop.body:
+  %i.next = add nsw i32 %i, 1
+  br label %loop.header, !llvm.loop !1
+
+exit:
+  ret void
+}
+
+!1 = !{!"llvm.loop.unroll.disable"} ; first node must be a distinct 
self-reference
+
+
+;--- not-md.ll
+
+target triple = "dxilv1.0-unknown-shadermodel6.0-library"
+
+define void @example_loop(i32 %n) {
+entry:
+  br label %loop.header
+
+loop.header:
+  %i = phi i32 [ 0, %entry ], [ %i.next, %loop.body ]
+  %cmp = icmp slt i32 %i, %n
+  br i1 %cmp, label %loop.body, label %exit
+
+loop.body:
+  %i.next = add nsw i32 %i, 1
+  ; CHECK: br label %loop.header, !llvm.loop ![[#LOOP_MD:]]
+  br label %loop.header, !llvm.loop !1
+
+exit:
+  ret void
+}
+
+; CHECK: ![[#LOOP_MD:]] = distinct !{![[#LOOP_MD]]}
+
+!1 = !{!1, i32 0} ; not a metadata node

hekota wrote:

```suggestion
!1 = !{!1, i32 0} ; second operand is not a metadata node
```

https://github.com/llvm/llvm-project/pull/164292
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [DirectX] Add DXIL validation of `llvm.loop` metadata (PR #164292)

2025-10-24 Thread Helena Kotas via llvm-branch-commits



@@ -314,16 +327,106 @@ static void translateBranchMetadata(Module &M, 
Instruction *BBTerminatorInst) {
   BBTerminatorInst->setMetadata("hlsl.controlflow.hint", nullptr);
 }
 
-static std::array getCompatibleInstructionMDs(llvm::Module &M) {
+// Determines if the metadata node will be compatible with DXIL's loop metadata
+// representation.
+//
+// Reports an error for compatible metadata that is ill-formed.
+static bool isLoopMDCompatible(Module &M, Metadata *MD) {
+  // DXIL only accepts the following loop hints:
+  std::array ValidHintNames = {"llvm.loop.unroll.count",
+ "llvm.loop.unroll.disable",
+ "llvm.loop.unroll.full"};
+
+  MDNode *HintMD = dyn_cast(MD);
+  if (!HintMD || HintMD->getNumOperands() == 0)
+return false;
+
+  auto *HintStr = dyn_cast(HintMD->getOperand(0));
+  if (!HintStr)
+return false;
+
+  if (!llvm::is_contained(ValidHintNames, HintStr->getString()))
+return false;
+
+  auto ValidCountNode = [](MDNode *CountMD) -> bool {
+if (CountMD->getNumOperands() == 2)
+  if (auto *Count = dyn_cast(CountMD->getOperand(1)))
+if (isa(Count->getValue()))
+  return true;
+return false;
+  };
+
+  if (HintStr->getString() == "llvm.loop.unroll.count") {
+if (!ValidCountNode(HintMD))
+  return reportLoopError(M, "Second operand of \"llvm.loop.unroll.count\" "
+"must be a constant integer");
+  } else if (HintMD->getNumOperands() != 1)
+return reportLoopError(
+M, "\"llvm.loop.unroll.disable\" and \"llvm.loop.unroll.disable\" "
+   "must be provided as a single operand");
+
+  return true;
+}
+
+static void translateLoopMetadata(Module &M, Instruction *I, MDNode *BaseMD) {
+  // A distinct node has the self-referential form: !0 = !{ !0, ... }
+  auto IsDistinctNode = [](MDNode *Node) -> bool {
+return Node && Node->getNumOperands() != 0 && Node == Node->getOperand(0);
+  };
+
+  // Strip empty metadata or a non-distinct node
+  if (BaseMD->getNumOperands() == 0 || !IsDistinctNode(BaseMD))
+return I->setMetadata("llvm.loop", nullptr);
+
+  // It is valid to have a chain of self-refential loop metadata nodes, as
+  // below. We will collapse these into just one when we reconstruct the
+  // metadata.
+  //
+  // Eg:
+  // !0 = !{!0, !1}
+  // !1 = !{!1, !2}
+  // !2 = !{!"llvm.loop.unroll.disable"}
+  //
+  // So, traverse down a potential self-referential chain
+  while (1 < BaseMD->getNumOperands() &&
+ IsDistinctNode(dyn_cast(BaseMD->getOperand(1
+BaseMD = dyn_cast(BaseMD->getOperand(1));
+
+  // To reconstruct a distinct node we create a temporary node that we will
+  // then update to create a self-reference.
+  llvm::TempMDTuple TempNode = llvm::MDNode::getTemporary(M.getContext(), {});
+  SmallVector CompatibleOperands = {TempNode.get()};
+
+  // Iterate and reconstruct the metadata nodes that contains any hints,
+  // stripping any unrecognized metadata.
+  ArrayRef Operands = BaseMD->operands();
+  for (auto &Op : Operands.drop_front())
+if (isLoopMDCompatible(M, Op.get()))
+  CompatibleOperands.push_back(Op.get());

hekota wrote:

Please add a test case with unrecognized loop metadata hint that gets stripped.

https://github.com/llvm/llvm-project/pull/164292
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [DirectX] Add DXIL validation of `llvm.loop` metadata (PR #164292)

2025-10-24 Thread Helena Kotas via llvm-branch-commits



@@ -314,16 +327,106 @@ static void translateBranchMetadata(Module &M, 
Instruction *BBTerminatorInst) {
   BBTerminatorInst->setMetadata("hlsl.controlflow.hint", nullptr);
 }
 
-static std::array getCompatibleInstructionMDs(llvm::Module &M) {
+// Determines if the metadata node will be compatible with DXIL's loop metadata
+// representation.
+//
+// Reports an error for compatible metadata that is ill-formed.
+static bool isLoopMDCompatible(Module &M, Metadata *MD) {
+  // DXIL only accepts the following loop hints:
+  std::array ValidHintNames = {"llvm.loop.unroll.count",
+ "llvm.loop.unroll.disable",
+ "llvm.loop.unroll.full"};
+
+  MDNode *HintMD = dyn_cast(MD);
+  if (!HintMD || HintMD->getNumOperands() == 0)
+return false;
+
+  auto *HintStr = dyn_cast(HintMD->getOperand(0));
+  if (!HintStr)
+return false;
+
+  if (!llvm::is_contained(ValidHintNames, HintStr->getString()))
+return false;
+
+  auto ValidCountNode = [](MDNode *CountMD) -> bool {
+if (CountMD->getNumOperands() == 2)
+  if (auto *Count = dyn_cast(CountMD->getOperand(1)))
+if (isa(Count->getValue()))
+  return true;
+return false;
+  };
+
+  if (HintStr->getString() == "llvm.loop.unroll.count") {
+if (!ValidCountNode(HintMD))
+  return reportLoopError(M, "Second operand of \"llvm.loop.unroll.count\" "
+"must be a constant integer");
+  } else if (HintMD->getNumOperands() != 1)
+return reportLoopError(
+M, "\"llvm.loop.unroll.disable\" and \"llvm.loop.unroll.disable\" "
+   "must be provided as a single operand");
+
+  return true;

hekota wrote:

It would be cleaner if `reportError` just reported the error and did not return 
`true`/`false`. It should be up to the `isLoopMDCompatible` routine to do that. 

https://github.com/llvm/llvm-project/pull/164292
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] a3d3339 - Merge branch 'main' into revert-164551-re-land

2025-10-24 Thread via llvm-branch-commits


Author: SahilPatidar
Date: 2025-10-25T09:35:02+05:30
New Revision: a3d3339c96708a26132ec050ee23531e9b54204f

URL: 
https://github.com/llvm/llvm-project/commit/a3d3339c96708a26132ec050ee23531e9b54204f
DIFF: 
https://github.com/llvm/llvm-project/commit/a3d3339c96708a26132ec050ee23531e9b54204f.diff

LOG: Merge branch 'main' into revert-164551-re-land

Added: 


Modified: 
llvm/include/llvm/ADT/RadixTree.h

Removed: 




diff  --git a/llvm/include/llvm/ADT/RadixTree.h 
b/llvm/include/llvm/ADT/RadixTree.h
index d3c44e4e6345c..a65acddf186b7 100644
--- a/llvm/include/llvm/ADT/RadixTree.h
+++ b/llvm/include/llvm/ADT/RadixTree.h
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 
 namespace llvm {
 



___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] a7c38b8 - [ADT][NFC] Add missing #include (#165068)

2025-10-24 Thread via llvm-branch-commits


Author: Jordan Rupprecht
Date: 2025-10-25T03:58:46Z
New Revision: a7c38b8a9c7feb94dc7f500e62c4197b8089da05

URL: 
https://github.com/llvm/llvm-project/commit/a7c38b8a9c7feb94dc7f500e62c4197b8089da05
DIFF: 
https://github.com/llvm/llvm-project/commit/a7c38b8a9c7feb94dc7f500e62c4197b8089da05.diff

LOG: [ADT][NFC] Add missing #include  (#165068)

Added in #164524. Fails when using libc++ in a mode that prunes
transitive headers.

Added: 


Modified: 
llvm/include/llvm/ADT/RadixTree.h

Removed: 




diff  --git a/llvm/include/llvm/ADT/RadixTree.h 
b/llvm/include/llvm/ADT/RadixTree.h
index d3c44e4e6345c..a65acddf186b7 100644
--- a/llvm/include/llvm/ADT/RadixTree.h
+++ b/llvm/include/llvm/ADT/RadixTree.h
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 
 namespace llvm {
 



___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [DirectX] Add DXIL validation of `llvm.loop` metadata (PR #164292)

2025-10-24 Thread Helena Kotas via llvm-branch-commits



@@ -314,16 +327,106 @@ static void translateBranchMetadata(Module &M, 
Instruction *BBTerminatorInst) {
   BBTerminatorInst->setMetadata("hlsl.controlflow.hint", nullptr);
 }
 
-static std::array getCompatibleInstructionMDs(llvm::Module &M) {
+// Determines if the metadata node will be compatible with DXIL's loop metadata
+// representation.
+//
+// Reports an error for compatible metadata that is ill-formed.
+static bool isLoopMDCompatible(Module &M, Metadata *MD) {
+  // DXIL only accepts the following loop hints:
+  std::array ValidHintNames = {"llvm.loop.unroll.count",
+ "llvm.loop.unroll.disable",
+ "llvm.loop.unroll.full"};
+
+  MDNode *HintMD = dyn_cast(MD);
+  if (!HintMD || HintMD->getNumOperands() == 0)
+return false;
+
+  auto *HintStr = dyn_cast(HintMD->getOperand(0));
+  if (!HintStr)
+return false;
+
+  if (!llvm::is_contained(ValidHintNames, HintStr->getString()))
+return false;
+
+  auto ValidCountNode = [](MDNode *CountMD) -> bool {
+if (CountMD->getNumOperands() == 2)
+  if (auto *Count = dyn_cast(CountMD->getOperand(1)))
+if (isa(Count->getValue()))
+  return true;
+return false;
+  };
+
+  if (HintStr->getString() == "llvm.loop.unroll.count") {
+if (!ValidCountNode(HintMD))
+  return reportLoopError(M, "Second operand of \"llvm.loop.unroll.count\" "
+"must be a constant integer");
+  } else if (HintMD->getNumOperands() != 1)
+return reportLoopError(
+M, "\"llvm.loop.unroll.disable\" and \"llvm.loop.unroll.disable\" "
+   "must be provided as a single operand");
+
+  return true;
+}
+
+static void translateLoopMetadata(Module &M, Instruction *I, MDNode *BaseMD) {
+  // A distinct node has the self-referential form: !0 = !{ !0, ... }
+  auto IsDistinctNode = [](MDNode *Node) -> bool {
+return Node && Node->getNumOperands() != 0 && Node == Node->getOperand(0);
+  };
+
+  // Strip empty metadata or a non-distinct node
+  if (BaseMD->getNumOperands() == 0 || !IsDistinctNode(BaseMD))
+return I->setMetadata("llvm.loop", nullptr);
+
+  // It is valid to have a chain of self-refential loop metadata nodes, as
+  // below. We will collapse these into just one when we reconstruct the
+  // metadata.
+  //
+  // Eg:
+  // !0 = !{!0, !1}
+  // !1 = !{!1, !2}
+  // !2 = !{!"llvm.loop.unroll.disable"}
+  //
+  // So, traverse down a potential self-referential chain
+  while (1 < BaseMD->getNumOperands() &&
+ IsDistinctNode(dyn_cast(BaseMD->getOperand(1
+BaseMD = dyn_cast(BaseMD->getOperand(1));
+
+  // To reconstruct a distinct node we create a temporary node that we will
+  // then update to create a self-reference.
+  llvm::TempMDTuple TempNode = llvm::MDNode::getTemporary(M.getContext(), {});
+  SmallVector CompatibleOperands = {TempNode.get()};
+
+  // Iterate and reconstruct the metadata nodes that contains any hints,
+  // stripping any unrecognized metadata.
+  ArrayRef Operands = BaseMD->operands();
+  for (auto &Op : Operands.drop_front())
+if (isLoopMDCompatible(M, Op.get()))
+  CompatibleOperands.push_back(Op.get());

hekota wrote:

Oh, I see you have one, nevermind!

https://github.com/llvm/llvm-project/pull/164292
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU] Record old VGPR MSBs in the high bits of s_set_vgpr_msb (PR #165035)

2025-10-24 Thread Stanislav Mekhanoshin via llvm-branch-commits


https://github.com/rampitec created 
https://github.com/llvm/llvm-project/pull/165035

Fixes: SWDEV-562450

>From 80fd780fb149c2561ffa164f66f2f97bc5dc90b3 Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Fri, 24 Oct 2025 13:06:11 -0700
Subject: [PATCH] [AMDGPU] Record old VGPR MSBs in the high bits of
 s_set_vgpr_msb

Fixes: SWDEV-562450
---
 .../Target/AMDGPU/AMDGPULowerVGPREncoding.cpp |  16 ++-
 llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp  |   2 +-
 .../MCTargetDesc/AMDGPUMCTargetDesc.cpp   |   2 +-
 .../AMDGPU/vgpr-lowering-gfx1250-t16.mir  |   4 +-
 .../CodeGen/AMDGPU/vgpr-lowering-gfx1250.mir  | 135 +-
 .../CodeGen/AMDGPU/whole-wave-functions.ll|  36 ++---
 6 files changed, 103 insertions(+), 92 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPULowerVGPREncoding.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPULowerVGPREncoding.cpp
index 0be1dd0817605..f9f0bc619d9f7 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULowerVGPREncoding.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPULowerVGPREncoding.cpp
@@ -58,6 +58,8 @@ class AMDGPULowerVGPREncoding {
   static constexpr unsigned BitsPerField = 2;
   static constexpr unsigned NumFields = 4;
   static constexpr unsigned FieldMask = (1 << BitsPerField) - 1;
+  static constexpr unsigned ModeWidth = NumFields * BitsPerField;
+  static constexpr unsigned ModeMask = (1 << ModeWidth) - 1;
   using ModeType = PackedVector>;
 
@@ -152,13 +154,21 @@ bool AMDGPULowerVGPREncoding::setMode(ModeTy NewMode, 
ModeTy Mask,
 CurrentMode |= NewMode;
 CurrentMask |= Mask;
 
-MostRecentModeSet->getOperand(0).setImm(CurrentMode);
+MachineOperand &Op = MostRecentModeSet->getOperand(0);
+
+// Carry old mode bits from the existing instruction.
+int64_t OldModeBits = Op.getImm() & (ModeMask << ModeWidth);
+
+Op.setImm(CurrentMode | OldModeBits);
 return true;
   }
 
+  // Record previous mode into high 8 bits of the immediate.
+  int64_t OldModeBits = CurrentMode << ModeWidth;
+
   I = handleClause(I);
-  MostRecentModeSet =
-  BuildMI(*MBB, I, {}, TII->get(AMDGPU::S_SET_VGPR_MSB)).addImm(NewMode);
+  MostRecentModeSet = BuildMI(*MBB, I, {}, TII->get(AMDGPU::S_SET_VGPR_MSB))
+  .addImm(NewMode | OldModeBits);
 
   CurrentMode = NewMode;
   CurrentMask = Mask;
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp
index 680e7eb3de6be..844649ebb9ae6 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp
@@ -412,7 +412,7 @@ void AMDGPUAsmPrinter::emitInstruction(const MachineInstr 
*MI) {
  *OutStreamer);
 
 if (isVerbose() && MI->getOpcode() == AMDGPU::S_SET_VGPR_MSB) {
-  unsigned V = MI->getOperand(0).getImm();
+  unsigned V = MI->getOperand(0).getImm() & 0xff;
   OutStreamer->AddComment(
   " msbs: dst=" + Twine(V >> 6) + " src0=" + Twine(V & 3) +
   " src1=" + Twine((V >> 2) & 3) + " src2=" + Twine((V >> 4) & 3));
diff --git a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCTargetDesc.cpp 
b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCTargetDesc.cpp
index 013cfeb364048..28b4da8ab9ebb 100644
--- a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCTargetDesc.cpp
+++ b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCTargetDesc.cpp
@@ -168,7 +168,7 @@ bool AMDGPUMCInstrAnalysis::evaluateBranch(const MCInst 
&Inst, uint64_t Addr,
 
 void AMDGPUMCInstrAnalysis::updateState(const MCInst &Inst, uint64_t Addr) {
   if (Inst.getOpcode() == AMDGPU::S_SET_VGPR_MSB_gfx12)
-VgprMSBs = Inst.getOperand(0).getImm();
+VgprMSBs = Inst.getOperand(0).getImm() & 0xff;
   else if (isTerminator(Inst))
 VgprMSBs = 0;
 }
diff --git a/llvm/test/CodeGen/AMDGPU/vgpr-lowering-gfx1250-t16.mir 
b/llvm/test/CodeGen/AMDGPU/vgpr-lowering-gfx1250-t16.mir
index 8a70a8acd28d3..32cc398740d62 100644
--- a/llvm/test/CodeGen/AMDGPU/vgpr-lowering-gfx1250-t16.mir
+++ b/llvm/test/CodeGen/AMDGPU/vgpr-lowering-gfx1250-t16.mir
@@ -36,7 +36,7 @@ body: |
 ; GCN-NEXT: v_add_f16_e64 v128.l /*v384.l*/, v129.l /*v385.l*/, v130.l 
/*v386.l*/
 $vgpr384_lo16 = V_ADD_F16_t16_e64 0, undef $vgpr385_lo16, 0, undef 
$vgpr386_lo16, 0, 0, 0, implicit $exec, implicit $mode
 
-; GCN-NEXT: s_set_vgpr_msb 0x8a
+; GCN-NEXT: s_set_vgpr_msb 0x458a
 ; ASM-SAME: ;  msbs: dst=2 src0=2 
src1=2 src2=0
 ; GCN-NEXT: v_add_f16_e64 v0.h /*v512.h*/, v1.h /*v513.h*/, v2.h /*v514.h*/
 $vgpr512_hi16 = V_ADD_F16_t16_e64 0, undef $vgpr513_hi16, 0, undef 
$vgpr514_hi16, 0, 0, 0, implicit $exec, implicit $mode
@@ -50,7 +50,7 @@ body: |
 ; GCN-NEXT: v_add_f16_e64 v128.l /*v640.l*/, v129.l /*v641.l*/, v130.l 
/*v642.l*/
 $vgpr640_lo16 = V_ADD_F16_t16_e64 0, undef $vgpr641_lo16, 0, undef 
$vgpr642_lo16, 0, 0, 0, implicit $exec, implicit $mode
 
-; GCN-NEXT: s_set_vgpr_msb 0xcf
+; GCN-NEXT: s_set_vgpr_msb 0x8acf
 ; ASM-

[llvm-branch-commits] [llvm] [AMDGPU] Record old VGPR MSBs in the high bits of s_set_vgpr_msb (PR #165035)

2025-10-24 Thread Stanislav Mekhanoshin via llvm-branch-commits


rampitec wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/165035?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#165035** https://app.graphite.dev/github/pr/llvm/llvm-project/165035?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/165035?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#164901** https://app.graphite.dev/github/pr/llvm/llvm-project/164901?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/165035
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU] Record old VGPR MSBs in the high bits of s_set_vgpr_msb (PR #165035)

2025-10-24 Thread Stanislav Mekhanoshin via llvm-branch-commits


https://github.com/rampitec ready_for_review 
https://github.com/llvm/llvm-project/pull/165035
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU] Record old VGPR MSBs in the high bits of s_set_vgpr_msb (PR #165035)

2025-10-24 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Stanislav Mekhanoshin (rampitec)


Changes

Fixes: SWDEV-562450

---

Patch is 44.31 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/165035.diff


6 Files Affected:

- (modified) llvm/lib/Target/AMDGPU/AMDGPULowerVGPREncoding.cpp (+13-3) 
- (modified) llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp (+1-1) 
- (modified) llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCTargetDesc.cpp (+1-1) 
- (modified) llvm/test/CodeGen/AMDGPU/vgpr-lowering-gfx1250-t16.mir (+2-2) 
- (modified) llvm/test/CodeGen/AMDGPU/vgpr-lowering-gfx1250.mir (+68-67) 
- (modified) llvm/test/CodeGen/AMDGPU/whole-wave-functions.ll (+18-18) 


``diff
diff --git a/llvm/lib/Target/AMDGPU/AMDGPULowerVGPREncoding.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPULowerVGPREncoding.cpp
index 0be1dd0817605..f9f0bc619d9f7 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULowerVGPREncoding.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPULowerVGPREncoding.cpp
@@ -58,6 +58,8 @@ class AMDGPULowerVGPREncoding {
   static constexpr unsigned BitsPerField = 2;
   static constexpr unsigned NumFields = 4;
   static constexpr unsigned FieldMask = (1 << BitsPerField) - 1;
+  static constexpr unsigned ModeWidth = NumFields * BitsPerField;
+  static constexpr unsigned ModeMask = (1 << ModeWidth) - 1;
   using ModeType = PackedVector>;
 
@@ -152,13 +154,21 @@ bool AMDGPULowerVGPREncoding::setMode(ModeTy NewMode, 
ModeTy Mask,
 CurrentMode |= NewMode;
 CurrentMask |= Mask;
 
-MostRecentModeSet->getOperand(0).setImm(CurrentMode);
+MachineOperand &Op = MostRecentModeSet->getOperand(0);
+
+// Carry old mode bits from the existing instruction.
+int64_t OldModeBits = Op.getImm() & (ModeMask << ModeWidth);
+
+Op.setImm(CurrentMode | OldModeBits);
 return true;
   }
 
+  // Record previous mode into high 8 bits of the immediate.
+  int64_t OldModeBits = CurrentMode << ModeWidth;
+
   I = handleClause(I);
-  MostRecentModeSet =
-  BuildMI(*MBB, I, {}, TII->get(AMDGPU::S_SET_VGPR_MSB)).addImm(NewMode);
+  MostRecentModeSet = BuildMI(*MBB, I, {}, TII->get(AMDGPU::S_SET_VGPR_MSB))
+  .addImm(NewMode | OldModeBits);
 
   CurrentMode = NewMode;
   CurrentMask = Mask;
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp
index 680e7eb3de6be..844649ebb9ae6 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp
@@ -412,7 +412,7 @@ void AMDGPUAsmPrinter::emitInstruction(const MachineInstr 
*MI) {
  *OutStreamer);
 
 if (isVerbose() && MI->getOpcode() == AMDGPU::S_SET_VGPR_MSB) {
-  unsigned V = MI->getOperand(0).getImm();
+  unsigned V = MI->getOperand(0).getImm() & 0xff;
   OutStreamer->AddComment(
   " msbs: dst=" + Twine(V >> 6) + " src0=" + Twine(V & 3) +
   " src1=" + Twine((V >> 2) & 3) + " src2=" + Twine((V >> 4) & 3));
diff --git a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCTargetDesc.cpp 
b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCTargetDesc.cpp
index 013cfeb364048..28b4da8ab9ebb 100644
--- a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCTargetDesc.cpp
+++ b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCTargetDesc.cpp
@@ -168,7 +168,7 @@ bool AMDGPUMCInstrAnalysis::evaluateBranch(const MCInst 
&Inst, uint64_t Addr,
 
 void AMDGPUMCInstrAnalysis::updateState(const MCInst &Inst, uint64_t Addr) {
   if (Inst.getOpcode() == AMDGPU::S_SET_VGPR_MSB_gfx12)
-VgprMSBs = Inst.getOperand(0).getImm();
+VgprMSBs = Inst.getOperand(0).getImm() & 0xff;
   else if (isTerminator(Inst))
 VgprMSBs = 0;
 }
diff --git a/llvm/test/CodeGen/AMDGPU/vgpr-lowering-gfx1250-t16.mir 
b/llvm/test/CodeGen/AMDGPU/vgpr-lowering-gfx1250-t16.mir
index 8a70a8acd28d3..32cc398740d62 100644
--- a/llvm/test/CodeGen/AMDGPU/vgpr-lowering-gfx1250-t16.mir
+++ b/llvm/test/CodeGen/AMDGPU/vgpr-lowering-gfx1250-t16.mir
@@ -36,7 +36,7 @@ body: |
 ; GCN-NEXT: v_add_f16_e64 v128.l /*v384.l*/, v129.l /*v385.l*/, v130.l 
/*v386.l*/
 $vgpr384_lo16 = V_ADD_F16_t16_e64 0, undef $vgpr385_lo16, 0, undef 
$vgpr386_lo16, 0, 0, 0, implicit $exec, implicit $mode
 
-; GCN-NEXT: s_set_vgpr_msb 0x8a
+; GCN-NEXT: s_set_vgpr_msb 0x458a
 ; ASM-SAME: ;  msbs: dst=2 src0=2 
src1=2 src2=0
 ; GCN-NEXT: v_add_f16_e64 v0.h /*v512.h*/, v1.h /*v513.h*/, v2.h /*v514.h*/
 $vgpr512_hi16 = V_ADD_F16_t16_e64 0, undef $vgpr513_hi16, 0, undef 
$vgpr514_hi16, 0, 0, 0, implicit $exec, implicit $mode
@@ -50,7 +50,7 @@ body: |
 ; GCN-NEXT: v_add_f16_e64 v128.l /*v640.l*/, v129.l /*v641.l*/, v130.l 
/*v642.l*/
 $vgpr640_lo16 = V_ADD_F16_t16_e64 0, undef $vgpr641_lo16, 0, undef 
$vgpr642_lo16, 0, 0, 0, implicit $exec, implicit $mode
 
-; GCN-NEXT: s_set_vgpr_msb 0xcf
+; GCN-NEXT: s_set_vgpr_msb 0x8acf
 ; ASM-SAME:

[llvm-branch-commits] [llvm] [LoongArch] Custom legalize vector_shuffle to `[x]vpermi.w` (PR #164945)

2025-10-24 Thread via llvm-branch-commits


https://github.com/zhaoqi5 updated 
https://github.com/llvm/llvm-project/pull/164945

>From f149131d41903bda9b79b61fc9991ebf009a905c Mon Sep 17 00:00:00 2001
From: Qi Zhao 
Date: Fri, 24 Oct 2025 17:00:29 +0800
Subject: [PATCH 1/2] [LoongArch] Custom legalize vector_shuffle to
 `[x]vpermi.w`

---
 .../LoongArch/LoongArchISelLowering.cpp   | 133 --
 .../Target/LoongArch/LoongArchISelLowering.h  |   1 +
 .../LoongArch/LoongArchLASXInstrInfo.td   |   6 +-
 .../Target/LoongArch/LoongArchLSXInstrInfo.td |   7 +
 .../lasx/ir-instruction/shuffle-as-xvpermi.ll |  10 +-
 .../lasx/ir-instruction/shuffle-as-xvshuf.ll  |   9 +-
 .../lsx/ir-instruction/shuffle-as-vpermi.ll   |  10 +-
 .../lsx/ir-instruction/shuffle-as-vshuf.ll|  10 +-
 .../LoongArch/lsx/widen-shuffle-mask.ll   |   6 +-
 9 files changed, 146 insertions(+), 46 deletions(-)

diff --git a/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp 
b/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
index ca4a655f06587..1215427e142ce 100644
--- a/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
+++ b/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
@@ -1948,6 +1948,85 @@ static SDValue lowerVECTOR_SHUFFLE_VPICKOD(const SDLoc 
&DL, ArrayRef Mask,
   return DAG.getNode(LoongArchISD::VPICKOD, DL, VT, V2, V1);
 }
 
+// Check the Mask and then build SrcVec and MaskImm infos which will
+// be used to build LoongArchISD nodes for VPERMI_W or XVPERMI_W.
+// On success, return true. Otherwise, return false.
+static bool buildVPERMIInfo(ArrayRef Mask, SDValue V1, SDValue V2,
+SmallVectorImpl &SrcVec,
+unsigned &MaskImm) {
+  unsigned MaskSize = Mask.size();
+
+  auto isValid = [&](int M, int Off) {
+return (M == -1) || (M >= Off && M < Off + 4);
+  };
+
+  auto buildImm = [&](int MLo, int MHi, unsigned Off, unsigned I) {
+auto immPart = [&](int M, unsigned Off) {
+  return (M == -1 ? 0 : (M - Off)) & 0x3;
+};
+MaskImm |= immPart(MLo, Off) << (I * 2);
+MaskImm |= immPart(MHi, Off) << ((I + 1) * 2);
+  };
+
+  for (unsigned i = 0; i < 4; i += 2) {
+int MLo = Mask[i];
+int MHi = Mask[i + 1];
+
+if (MaskSize == 8) { // Only v8i32/v8f32 need this check.
+  int M2Lo = Mask[i + 4];
+  int M2Hi = Mask[i + 5];
+  if (M2Lo != MLo + 4 || M2Hi != MHi + 4)
+return false;
+}
+
+if (isValid(MLo, 0) && isValid(MHi, 0)) {
+  SrcVec.push_back(V1);
+  buildImm(MLo, MHi, 0, i);
+} else if (isValid(MLo, MaskSize) && isValid(MHi, MaskSize)) {
+  SrcVec.push_back(V2);
+  buildImm(MLo, MHi, MaskSize, i);
+} else {
+  return false;
+}
+  }
+
+  return true;
+}
+
+/// Lower VECTOR_SHUFFLE into VPERMI (if possible).
+///
+/// VPERMI selects two elements from each of the two vectors based on the
+/// mask and places them in the corresponding positions of the result vector
+/// in order. Only v4i32 and v4f32 types are allowed.
+///
+/// It is possible to lower into VPERMI when the mask consists of two of the
+/// following forms concatenated:
+///   
+///   
+/// where i,j are in [0,4) and u,v are in [4, 8).
+/// For example:
+///   <2, 3, 4, 5>
+///   <5, 7, 0, 2>
+///
+/// When undef's appear in the mask they are treated as if they were whatever
+/// value is necessary in order to fit the above forms.
+static SDValue lowerVECTOR_SHUFFLE_VPERMI(const SDLoc &DL, ArrayRef Mask,
+  MVT VT, SDValue V1, SDValue V2,
+  SelectionDAG &DAG,
+  const LoongArchSubtarget &Subtarget) 
{
+  if ((VT != MVT::v4i32 && VT != MVT::v4f32) ||
+  Mask.size() != VT.getVectorNumElements())
+return SDValue();
+
+  SmallVector SrcVec;
+  unsigned MaskImm = 0;
+  if (!buildVPERMIInfo(Mask, V1, V2, SrcVec, MaskImm))
+return SDValue();
+
+  return DAG.getNode(LoongArchISD::VPERMI, DL, VT, SrcVec[0], SrcVec[1],
+ DAG.getConstant(MaskImm, DL, Subtarget.getGRLenVT()));
+}
+
 /// Lower VECTOR_SHUFFLE into VSHUF.
 ///
 /// This mostly consists of converting the shuffle mask into a BUILD_VECTOR and
@@ -2028,12 +2107,15 @@ static SDValue lower128BitShuffle(const SDLoc &DL, 
ArrayRef Mask, MVT VT,
   (Result =
lowerVECTOR_SHUFFLE_VSHUF4I(DL, Mask, VT, V1, V2, DAG, Subtarget)))
 return Result;
-  if ((Result = lowerVECTOR_SHUFFLEAsZeroOrAnyExtend(DL, Mask, VT, V1, V2, DAG,
- Zeroable)))
-return Result;
   if ((Result = lowerVECTOR_SHUFFLEAsShift(DL, Mask, VT, V1, V2, DAG, 
Subtarget,
Zeroable)))
 return Result;
+  if ((Result =
+   lowerVECTOR_SHUFFLE_VPERMI(DL, Mask, VT, V1, V2, DAG, Subtarget)))
+return Result;
+  if ((Result = lowerVECTOR_SHUFFLEAsZeroOrAnyExtend(DL, Mask, VT, V1, V2, DAG,
+ Zeroable)))
+retu

[llvm-branch-commits] [llvm] [LoongArch] Custom legalize vector_shuffle to `[x]vpermi.w` (PR #164945)

2025-10-24 Thread via llvm-branch-commits


https://github.com/zhaoqi5 ready_for_review 
https://github.com/llvm/llvm-project/pull/164945
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [llvm] [AArch64][llvm] Relax mandatory features for Armv9.6-A (PR #164950)

2025-10-24 Thread Jonathan Thackray via llvm-branch-commits


jthackray wrote:

> I had to go and look at what was going on here, but yes this appears to be 
> valid. LGTM

Thanks. It's a backport of 0e8781100. The relaxation of mandatory features will 
also be implemented in gcc shortly.

https://github.com/llvm/llvm-project/pull/164950
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [DAGCombiner] Relax nsz constraint with fp->int->fp optimizations (PR #164503)

2025-10-24 Thread Guy David via llvm-branch-commits


https://github.com/guy-david updated 
https://github.com/llvm/llvm-project/pull/164503

>From 7f65dea126ac725b2f7cde88784845a7eb518de5 Mon Sep 17 00:00:00 2001
From: Guy David 
Date: Wed, 22 Oct 2025 00:07:57 +0300
Subject: [PATCH] [DAGCombiner] Relax nsz constraint with fp->int->fp
 optimizations

---
 llvm/include/llvm/CodeGen/SelectionDAG.h  |   4 +
 llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp |  20 ++-
 .../lib/CodeGen/SelectionDAG/SelectionDAG.cpp |  29 
 llvm/test/CodeGen/AArch64/fp-to-int-to-fp.ll  | 104 +
 .../AMDGPU/select-fabs-fneg-extract.f16.ll|  64 +++-
 .../AMDGPU/select-fabs-fneg-extract.v2f16.ll  |  50 +++---
 llvm/test/CodeGen/X86/setoeq.ll   | 142 +++---
 7 files changed, 208 insertions(+), 205 deletions(-)

diff --git a/llvm/include/llvm/CodeGen/SelectionDAG.h 
b/llvm/include/llvm/CodeGen/SelectionDAG.h
index df6ce0fe1b037..a4ab3ef1de30c 100644
--- a/llvm/include/llvm/CodeGen/SelectionDAG.h
+++ b/llvm/include/llvm/CodeGen/SelectionDAG.h
@@ -2322,6 +2322,10 @@ class SelectionDAG {
   /// +nan are considered positive, -0.0, -inf and -nan are not.
   LLVM_ABI bool cannotBeOrderedNegativeFP(SDValue Op) const;
 
+  /// Check if all uses of a floating-point value are insensitive to signed
+  /// zeros.
+  LLVM_ABI bool allUsesSignedZeroInsensitive(SDValue Op) const;
+
   /// Test whether two SDValues are known to compare equal. This
   /// is true if they are the same value, or if one is negative zero and the
   /// other positive zero.
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp 
b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 2372d7dfe7c3c..73aed33fe0838 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -18891,12 +18891,13 @@ static SDValue foldFPToIntToFP(SDNode *N, const SDLoc 
&DL, SelectionDAG &DAG,
   bool IsSigned = N->getOpcode() == ISD::SINT_TO_FP;
   assert(IsSigned || IsUnsigned);
 
-  bool IsSignedZeroSafe = DAG.getTarget().Options.NoSignedZerosFPMath;
+  bool IsSignedZeroSafe = DAG.getTarget().Options.NoSignedZerosFPMath ||
+  DAG.allUsesSignedZeroInsensitive(SDValue(N, 0));
   // For signed conversions: The optimization changes signed zero behavior.
   if (IsSigned && !IsSignedZeroSafe)
 return SDValue();
   // For unsigned conversions, we need FABS to canonicalize -0.0 to +0.0
-  // (unless NoSignedZerosFPMath is set).
+  // (unless outputting a signed zero is OK).
   if (IsUnsigned && !IsSignedZeroSafe && !TLI.isFAbsFree(VT))
 return SDValue();
 
@@ -19375,10 +19376,17 @@ SDValue DAGCombiner::visitFNEG(SDNode *N) {
   // FIXME: This is duplicated in getNegatibleCost, but getNegatibleCost 
doesn't
   // know it was called from a context with a nsz flag if the input fsub does
   // not.
-  if (N0.getOpcode() == ISD::FSUB && N->getFlags().hasNoSignedZeros() &&
-  N0.hasOneUse()) {
-return DAG.getNode(ISD::FSUB, SDLoc(N), VT, N0.getOperand(1),
-   N0.getOperand(0));
+  if (N0.getOpcode() == ISD::FSUB && N0.hasOneUse()) {
+SDValue X = N0.getOperand(0);
+SDValue Y = N0.getOperand(1);
+
+// Safe if NoSignedZeros, or if we can prove X != Y (avoiding the -0.0 vs
+// +0.0 issue) For now, we use a conservative check: if either operand is
+// known never zero, then X - Y can't produce a signed zero from X == Y.
+if (N->getFlags().hasNoSignedZeros() || DAG.isKnownNeverZeroFloat(X) ||
+DAG.isKnownNeverZeroFloat(Y)) {
+  return DAG.getNode(ISD::FSUB, SDLoc(N), VT, Y, X);
+}
   }
 
   if (SimplifyDemandedBits(SDValue(N, 0)))
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp 
b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index 379242ec5a157..61b70ffd26e2f 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -6075,6 +6075,35 @@ bool SelectionDAG::isKnownNeverZeroFloat(SDValue Op) 
const {
   Op, [](ConstantFPSDNode *C) { return !C->isZero(); });
 }
 
+bool SelectionDAG::allUsesSignedZeroInsensitive(SDValue Op) const {
+  assert(Op.getValueType().isFloatingPoint());
+  return all_of(Op->uses(), [&](SDUse &Use) {
+SDNode *User = Use.getUser();
+unsigned OperandNo = Use.getOperandNo();
+
+// Check if this use is insensitive to the sign of zero
+switch (User->getOpcode()) {
+case ISD::SETCC:
+  // Comparisons: IEEE-754 specifies +0.0 == -0.0.
+case ISD::FABS:
+  // fabs always produces +0.0.
+  return true;
+case ISD::FCOPYSIGN:
+  // copysign overwrites the sign bit of the first operand.
+  return OperandNo == 0;
+case ISD::FADD:
+case ISD::FSUB: {
+  // Arithmetic with non-zero constants fixes the uncertainty around the
+  // sign bit.
+  SDValue Other = User->getOperand(1 - OperandNo);
+  return isKnownNeverZeroFloat(Other);
+}
+default:
+  return false;
+}
+  });
+}
+
 bool SelectionDAG::isKnow

[llvm-branch-commits] [clang] [llvm] [AArch64][llvm] Relax mandatory features for Armv9.6-A (PR #164950)

2025-10-24 Thread David Green via llvm-branch-commits


https://github.com/davemgreen approved this pull request.

I had to go and look at what was going on here, but yes this appears to be 
valid. LGTM

https://github.com/llvm/llvm-project/pull/164950
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] RuntimeLibcalls: Avoid reporting __stack_chk_guard as available for msvc (PR #164133)

2025-10-24 Thread Martin Storsjö via llvm-branch-commits


https://github.com/mstorsjo approved this pull request.

Looks ok I think.

https://github.com/llvm/llvm-project/pull/164133
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [llvm] [AArch64][llvm] Relax mandatory features for Armv9.6-A (PR #164950)

2025-10-24 Thread Jonathan Thackray via llvm-branch-commits


https://github.com/jthackray milestoned 
https://github.com/llvm/llvm-project/pull/164950
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [llvm] [AArch64][llvm] Relax mandatory features for Armv9.6-A (PR #164950)

2025-10-24 Thread Jonathan Thackray via llvm-branch-commits


https://github.com/jthackray created 
https://github.com/llvm/llvm-project/pull/164950

`FEAT_FPRCVT` is removed from being mandatory in Armv9.6-A
`FEAT_SVE2p2` is removed from being mandatory in Armv9.6-A

>From e9da86d042ca5a9888c00426a7422ef53ddc2444 Mon Sep 17 00:00:00 2001
From: Jonathan Thackray 
Date: Fri, 24 Oct 2025 00:13:53 +0100
Subject: [PATCH] [AArch64][llvm] Relax mandatory features for Armv9.6-A
 (#163973)

`FEAT_FPRCVT` is removed from being mandatory in Armv9.6-A
`FEAT_SVE2p2` is removed from being mandatory in Armv9.6-A
---
 clang/test/Driver/aarch64-v96a.c   | 4 ++--
 llvm/lib/Target/AArch64/AArch64Features.td | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/clang/test/Driver/aarch64-v96a.c b/clang/test/Driver/aarch64-v96a.c
index de7890140ebd3..e0081bbbdabfe 100644
--- a/clang/test/Driver/aarch64-v96a.c
+++ b/clang/test/Driver/aarch64-v96a.c
@@ -6,7 +6,7 @@
 // RUN: %clang -target aarch64 -mlittle-endian -march=armv9.6-a -### -c %s 
2>&1 | FileCheck -check-prefix=GENERICV96A %s
 // RUN: %clang -target aarch64_be -mlittle-endian -march=armv9.6a -### -c %s 
2>&1 | FileCheck -check-prefix=GENERICV96A %s
 // RUN: %clang -target aarch64_be -mlittle-endian -march=armv9.6-a -### -c %s 
2>&1 | FileCheck -check-prefix=GENERICV96A %s
-// GENERICV96A: "-cc1"{{.*}} "-triple" "aarch64{{.*}}" "-target-cpu" "generic" 
"-target-feature" "+v9.6a"{{.*}} "-target-feature" "+cmpbr"{{.*}} 
"-target-feature" "+fprcvt"{{.*}} "-target-feature" "+sve2p2"
+// GENERICV96A: "-cc1"{{.*}} "-triple" "aarch64{{.*}}" "-target-cpu" "generic" 
"-target-feature" "+v9.6a"{{.*}} "-target-feature" "+cmpbr"{{.*}}
 
 // RUN: %clang -target aarch64_be -march=armv9.6a -### -c %s 2>&1 | FileCheck 
-check-prefix=GENERICV96A-BE %s
 // RUN: %clang -target aarch64_be -march=armv9.6-a -### -c %s 2>&1 | FileCheck 
-check-prefix=GENERICV96A-BE %s
@@ -14,7 +14,7 @@
 // RUN: %clang -target aarch64 -mbig-endian -march=armv9.6-a -### -c %s 2>&1 | 
FileCheck -check-prefix=GENERICV96A-BE %s
 // RUN: %clang -target aarch64_be -mbig-endian -march=armv9.6a -### -c %s 2>&1 
| FileCheck -check-prefix=GENERICV96A-BE %s
 // RUN: %clang -target aarch64_be -mbig-endian -march=armv9.6-a -### -c %s 
2>&1 | FileCheck -check-prefix=GENERICV96A-BE %s
-// GENERICV96A-BE: "-cc1"{{.*}} "-triple" "aarch64_be{{.*}}" "-target-cpu" 
"generic" "-target-feature" "+v9.6a"{{.*}} "-target-feature" "+cmpbr"{{.*}} 
"-target-feature" "+fprcvt"{{.*}} "-target-feature" "+sve2p2"
+// GENERICV96A-BE: "-cc1"{{.*}} "-triple" "aarch64_be{{.*}}" "-target-cpu" 
"generic" "-target-feature" "+v9.6a"{{.*}} "-target-feature" "+cmpbr"{{.*}}
 
 // = Features supported on aarch64 =
 
diff --git a/llvm/lib/Target/AArch64/AArch64Features.td 
b/llvm/lib/Target/AArch64/AArch64Features.td
index 9973df865ea17..12159a9519737 100644
--- a/llvm/lib/Target/AArch64/AArch64Features.td
+++ b/llvm/lib/Target/AArch64/AArch64Features.td
@@ -923,8 +923,8 @@ def HasV9_5aOps : Architecture64<9, 5, "a", "v9.5a",
   [HasV9_4aOps, FeatureCPA],
   !listconcat(HasV9_4aOps.DefaultExts, [FeatureCPA,  FeatureLUT, 
FeatureFAMINMAX])>;
 def HasV9_6aOps : Architecture64<9, 6, "a", "v9.6a",
-  [HasV9_5aOps, FeatureCMPBR, FeatureFPRCVT, FeatureSVE2p2, FeatureLSUI, 
FeatureOCCMO],
-  !listconcat(HasV9_5aOps.DefaultExts, [FeatureCMPBR, FeatureFPRCVT, 
FeatureSVE2p2,
+  [HasV9_5aOps, FeatureCMPBR, FeatureLSUI, FeatureOCCMO],
+  !listconcat(HasV9_5aOps.DefaultExts, [FeatureCMPBR,
 FeatureLSUI, FeatureOCCMO])>;
 def HasV8_0rOps : Architecture64<8, 0, "r", "v8r",
   [ //v8.1

___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [llvm] [AArch64][llvm] Relax mandatory features for Armv9.6-A (PR #164950)

2025-10-24 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-backend-aarch64

Author: Jonathan Thackray (jthackray)


Changes

`FEAT_FPRCVT` is removed from being mandatory in Armv9.6-A
`FEAT_SVE2p2` is removed from being mandatory in Armv9.6-A

---
Full diff: https://github.com/llvm/llvm-project/pull/164950.diff


2 Files Affected:

- (modified) clang/test/Driver/aarch64-v96a.c (+2-2) 
- (modified) llvm/lib/Target/AArch64/AArch64Features.td (+2-2) 


``diff
diff --git a/clang/test/Driver/aarch64-v96a.c b/clang/test/Driver/aarch64-v96a.c
index de7890140ebd3..e0081bbbdabfe 100644
--- a/clang/test/Driver/aarch64-v96a.c
+++ b/clang/test/Driver/aarch64-v96a.c
@@ -6,7 +6,7 @@
 // RUN: %clang -target aarch64 -mlittle-endian -march=armv9.6-a -### -c %s 
2>&1 | FileCheck -check-prefix=GENERICV96A %s
 // RUN: %clang -target aarch64_be -mlittle-endian -march=armv9.6a -### -c %s 
2>&1 | FileCheck -check-prefix=GENERICV96A %s
 // RUN: %clang -target aarch64_be -mlittle-endian -march=armv9.6-a -### -c %s 
2>&1 | FileCheck -check-prefix=GENERICV96A %s
-// GENERICV96A: "-cc1"{{.*}} "-triple" "aarch64{{.*}}" "-target-cpu" "generic" 
"-target-feature" "+v9.6a"{{.*}} "-target-feature" "+cmpbr"{{.*}} 
"-target-feature" "+fprcvt"{{.*}} "-target-feature" "+sve2p2"
+// GENERICV96A: "-cc1"{{.*}} "-triple" "aarch64{{.*}}" "-target-cpu" "generic" 
"-target-feature" "+v9.6a"{{.*}} "-target-feature" "+cmpbr"{{.*}}
 
 // RUN: %clang -target aarch64_be -march=armv9.6a -### -c %s 2>&1 | FileCheck 
-check-prefix=GENERICV96A-BE %s
 // RUN: %clang -target aarch64_be -march=armv9.6-a -### -c %s 2>&1 | FileCheck 
-check-prefix=GENERICV96A-BE %s
@@ -14,7 +14,7 @@
 // RUN: %clang -target aarch64 -mbig-endian -march=armv9.6-a -### -c %s 2>&1 | 
FileCheck -check-prefix=GENERICV96A-BE %s
 // RUN: %clang -target aarch64_be -mbig-endian -march=armv9.6a -### -c %s 2>&1 
| FileCheck -check-prefix=GENERICV96A-BE %s
 // RUN: %clang -target aarch64_be -mbig-endian -march=armv9.6-a -### -c %s 
2>&1 | FileCheck -check-prefix=GENERICV96A-BE %s
-// GENERICV96A-BE: "-cc1"{{.*}} "-triple" "aarch64_be{{.*}}" "-target-cpu" 
"generic" "-target-feature" "+v9.6a"{{.*}} "-target-feature" "+cmpbr"{{.*}} 
"-target-feature" "+fprcvt"{{.*}} "-target-feature" "+sve2p2"
+// GENERICV96A-BE: "-cc1"{{.*}} "-triple" "aarch64_be{{.*}}" "-target-cpu" 
"generic" "-target-feature" "+v9.6a"{{.*}} "-target-feature" "+cmpbr"{{.*}}
 
 // = Features supported on aarch64 =
 
diff --git a/llvm/lib/Target/AArch64/AArch64Features.td 
b/llvm/lib/Target/AArch64/AArch64Features.td
index 9973df865ea17..12159a9519737 100644
--- a/llvm/lib/Target/AArch64/AArch64Features.td
+++ b/llvm/lib/Target/AArch64/AArch64Features.td
@@ -923,8 +923,8 @@ def HasV9_5aOps : Architecture64<9, 5, "a", "v9.5a",
   [HasV9_4aOps, FeatureCPA],
   !listconcat(HasV9_4aOps.DefaultExts, [FeatureCPA,  FeatureLUT, 
FeatureFAMINMAX])>;
 def HasV9_6aOps : Architecture64<9, 6, "a", "v9.6a",
-  [HasV9_5aOps, FeatureCMPBR, FeatureFPRCVT, FeatureSVE2p2, FeatureLSUI, 
FeatureOCCMO],
-  !listconcat(HasV9_5aOps.DefaultExts, [FeatureCMPBR, FeatureFPRCVT, 
FeatureSVE2p2,
+  [HasV9_5aOps, FeatureCMPBR, FeatureLSUI, FeatureOCCMO],
+  !listconcat(HasV9_5aOps.DefaultExts, [FeatureCMPBR,
 FeatureLSUI, FeatureOCCMO])>;
 def HasV8_0rOps : Architecture64<8, 0, "r", "v8r",
   [ //v8.1

``




https://github.com/llvm/llvm-project/pull/164950
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LoongArch] Add patterns to support vector type average instructions generation (PR #161079)

2025-10-24 Thread Zhaoxin Yang via llvm-branch-commits


https://github.com/ylzsx approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/161079
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LoongArch] Custom legalize vector_shuffle to `[x]vpermi.w` (PR #164945)

2025-10-24 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-backend-loongarch

Author: ZhaoQi (zhaoqi5)


Changes



---
Full diff: https://github.com/llvm/llvm-project/pull/164945.diff


9 Files Affected:

- (modified) llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp (+119-14) 
- (modified) llvm/lib/Target/LoongArch/LoongArchISelLowering.h (+1) 
- (modified) llvm/lib/Target/LoongArch/LoongArchLASXInstrInfo.td (+5-1) 
- (modified) llvm/lib/Target/LoongArch/LoongArchLSXInstrInfo.td (+7) 
- (modified) 
llvm/test/CodeGen/LoongArch/lasx/ir-instruction/shuffle-as-xvpermi.ll (+3-7) 
- (modified) 
llvm/test/CodeGen/LoongArch/lasx/ir-instruction/shuffle-as-xvshuf.ll (+3-6) 
- (modified) 
llvm/test/CodeGen/LoongArch/lsx/ir-instruction/shuffle-as-vpermi.ll (+3-7) 
- (modified) llvm/test/CodeGen/LoongArch/lsx/ir-instruction/shuffle-as-vshuf.ll 
(+2-8) 
- (modified) llvm/test/CodeGen/LoongArch/lsx/widen-shuffle-mask.ll (+3-3) 


``diff
diff --git a/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp 
b/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
index ca4a655f06587..1215427e142ce 100644
--- a/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
+++ b/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
@@ -1948,6 +1948,85 @@ static SDValue lowerVECTOR_SHUFFLE_VPICKOD(const SDLoc 
&DL, ArrayRef Mask,
   return DAG.getNode(LoongArchISD::VPICKOD, DL, VT, V2, V1);
 }
 
+// Check the Mask and then build SrcVec and MaskImm infos which will
+// be used to build LoongArchISD nodes for VPERMI_W or XVPERMI_W.
+// On success, return true. Otherwise, return false.
+static bool buildVPERMIInfo(ArrayRef Mask, SDValue V1, SDValue V2,
+SmallVectorImpl &SrcVec,
+unsigned &MaskImm) {
+  unsigned MaskSize = Mask.size();
+
+  auto isValid = [&](int M, int Off) {
+return (M == -1) || (M >= Off && M < Off + 4);
+  };
+
+  auto buildImm = [&](int MLo, int MHi, unsigned Off, unsigned I) {
+auto immPart = [&](int M, unsigned Off) {
+  return (M == -1 ? 0 : (M - Off)) & 0x3;
+};
+MaskImm |= immPart(MLo, Off) << (I * 2);
+MaskImm |= immPart(MHi, Off) << ((I + 1) * 2);
+  };
+
+  for (unsigned i = 0; i < 4; i += 2) {
+int MLo = Mask[i];
+int MHi = Mask[i + 1];
+
+if (MaskSize == 8) { // Only v8i32/v8f32 need this check.
+  int M2Lo = Mask[i + 4];
+  int M2Hi = Mask[i + 5];
+  if (M2Lo != MLo + 4 || M2Hi != MHi + 4)
+return false;
+}
+
+if (isValid(MLo, 0) && isValid(MHi, 0)) {
+  SrcVec.push_back(V1);
+  buildImm(MLo, MHi, 0, i);
+} else if (isValid(MLo, MaskSize) && isValid(MHi, MaskSize)) {
+  SrcVec.push_back(V2);
+  buildImm(MLo, MHi, MaskSize, i);
+} else {
+  return false;
+}
+  }
+
+  return true;
+}
+
+/// Lower VECTOR_SHUFFLE into VPERMI (if possible).
+///
+/// VPERMI selects two elements from each of the two vectors based on the
+/// mask and places them in the corresponding positions of the result vector
+/// in order. Only v4i32 and v4f32 types are allowed.
+///
+/// It is possible to lower into VPERMI when the mask consists of two of the
+/// following forms concatenated:
+///   
+///   
+/// where i,j are in [0,4) and u,v are in [4, 8).
+/// For example:
+///   <2, 3, 4, 5>
+///   <5, 7, 0, 2>
+///
+/// When undef's appear in the mask they are treated as if they were whatever
+/// value is necessary in order to fit the above forms.
+static SDValue lowerVECTOR_SHUFFLE_VPERMI(const SDLoc &DL, ArrayRef Mask,
+  MVT VT, SDValue V1, SDValue V2,
+  SelectionDAG &DAG,
+  const LoongArchSubtarget &Subtarget) 
{
+  if ((VT != MVT::v4i32 && VT != MVT::v4f32) ||
+  Mask.size() != VT.getVectorNumElements())
+return SDValue();
+
+  SmallVector SrcVec;
+  unsigned MaskImm = 0;
+  if (!buildVPERMIInfo(Mask, V1, V2, SrcVec, MaskImm))
+return SDValue();
+
+  return DAG.getNode(LoongArchISD::VPERMI, DL, VT, SrcVec[0], SrcVec[1],
+ DAG.getConstant(MaskImm, DL, Subtarget.getGRLenVT()));
+}
+
 /// Lower VECTOR_SHUFFLE into VSHUF.
 ///
 /// This mostly consists of converting the shuffle mask into a BUILD_VECTOR and
@@ -2028,12 +2107,15 @@ static SDValue lower128BitShuffle(const SDLoc &DL, 
ArrayRef Mask, MVT VT,
   (Result =
lowerVECTOR_SHUFFLE_VSHUF4I(DL, Mask, VT, V1, V2, DAG, Subtarget)))
 return Result;
-  if ((Result = lowerVECTOR_SHUFFLEAsZeroOrAnyExtend(DL, Mask, VT, V1, V2, DAG,
- Zeroable)))
-return Result;
   if ((Result = lowerVECTOR_SHUFFLEAsShift(DL, Mask, VT, V1, V2, DAG, 
Subtarget,
Zeroable)))
 return Result;
+  if ((Result =
+   lowerVECTOR_SHUFFLE_VPERMI(DL, Mask, VT, V1, V2, DAG, Subtarget)))
+return Result;
+  if ((Result = lowerVECTOR_SHUFFLEAsZeroOrAnyExtend(DL, Mask, VT, V1, V2, DAG,
+

[llvm-branch-commits] [llvm] [LoongArch] Custom legalize vector_shuffle to `[x]vpermi.w` (PR #164945)

2025-10-24 Thread via llvm-branch-commits


https://github.com/zhaoqi5 created 
https://github.com/llvm/llvm-project/pull/164945

None

>From f149131d41903bda9b79b61fc9991ebf009a905c Mon Sep 17 00:00:00 2001
From: Qi Zhao 
Date: Fri, 24 Oct 2025 17:00:29 +0800
Subject: [PATCH] [LoongArch] Custom legalize vector_shuffle to `[x]vpermi.w`

---
 .../LoongArch/LoongArchISelLowering.cpp   | 133 --
 .../Target/LoongArch/LoongArchISelLowering.h  |   1 +
 .../LoongArch/LoongArchLASXInstrInfo.td   |   6 +-
 .../Target/LoongArch/LoongArchLSXInstrInfo.td |   7 +
 .../lasx/ir-instruction/shuffle-as-xvpermi.ll |  10 +-
 .../lasx/ir-instruction/shuffle-as-xvshuf.ll  |   9 +-
 .../lsx/ir-instruction/shuffle-as-vpermi.ll   |  10 +-
 .../lsx/ir-instruction/shuffle-as-vshuf.ll|  10 +-
 .../LoongArch/lsx/widen-shuffle-mask.ll   |   6 +-
 9 files changed, 146 insertions(+), 46 deletions(-)

diff --git a/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp 
b/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
index ca4a655f06587..1215427e142ce 100644
--- a/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
+++ b/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
@@ -1948,6 +1948,85 @@ static SDValue lowerVECTOR_SHUFFLE_VPICKOD(const SDLoc 
&DL, ArrayRef Mask,
   return DAG.getNode(LoongArchISD::VPICKOD, DL, VT, V2, V1);
 }
 
+// Check the Mask and then build SrcVec and MaskImm infos which will
+// be used to build LoongArchISD nodes for VPERMI_W or XVPERMI_W.
+// On success, return true. Otherwise, return false.
+static bool buildVPERMIInfo(ArrayRef Mask, SDValue V1, SDValue V2,
+SmallVectorImpl &SrcVec,
+unsigned &MaskImm) {
+  unsigned MaskSize = Mask.size();
+
+  auto isValid = [&](int M, int Off) {
+return (M == -1) || (M >= Off && M < Off + 4);
+  };
+
+  auto buildImm = [&](int MLo, int MHi, unsigned Off, unsigned I) {
+auto immPart = [&](int M, unsigned Off) {
+  return (M == -1 ? 0 : (M - Off)) & 0x3;
+};
+MaskImm |= immPart(MLo, Off) << (I * 2);
+MaskImm |= immPart(MHi, Off) << ((I + 1) * 2);
+  };
+
+  for (unsigned i = 0; i < 4; i += 2) {
+int MLo = Mask[i];
+int MHi = Mask[i + 1];
+
+if (MaskSize == 8) { // Only v8i32/v8f32 need this check.
+  int M2Lo = Mask[i + 4];
+  int M2Hi = Mask[i + 5];
+  if (M2Lo != MLo + 4 || M2Hi != MHi + 4)
+return false;
+}
+
+if (isValid(MLo, 0) && isValid(MHi, 0)) {
+  SrcVec.push_back(V1);
+  buildImm(MLo, MHi, 0, i);
+} else if (isValid(MLo, MaskSize) && isValid(MHi, MaskSize)) {
+  SrcVec.push_back(V2);
+  buildImm(MLo, MHi, MaskSize, i);
+} else {
+  return false;
+}
+  }
+
+  return true;
+}
+
+/// Lower VECTOR_SHUFFLE into VPERMI (if possible).
+///
+/// VPERMI selects two elements from each of the two vectors based on the
+/// mask and places them in the corresponding positions of the result vector
+/// in order. Only v4i32 and v4f32 types are allowed.
+///
+/// It is possible to lower into VPERMI when the mask consists of two of the
+/// following forms concatenated:
+///   
+///   
+/// where i,j are in [0,4) and u,v are in [4, 8).
+/// For example:
+///   <2, 3, 4, 5>
+///   <5, 7, 0, 2>
+///
+/// When undef's appear in the mask they are treated as if they were whatever
+/// value is necessary in order to fit the above forms.
+static SDValue lowerVECTOR_SHUFFLE_VPERMI(const SDLoc &DL, ArrayRef Mask,
+  MVT VT, SDValue V1, SDValue V2,
+  SelectionDAG &DAG,
+  const LoongArchSubtarget &Subtarget) 
{
+  if ((VT != MVT::v4i32 && VT != MVT::v4f32) ||
+  Mask.size() != VT.getVectorNumElements())
+return SDValue();
+
+  SmallVector SrcVec;
+  unsigned MaskImm = 0;
+  if (!buildVPERMIInfo(Mask, V1, V2, SrcVec, MaskImm))
+return SDValue();
+
+  return DAG.getNode(LoongArchISD::VPERMI, DL, VT, SrcVec[0], SrcVec[1],
+ DAG.getConstant(MaskImm, DL, Subtarget.getGRLenVT()));
+}
+
 /// Lower VECTOR_SHUFFLE into VSHUF.
 ///
 /// This mostly consists of converting the shuffle mask into a BUILD_VECTOR and
@@ -2028,12 +2107,15 @@ static SDValue lower128BitShuffle(const SDLoc &DL, 
ArrayRef Mask, MVT VT,
   (Result =
lowerVECTOR_SHUFFLE_VSHUF4I(DL, Mask, VT, V1, V2, DAG, Subtarget)))
 return Result;
-  if ((Result = lowerVECTOR_SHUFFLEAsZeroOrAnyExtend(DL, Mask, VT, V1, V2, DAG,
- Zeroable)))
-return Result;
   if ((Result = lowerVECTOR_SHUFFLEAsShift(DL, Mask, VT, V1, V2, DAG, 
Subtarget,
Zeroable)))
 return Result;
+  if ((Result =
+   lowerVECTOR_SHUFFLE_VPERMI(DL, Mask, VT, V1, V2, DAG, Subtarget)))
+return Result;
+  if ((Result = lowerVECTOR_SHUFFLEAsZeroOrAnyExtend(DL, Mask, VT, V1, V2, DAG,
+ Zeroable)))
+ret

[llvm-branch-commits] [llvm] [LoongArch] Custom legalize vector_shuffle to `[x]vpermi.w` (PR #164945)

2025-10-24 Thread via llvm-branch-commits


https://github.com/zhaoqi5 converted_to_draft 
https://github.com/llvm/llvm-project/pull/164945
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [mlir] 3b947aa - Revert "[mlir][scf] Add parallelLoopUnrollByFactors() (#163806)"

2025-10-24 Thread via llvm-branch-commits


Author: fabrizio-indirli
Date: 2025-10-24T10:27:13+01:00
New Revision: 3b947aa8c6a604a8d97dc7f1933afa6306aadf93

URL: 
https://github.com/llvm/llvm-project/commit/3b947aa8c6a604a8d97dc7f1933afa6306aadf93
DIFF: 
https://github.com/llvm/llvm-project/commit/3b947aa8c6a604a8d97dc7f1933afa6306aadf93.diff

LOG: Revert "[mlir][scf] Add parallelLoopUnrollByFactors() (#163806)"

This reverts commit 86a2073b5bf5f0b44573b8f7e600040a8cdc8bc2.

Added: 


Modified: 
mlir/include/mlir/Dialect/SCF/Utils/Utils.h
mlir/lib/Dialect/SCF/IR/SCF.cpp
mlir/lib/Dialect/SCF/Utils/Utils.cpp
mlir/test/lib/Dialect/SCF/CMakeLists.txt
mlir/tools/mlir-opt/mlir-opt.cpp

Removed: 
mlir/test/Dialect/SCF/parallel-loop-unroll.mlir
mlir/test/lib/Dialect/SCF/TestParallelLoopUnrolling.cpp



diff  --git a/mlir/include/mlir/Dialect/SCF/Utils/Utils.h 
b/mlir/include/mlir/Dialect/SCF/Utils/Utils.h
index cdc52f4f3668c..ecd829ed14add 100644
--- a/mlir/include/mlir/Dialect/SCF/Utils/Utils.h
+++ b/mlir/include/mlir/Dialect/SCF/Utils/Utils.h
@@ -221,45 +221,6 @@ FailureOr normalizeForallOp(RewriterBase 
&rewriter,
 /// 4. Each region iter arg and result has exactly one use
 bool isPerfectlyNestedForLoops(MutableArrayRef loops);
 
-/// Generate unrolled copies of an scf loop's 'loopBodyBlock', with 'iterArgs'
-/// and 'yieldedValues' as the block arguments and yielded values of the loop.
-/// The content of the loop body is replicated 'unrollFactor' times, calling
-/// 'ivRemapFn' to remap 'iv' for each unrolled body. If specified, annotates
-/// the Ops in each unrolled iteration using annotateFn. If provided,
-/// 'clonedToSrcOpsMap' is populated with the mappings from the cloned ops to
-/// the original op.
-void generateUnrolledLoop(
-Block *loopBodyBlock, Value iv, uint64_t unrollFactor,
-function_ref ivRemapFn,
-function_ref annotateFn,
-ValueRange iterArgs, ValueRange yieldedValues,
-IRMapping *clonedToSrcOpsMap = nullptr);
-
-/// Unroll this scf::Parallel loop by the specified unroll factors. Returns the
-/// unrolled loop if the unroll succeded; otherwise returns failure if the loop
-/// cannot be unrolled either due to restrictions or to invalid unroll factors.
-/// Requires positive loop bounds and step. If specified, annotates the Ops in
-/// each unrolled iteration by applying `annotateFn`.
-/// If provided, 'clonedToSrcOpsMap' is populated with the mappings from the
-/// cloned ops to the original op.
-FailureOr parallelLoopUnrollByFactors(
-scf::ParallelOp op, ArrayRef unrollFactors,
-RewriterBase &rewriter,
-function_ref annotateFn = nullptr,
-IRMapping *clonedToSrcOpsMap = nullptr);
-
-/// Get constant trip counts for each of the induction variables of the given
-/// loop operation. If any of the loop's trip counts is not constant, return an
-/// empty vector.
-llvm::SmallVector
-getConstLoopTripCounts(mlir::LoopLikeOpInterface loopOp);
-
-namespace scf {
-/// Helper function to compute the 
diff erence between two values. This is used
-/// by the loop implementations to compute the trip count.
-std::optional computeUbMinusLb(Value lb, Value ub, bool 
isSigned);
-} // namespace scf
-
 } // namespace mlir
 
 #endif // MLIR_DIALECT_SCF_UTILS_UTILS_H_

diff  --git a/mlir/lib/Dialect/SCF/IR/SCF.cpp b/mlir/lib/Dialect/SCF/IR/SCF.cpp
index 395b52fe46d25..744a5951330a3 100644
--- a/mlir/lib/Dialect/SCF/IR/SCF.cpp
+++ b/mlir/lib/Dialect/SCF/IR/SCF.cpp
@@ -15,7 +15,6 @@
 #include "mlir/Dialect/ControlFlow/IR/ControlFlowOps.h"
 #include "mlir/Dialect/MemRef/IR/MemRef.h"
 #include "mlir/Dialect/SCF/IR/DeviceMappingInterface.h"
-#include "mlir/Dialect/SCF/Utils/Utils.h"
 #include "mlir/Dialect/Tensor/IR/Tensor.h"
 #include "mlir/IR/BuiltinAttributes.h"
 #include "mlir/IR/IRMapping.h"
@@ -112,6 +111,24 @@ static TerminatorTy verifyAndGetTerminator(Operation *op, 
Region ®ion,
   return nullptr;
 }
 
+/// Helper function to compute the 
diff erence between two values. This is used
+/// by the loop implementations to compute the trip count.
+static std::optional computeUbMinusLb(Value lb, Value ub,
+bool isSigned) {
+  llvm::APSInt 
diff ;
+  auto addOp = ub.getDefiningOp();
+  if (!addOp)
+return std::nullopt;
+  if ((isSigned && !addOp.hasNoSignedWrap()) ||
+  (!isSigned && !addOp.hasNoUnsignedWrap()))
+return std::nullopt;
+
+  if (addOp.getLhs() != lb ||
+  !matchPattern(addOp.getRhs(), m_ConstantInt(&
diff )))
+return std::nullopt;
+  return 
diff ;
+}
+
 
//===--===//
 // ExecuteRegionOp
 
//===--===//

diff  --git a/mlir/lib/Dialect/SCF/Utils/Utils.cpp 
b/mlir/lib/Dialect/SCF/Utils/Utils.cpp
index 2d989d50bb8ac..10eae8906ce31 100644
--- a/mlir/lib/Dialect/SCF/Utils/Utils.cpp
+++ b/mlir/lib/Dialec

[llvm-branch-commits] [llvm] [DirectX] Add DXIL validation of `llvm.loop` metadata (PR #164292)

2025-10-24 Thread Finn Plummer via llvm-branch-commits


https://github.com/inbelic edited 
https://github.com/llvm/llvm-project/pull/164292
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [DirectX] Add DXIL validation of `llvm.loop` metadata (PR #164292)

2025-10-24 Thread Finn Plummer via llvm-branch-commits





inbelic wrote:

This should be addressed with the refactor

https://github.com/llvm/llvm-project/pull/164292
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [DirectX] Add DXIL validation of `llvm.loop` metadata (PR #164292)

2025-10-24 Thread Finn Plummer via llvm-branch-commits


https://github.com/inbelic updated 
https://github.com/llvm/llvm-project/pull/164292

>From 868c42bbd41aae5c43c89dd653d949418ec914f1 Mon Sep 17 00:00:00 2001
From: Finn Plummer 
Date: Fri, 17 Oct 2025 10:17:32 -0700
Subject: [PATCH 01/11] [DirectX] add a DXILValidateMetadata pass

---
 llvm/docs/DirectX/DXILArchitecture.rst|  4 +-
 llvm/lib/Target/DirectX/CMakeLists.txt|  1 +
 .../Target/DirectX/DXILTranslateMetadata.cpp  | 63 ++---
 .../Target/DirectX/DXILTranslateMetadata.h| 17 
 .../Target/DirectX/DXILValidateMetadata.cpp   | 90 +++
 .../lib/Target/DirectX/DXILValidateMetadata.h | 24 +
 llvm/lib/Target/DirectX/DirectX.h |  6 ++
 .../Target/DirectX/DirectXPassRegistry.def|  1 +
 .../Target/DirectX/DirectXTargetMachine.cpp   |  3 +
 llvm/test/CodeGen/DirectX/llc-pipeline.ll |  1 +
 10 files changed, 172 insertions(+), 38 deletions(-)
 create mode 100644 llvm/lib/Target/DirectX/DXILValidateMetadata.cpp
 create mode 100644 llvm/lib/Target/DirectX/DXILValidateMetadata.h

diff --git a/llvm/docs/DirectX/DXILArchitecture.rst 
b/llvm/docs/DirectX/DXILArchitecture.rst
index bce7fdaa386ed..de72697fc4505 100644
--- a/llvm/docs/DirectX/DXILArchitecture.rst
+++ b/llvm/docs/DirectX/DXILArchitecture.rst
@@ -113,7 +113,7 @@ are grouped into two flows:
 
 The passes to generate DXIL IR follow the flow:
 
-  DXILOpLowering -> DXILPrepare -> DXILTranslateMetadata
+  DXILOpLowering -> DXILPrepare -> DXILTranslateMetadata -> 
DXILValidateMetadata
 
 Each of these passes has a defined responsibility:
 
@@ -122,6 +122,8 @@ Each of these passes has a defined responsibility:
namely removing attributes, and inserting bitcasts to allow typed pointers
to be inserted.
 #. DXILTranslateMetadata transforms and emits all recognized DXIL Metadata.
+#. DXILValidateMetadata validates all emitted DXIL metadata structures
+   conform to DXIL validation.
 
 The passes to encode DXIL to binary in the DX Container follow the flow:
 
diff --git a/llvm/lib/Target/DirectX/CMakeLists.txt 
b/llvm/lib/Target/DirectX/CMakeLists.txt
index 6c079517e22d6..f9900370660dd 100644
--- a/llvm/lib/Target/DirectX/CMakeLists.txt
+++ b/llvm/lib/Target/DirectX/CMakeLists.txt
@@ -35,6 +35,7 @@ add_llvm_target(DirectXCodeGen
   DXILResourceImplicitBinding.cpp
   DXILShaderFlags.cpp
   DXILTranslateMetadata.cpp
+  DXILValidateMetadata.cpp
   DXILRootSignature.cpp
   DXILLegalizePass.cpp
   
diff --git a/llvm/lib/Target/DirectX/DXILTranslateMetadata.cpp 
b/llvm/lib/Target/DirectX/DXILTranslateMetadata.cpp
index 1e4797bbd05aa..5c3635a1a6c68 100644
--- a/llvm/lib/Target/DirectX/DXILTranslateMetadata.cpp
+++ b/llvm/lib/Target/DirectX/DXILTranslateMetadata.cpp
@@ -454,45 +454,34 @@ PreservedAnalyses DXILTranslateMetadata::run(Module &M,
   return PreservedAnalyses::all();
 }
 
-namespace {
-class DXILTranslateMetadataLegacy : public ModulePass {
-public:
-  static char ID; // Pass identification, replacement for typeid
-  explicit DXILTranslateMetadataLegacy() : ModulePass(ID) {}
-
-  StringRef getPassName() const override { return "DXIL Translate Metadata"; }
-
-  void getAnalysisUsage(AnalysisUsage &AU) const override {
-AU.addRequired();
-AU.addRequired();
-AU.addRequired();
-AU.addRequired();
-AU.addRequired();
-
-AU.addPreserved();
-AU.addPreserved();
-AU.addPreserved();
-AU.addPreserved();
-AU.addPreserved();
-  }
+void DXILTranslateMetadataLegacy::getAnalysisUsage(AnalysisUsage &AU) const {
+  AU.addRequired();
+  AU.addRequired();
+  AU.addRequired();
+  AU.addRequired();
+  AU.addRequired();
+
+  AU.addPreserved();
+  AU.addPreserved();
+  AU.addPreserved();
+  AU.addPreserved();
+  AU.addPreserved();
+}
 
-  bool runOnModule(Module &M) override {
-DXILResourceMap &DRM =
-getAnalysis().getResourceMap();
-DXILResourceTypeMap &DRTM =
-getAnalysis().getResourceTypeMap();
-const ModuleShaderFlags &ShaderFlags =
-getAnalysis().getShaderFlags();
-dxil::ModuleMetadataInfo MMDI =
-getAnalysis().getModuleMetadata();
-
-translateGlobalMetadata(M, DRM, DRTM, ShaderFlags, MMDI);
-translateInstructionMetadata(M);
-return true;
-  }
-};
+bool DXILTranslateMetadataLegacy::runOnModule(Module &M) {
+  DXILResourceMap &DRM =
+  getAnalysis().getResourceMap();
+  DXILResourceTypeMap &DRTM =
+  getAnalysis().getResourceTypeMap();
+  const ModuleShaderFlags &ShaderFlags =
+  getAnalysis().getShaderFlags();
+  dxil::ModuleMetadataInfo MMDI =
+  getAnalysis().getModuleMetadata();
 
-} // namespace
+  translateGlobalMetadata(M, DRM, DRTM, ShaderFlags, MMDI);
+  translateInstructionMetadata(M);
+  return true;
+}
 
 char DXILTranslateMetadataLegacy::ID = 0;
 
diff --git a/llvm/lib/Target/DirectX/DXILTranslateMetadata.h 
b/llvm/lib/Target/DirectX/DXILTranslateMetadata.h
index 4c1ffac1781e6..cfb8aaa8f98b5 100644
--- a/llvm/lib/Target/DirectX/DXILTranslateMetadata.h
+++ b/llvm/lib/Target/DirectX/D

[llvm-branch-commits] [llvm] [DirectX] Add DXIL validation of `llvm.loop` metadata (PR #164292)

2025-10-24 Thread Finn Plummer via llvm-branch-commits


https://github.com/inbelic updated 
https://github.com/llvm/llvm-project/pull/164292

>From 868c42bbd41aae5c43c89dd653d949418ec914f1 Mon Sep 17 00:00:00 2001
From: Finn Plummer 
Date: Fri, 17 Oct 2025 10:17:32 -0700
Subject: [PATCH 01/12] [DirectX] add a DXILValidateMetadata pass

---
 llvm/docs/DirectX/DXILArchitecture.rst|  4 +-
 llvm/lib/Target/DirectX/CMakeLists.txt|  1 +
 .../Target/DirectX/DXILTranslateMetadata.cpp  | 63 ++---
 .../Target/DirectX/DXILTranslateMetadata.h| 17 
 .../Target/DirectX/DXILValidateMetadata.cpp   | 90 +++
 .../lib/Target/DirectX/DXILValidateMetadata.h | 24 +
 llvm/lib/Target/DirectX/DirectX.h |  6 ++
 .../Target/DirectX/DirectXPassRegistry.def|  1 +
 .../Target/DirectX/DirectXTargetMachine.cpp   |  3 +
 llvm/test/CodeGen/DirectX/llc-pipeline.ll |  1 +
 10 files changed, 172 insertions(+), 38 deletions(-)
 create mode 100644 llvm/lib/Target/DirectX/DXILValidateMetadata.cpp
 create mode 100644 llvm/lib/Target/DirectX/DXILValidateMetadata.h

diff --git a/llvm/docs/DirectX/DXILArchitecture.rst 
b/llvm/docs/DirectX/DXILArchitecture.rst
index bce7fdaa386ed..de72697fc4505 100644
--- a/llvm/docs/DirectX/DXILArchitecture.rst
+++ b/llvm/docs/DirectX/DXILArchitecture.rst
@@ -113,7 +113,7 @@ are grouped into two flows:
 
 The passes to generate DXIL IR follow the flow:
 
-  DXILOpLowering -> DXILPrepare -> DXILTranslateMetadata
+  DXILOpLowering -> DXILPrepare -> DXILTranslateMetadata -> 
DXILValidateMetadata
 
 Each of these passes has a defined responsibility:
 
@@ -122,6 +122,8 @@ Each of these passes has a defined responsibility:
namely removing attributes, and inserting bitcasts to allow typed pointers
to be inserted.
 #. DXILTranslateMetadata transforms and emits all recognized DXIL Metadata.
+#. DXILValidateMetadata validates all emitted DXIL metadata structures
+   conform to DXIL validation.
 
 The passes to encode DXIL to binary in the DX Container follow the flow:
 
diff --git a/llvm/lib/Target/DirectX/CMakeLists.txt 
b/llvm/lib/Target/DirectX/CMakeLists.txt
index 6c079517e22d6..f9900370660dd 100644
--- a/llvm/lib/Target/DirectX/CMakeLists.txt
+++ b/llvm/lib/Target/DirectX/CMakeLists.txt
@@ -35,6 +35,7 @@ add_llvm_target(DirectXCodeGen
   DXILResourceImplicitBinding.cpp
   DXILShaderFlags.cpp
   DXILTranslateMetadata.cpp
+  DXILValidateMetadata.cpp
   DXILRootSignature.cpp
   DXILLegalizePass.cpp
   
diff --git a/llvm/lib/Target/DirectX/DXILTranslateMetadata.cpp 
b/llvm/lib/Target/DirectX/DXILTranslateMetadata.cpp
index 1e4797bbd05aa..5c3635a1a6c68 100644
--- a/llvm/lib/Target/DirectX/DXILTranslateMetadata.cpp
+++ b/llvm/lib/Target/DirectX/DXILTranslateMetadata.cpp
@@ -454,45 +454,34 @@ PreservedAnalyses DXILTranslateMetadata::run(Module &M,
   return PreservedAnalyses::all();
 }
 
-namespace {
-class DXILTranslateMetadataLegacy : public ModulePass {
-public:
-  static char ID; // Pass identification, replacement for typeid
-  explicit DXILTranslateMetadataLegacy() : ModulePass(ID) {}
-
-  StringRef getPassName() const override { return "DXIL Translate Metadata"; }
-
-  void getAnalysisUsage(AnalysisUsage &AU) const override {
-AU.addRequired();
-AU.addRequired();
-AU.addRequired();
-AU.addRequired();
-AU.addRequired();
-
-AU.addPreserved();
-AU.addPreserved();
-AU.addPreserved();
-AU.addPreserved();
-AU.addPreserved();
-  }
+void DXILTranslateMetadataLegacy::getAnalysisUsage(AnalysisUsage &AU) const {
+  AU.addRequired();
+  AU.addRequired();
+  AU.addRequired();
+  AU.addRequired();
+  AU.addRequired();
+
+  AU.addPreserved();
+  AU.addPreserved();
+  AU.addPreserved();
+  AU.addPreserved();
+  AU.addPreserved();
+}
 
-  bool runOnModule(Module &M) override {
-DXILResourceMap &DRM =
-getAnalysis().getResourceMap();
-DXILResourceTypeMap &DRTM =
-getAnalysis().getResourceTypeMap();
-const ModuleShaderFlags &ShaderFlags =
-getAnalysis().getShaderFlags();
-dxil::ModuleMetadataInfo MMDI =
-getAnalysis().getModuleMetadata();
-
-translateGlobalMetadata(M, DRM, DRTM, ShaderFlags, MMDI);
-translateInstructionMetadata(M);
-return true;
-  }
-};
+bool DXILTranslateMetadataLegacy::runOnModule(Module &M) {
+  DXILResourceMap &DRM =
+  getAnalysis().getResourceMap();
+  DXILResourceTypeMap &DRTM =
+  getAnalysis().getResourceTypeMap();
+  const ModuleShaderFlags &ShaderFlags =
+  getAnalysis().getShaderFlags();
+  dxil::ModuleMetadataInfo MMDI =
+  getAnalysis().getModuleMetadata();
 
-} // namespace
+  translateGlobalMetadata(M, DRM, DRTM, ShaderFlags, MMDI);
+  translateInstructionMetadata(M);
+  return true;
+}
 
 char DXILTranslateMetadataLegacy::ID = 0;
 
diff --git a/llvm/lib/Target/DirectX/DXILTranslateMetadata.h 
b/llvm/lib/Target/DirectX/DXILTranslateMetadata.h
index 4c1ffac1781e6..cfb8aaa8f98b5 100644
--- a/llvm/lib/Target/DirectX/DXILTranslateMetadata.h
+++ b/llvm/lib/Target/DirectX/D

[llvm-branch-commits] [llvm] [DirectX] Add DXIL validation of `llvm.loop` metadata (PR #164292)

2025-10-24 Thread Helena Kotas via llvm-branch-commits



@@ -24,9 +24,9 @@ _Z4mainDv3_j.exit:; preds = 
%for.body.i, %entry
 ; These next check lines check that only the range metadata remains
 ; No more metadata should be necessary, the rest (the current 0 and 1)
 ; should be removed.
-; CHECK-NOT: !{!"llvm.loop.mustprogress"}
-; CHECK: [[RANGEMD]] = !{i32 1, i32 5}
-; CHECK-NOT: !{!"llvm.loop.mustprogress"}
+; CHECK-DAG: [[RANGEMD]] = !{i32 1, i32 5}
+; CHECK-DAG: [[LOOPMD]] = distinct !{[[LOOPMD]], [[PROGRESS:![0-9]+]]}
+; CHECK-DAG: {!"llvm.loop.mustprogress"}

hekota wrote:

```suggestion
; CHECK-DAG: [[LOOPMD]] = distinct !{[[LOOPMD]], [[PROGRESS:![0-9]+]]}
; CHECK-DAG: [[PROGRESS]] = {!"llvm.loop.mustprogress"}
```
Since you named `PROGRESS` you might as well use it. Or do not name it at all.  

https://github.com/llvm/llvm-project/pull/164292
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [DirectX] Add DXIL validation of `llvm.loop` metadata (PR #164292)

2025-10-24 Thread Helena Kotas via llvm-branch-commits





hekota wrote:

The test does not make sense now - no metadata is getting stripped. Can you add 
some metadata that DXIL does not allow to test that it gets stripped?

https://github.com/llvm/llvm-project/pull/164292
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [SpecialCaseList] Filtering Globs with matching prefix and suffix (PR #164543)

2025-10-24 Thread via llvm-branch-commits


github-actions[bot] wrote:




:warning: C/C++ code formatter, clang-format found issues in your code. 
:warning:



You can test this locally with the following command:


``bash
git-clang-format --diff origin/main HEAD --extensions h,cpp -- 
llvm/include/llvm/Support/SpecialCaseList.h 
llvm/lib/Support/SpecialCaseList.cpp --diff_from_common_commit
``

:warning:
The reproduction instructions above might return results for more than one PR
in a stack if you are using a stacked PR workflow. You can limit the results by
changing `origin/main` to the base branch/commit you want to compare against.
:warning:





View the diff from clang-format here.


``diff
diff --git a/llvm/lib/Support/SpecialCaseList.cpp 
b/llvm/lib/Support/SpecialCaseList.cpp
index 3d9b749cb..30f3061ea 100644
--- a/llvm/lib/Support/SpecialCaseList.cpp
+++ b/llvm/lib/Support/SpecialCaseList.cpp
@@ -122,7 +122,7 @@ void SpecialCaseList::GlobMatcher::match(
   }
 }
 
-   SpecialCaseList::Matcher::Matcher(bool UseGlobs, bool RemoveDotSlash)
+SpecialCaseList::Matcher::Matcher(bool UseGlobs, bool RemoveDotSlash)
 : RemoveDotSlash(RemoveDotSlash) {
   if (UseGlobs)
 M.emplace();

``




https://github.com/llvm/llvm-project/pull/164543
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [SpecialCaseList] Filtering Globs with matching prefix and suffix (PR #164543)

2025-10-24 Thread Vitaly Buka via llvm-branch-commits


https://github.com/vitalybuka updated 
https://github.com/llvm/llvm-project/pull/164543


___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [SpecialCaseList] Add RadixTree for substring matching (PR #164545)

2025-10-24 Thread Vitaly Buka via llvm-branch-commits


https://github.com/vitalybuka updated 
https://github.com/llvm/llvm-project/pull/164545


___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [SpecialCaseList] Add RadixTree for substring matching (PR #164545)

2025-10-24 Thread Vitaly Buka via llvm-branch-commits


https://github.com/vitalybuka updated 
https://github.com/llvm/llvm-project/pull/164545


___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [SpecialCaseList] Filtering Globs with matching prefix and suffix (PR #164543)

2025-10-24 Thread Vitaly Buka via llvm-branch-commits


https://github.com/vitalybuka updated 
https://github.com/llvm/llvm-project/pull/164543


___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [SpecialCaseList] Flip RadixTree key order (PR #164544)

2025-10-24 Thread Vitaly Buka via llvm-branch-commits


https://github.com/vitalybuka updated 
https://github.com/llvm/llvm-project/pull/164544


___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [SpecialCaseList] Flip RadixTree key order (PR #164544)

2025-10-24 Thread Vitaly Buka via llvm-branch-commits


https://github.com/vitalybuka updated 
https://github.com/llvm/llvm-project/pull/164544


___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [SpecialCaseList] Filtering Globs with matching prefix and suffix (PR #164543)

2025-10-24 Thread Vitaly Buka via llvm-branch-commits


https://github.com/vitalybuka updated 
https://github.com/llvm/llvm-project/pull/164543


___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [SpecialCaseList] Filtering Globs with matching prefix and suffix (PR #164543)

2025-10-24 Thread Vitaly Buka via llvm-branch-commits


https://github.com/vitalybuka updated 
https://github.com/llvm/llvm-project/pull/164543


___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [SpecialCaseList] Flip RadixTree key order (PR #164544)

2025-10-24 Thread Vitaly Buka via llvm-branch-commits


https://github.com/vitalybuka updated 
https://github.com/llvm/llvm-project/pull/164544


___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [SpecialCaseList] Flip RadixTree key order (PR #164544)

2025-10-24 Thread Vitaly Buka via llvm-branch-commits


https://github.com/vitalybuka updated 
https://github.com/llvm/llvm-project/pull/164544


___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [ADT][NFC] Add missing #include (#165068) (PR #165074)

2025-10-24 Thread Vitaly Buka via llvm-branch-commits


https://github.com/vitalybuka created 
https://github.com/llvm/llvm-project/pull/165074

Added in #164524. Fails when using libc++ in a mode that prunes
transitive headers.



___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [ADT][NFC] Add missing #include (#165068) (PR #165074)

2025-10-24 Thread Vitaly Buka via llvm-branch-commits


https://github.com/vitalybuka closed 
https://github.com/llvm/llvm-project/pull/165074
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [ADT][NFC] Add missing #include (#165068) (PR #165074)

2025-10-24 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-llvm-adt

Author: Vitaly Buka (vitalybuka)


Changes

Added in #164524. Fails when using libc++ in a mode that prunes
transitive headers.


---
Full diff: https://github.com/llvm/llvm-project/pull/165074.diff


2 Files Affected:

- (modified) llvm/include/llvm/ADT/RadixTree.h (+1) 
- (modified) llvm/lib/Support/SpecialCaseList.cpp (+5-5) 


``diff
diff --git a/llvm/include/llvm/ADT/RadixTree.h 
b/llvm/include/llvm/ADT/RadixTree.h
index d3c44e4e6345c..a65acddf186b7 100644
--- a/llvm/include/llvm/ADT/RadixTree.h
+++ b/llvm/include/llvm/ADT/RadixTree.h
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 
 namespace llvm {
 
diff --git a/llvm/lib/Support/SpecialCaseList.cpp 
b/llvm/lib/Support/SpecialCaseList.cpp
index 944f59f245be3..1e303ebbfd3f2 100644
--- a/llvm/lib/Support/SpecialCaseList.cpp
+++ b/llvm/lib/Support/SpecialCaseList.cpp
@@ -121,13 +121,13 @@ void SpecialCaseList::GlobMatcher::match(
  SuffixPrefixToGlob.find_prefixes(reverse(Query))) {
   for (const auto &[_, V] : PToGlob.find_prefixes(Query)) {
 for (const auto *G : V) {
-  // Each value of the map is a vector of globs sorted as from best to
-  // worst.
+  // Each value of the map is a vector of globs ordered from the best 
to
+  // the worst.
   if (G->Pattern.match(Query)) {
 Cb(G->Name, G->LineNo);
-// As soon as we find a match in the vector we can break for the 
vector,
-// vector, but we still need to continue for other values in the
-// map, as they may contain a better match.
+// As soon as we find a match in the vector we can break for the
+// vector, still we can't return, and need to continue for others
+// values in the map, as they may contain a better match.
 break;
   }
 }

``




https://github.com/llvm/llvm-project/pull/165074
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [SpecialCaseList] Add RadixTree for substring matching (PR #164545)

2025-10-24 Thread Vitaly Buka via llvm-branch-commits


https://github.com/vitalybuka updated 
https://github.com/llvm/llvm-project/pull/164545


___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [SpecialCaseList] Add RadixTree for substring matching (PR #164545)

2025-10-24 Thread Vitaly Buka via llvm-branch-commits


https://github.com/vitalybuka updated 
https://github.com/llvm/llvm-project/pull/164545


___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] ARM: Avoid adding default libcalls overridden by AEABI functions (PR #164983)

2025-10-24 Thread Saleem Abdulrasool via llvm-branch-commits


https://github.com/compnerd approved this pull request.

The AEABI names should be preferred. The aliases are for easing migration I 
believe. Switching to the GNU names for non-AEABI or non-AAPCS calling 
conventions on non-Windows targets is the correct behaviour.

https://github.com/llvm/llvm-project/pull/164983
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] RuntimeLibcalls: Split lowering decisions into LibcallLoweringInfo (PR #164987)

2025-10-24 Thread Saleem Abdulrasool via llvm-branch-commits



@@ -193,6 +193,58 @@ struct MemOp {
   }
 };
 
+class LibcallLoweringInfo {
+private:
+  LLVM_ABI const RTLIB::RuntimeLibcallsInfo &RTLCI;
+  /// Stores the implementation choice for each each libcall.
+  LLVM_ABI RTLIB::LibcallImpl LibcallImpls[RTLIB::UNKNOWN_LIBCALL + 1] = {
+  RTLIB::Unsupported};
+
+public:
+  LLVM_ABI LibcallLoweringInfo(const RTLIB::RuntimeLibcallsInfo &RTLCI);
+
+  /// Get the libcall routine name for the specified libcall.
+  // FIXME: This should be removed. Only LibcallImpl should have a name.
+  LLVM_ABI const char *getLibcallName(RTLIB::Libcall Call) const {
+// FIXME: Return StringRef
+return RTLIB::RuntimeLibcallsInfo::getLibcallImplName(LibcallImpls[Call])
+.data();
+  }
+
+  /// Return the lowering's selection of implementation call for \p Call
+  LLVM_ABI RTLIB::LibcallImpl getLibcallImpl(RTLIB::Libcall Call) const {
+return LibcallImpls[Call];
+  }
+
+  /// Rename the default libcall routine name for the specified libcall.
+  LLVM_ABI void setLibcallImpl(RTLIB::Libcall Call, RTLIB::LibcallImpl Impl) {
+LibcallImpls[Call] = Impl;
+  }
+
+  // FIXME: Remove this wrapper in favor of directly using
+  // getLibcallImplCallingConv
+  LLVM_ABI CallingConv::ID getLibcallCallingConv(RTLIB::Libcall Call) const {
+return RTLCI.LibcallImplCallingConvs[LibcallImpls[Call]];

compnerd wrote:

Mind changing this to:

```
  return RTLCI.LibcallImplCallingConvs[getLibcallImpl(Call)];
```

https://github.com/llvm/llvm-project/pull/164987
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] RuntimeLibcalls: Split lowering decisions into LibcallLoweringInfo (PR #164987)

2025-10-24 Thread Saleem Abdulrasool via llvm-branch-commits



@@ -193,6 +193,58 @@ struct MemOp {
   }
 };
 
+class LibcallLoweringInfo {
+private:
+  LLVM_ABI const RTLIB::RuntimeLibcallsInfo &RTLCI;
+  /// Stores the implementation choice for each each libcall.
+  LLVM_ABI RTLIB::LibcallImpl LibcallImpls[RTLIB::UNKNOWN_LIBCALL + 1] = {
+  RTLIB::Unsupported};
+
+public:
+  LLVM_ABI LibcallLoweringInfo(const RTLIB::RuntimeLibcallsInfo &RTLCI);
+
+  /// Get the libcall routine name for the specified libcall.
+  // FIXME: This should be removed. Only LibcallImpl should have a name.
+  LLVM_ABI const char *getLibcallName(RTLIB::Libcall Call) const {
+// FIXME: Return StringRef

compnerd wrote:

Why not just make that change? Is there a lifetime issue?

https://github.com/llvm/llvm-project/pull/164987
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] RuntimeLibcalls: Split lowering decisions into LibcallLoweringInfo (PR #164987)

2025-10-24 Thread Saleem Abdulrasool via llvm-branch-commits



@@ -193,6 +193,58 @@ struct MemOp {
   }
 };
 
+class LibcallLoweringInfo {
+private:
+  LLVM_ABI const RTLIB::RuntimeLibcallsInfo &RTLCI;
+  /// Stores the implementation choice for each each libcall.
+  LLVM_ABI RTLIB::LibcallImpl LibcallImpls[RTLIB::UNKNOWN_LIBCALL + 1] = {
+  RTLIB::Unsupported};
+
+public:
+  LLVM_ABI LibcallLoweringInfo(const RTLIB::RuntimeLibcallsInfo &RTLCI);
+
+  /// Get the libcall routine name for the specified libcall.
+  // FIXME: This should be removed. Only LibcallImpl should have a name.
+  LLVM_ABI const char *getLibcallName(RTLIB::Libcall Call) const {
+// FIXME: Return StringRef
+return RTLIB::RuntimeLibcallsInfo::getLibcallImplName(LibcallImpls[Call])
+.data();
+  }
+
+  /// Return the lowering's selection of implementation call for \p Call
+  LLVM_ABI RTLIB::LibcallImpl getLibcallImpl(RTLIB::Libcall Call) const {
+return LibcallImpls[Call];
+  }
+
+  /// Rename the default libcall routine name for the specified libcall.
+  LLVM_ABI void setLibcallImpl(RTLIB::Libcall Call, RTLIB::LibcallImpl Impl) {
+LibcallImpls[Call] = Impl;
+  }
+
+  // FIXME: Remove this wrapper in favor of directly using
+  // getLibcallImplCallingConv
+  LLVM_ABI CallingConv::ID getLibcallCallingConv(RTLIB::Libcall Call) const {
+return RTLCI.LibcallImplCallingConvs[LibcallImpls[Call]];
+  }
+
+  /// Get the CallingConv that should be used for the specified libcall.
+  LLVM_ABI CallingConv::ID
+  getLibcallImplCallingConv(RTLIB::LibcallImpl Call) const {
+return RTLCI.LibcallImplCallingConvs[Call];
+  }
+
+  /// Return a function name compatible with RTLIB::MEMCPY, or nullptr if fully
+  /// unsupported.
+  LLVM_ABI StringRef getMemcpyName() const {
+RTLIB::LibcallImpl Memcpy = getLibcallImpl(RTLIB::MEMCPY);
+if (Memcpy != RTLIB::Unsupported)
+  return RTLIB::RuntimeLibcallsInfo::getLibcallImplName(Memcpy);
+
+// Fallback to memmove if memcpy isn't available.
+return getLibcallName(RTLIB::MEMMOVE);

compnerd wrote:

I would invert this:

```
  if (Memcpy == RTLIB::Unsupported)
return getLibcallName(RTLIB::MEMMOVE);
  return RTLIB::RuntimeLibcallsInfo::getLibcallImplName(Memcpy);
```

https://github.com/llvm/llvm-project/pull/164987
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [flang] [flang][OpenACC] lower acc loops with early exits (PR #164992)

2025-10-24 Thread via llvm-branch-commits


https://github.com/jeanPerier created 
https://github.com/llvm/llvm-project/pull/164992

Lower acc loop with early exit using the newly added "unstructured" attribute.

The core change of this patch is to refactor the loop control variable so that 
for loop with early exits, the induction variables are privatized, but no 
bounds operands are added to the acc.loop.

The logic of the loop is implemented by the FIR loop lowering logic by 
generating explicit control flow.


>From 9bb33e36860b2b8eb4cb62d40cffeb5f0dca328b Mon Sep 17 00:00:00 2001
From: Jean Perier 
Date: Fri, 24 Oct 2025 07:15:25 -0700
Subject: [PATCH] [flang][OpenACC] lower acc loops with early exits

---
 flang/lib/Lower/OpenACC.cpp   | 243 ++
 flang/test/Lower/OpenACC/acc-unstructured.f90 |   4 +-
 2 files changed, 144 insertions(+), 103 deletions(-)

diff --git a/flang/lib/Lower/OpenACC.cpp b/flang/lib/Lower/OpenACC.cpp
index d7861ac6463c8..7901971088f70 100644
--- a/flang/lib/Lower/OpenACC.cpp
+++ b/flang/lib/Lower/OpenACC.cpp
@@ -2251,6 +2251,49 @@ static void determineDefaultLoopParMode(
   }
 }
 
+// Helper to visit Bounds of DO LOOP nest.
+static void visitLoopControl(
+Fortran::lower::AbstractConverter &converter,
+const Fortran::parser::DoConstruct &outerDoConstruct,
+uint64_t loopsToProcess, Fortran::lower::pft::Evaluation &eval,
+std::function
+callback) {
+  Fortran::lower::pft::Evaluation *crtEval = &eval.getFirstNestedEvaluation();
+  for (uint64_t i = 0; i < loopsToProcess; ++i) {
+const Fortran::parser::LoopControl *loopControl;
+if (i == 0) {
+  loopControl = &*outerDoConstruct.GetLoopControl();
+  mlir::Location loc = converter.genLocation(
+  Fortran::parser::FindSourceLocation(outerDoConstruct));
+  callback(std::get(loopControl->u),
+   loc);
+} else {
+  // Safely locate the next inner DoConstruct within this eval.
+  const Fortran::parser::DoConstruct *innerDo = nullptr;
+  if (crtEval && crtEval->hasNestedEvaluations()) {
+for (Fortran::lower::pft::Evaluation &child :
+ crtEval->getNestedEvaluations()) {
+  if (auto *stmt = child.getIf()) {
+innerDo = stmt;
+// Prepare to descend for the next iteration
+crtEval = &child;
+break;
+  }
+}
+  }
+  if (!innerDo)
+break; // No deeper loop; stop collecting collapsed bounds.
+
+  loopControl = &*innerDo->GetLoopControl();
+  mlir::Location loc =
+  converter.genLocation(Fortran::parser::FindSourceLocation(*innerDo));
+  callback(std::get(loopControl->u),
+   loc);
+}
+  }
+}
+
 // Extract loop bounds, steps, induction variables, and privatization info
 // for both DO CONCURRENT and regular do loops
 static void processDoLoopBounds(
@@ -2272,7 +2315,6 @@ static void processDoLoopBounds(
 llvm::SmallVector &locs, uint64_t loopsToProcess) {
   assert(loopsToProcess > 0 && "expect at least one loop");
   locs.push_back(currentLocation); // Location of the directive
-  Fortran::lower::pft::Evaluation *crtEval = &eval.getFirstNestedEvaluation();
   bool isDoConcurrent = outerDoConstruct.IsDoConcurrent();
 
   if (isDoConcurrent) {
@@ -2313,57 +2355,29 @@ static void processDoLoopBounds(
   inclusiveBounds.push_back(true);
 }
   } else {
-for (uint64_t i = 0; i < loopsToProcess; ++i) {
-  const Fortran::parser::LoopControl *loopControl;
-  if (i == 0) {
-loopControl = &*outerDoConstruct.GetLoopControl();
-locs.push_back(converter.genLocation(
-Fortran::parser::FindSourceLocation(outerDoConstruct)));
-  } else {
-// Safely locate the next inner DoConstruct within this eval.
-const Fortran::parser::DoConstruct *innerDo = nullptr;
-if (crtEval && crtEval->hasNestedEvaluations()) {
-  for (Fortran::lower::pft::Evaluation &child :
-   crtEval->getNestedEvaluations()) {
-if (auto *stmt = child.getIf()) {
-  innerDo = stmt;
-  // Prepare to descend for the next iteration
-  crtEval = &child;
-  break;
-}
-  }
-}
-if (!innerDo)
-  break; // No deeper loop; stop collecting collapsed bounds.
-
-loopControl = &*innerDo->GetLoopControl();
-locs.push_back(converter.genLocation(
-Fortran::parser::FindSourceLocation(*innerDo)));
-  }
-
-  const Fortran::parser::LoopControl::Bounds *bounds =
-  std::get_if(&loopControl->u);
-  assert(bounds && "Expected bounds on the loop construct");
-  lowerbounds.push_back(fir::getBase(converter.genExprValue(
-  *Fortran::semantics::GetExpr(bounds->lower), stmtCtx)));
-  upperbounds.push_back(fir::getBase(converter.genExprValue(
-  *Fortran::semantics::GetExpr(bounds->upper), stmtCtx)));
-  if (bounds->step)
-steps.push_back(fi

[llvm-branch-commits] [flang] [flang][OpenACC] lower acc loops with early exits (PR #164992)

2025-10-24 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-flang-fir-hlfir

Author: None (jeanPerier)


Changes

Lower acc loop with early exit using the newly added "unstructured" attribute.

The core change of this patch is to refactor the loop control variable so that 
for loop with early exits, the induction variables are privatized, but no 
bounds operands are added to the acc.loop.

The logic of the loop is implemented by the FIR loop lowering logic by 
generating explicit control flow.


---
Full diff: https://github.com/llvm/llvm-project/pull/164992.diff


2 Files Affected:

- (modified) flang/lib/Lower/OpenACC.cpp (+142-101) 
- (modified) flang/test/Lower/OpenACC/acc-unstructured.f90 (+2-2) 


``diff
diff --git a/flang/lib/Lower/OpenACC.cpp b/flang/lib/Lower/OpenACC.cpp
index d7861ac6463c8..7901971088f70 100644
--- a/flang/lib/Lower/OpenACC.cpp
+++ b/flang/lib/Lower/OpenACC.cpp
@@ -2251,6 +2251,49 @@ static void determineDefaultLoopParMode(
   }
 }
 
+// Helper to visit Bounds of DO LOOP nest.
+static void visitLoopControl(
+Fortran::lower::AbstractConverter &converter,
+const Fortran::parser::DoConstruct &outerDoConstruct,
+uint64_t loopsToProcess, Fortran::lower::pft::Evaluation &eval,
+std::function
+callback) {
+  Fortran::lower::pft::Evaluation *crtEval = &eval.getFirstNestedEvaluation();
+  for (uint64_t i = 0; i < loopsToProcess; ++i) {
+const Fortran::parser::LoopControl *loopControl;
+if (i == 0) {
+  loopControl = &*outerDoConstruct.GetLoopControl();
+  mlir::Location loc = converter.genLocation(
+  Fortran::parser::FindSourceLocation(outerDoConstruct));
+  callback(std::get(loopControl->u),
+   loc);
+} else {
+  // Safely locate the next inner DoConstruct within this eval.
+  const Fortran::parser::DoConstruct *innerDo = nullptr;
+  if (crtEval && crtEval->hasNestedEvaluations()) {
+for (Fortran::lower::pft::Evaluation &child :
+ crtEval->getNestedEvaluations()) {
+  if (auto *stmt = child.getIf()) {
+innerDo = stmt;
+// Prepare to descend for the next iteration
+crtEval = &child;
+break;
+  }
+}
+  }
+  if (!innerDo)
+break; // No deeper loop; stop collecting collapsed bounds.
+
+  loopControl = &*innerDo->GetLoopControl();
+  mlir::Location loc =
+  converter.genLocation(Fortran::parser::FindSourceLocation(*innerDo));
+  callback(std::get(loopControl->u),
+   loc);
+}
+  }
+}
+
 // Extract loop bounds, steps, induction variables, and privatization info
 // for both DO CONCURRENT and regular do loops
 static void processDoLoopBounds(
@@ -2272,7 +2315,6 @@ static void processDoLoopBounds(
 llvm::SmallVector &locs, uint64_t loopsToProcess) {
   assert(loopsToProcess > 0 && "expect at least one loop");
   locs.push_back(currentLocation); // Location of the directive
-  Fortran::lower::pft::Evaluation *crtEval = &eval.getFirstNestedEvaluation();
   bool isDoConcurrent = outerDoConstruct.IsDoConcurrent();
 
   if (isDoConcurrent) {
@@ -2313,57 +2355,29 @@ static void processDoLoopBounds(
   inclusiveBounds.push_back(true);
 }
   } else {
-for (uint64_t i = 0; i < loopsToProcess; ++i) {
-  const Fortran::parser::LoopControl *loopControl;
-  if (i == 0) {
-loopControl = &*outerDoConstruct.GetLoopControl();
-locs.push_back(converter.genLocation(
-Fortran::parser::FindSourceLocation(outerDoConstruct)));
-  } else {
-// Safely locate the next inner DoConstruct within this eval.
-const Fortran::parser::DoConstruct *innerDo = nullptr;
-if (crtEval && crtEval->hasNestedEvaluations()) {
-  for (Fortran::lower::pft::Evaluation &child :
-   crtEval->getNestedEvaluations()) {
-if (auto *stmt = child.getIf()) {
-  innerDo = stmt;
-  // Prepare to descend for the next iteration
-  crtEval = &child;
-  break;
-}
-  }
-}
-if (!innerDo)
-  break; // No deeper loop; stop collecting collapsed bounds.
-
-loopControl = &*innerDo->GetLoopControl();
-locs.push_back(converter.genLocation(
-Fortran::parser::FindSourceLocation(*innerDo)));
-  }
-
-  const Fortran::parser::LoopControl::Bounds *bounds =
-  std::get_if(&loopControl->u);
-  assert(bounds && "Expected bounds on the loop construct");
-  lowerbounds.push_back(fir::getBase(converter.genExprValue(
-  *Fortran::semantics::GetExpr(bounds->lower), stmtCtx)));
-  upperbounds.push_back(fir::getBase(converter.genExprValue(
-  *Fortran::semantics::GetExpr(bounds->upper), stmtCtx)));
-  if (bounds->step)
-steps.push_back(fir::getBase(converter.genExprValue(
-*Fortran::semantics::GetExpr(bounds->step), stmtCtx)));
-  else // If `step` is not present, assume

[llvm-branch-commits] [flang] [flang][OpenACC] lower acc loops with early exits (PR #164992)

2025-10-24 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-openacc

Author: None (jeanPerier)


Changes

Lower acc loop with early exit using the newly added "unstructured" attribute.

The core change of this patch is to refactor the loop control variable so that 
for loop with early exits, the induction variables are privatized, but no 
bounds operands are added to the acc.loop.

The logic of the loop is implemented by the FIR loop lowering logic by 
generating explicit control flow.


---
Full diff: https://github.com/llvm/llvm-project/pull/164992.diff


2 Files Affected:

- (modified) flang/lib/Lower/OpenACC.cpp (+142-101) 
- (modified) flang/test/Lower/OpenACC/acc-unstructured.f90 (+2-2) 


``diff
diff --git a/flang/lib/Lower/OpenACC.cpp b/flang/lib/Lower/OpenACC.cpp
index d7861ac6463c8..7901971088f70 100644
--- a/flang/lib/Lower/OpenACC.cpp
+++ b/flang/lib/Lower/OpenACC.cpp
@@ -2251,6 +2251,49 @@ static void determineDefaultLoopParMode(
   }
 }
 
+// Helper to visit Bounds of DO LOOP nest.
+static void visitLoopControl(
+Fortran::lower::AbstractConverter &converter,
+const Fortran::parser::DoConstruct &outerDoConstruct,
+uint64_t loopsToProcess, Fortran::lower::pft::Evaluation &eval,
+std::function
+callback) {
+  Fortran::lower::pft::Evaluation *crtEval = &eval.getFirstNestedEvaluation();
+  for (uint64_t i = 0; i < loopsToProcess; ++i) {
+const Fortran::parser::LoopControl *loopControl;
+if (i == 0) {
+  loopControl = &*outerDoConstruct.GetLoopControl();
+  mlir::Location loc = converter.genLocation(
+  Fortran::parser::FindSourceLocation(outerDoConstruct));
+  callback(std::get(loopControl->u),
+   loc);
+} else {
+  // Safely locate the next inner DoConstruct within this eval.
+  const Fortran::parser::DoConstruct *innerDo = nullptr;
+  if (crtEval && crtEval->hasNestedEvaluations()) {
+for (Fortran::lower::pft::Evaluation &child :
+ crtEval->getNestedEvaluations()) {
+  if (auto *stmt = child.getIf()) {
+innerDo = stmt;
+// Prepare to descend for the next iteration
+crtEval = &child;
+break;
+  }
+}
+  }
+  if (!innerDo)
+break; // No deeper loop; stop collecting collapsed bounds.
+
+  loopControl = &*innerDo->GetLoopControl();
+  mlir::Location loc =
+  converter.genLocation(Fortran::parser::FindSourceLocation(*innerDo));
+  callback(std::get(loopControl->u),
+   loc);
+}
+  }
+}
+
 // Extract loop bounds, steps, induction variables, and privatization info
 // for both DO CONCURRENT and regular do loops
 static void processDoLoopBounds(
@@ -2272,7 +2315,6 @@ static void processDoLoopBounds(
 llvm::SmallVector &locs, uint64_t loopsToProcess) {
   assert(loopsToProcess > 0 && "expect at least one loop");
   locs.push_back(currentLocation); // Location of the directive
-  Fortran::lower::pft::Evaluation *crtEval = &eval.getFirstNestedEvaluation();
   bool isDoConcurrent = outerDoConstruct.IsDoConcurrent();
 
   if (isDoConcurrent) {
@@ -2313,57 +2355,29 @@ static void processDoLoopBounds(
   inclusiveBounds.push_back(true);
 }
   } else {
-for (uint64_t i = 0; i < loopsToProcess; ++i) {
-  const Fortran::parser::LoopControl *loopControl;
-  if (i == 0) {
-loopControl = &*outerDoConstruct.GetLoopControl();
-locs.push_back(converter.genLocation(
-Fortran::parser::FindSourceLocation(outerDoConstruct)));
-  } else {
-// Safely locate the next inner DoConstruct within this eval.
-const Fortran::parser::DoConstruct *innerDo = nullptr;
-if (crtEval && crtEval->hasNestedEvaluations()) {
-  for (Fortran::lower::pft::Evaluation &child :
-   crtEval->getNestedEvaluations()) {
-if (auto *stmt = child.getIf()) {
-  innerDo = stmt;
-  // Prepare to descend for the next iteration
-  crtEval = &child;
-  break;
-}
-  }
-}
-if (!innerDo)
-  break; // No deeper loop; stop collecting collapsed bounds.
-
-loopControl = &*innerDo->GetLoopControl();
-locs.push_back(converter.genLocation(
-Fortran::parser::FindSourceLocation(*innerDo)));
-  }
-
-  const Fortran::parser::LoopControl::Bounds *bounds =
-  std::get_if(&loopControl->u);
-  assert(bounds && "Expected bounds on the loop construct");
-  lowerbounds.push_back(fir::getBase(converter.genExprValue(
-  *Fortran::semantics::GetExpr(bounds->lower), stmtCtx)));
-  upperbounds.push_back(fir::getBase(converter.genExprValue(
-  *Fortran::semantics::GetExpr(bounds->upper), stmtCtx)));
-  if (bounds->step)
-steps.push_back(fir::getBase(converter.genExprValue(
-*Fortran::semantics::GetExpr(bounds->step), stmtCtx)));
-  else // If `step` is not present, assume it is `1

[llvm-branch-commits] [llvm] RuntimeLibcalls: Split lowering decisions into LibcallLoweringInfo (PR #164987)

2025-10-24 Thread Matt Arsenault via llvm-branch-commits


https://github.com/arsenm created 
https://github.com/llvm/llvm-project/pull/164987

Introduce a new class for the TargetLowering usage. This tracks the
subtarget specific lowering decisions for which libcall to use.
RuntimeLibcallsInfo is a module level property, which may have multiple
implementations of a particular libcall available. This attempts to be
a minimum boilerplate patch to introduce the new concept.

In the future we should have a tablegen way of selecting which
implementations should be used for a subtarget. Currently we
do have some conflicting implementations added, it just happens
to work out that the default cases to prefer is alphabetically
first (plus some of these still are using manual overrides
in TargetLowering constructors).

>From c8e03b977422049a2f616490f35e95b4bac40722 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Tue, 7 Oct 2025 20:00:23 +0900
Subject: [PATCH] RuntimeLibcalls: Split lowering decisions into
 LibcallLoweringInfo

Introduce a new class for the TargetLowering usage. This tracks the
subtarget specific lowering decisions for which libcall to use.
RuntimeLibcallsInfo is a module level property, which may have multiple
implementations of a particular libcall available. This attempts to be
a minimum boilerplate patch to introduce the new concept.

In the future we should have a tablegen way of selecting which
implementations should be used for a subtarget. Currently we
do have some conflicting implementations added, it just happens
to work out that the default cases to prefer is alphabetically
first (plus some of these still are using manual overrides
in TargetLowering constructors).
---
 llvm/include/llvm/CodeGen/TargetLowering.h| 68 +---
 llvm/include/llvm/IR/RuntimeLibcalls.h| 59 +++---
 llvm/lib/CodeGen/TargetLoweringBase.cpp   | 23 +-
 llvm/lib/IR/RuntimeLibcalls.cpp   |  1 +
 llvm/lib/LTO/LTO.cpp  |  7 +-
 .../WebAssemblyRuntimeLibcallSignatures.cpp   | 15 ++--
 .../Utils/DeclareRuntimeLibcalls.cpp  |  4 +-
 .../RuntimeLibcallEmitter-calling-conv.td | 60 +++---
 .../RuntimeLibcallEmitter-conflict-warning.td | 30 +++
 llvm/test/TableGen/RuntimeLibcallEmitter.td   | 78 +--
 .../Util/DeclareRuntimeLibcalls/basic.ll  |  5 +-
 .../TableGen/Basic/RuntimeLibcallsEmitter.cpp | 19 ++---
 12 files changed, 197 insertions(+), 172 deletions(-)

diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h 
b/llvm/include/llvm/CodeGen/TargetLowering.h
index d6ed3a8f739b3..3060c29eab570 100644
--- a/llvm/include/llvm/CodeGen/TargetLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetLowering.h
@@ -193,6 +193,58 @@ struct MemOp {
   }
 };
 
+class LibcallLoweringInfo {
+private:
+  LLVM_ABI const RTLIB::RuntimeLibcallsInfo &RTLCI;
+  /// Stores the implementation choice for each each libcall.
+  LLVM_ABI RTLIB::LibcallImpl LibcallImpls[RTLIB::UNKNOWN_LIBCALL + 1] = {
+  RTLIB::Unsupported};
+
+public:
+  LLVM_ABI LibcallLoweringInfo(const RTLIB::RuntimeLibcallsInfo &RTLCI);
+
+  /// Get the libcall routine name for the specified libcall.
+  // FIXME: This should be removed. Only LibcallImpl should have a name.
+  LLVM_ABI const char *getLibcallName(RTLIB::Libcall Call) const {
+// FIXME: Return StringRef
+return RTLIB::RuntimeLibcallsInfo::getLibcallImplName(LibcallImpls[Call])
+.data();
+  }
+
+  /// Return the lowering's selection of implementation call for \p Call
+  LLVM_ABI RTLIB::LibcallImpl getLibcallImpl(RTLIB::Libcall Call) const {
+return LibcallImpls[Call];
+  }
+
+  /// Rename the default libcall routine name for the specified libcall.
+  LLVM_ABI void setLibcallImpl(RTLIB::Libcall Call, RTLIB::LibcallImpl Impl) {
+LibcallImpls[Call] = Impl;
+  }
+
+  // FIXME: Remove this wrapper in favor of directly using
+  // getLibcallImplCallingConv
+  LLVM_ABI CallingConv::ID getLibcallCallingConv(RTLIB::Libcall Call) const {
+return RTLCI.LibcallImplCallingConvs[LibcallImpls[Call]];
+  }
+
+  /// Get the CallingConv that should be used for the specified libcall.
+  LLVM_ABI CallingConv::ID
+  getLibcallImplCallingConv(RTLIB::LibcallImpl Call) const {
+return RTLCI.LibcallImplCallingConvs[Call];
+  }
+
+  /// Return a function name compatible with RTLIB::MEMCPY, or nullptr if fully
+  /// unsupported.
+  LLVM_ABI StringRef getMemcpyName() const {
+RTLIB::LibcallImpl Memcpy = getLibcallImpl(RTLIB::MEMCPY);
+if (Memcpy != RTLIB::Unsupported)
+  return RTLIB::RuntimeLibcallsInfo::getLibcallImplName(Memcpy);
+
+// Fallback to memmove if memcpy isn't available.
+return getLibcallName(RTLIB::MEMMOVE);
+  }
+};
+
 /// This base class for TargetLowering contains the SelectionDAG-independent
 /// parts that can be used from the rest of CodeGen.
 class LLVM_ABI TargetLoweringBase {
@@ -3590,9 +3642,9 @@ class LLVM_ABI TargetLoweringBase {
   }
 
   /// Get the libcall routine name for the specified libcall.
+  //

[llvm-branch-commits] [llvm] RuntimeLibcalls: Split lowering decisions into LibcallLoweringInfo (PR #164987)

2025-10-24 Thread Matt Arsenault via llvm-branch-commits


arsenm wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/164987?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#164987** https://app.graphite.dev/github/pr/llvm/llvm-project/164987?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/164987?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#164983** https://app.graphite.dev/github/pr/llvm/llvm-project/164983?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#164591** https://app.graphite.dev/github/pr/llvm/llvm-project/164591?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#164195** https://app.graphite.dev/github/pr/llvm/llvm-project/164195?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#164133** https://app.graphite.dev/github/pr/llvm/llvm-project/164133?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#164044** https://app.graphite.dev/github/pr/llvm/llvm-project/164044?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/164987
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU][MC][NFC] Fix True16 instructions in the literals test. (PR #164426)

2025-10-24 Thread Ivan Kosarev via llvm-branch-commits


https://github.com/kosarev updated 
https://github.com/llvm/llvm-project/pull/164426

>From 8e6abdcec3866ac757bae119222c32edbfd2d3a8 Mon Sep 17 00:00:00 2001
From: Ivan Kosarev 
Date: Tue, 21 Oct 2025 14:21:30 +0100
Subject: [PATCH] [AMDGPU][MC][NFC] Fix True16 instructions in the literals
 test.

---
 llvm/test/MC/AMDGPU/literals.s | 52 --
 1 file changed, 24 insertions(+), 28 deletions(-)

diff --git a/llvm/test/MC/AMDGPU/literals.s b/llvm/test/MC/AMDGPU/literals.s
index ae2c0f56144c5..be4e0defa5760 100644
--- a/llvm/test/MC/AMDGPU/literals.s
+++ b/llvm/test/MC/AMDGPU/literals.s
@@ -231,21 +231,19 @@ v_cos_f16_e32 v5.l, lit(1.0)
 // NOGFX89: :[[@LINE-4]]:1: error: operands are not valid for this GPU or mode
 // NOSICI: :[[@LINE-5]]:1: error: instruction not supported on this GPU
 
-v_tanh_bf16 v5, 1.0
-// GFX1250-ASM: v_tanh_bf16_e32 v5, 1.0 ; encoding: 
[0xf2,0x94,0x0a,0x7e]
-// GFX1250-DIS: v_tanh_bf16_e32 v5.l, 1.0   ; encoding: 
[0xf2,0x94,0x0a,0x7e]
-// NOGFX11: :[[@LINE-3]]:1: error: instruction not supported on this GPU
-// NOGFX12: :[[@LINE-4]]:1: error: instruction not supported on this GPU
-// NOGFX89: :[[@LINE-5]]:1: error: instruction not supported on this GPU
-// NOSICI: :[[@LINE-6]]:1: error: instruction not supported on this GPU
+v_tanh_bf16 v5.l, 1.0
+// GFX1250: v_tanh_bf16_e32 v5.l, 1.0   ; encoding: 
[0xf2,0x94,0x0a,0x7e]
+// NOGFX11: :[[@LINE-2]]:1: error: instruction not supported on this GPU
+// NOGFX12: :[[@LINE-3]]:1: error: instruction not supported on this GPU
+// NOGFX89: :[[@LINE-4]]:1: error: instruction not supported on this GPU
+// NOSICI: :[[@LINE-5]]:1: error: instruction not supported on this GPU
 
-v_tanh_bf16 v5, lit(1.0)
-// GFX1250-ASM: v_tanh_bf16_e32 v5, lit(0x3f80) ; encoding: 
[0xff,0x94,0x0a,0x7e,0x80,0x3f,0x00,0x00]
-// GFX1250-DIS: v_tanh_bf16_e32 v5.l, lit(0x3f80)   ; encoding: 
[0xff,0x94,0x0a,0x7e,0x80,0x3f,0x00,0x00]
-// NOGFX11: :[[@LINE-3]]:1: error: instruction not supported on this GPU
-// NOGFX12: :[[@LINE-4]]:1: error: instruction not supported on this GPU
-// NOGFX89: :[[@LINE-5]]:1: error: instruction not supported on this GPU
-// NOSICI: :[[@LINE-6]]:1: error: instruction not supported on this GPU
+v_tanh_bf16 v5.l, lit(1.0)
+// GFX1250: v_tanh_bf16_e32 v5.l, lit(0x3f80)   ; encoding: 
[0xff,0x94,0x0a,0x7e,0x80,0x3f,0x00,0x00]
+// NOGFX11: :[[@LINE-2]]:1: error: instruction not supported on this GPU
+// NOGFX12: :[[@LINE-3]]:1: error: instruction not supported on this GPU
+// NOGFX89: :[[@LINE-4]]:1: error: instruction not supported on this GPU
+// NOSICI: :[[@LINE-5]]:1: error: instruction not supported on this GPU
 
 v_trunc_f32_e32 v0, 1.0
 // GFX11: v_trunc_f32_e32 v0, 1.0 ; encoding: 
[0xf2,0x42,0x00,0x7e]
@@ -685,21 +683,19 @@ v_cos_f16_e32 v5.l, lit(1)
 // NOGFX89: :[[@LINE-4]]:1: error: operands are not valid for this GPU or mode
 // NOSICI: :[[@LINE-5]]:1: error: instruction not supported on this GPU
 
-v_tanh_bf16 v5, 1
-// GFX1250-ASM: v_tanh_bf16_e32 v5, 1   ; encoding: 
[0x81,0x94,0x0a,0x7e]
-// GFX1250-DIS: v_tanh_bf16_e32 v5.l, 1 ; encoding: 
[0x81,0x94,0x0a,0x7e]
-// NOGFX11: :[[@LINE-3]]:1: error: instruction not supported on this GPU
-// NOGFX12: :[[@LINE-4]]:1: error: instruction not supported on this GPU
-// NOGFX89: :[[@LINE-5]]:1: error: instruction not supported on this GPU
-// NOSICI: :[[@LINE-6]]:1: error: instruction not supported on this GPU
+v_tanh_bf16 v5.l, 1
+// GFX1250: v_tanh_bf16_e32 v5.l, 1 ; encoding: 
[0x81,0x94,0x0a,0x7e]
+// NOGFX11: :[[@LINE-2]]:1: error: instruction not supported on this GPU
+// NOGFX12: :[[@LINE-3]]:1: error: instruction not supported on this GPU
+// NOGFX89: :[[@LINE-4]]:1: error: instruction not supported on this GPU
+// NOSICI: :[[@LINE-5]]:1: error: instruction not supported on this GPU
 
-v_tanh_bf16 v5, lit(1)
-// GFX1250-ASM: v_tanh_bf16_e32 v5, lit(0x1); encoding: 
[0xff,0x94,0x0a,0x7e,0x01,0x00,0x00,0x00]
-// GFX1250-DIS: v_tanh_bf16_e32 v5.l, lit(0x1)  ; encoding: 
[0xff,0x94,0x0a,0x7e,0x01,0x00,0x00,0x00]
-// NOGFX11: :[[@LINE-3]]:1: error: instruction not supported on this GPU
-// NOGFX12: :[[@LINE-4]]:1: error: instruction not supported on this GPU
-// NOGFX89: :[[@LINE-5]]:1: error: instruction not supported on this GPU
-// NOSICI: :[[@LINE-6]]:1: error: instruction not supported on this GPU
+v_tanh_bf16 v5.l, lit(1)
+// GFX1250: v_tanh_bf16_e32 v5.l, lit(0x1)  ; encoding: 
[0xff,0x94,0x0a,0x7e,0x01,0x00,0x00,0x00]
+// NOGFX11: :[[@LINE-2]]:1: error: instruction not supported on this GPU
+// NOGFX12: :[[@LINE-3]]:1: error: instruction not supported on this GPU
+// NOGFX89: :[[@LINE-4]]:1: error: instruction not supported on this GPU
+// NOSICI: :[[@LINE-5]]:1: error: instruction not supported on this GPU
 
 v_trunc_f32_e32 v0, 1
 // GFX11: v_trunc_f32_e32 v0, 1   ; encoding: 
[0x81,

[llvm-branch-commits] [llvm] [Utils][NFC] Clean up update_mc_test_checks.py. (PR #164454)

2025-10-24 Thread Ivan Kosarev via llvm-branch-commits


https://github.com/kosarev updated 
https://github.com/llvm/llvm-project/pull/164454

>From 548424cbcba8d24332aa597f8ceda54e10a227c8 Mon Sep 17 00:00:00 2001
From: Ivan Kosarev 
Date: Tue, 21 Oct 2025 17:06:56 +0100
Subject: [PATCH] [Utils][NFC] Clean up update_mc_test_checks.py.

Refine the code a bit to make it easier to comprehend the logic.
---
 llvm/utils/update_mc_test_checks.py | 56 +++--
 1 file changed, 20 insertions(+), 36 deletions(-)

diff --git a/llvm/utils/update_mc_test_checks.py 
b/llvm/utils/update_mc_test_checks.py
index 3de1333b19e0e..791ff0dcc047d 100755
--- a/llvm/utils/update_mc_test_checks.py
+++ b/llvm/utils/update_mc_test_checks.py
@@ -212,9 +212,6 @@ def update_test(ti: common.TestInfo):
 testlines = list(dict.fromkeys(testlines))
 common.debug("Valid test line found: ", len(testlines))
 
-run_list_size = len(run_list)
-testnum = len(testlines)
-
 raw_output = []
 raw_prefixes = []
 for (
@@ -256,14 +253,12 @@ def update_test(ti: common.TestInfo):
 prefix_set = set([prefix for p in run_list for prefix in p[0]])
 common.debug("Rewriting FileCheck prefixes:", str(prefix_set))
 
-for test_id in range(testnum):
-input_line = testlines[test_id]
-
+for test_id, input_line in enumerate(testlines):
 # a {prefix : output, [runid] } dict
 # insert output to a prefix-key dict, and do a max sorting
 # to select the most-used prefix which share the same output string
 p_dict = {}
-for run_id in range(run_list_size):
+for run_id in range(len(run_list)):
 out = raw_output[run_id][test_id]
 
 if hasErr(out):
@@ -271,45 +266,34 @@ def update_test(ti: common.TestInfo):
 else:
 o = getOutputString(out)
 
-prefixes = raw_prefixes[run_id]
-
-for p in prefixes:
+for p in raw_prefixes[run_id]:
 if p not in p_dict:
 p_dict[p] = o, [run_id]
-else:
-if p_dict[p] == (None, []):
-continue
+continue
 
-prev_o, run_ids = p_dict[p]
-if o == prev_o:
-run_ids.append(run_id)
-p_dict[p] = o, run_ids
-else:
-# conflict, discard
-p_dict[p] = None, []
+if p_dict[p] == (None, []):
+continue
 
-p_dict_sorted = dict(sorted(p_dict.items(), key=lambda item: 
-len(item[1][1])))
+prev_o, run_ids = p_dict[p]
+if o == prev_o:
+run_ids.append(run_id)
+p_dict[p] = o, run_ids
+else:
+# conflict, discard
+p_dict[p] = None, []
 
 # prefix is selected and generated with most shared output lines
 # each run_id can only be used once
-used_runid = set()
-
+used_run_ids = set()
 selected_prefixes = set()
-for prefix, tup in p_dict_sorted.items():
-o, run_ids = tup
-
-if len(run_ids) == 0:
-continue
-
-skip = False
-for i in run_ids:
-if i in used_runid:
-skip = True
-else:
-used_runid.add(i)
-if not skip:
+get_num_runs = lambda item: len(item[1][1])
+p_dict_sorted = sorted(p_dict.items(), key=get_num_runs, reverse=True)
+for prefix, (o, run_ids) in p_dict_sorted:
+if run_ids and used_run_ids.isdisjoint(run_ids):
 selected_prefixes.add(prefix)
 
+used_run_ids.update(run_ids)
+
 # Generate check lines in alphabetical order.
 check_lines = []
 for prefix in sorted(selected_prefixes):

___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU][MC][NFC] Fix True16 instructions in the literals test. (PR #164426)

2025-10-24 Thread Ivan Kosarev via llvm-branch-commits


https://github.com/kosarev updated 
https://github.com/llvm/llvm-project/pull/164426

>From 8e6abdcec3866ac757bae119222c32edbfd2d3a8 Mon Sep 17 00:00:00 2001
From: Ivan Kosarev 
Date: Tue, 21 Oct 2025 14:21:30 +0100
Subject: [PATCH] [AMDGPU][MC][NFC] Fix True16 instructions in the literals
 test.

---
 llvm/test/MC/AMDGPU/literals.s | 52 --
 1 file changed, 24 insertions(+), 28 deletions(-)

diff --git a/llvm/test/MC/AMDGPU/literals.s b/llvm/test/MC/AMDGPU/literals.s
index ae2c0f56144c5..be4e0defa5760 100644
--- a/llvm/test/MC/AMDGPU/literals.s
+++ b/llvm/test/MC/AMDGPU/literals.s
@@ -231,21 +231,19 @@ v_cos_f16_e32 v5.l, lit(1.0)
 // NOGFX89: :[[@LINE-4]]:1: error: operands are not valid for this GPU or mode
 // NOSICI: :[[@LINE-5]]:1: error: instruction not supported on this GPU
 
-v_tanh_bf16 v5, 1.0
-// GFX1250-ASM: v_tanh_bf16_e32 v5, 1.0 ; encoding: 
[0xf2,0x94,0x0a,0x7e]
-// GFX1250-DIS: v_tanh_bf16_e32 v5.l, 1.0   ; encoding: 
[0xf2,0x94,0x0a,0x7e]
-// NOGFX11: :[[@LINE-3]]:1: error: instruction not supported on this GPU
-// NOGFX12: :[[@LINE-4]]:1: error: instruction not supported on this GPU
-// NOGFX89: :[[@LINE-5]]:1: error: instruction not supported on this GPU
-// NOSICI: :[[@LINE-6]]:1: error: instruction not supported on this GPU
+v_tanh_bf16 v5.l, 1.0
+// GFX1250: v_tanh_bf16_e32 v5.l, 1.0   ; encoding: 
[0xf2,0x94,0x0a,0x7e]
+// NOGFX11: :[[@LINE-2]]:1: error: instruction not supported on this GPU
+// NOGFX12: :[[@LINE-3]]:1: error: instruction not supported on this GPU
+// NOGFX89: :[[@LINE-4]]:1: error: instruction not supported on this GPU
+// NOSICI: :[[@LINE-5]]:1: error: instruction not supported on this GPU
 
-v_tanh_bf16 v5, lit(1.0)
-// GFX1250-ASM: v_tanh_bf16_e32 v5, lit(0x3f80) ; encoding: 
[0xff,0x94,0x0a,0x7e,0x80,0x3f,0x00,0x00]
-// GFX1250-DIS: v_tanh_bf16_e32 v5.l, lit(0x3f80)   ; encoding: 
[0xff,0x94,0x0a,0x7e,0x80,0x3f,0x00,0x00]
-// NOGFX11: :[[@LINE-3]]:1: error: instruction not supported on this GPU
-// NOGFX12: :[[@LINE-4]]:1: error: instruction not supported on this GPU
-// NOGFX89: :[[@LINE-5]]:1: error: instruction not supported on this GPU
-// NOSICI: :[[@LINE-6]]:1: error: instruction not supported on this GPU
+v_tanh_bf16 v5.l, lit(1.0)
+// GFX1250: v_tanh_bf16_e32 v5.l, lit(0x3f80)   ; encoding: 
[0xff,0x94,0x0a,0x7e,0x80,0x3f,0x00,0x00]
+// NOGFX11: :[[@LINE-2]]:1: error: instruction not supported on this GPU
+// NOGFX12: :[[@LINE-3]]:1: error: instruction not supported on this GPU
+// NOGFX89: :[[@LINE-4]]:1: error: instruction not supported on this GPU
+// NOSICI: :[[@LINE-5]]:1: error: instruction not supported on this GPU
 
 v_trunc_f32_e32 v0, 1.0
 // GFX11: v_trunc_f32_e32 v0, 1.0 ; encoding: 
[0xf2,0x42,0x00,0x7e]
@@ -685,21 +683,19 @@ v_cos_f16_e32 v5.l, lit(1)
 // NOGFX89: :[[@LINE-4]]:1: error: operands are not valid for this GPU or mode
 // NOSICI: :[[@LINE-5]]:1: error: instruction not supported on this GPU
 
-v_tanh_bf16 v5, 1
-// GFX1250-ASM: v_tanh_bf16_e32 v5, 1   ; encoding: 
[0x81,0x94,0x0a,0x7e]
-// GFX1250-DIS: v_tanh_bf16_e32 v5.l, 1 ; encoding: 
[0x81,0x94,0x0a,0x7e]
-// NOGFX11: :[[@LINE-3]]:1: error: instruction not supported on this GPU
-// NOGFX12: :[[@LINE-4]]:1: error: instruction not supported on this GPU
-// NOGFX89: :[[@LINE-5]]:1: error: instruction not supported on this GPU
-// NOSICI: :[[@LINE-6]]:1: error: instruction not supported on this GPU
+v_tanh_bf16 v5.l, 1
+// GFX1250: v_tanh_bf16_e32 v5.l, 1 ; encoding: 
[0x81,0x94,0x0a,0x7e]
+// NOGFX11: :[[@LINE-2]]:1: error: instruction not supported on this GPU
+// NOGFX12: :[[@LINE-3]]:1: error: instruction not supported on this GPU
+// NOGFX89: :[[@LINE-4]]:1: error: instruction not supported on this GPU
+// NOSICI: :[[@LINE-5]]:1: error: instruction not supported on this GPU
 
-v_tanh_bf16 v5, lit(1)
-// GFX1250-ASM: v_tanh_bf16_e32 v5, lit(0x1); encoding: 
[0xff,0x94,0x0a,0x7e,0x01,0x00,0x00,0x00]
-// GFX1250-DIS: v_tanh_bf16_e32 v5.l, lit(0x1)  ; encoding: 
[0xff,0x94,0x0a,0x7e,0x01,0x00,0x00,0x00]
-// NOGFX11: :[[@LINE-3]]:1: error: instruction not supported on this GPU
-// NOGFX12: :[[@LINE-4]]:1: error: instruction not supported on this GPU
-// NOGFX89: :[[@LINE-5]]:1: error: instruction not supported on this GPU
-// NOSICI: :[[@LINE-6]]:1: error: instruction not supported on this GPU
+v_tanh_bf16 v5.l, lit(1)
+// GFX1250: v_tanh_bf16_e32 v5.l, lit(0x1)  ; encoding: 
[0xff,0x94,0x0a,0x7e,0x01,0x00,0x00,0x00]
+// NOGFX11: :[[@LINE-2]]:1: error: instruction not supported on this GPU
+// NOGFX12: :[[@LINE-3]]:1: error: instruction not supported on this GPU
+// NOGFX89: :[[@LINE-4]]:1: error: instruction not supported on this GPU
+// NOSICI: :[[@LINE-5]]:1: error: instruction not supported on this GPU
 
 v_trunc_f32_e32 v0, 1
 // GFX11: v_trunc_f32_e32 v0, 1   ; encoding: 
[0x81,

[llvm-branch-commits] [llvm] [Utils][NFC] Clean up update_mc_test_checks.py. (PR #164454)

2025-10-24 Thread Ivan Kosarev via llvm-branch-commits


https://github.com/kosarev updated 
https://github.com/llvm/llvm-project/pull/164454

>From 548424cbcba8d24332aa597f8ceda54e10a227c8 Mon Sep 17 00:00:00 2001
From: Ivan Kosarev 
Date: Tue, 21 Oct 2025 17:06:56 +0100
Subject: [PATCH] [Utils][NFC] Clean up update_mc_test_checks.py.

Refine the code a bit to make it easier to comprehend the logic.
---
 llvm/utils/update_mc_test_checks.py | 56 +++--
 1 file changed, 20 insertions(+), 36 deletions(-)

diff --git a/llvm/utils/update_mc_test_checks.py 
b/llvm/utils/update_mc_test_checks.py
index 3de1333b19e0e..791ff0dcc047d 100755
--- a/llvm/utils/update_mc_test_checks.py
+++ b/llvm/utils/update_mc_test_checks.py
@@ -212,9 +212,6 @@ def update_test(ti: common.TestInfo):
 testlines = list(dict.fromkeys(testlines))
 common.debug("Valid test line found: ", len(testlines))
 
-run_list_size = len(run_list)
-testnum = len(testlines)
-
 raw_output = []
 raw_prefixes = []
 for (
@@ -256,14 +253,12 @@ def update_test(ti: common.TestInfo):
 prefix_set = set([prefix for p in run_list for prefix in p[0]])
 common.debug("Rewriting FileCheck prefixes:", str(prefix_set))
 
-for test_id in range(testnum):
-input_line = testlines[test_id]
-
+for test_id, input_line in enumerate(testlines):
 # a {prefix : output, [runid] } dict
 # insert output to a prefix-key dict, and do a max sorting
 # to select the most-used prefix which share the same output string
 p_dict = {}
-for run_id in range(run_list_size):
+for run_id in range(len(run_list)):
 out = raw_output[run_id][test_id]
 
 if hasErr(out):
@@ -271,45 +266,34 @@ def update_test(ti: common.TestInfo):
 else:
 o = getOutputString(out)
 
-prefixes = raw_prefixes[run_id]
-
-for p in prefixes:
+for p in raw_prefixes[run_id]:
 if p not in p_dict:
 p_dict[p] = o, [run_id]
-else:
-if p_dict[p] == (None, []):
-continue
+continue
 
-prev_o, run_ids = p_dict[p]
-if o == prev_o:
-run_ids.append(run_id)
-p_dict[p] = o, run_ids
-else:
-# conflict, discard
-p_dict[p] = None, []
+if p_dict[p] == (None, []):
+continue
 
-p_dict_sorted = dict(sorted(p_dict.items(), key=lambda item: 
-len(item[1][1])))
+prev_o, run_ids = p_dict[p]
+if o == prev_o:
+run_ids.append(run_id)
+p_dict[p] = o, run_ids
+else:
+# conflict, discard
+p_dict[p] = None, []
 
 # prefix is selected and generated with most shared output lines
 # each run_id can only be used once
-used_runid = set()
-
+used_run_ids = set()
 selected_prefixes = set()
-for prefix, tup in p_dict_sorted.items():
-o, run_ids = tup
-
-if len(run_ids) == 0:
-continue
-
-skip = False
-for i in run_ids:
-if i in used_runid:
-skip = True
-else:
-used_runid.add(i)
-if not skip:
+get_num_runs = lambda item: len(item[1][1])
+p_dict_sorted = sorted(p_dict.items(), key=get_num_runs, reverse=True)
+for prefix, (o, run_ids) in p_dict_sorted:
+if run_ids and used_run_ids.isdisjoint(run_ids):
 selected_prefixes.add(prefix)
 
+used_run_ids.update(run_ids)
+
 # Generate check lines in alphabetical order.
 check_lines = []
 for prefix in sorted(selected_prefixes):

___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [DAGCombiner] Relax nsz constraint for more FP optimizations (PR #165011)

2025-10-24 Thread Guy David via llvm-branch-commits


https://github.com/guy-david created 
https://github.com/llvm/llvm-project/pull/165011

[DAGCombiner] Relax nsz constraint for more FP optimizations

Some floating-point optimization don't trigger because they can produce 
incorrect results around signed zeros, and rely on the existence of the nsz 
flag which commonly appears when fast-math is enabled.
However, this flag is not a hard requirement when all of the users of the 
combined value are either guranteed to overwrite the sign-bit or simply ignore 
it (comparisons, etc.).

The optimizations affected:
- fadd x, -0.0 -> x
- fsub x, 0.0 -> x
- fsub -0.0, x -> fneg x
- fdiv x, sqrt(x) -> sqrt(x)
- frem lowering with power-of-2 divisors

>From 44ed78de41205c4a39fd5141f99f021b473ec9cb Mon Sep 17 00:00:00 2001
From: Guy David 
Date: Fri, 24 Oct 2025 19:30:19 +0300
Subject: [PATCH] [DAGCombiner] Relax nsz constraint for more FP optimizations

Some floating-point optimization don't trigger because they can produce
incorrect results around signed zeros, and rely on the existence of the
nsz flag which commonly appears when fast-math is enabled.
However, this flag is not a hard requirement when all of the users of
the combined value are either guranteed to overwrite the sign-bit or
simply ignore it (comparisons, etc.).

The optimizations affected:
- fadd x, -0.0 -> x
- fsub x, 0.0 -> x
- fsub -0.0, x -> fneg x
- fdiv x, sqrt(x) -> sqrt(x)
- frem lowering with power-of-2 divisors
---
 llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | 16 +++--
 llvm/test/CodeGen/AArch64/nsz-bypass.ll   | 72 +++
 llvm/test/CodeGen/AMDGPU/swdev380865.ll   |  5 +-
 3 files changed, 85 insertions(+), 8 deletions(-)
 create mode 100644 llvm/test/CodeGen/AArch64/nsz-bypass.ll

diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp 
b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 73aed33fe0838..f2b4e74cc65c7 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -17781,7 +17781,8 @@ SDValue DAGCombiner::visitFADD(SDNode *N) {
   // N0 + -0.0 --> N0 (also allowed with +0.0 and fast-math)
   ConstantFPSDNode *N1C = isConstOrConstSplatFP(N1, true);
   if (N1C && N1C->isZero())
-if (N1C->isNegative() || Flags.hasNoSignedZeros())
+if (N1C->isNegative() || Flags.hasNoSignedZeros() ||
+DAG.allUsesSignedZeroInsensitive(SDValue(N, 0)))
   return N0;
 
   if (SDValue NewSel = foldBinOpIntoSelect(N))
@@ -17993,7 +17994,8 @@ SDValue DAGCombiner::visitFSUB(SDNode *N) {
 
   // (fsub A, 0) -> A
   if (N1CFP && N1CFP->isZero()) {
-if (!N1CFP->isNegative() || Flags.hasNoSignedZeros()) {
+if (!N1CFP->isNegative() || Flags.hasNoSignedZeros() ||
+DAG.allUsesSignedZeroInsensitive(SDValue(N, 0))) {
   return N0;
 }
   }
@@ -18006,7 +18008,8 @@ SDValue DAGCombiner::visitFSUB(SDNode *N) {
 
   // (fsub -0.0, N1) -> -N1
   if (N0CFP && N0CFP->isZero()) {
-if (N0CFP->isNegative() || Flags.hasNoSignedZeros()) {
+if (N0CFP->isNegative() || Flags.hasNoSignedZeros() ||
+DAG.allUsesSignedZeroInsensitive(SDValue(N, 0))) {
   // We cannot replace an FSUB(+-0.0,X) with FNEG(X) when denormals are
   // flushed to zero, unless all users treat denorms as zero (DAZ).
   // FIXME: This transform will change the sign of a NaN and the behavior
@@ -18654,7 +18657,9 @@ SDValue DAGCombiner::visitFDIV(SDNode *N) {
   }
 
   // Fold X/Sqrt(X) -> Sqrt(X)
-  if (Flags.hasNoSignedZeros() && Flags.hasAllowReassociation())
+  if ((Flags.hasNoSignedZeros() ||
+   DAG.allUsesSignedZeroInsensitive(SDValue(N, 0))) &&
+  Flags.hasAllowReassociation())
 if (N1.getOpcode() == ISD::FSQRT && N0 == N1.getOperand(0))
   return N1;
 
@@ -18706,7 +18711,8 @@ SDValue DAGCombiner::visitFREM(SDNode *N) {
   TLI.isOperationLegalOrCustom(ISD::FTRUNC, VT) &&
   DAG.isKnownToBeAPowerOfTwoFP(N1)) {
 bool NeedsCopySign =
-!Flags.hasNoSignedZeros() && !DAG.cannotBeOrderedNegativeFP(N0);
+!Flags.hasNoSignedZeros() && !DAG.cannotBeOrderedNegativeFP(N0) &&
+!DAG.allUsesSignedZeroInsensitive(SDValue(N, 0));
 SDValue Div = DAG.getNode(ISD::FDIV, DL, VT, N0, N1);
 SDValue Rnd = DAG.getNode(ISD::FTRUNC, DL, VT, Div);
 SDValue MLA;
diff --git a/llvm/test/CodeGen/AArch64/nsz-bypass.ll 
b/llvm/test/CodeGen/AArch64/nsz-bypass.ll
new file mode 100644
index 0..3b17e410ac380
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/nsz-bypass.ll
@@ -0,0 +1,72 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc < %s -mtriple=aarch64 | FileCheck %s
+
+; Test that nsz constraint can be bypassed when all uses are sign-insensitive.
+
+define i1 @test_fadd_neg_zero_fcmp(float %x) {
+; CHECK-LABEL: test_fadd_neg_zero_fcmp:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:fmov s1, #1.
+; CHECK-NEXT:fcmp s0, s1
+; CHECK-NEXT:cset w0, eq
+; CHECK-NEXT:ret
+  %add = fadd float %x, -0.0
+  %cmp = fcmp oeq

[llvm-branch-commits] [llvm] [DAGCombiner] Relax nsz constraint for more FP optimizations (PR #165011)

2025-10-24 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-backend-aarch64

Author: Guy David (guy-david)


Changes

[DAGCombiner] Relax nsz constraint for more FP optimizations

Some floating-point optimization don't trigger because they can produce 
incorrect results around signed zeros, and rely on the existence of the nsz 
flag which commonly appears when fast-math is enabled.
However, this flag is not a hard requirement when all of the users of the 
combined value are either guranteed to overwrite the sign-bit or simply ignore 
it (comparisons, etc.).

The optimizations affected:
- fadd x, -0.0 -> x
- fsub x, 0.0 -> x
- fsub -0.0, x -> fneg x
- fdiv x, sqrt(x) -> sqrt(x)
- frem lowering with power-of-2 divisors

---
Full diff: https://github.com/llvm/llvm-project/pull/165011.diff


3 Files Affected:

- (modified) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (+11-5) 
- (added) llvm/test/CodeGen/AArch64/nsz-bypass.ll (+72) 
- (modified) llvm/test/CodeGen/AMDGPU/swdev380865.ll (+2-3) 


``diff
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp 
b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 73aed33fe0838..f2b4e74cc65c7 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -17781,7 +17781,8 @@ SDValue DAGCombiner::visitFADD(SDNode *N) {
   // N0 + -0.0 --> N0 (also allowed with +0.0 and fast-math)
   ConstantFPSDNode *N1C = isConstOrConstSplatFP(N1, true);
   if (N1C && N1C->isZero())
-if (N1C->isNegative() || Flags.hasNoSignedZeros())
+if (N1C->isNegative() || Flags.hasNoSignedZeros() ||
+DAG.allUsesSignedZeroInsensitive(SDValue(N, 0)))
   return N0;
 
   if (SDValue NewSel = foldBinOpIntoSelect(N))
@@ -17993,7 +17994,8 @@ SDValue DAGCombiner::visitFSUB(SDNode *N) {
 
   // (fsub A, 0) -> A
   if (N1CFP && N1CFP->isZero()) {
-if (!N1CFP->isNegative() || Flags.hasNoSignedZeros()) {
+if (!N1CFP->isNegative() || Flags.hasNoSignedZeros() ||
+DAG.allUsesSignedZeroInsensitive(SDValue(N, 0))) {
   return N0;
 }
   }
@@ -18006,7 +18008,8 @@ SDValue DAGCombiner::visitFSUB(SDNode *N) {
 
   // (fsub -0.0, N1) -> -N1
   if (N0CFP && N0CFP->isZero()) {
-if (N0CFP->isNegative() || Flags.hasNoSignedZeros()) {
+if (N0CFP->isNegative() || Flags.hasNoSignedZeros() ||
+DAG.allUsesSignedZeroInsensitive(SDValue(N, 0))) {
   // We cannot replace an FSUB(+-0.0,X) with FNEG(X) when denormals are
   // flushed to zero, unless all users treat denorms as zero (DAZ).
   // FIXME: This transform will change the sign of a NaN and the behavior
@@ -18654,7 +18657,9 @@ SDValue DAGCombiner::visitFDIV(SDNode *N) {
   }
 
   // Fold X/Sqrt(X) -> Sqrt(X)
-  if (Flags.hasNoSignedZeros() && Flags.hasAllowReassociation())
+  if ((Flags.hasNoSignedZeros() ||
+   DAG.allUsesSignedZeroInsensitive(SDValue(N, 0))) &&
+  Flags.hasAllowReassociation())
 if (N1.getOpcode() == ISD::FSQRT && N0 == N1.getOperand(0))
   return N1;
 
@@ -18706,7 +18711,8 @@ SDValue DAGCombiner::visitFREM(SDNode *N) {
   TLI.isOperationLegalOrCustom(ISD::FTRUNC, VT) &&
   DAG.isKnownToBeAPowerOfTwoFP(N1)) {
 bool NeedsCopySign =
-!Flags.hasNoSignedZeros() && !DAG.cannotBeOrderedNegativeFP(N0);
+!Flags.hasNoSignedZeros() && !DAG.cannotBeOrderedNegativeFP(N0) &&
+!DAG.allUsesSignedZeroInsensitive(SDValue(N, 0));
 SDValue Div = DAG.getNode(ISD::FDIV, DL, VT, N0, N1);
 SDValue Rnd = DAG.getNode(ISD::FTRUNC, DL, VT, Div);
 SDValue MLA;
diff --git a/llvm/test/CodeGen/AArch64/nsz-bypass.ll 
b/llvm/test/CodeGen/AArch64/nsz-bypass.ll
new file mode 100644
index 0..3b17e410ac380
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/nsz-bypass.ll
@@ -0,0 +1,72 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc < %s -mtriple=aarch64 | FileCheck %s
+
+; Test that nsz constraint can be bypassed when all uses are sign-insensitive.
+
+define i1 @test_fadd_neg_zero_fcmp(float %x) {
+; CHECK-LABEL: test_fadd_neg_zero_fcmp:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:fmov s1, #1.
+; CHECK-NEXT:fcmp s0, s1
+; CHECK-NEXT:cset w0, eq
+; CHECK-NEXT:ret
+  %add = fadd float %x, -0.0
+  %cmp = fcmp oeq float %add, 1.0
+  ret i1 %cmp
+}
+
+define float @test_fsub_zero_fabs(float %x) {
+; CHECK-LABEL: test_fsub_zero_fabs:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:fabs s0, s0
+; CHECK-NEXT:ret
+  %sub = fsub float %x, 0.0
+  %abs = call float @llvm.fabs.f32(float %sub)
+  ret float %abs
+}
+
+define float @test_fsub_neg_zero_copysign(float %x, float %y) {
+; CHECK-LABEL: test_fsub_neg_zero_copysign:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:mvni v2.4s, #128, lsl #24
+; CHECK-NEXT:// kill: def $s0 killed $s0 def $q0
+; CHECK-NEXT:// kill: def $s1 killed $s1 def $q1
+; CHECK-NEXT:bif v0.16b, v1.16b, v2.16b
+; CHECK-NEXT:// kill: def $s0 killed $s0 killed $q0
+; CHECK-NEXT:ret
+  %sub = fsub float -0.0, %x
+  %co

[llvm-branch-commits] [llvm] [DAGCombiner] Relax nsz constraint for more FP optimizations (PR #165011)

2025-10-24 Thread via llvm-branch-commits


github-actions[bot] wrote:




:warning: C/C++ code formatter, clang-format found issues in your code. 
:warning:



You can test this locally with the following command:


``bash
git-clang-format --diff origin/main HEAD --extensions cpp -- 
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp --diff_from_common_commit
``

:warning:
The reproduction instructions above might return results for more than one PR
in a stack if you are using a stacked PR workflow. You can limit the results by
changing `origin/main` to the base branch/commit you want to compare against.
:warning:





View the diff from clang-format here.


``diff
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp 
b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index f2b4e74cc..a2c48f44b 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -18710,9 +18710,9 @@ SDValue DAGCombiner::visitFREM(SDNode *N) {
   TLI.isOperationLegalOrCustom(ISD::FDIV, VT) &&
   TLI.isOperationLegalOrCustom(ISD::FTRUNC, VT) &&
   DAG.isKnownToBeAPowerOfTwoFP(N1)) {
-bool NeedsCopySign =
-!Flags.hasNoSignedZeros() && !DAG.cannotBeOrderedNegativeFP(N0) &&
-!DAG.allUsesSignedZeroInsensitive(SDValue(N, 0));
+bool NeedsCopySign = !Flags.hasNoSignedZeros() &&
+ !DAG.cannotBeOrderedNegativeFP(N0) &&
+ !DAG.allUsesSignedZeroInsensitive(SDValue(N, 0));
 SDValue Div = DAG.getNode(ISD::FDIV, DL, VT, N0, N1);
 SDValue Rnd = DAG.getNode(ISD::FTRUNC, DL, VT, Div);
 SDValue MLA;

``




https://github.com/llvm/llvm-project/pull/165011
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] RuntimeLibcalls: Split lowering decisions into LibcallLoweringInfo (PR #164987)

2025-10-24 Thread via llvm-branch-commits


llvmbot wrote:



@llvm/pr-subscribers-backend-arm

@llvm/pr-subscribers-llvm-ir

Author: Matt Arsenault (arsenm)


Changes

Introduce a new class for the TargetLowering usage. This tracks the
subtarget specific lowering decisions for which libcall to use.
RuntimeLibcallsInfo is a module level property, which may have multiple
implementations of a particular libcall available. This attempts to be
a minimum boilerplate patch to introduce the new concept.

In the future we should have a tablegen way of selecting which
implementations should be used for a subtarget. Currently we
do have some conflicting implementations added, it just happens
to work out that the default cases to prefer is alphabetically
first (plus some of these still are using manual overrides
in TargetLowering constructors).

---

Patch is 32.90 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/164987.diff


12 Files Affected:

- (modified) llvm/include/llvm/CodeGen/TargetLowering.h (+59-9) 
- (modified) llvm/include/llvm/IR/RuntimeLibcalls.h (+11-48) 
- (modified) llvm/lib/CodeGen/TargetLoweringBase.cpp (+20-3) 
- (modified) llvm/lib/IR/RuntimeLibcalls.cpp (+1) 
- (modified) llvm/lib/LTO/LTO.cpp (+3-4) 
- (modified) 
llvm/lib/Target/WebAssembly/WebAssemblyRuntimeLibcallSignatures.cpp (+8-7) 
- (modified) llvm/lib/Transforms/Utils/DeclareRuntimeLibcalls.cpp (+2-2) 
- (modified) llvm/test/TableGen/RuntimeLibcallEmitter-calling-conv.td (+30-30) 
- (modified) llvm/test/TableGen/RuntimeLibcallEmitter-conflict-warning.td 
(+15-15) 
- (modified) llvm/test/TableGen/RuntimeLibcallEmitter.td (+37-41) 
- (modified) llvm/test/Transforms/Util/DeclareRuntimeLibcalls/basic.ll (+3-2) 
- (modified) llvm/utils/TableGen/Basic/RuntimeLibcallsEmitter.cpp (+8-11) 


``diff
diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h 
b/llvm/include/llvm/CodeGen/TargetLowering.h
index d6ed3a8f739b3..3060c29eab570 100644
--- a/llvm/include/llvm/CodeGen/TargetLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetLowering.h
@@ -193,6 +193,58 @@ struct MemOp {
   }
 };
 
+class LibcallLoweringInfo {
+private:
+  LLVM_ABI const RTLIB::RuntimeLibcallsInfo &RTLCI;
+  /// Stores the implementation choice for each each libcall.
+  LLVM_ABI RTLIB::LibcallImpl LibcallImpls[RTLIB::UNKNOWN_LIBCALL + 1] = {
+  RTLIB::Unsupported};
+
+public:
+  LLVM_ABI LibcallLoweringInfo(const RTLIB::RuntimeLibcallsInfo &RTLCI);
+
+  /// Get the libcall routine name for the specified libcall.
+  // FIXME: This should be removed. Only LibcallImpl should have a name.
+  LLVM_ABI const char *getLibcallName(RTLIB::Libcall Call) const {
+// FIXME: Return StringRef
+return RTLIB::RuntimeLibcallsInfo::getLibcallImplName(LibcallImpls[Call])
+.data();
+  }
+
+  /// Return the lowering's selection of implementation call for \p Call
+  LLVM_ABI RTLIB::LibcallImpl getLibcallImpl(RTLIB::Libcall Call) const {
+return LibcallImpls[Call];
+  }
+
+  /// Rename the default libcall routine name for the specified libcall.
+  LLVM_ABI void setLibcallImpl(RTLIB::Libcall Call, RTLIB::LibcallImpl Impl) {
+LibcallImpls[Call] = Impl;
+  }
+
+  // FIXME: Remove this wrapper in favor of directly using
+  // getLibcallImplCallingConv
+  LLVM_ABI CallingConv::ID getLibcallCallingConv(RTLIB::Libcall Call) const {
+return RTLCI.LibcallImplCallingConvs[LibcallImpls[Call]];
+  }
+
+  /// Get the CallingConv that should be used for the specified libcall.
+  LLVM_ABI CallingConv::ID
+  getLibcallImplCallingConv(RTLIB::LibcallImpl Call) const {
+return RTLCI.LibcallImplCallingConvs[Call];
+  }
+
+  /// Return a function name compatible with RTLIB::MEMCPY, or nullptr if fully
+  /// unsupported.
+  LLVM_ABI StringRef getMemcpyName() const {
+RTLIB::LibcallImpl Memcpy = getLibcallImpl(RTLIB::MEMCPY);
+if (Memcpy != RTLIB::Unsupported)
+  return RTLIB::RuntimeLibcallsInfo::getLibcallImplName(Memcpy);
+
+// Fallback to memmove if memcpy isn't available.
+return getLibcallName(RTLIB::MEMMOVE);
+  }
+};
+
 /// This base class for TargetLowering contains the SelectionDAG-independent
 /// parts that can be used from the rest of CodeGen.
 class LLVM_ABI TargetLoweringBase {
@@ -3590,9 +3642,9 @@ class LLVM_ABI TargetLoweringBase {
   }
 
   /// Get the libcall routine name for the specified libcall.
+  // FIXME: This should be removed. Only LibcallImpl should have a name.
   const char *getLibcallName(RTLIB::Libcall Call) const {
-// FIXME: Return StringRef
-return Libcalls.getLibcallName(Call).data();
+return Libcalls.getLibcallName(Call);
   }
 
   /// Get the libcall routine name for the specified libcall implementation
@@ -3608,7 +3660,7 @@ class LLVM_ABI TargetLoweringBase {
   /// Check if this is valid libcall for the current module, otherwise
   /// RTLIB::Unsupported.
   RTLIB::LibcallImpl getSupportedLibcallImpl(StringRef FuncName) const {
-return Libcalls.getSupportedLibcallImpl(FuncName);
+return RuntimeLibca

[llvm-branch-commits] [llvm] RuntimeLibcalls: Split lowering decisions into LibcallLoweringInfo (PR #164987)

2025-10-24 Thread Matt Arsenault via llvm-branch-commits


https://github.com/arsenm ready_for_review 
https://github.com/llvm/llvm-project/pull/164987
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] X86: Make sure compiler-rt div calls are not added for msvc (PR #164591)

2025-10-24 Thread Matt Arsenault via llvm-branch-commits


https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/164591

>From a84fbd5ff20026d1054f87f8a25a41f4a12eb8ae Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Wed, 22 Oct 2025 16:30:46 +0900
Subject: [PATCH] X86: Make sure compiler-rt div calls are not added for msvc

The current predicate system is primitive, we ought to have
a way to list a chain of alternatives.
---
 llvm/include/llvm/IR/RuntimeLibcalls.td | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/llvm/include/llvm/IR/RuntimeLibcalls.td 
b/llvm/include/llvm/IR/RuntimeLibcalls.td
index 9d394af83ee7f..04e0ea3ee75a9 100644
--- a/llvm/include/llvm/IR/RuntimeLibcalls.td
+++ b/llvm/include/llvm/IR/RuntimeLibcalls.td
@@ -2452,6 +2452,11 @@ def _aullrem : RuntimeLibcallImpl;
 def _allmul : RuntimeLibcallImpl;
 }
 
+// FIXME: Should have utility function to filter by known provider.
+defvar WindowsDivRemMulLibcallOverrides = [
+  __divdi3, __udivdi3, __moddi3, __umoddi3, __muldi3
+];
+
 
//===--===//
 // X86 Runtime Libcalls
 
//===--===//
@@ -2473,7 +2478,7 @@ defvar X86_F128_Libcalls = LibcallImpls<(add 
LibmF128Libcalls, LibmF128FiniteLib
 defvar SinCosF32F64Libcalls = LibcallImpls<(add sincosf, sincos), 
hasSinCos_f32_f64>;
 
 defvar X86CommonLibcalls =
-  (add WinDefaultLibcallImpls,
+  (add (sub WinDefaultLibcallImpls, WindowsDivRemMulLibcallOverrides),
DarwinSinCosStret, DarwinExp10,
X86_F128_Libcalls,
LibmHasSinCosF80, // FIXME: Depends on long double
@@ -2496,10 +2501,15 @@ defvar Windows32DivRemMulCalls =
   LibcallsWithCC<(add WindowsDivRemMulLibcalls), X86_STDCALL,
   RuntimeLibcallPredicate<"TT.isWindowsMSVCEnvironment() || 
TT.isWindowsItaniumEnvironment()">>;
 
+defvar NotWindows32DivRemMulCalls =
+  LibcallImpls<(add WindowsDivRemMulLibcallOverrides),
+RuntimeLibcallPredicate<"!TT.isWindowsMSVCEnvironment() && 
!TT.isWindowsItaniumEnvironment()">>;
+
 def X86_32SystemLibrary
 : SystemRuntimeLibrary;
+ NotWindows32DivRemMulCalls,
+ Windows32DivRemMulCalls)>;
 
 def X86_64SystemLibrary
 : SystemRuntimeLibraryhttps://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] X86: Make sure compiler-rt div calls are not added for msvc (PR #164591)

2025-10-24 Thread Matt Arsenault via llvm-branch-commits


https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/164591

>From a84fbd5ff20026d1054f87f8a25a41f4a12eb8ae Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Wed, 22 Oct 2025 16:30:46 +0900
Subject: [PATCH] X86: Make sure compiler-rt div calls are not added for msvc

The current predicate system is primitive, we ought to have
a way to list a chain of alternatives.
---
 llvm/include/llvm/IR/RuntimeLibcalls.td | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/llvm/include/llvm/IR/RuntimeLibcalls.td 
b/llvm/include/llvm/IR/RuntimeLibcalls.td
index 9d394af83ee7f..04e0ea3ee75a9 100644
--- a/llvm/include/llvm/IR/RuntimeLibcalls.td
+++ b/llvm/include/llvm/IR/RuntimeLibcalls.td
@@ -2452,6 +2452,11 @@ def _aullrem : RuntimeLibcallImpl;
 def _allmul : RuntimeLibcallImpl;
 }
 
+// FIXME: Should have utility function to filter by known provider.
+defvar WindowsDivRemMulLibcallOverrides = [
+  __divdi3, __udivdi3, __moddi3, __umoddi3, __muldi3
+];
+
 
//===--===//
 // X86 Runtime Libcalls
 
//===--===//
@@ -2473,7 +2478,7 @@ defvar X86_F128_Libcalls = LibcallImpls<(add 
LibmF128Libcalls, LibmF128FiniteLib
 defvar SinCosF32F64Libcalls = LibcallImpls<(add sincosf, sincos), 
hasSinCos_f32_f64>;
 
 defvar X86CommonLibcalls =
-  (add WinDefaultLibcallImpls,
+  (add (sub WinDefaultLibcallImpls, WindowsDivRemMulLibcallOverrides),
DarwinSinCosStret, DarwinExp10,
X86_F128_Libcalls,
LibmHasSinCosF80, // FIXME: Depends on long double
@@ -2496,10 +2501,15 @@ defvar Windows32DivRemMulCalls =
   LibcallsWithCC<(add WindowsDivRemMulLibcalls), X86_STDCALL,
   RuntimeLibcallPredicate<"TT.isWindowsMSVCEnvironment() || 
TT.isWindowsItaniumEnvironment()">>;
 
+defvar NotWindows32DivRemMulCalls =
+  LibcallImpls<(add WindowsDivRemMulLibcallOverrides),
+RuntimeLibcallPredicate<"!TT.isWindowsMSVCEnvironment() && 
!TT.isWindowsItaniumEnvironment()">>;
+
 def X86_32SystemLibrary
 : SystemRuntimeLibrary;
+ NotWindows32DivRemMulCalls,
+ Windows32DivRemMulCalls)>;
 
 def X86_64SystemLibrary
 : SystemRuntimeLibraryhttps://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] ARM: Avoid adding default libcalls overridden by AEABI functions (PR #164983)

2025-10-24 Thread Matt Arsenault via llvm-branch-commits


https://github.com/arsenm created 
https://github.com/llvm/llvm-project/pull/164983

Avoids adding alternative libcall impls for the same libcall.

I'm not sure if the default names exist or not, or are just not
preferred. compiler-rt appears to define aliases for all of these,
so I'm not sure why we bother distinguishing these in the first place.

>From f615c5a24cb5c51a81079191944507fbd84960ef Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Thu, 23 Oct 2025 15:00:05 +0900
Subject: [PATCH] ARM: Avoid adding default libcalls overridden by AEABI
 functions

Avoids adding alternative libcall impls for the same libcall.

I'm not sure if the default names exist or not, or are just not
preferred. compiler-rt appears to define aliases for all of these,
so I'm not sure why we bother distinguishing these in the first place.
---
 llvm/include/llvm/IR/RuntimeLibcalls.td | 43 -
 1 file changed, 42 insertions(+), 1 deletion(-)

diff --git a/llvm/include/llvm/IR/RuntimeLibcalls.td 
b/llvm/include/llvm/IR/RuntimeLibcalls.td
index 04e0ea3ee75a9..7be1b654ca727 100644
--- a/llvm/include/llvm/IR/RuntimeLibcalls.td
+++ b/llvm/include/llvm/IR/RuntimeLibcalls.td
@@ -1508,6 +1508,41 @@ def __aeabi_ddiv : RuntimeLibcallImpl; // 
CallingConv::ARM_AAPCS
 def __aeabi_dmul : RuntimeLibcallImpl; // CallingConv::ARM_AAPCS
 def __aeabi_dsub : RuntimeLibcallImpl; // CallingConv::ARM_AAPCS
 
+defvar AEABIOverrides = [
+  __eqsf2, __eqdf2,
+  __nesf2, __nedf2,
+  __ltsf2, __ltdf2,
+  __lesf2, __ledf2,
+  __gesf2, __gedf2,
+  __gtsf2, __gtdf2,
+  __unordsf2, __unorddf2,
+
+  __addsf3, __adddf3,
+  __divsf3, __divdf3,
+  __mulsf3, __muldf3,
+  __subsf3, __subdf3,
+
+  __fixdfsi, __fixunsdfsi,
+  __fixdfdi, __fixunsdfdi,
+  __fixsfsi, __fixunssfsi,
+  __fixsfdi, __fixunssfdi,
+
+  __floatsidf, __floatunsidf,
+  __floatdidf, __floatundidf,
+  __floatsisf, __floatunsisf,
+  __floatdisf, __floatundisf,
+
+  __muldi3, __ashldi3,
+  __lshrdi3, __ashrdi3,
+
+  __divsi3, __udivsi3
+
+  // Half conversion cases are a mess and handled separately.
+  //  __truncdfsf2, __truncdfhf2,
+  //  __extendsfdf2,
+  //  __truncsfhf2, __extendhfsf2
+];
+
 // Double-precision floating-point comparison helper functions
 // RTABI chapter 4.1.2, Table 3
 def __aeabi_dcmpeq__oeq : RuntimeLibcallImpl; // 
CallingConv::ARM_AAPCS, CmpInst::ICMP_NE
@@ -1793,7 +1828,8 @@ def ARMSystemLibrary
 : SystemRuntimeLibrary,
LibmHasFrexpF32, LibmHasLdexpF32,
LibmHasFrexpF128, LibmHasLdexpF128,
@@ -1812,6 +1848,11 @@ def ARMSystemLibrary
GNUEABIHalfConvertCalls,
ARMDoubleToHalfCalls,
 
+   LibcallImpls<(add AEABIOverrides),
+ RuntimeLibcallPredicate<[{
+   (!hasAEABILibcalls(TT) || !isAAPCS_ABI(TT, ABIName)) &&
+   !TT.isOSWindows()
+}]>>,
// Use divmod compiler-rt calls for iOS 5.0 and later.
LibcallImpls<(add __divmodsi4, __udivmodsi4),
 RuntimeLibcallPredicate<[{TT.isOSBinFormatMachO() &&

___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] ARM: Avoid adding default libcalls overridden by AEABI functions (PR #164983)

2025-10-24 Thread Matt Arsenault via llvm-branch-commits


arsenm wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/164983?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#164983** https://app.graphite.dev/github/pr/llvm/llvm-project/164983?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/164983?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#164591** https://app.graphite.dev/github/pr/llvm/llvm-project/164591?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#164195** https://app.graphite.dev/github/pr/llvm/llvm-project/164195?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#164133** https://app.graphite.dev/github/pr/llvm/llvm-project/164133?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#164044** https://app.graphite.dev/github/pr/llvm/llvm-project/164044?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/164983
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] ARM: Avoid adding default libcalls overridden by AEABI functions (PR #164983)

2025-10-24 Thread Matt Arsenault via llvm-branch-commits


https://github.com/arsenm ready_for_review 
https://github.com/llvm/llvm-project/pull/164983
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] ARM: Avoid adding default libcalls overridden by AEABI functions (PR #164983)

2025-10-24 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-llvm-ir

Author: Matt Arsenault (arsenm)


Changes

Avoids adding alternative libcall impls for the same libcall.

I'm not sure if the default names exist or not, or are just not
preferred. compiler-rt appears to define aliases for all of these,
so I'm not sure why we bother distinguishing these in the first place.

---
Full diff: https://github.com/llvm/llvm-project/pull/164983.diff


1 Files Affected:

- (modified) llvm/include/llvm/IR/RuntimeLibcalls.td (+42-1) 


``diff
diff --git a/llvm/include/llvm/IR/RuntimeLibcalls.td 
b/llvm/include/llvm/IR/RuntimeLibcalls.td
index 04e0ea3ee75a9..7be1b654ca727 100644
--- a/llvm/include/llvm/IR/RuntimeLibcalls.td
+++ b/llvm/include/llvm/IR/RuntimeLibcalls.td
@@ -1508,6 +1508,41 @@ def __aeabi_ddiv : RuntimeLibcallImpl; // 
CallingConv::ARM_AAPCS
 def __aeabi_dmul : RuntimeLibcallImpl; // CallingConv::ARM_AAPCS
 def __aeabi_dsub : RuntimeLibcallImpl; // CallingConv::ARM_AAPCS
 
+defvar AEABIOverrides = [
+  __eqsf2, __eqdf2,
+  __nesf2, __nedf2,
+  __ltsf2, __ltdf2,
+  __lesf2, __ledf2,
+  __gesf2, __gedf2,
+  __gtsf2, __gtdf2,
+  __unordsf2, __unorddf2,
+
+  __addsf3, __adddf3,
+  __divsf3, __divdf3,
+  __mulsf3, __muldf3,
+  __subsf3, __subdf3,
+
+  __fixdfsi, __fixunsdfsi,
+  __fixdfdi, __fixunsdfdi,
+  __fixsfsi, __fixunssfsi,
+  __fixsfdi, __fixunssfdi,
+
+  __floatsidf, __floatunsidf,
+  __floatdidf, __floatundidf,
+  __floatsisf, __floatunsisf,
+  __floatdisf, __floatundisf,
+
+  __muldi3, __ashldi3,
+  __lshrdi3, __ashrdi3,
+
+  __divsi3, __udivsi3
+
+  // Half conversion cases are a mess and handled separately.
+  //  __truncdfsf2, __truncdfhf2,
+  //  __extendsfdf2,
+  //  __truncsfhf2, __extendhfsf2
+];
+
 // Double-precision floating-point comparison helper functions
 // RTABI chapter 4.1.2, Table 3
 def __aeabi_dcmpeq__oeq : RuntimeLibcallImpl; // 
CallingConv::ARM_AAPCS, CmpInst::ICMP_NE
@@ -1793,7 +1828,8 @@ def ARMSystemLibrary
 : SystemRuntimeLibrary,
LibmHasFrexpF32, LibmHasLdexpF32,
LibmHasFrexpF128, LibmHasLdexpF128,
@@ -1812,6 +1848,11 @@ def ARMSystemLibrary
GNUEABIHalfConvertCalls,
ARMDoubleToHalfCalls,
 
+   LibcallImpls<(add AEABIOverrides),
+ RuntimeLibcallPredicate<[{
+   (!hasAEABILibcalls(TT) || !isAAPCS_ABI(TT, ABIName)) &&
+   !TT.isOSWindows()
+}]>>,
// Use divmod compiler-rt calls for iOS 5.0 and later.
LibcallImpls<(add __divmodsi4, __udivmodsi4),
 RuntimeLibcallPredicate<[{TT.isOSBinFormatMachO() &&

``




https://github.com/llvm/llvm-project/pull/164983
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] RuntimeLibcalls: Split lowering decisions into LibcallLoweringInfo (PR #164987)

2025-10-24 Thread Matt Arsenault via llvm-branch-commits


https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/164987

>From 4d1530d61354ceccbfd12c3483adcb3fce07a466 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Tue, 7 Oct 2025 20:00:23 +0900
Subject: [PATCH] RuntimeLibcalls: Split lowering decisions into
 LibcallLoweringInfo

Introduce a new class for the TargetLowering usage. This tracks the
subtarget specific lowering decisions for which libcall to use.
RuntimeLibcallsInfo is a module level property, which may have multiple
implementations of a particular libcall available. This attempts to be
a minimum boilerplate patch to introduce the new concept.

In the future we should have a tablegen way of selecting which
implementations should be used for a subtarget. Currently we
do have some conflicting implementations added, it just happens
to work out that the default cases to prefer is alphabetically
first (plus some of these still are using manual overrides
in TargetLowering constructors).
---
 llvm/include/llvm/CodeGen/TargetLowering.h| 70 ++---
 llvm/include/llvm/IR/RuntimeLibcalls.h| 59 +++---
 llvm/lib/CodeGen/TargetLoweringBase.cpp   | 23 +-
 llvm/lib/IR/RuntimeLibcalls.cpp   |  1 +
 llvm/lib/LTO/LTO.cpp  |  7 +-
 .../WebAssemblyRuntimeLibcallSignatures.cpp   | 15 ++--
 .../Utils/DeclareRuntimeLibcalls.cpp  |  4 +-
 .../RuntimeLibcallEmitter-calling-conv.td | 60 +++---
 .../RuntimeLibcallEmitter-conflict-warning.td | 30 +++
 llvm/test/TableGen/RuntimeLibcallEmitter.td   | 78 +--
 .../Util/DeclareRuntimeLibcalls/basic.ll  |  5 +-
 .../TableGen/Basic/RuntimeLibcallsEmitter.cpp | 19 ++---
 12 files changed, 198 insertions(+), 173 deletions(-)

diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h 
b/llvm/include/llvm/CodeGen/TargetLowering.h
index d6ed3a8f739b3..420292e64743c 100644
--- a/llvm/include/llvm/CodeGen/TargetLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetLowering.h
@@ -193,6 +193,58 @@ struct MemOp {
   }
 };
 
+class LibcallLoweringInfo {
+private:
+  LLVM_ABI const RTLIB::RuntimeLibcallsInfo &RTLCI;
+  /// Stores the implementation choice for each each libcall.
+  LLVM_ABI RTLIB::LibcallImpl LibcallImpls[RTLIB::UNKNOWN_LIBCALL + 1] = {
+  RTLIB::Unsupported};
+
+public:
+  LLVM_ABI LibcallLoweringInfo(const RTLIB::RuntimeLibcallsInfo &RTLCI);
+
+  /// Get the libcall routine name for the specified libcall.
+  // FIXME: This should be removed. Only LibcallImpl should have a name.
+  LLVM_ABI const char *getLibcallName(RTLIB::Libcall Call) const {
+// FIXME: Return StringRef
+return RTLIB::RuntimeLibcallsInfo::getLibcallImplName(LibcallImpls[Call])
+.data();
+  }
+
+  /// Return the lowering's selection of implementation call for \p Call
+  LLVM_ABI RTLIB::LibcallImpl getLibcallImpl(RTLIB::Libcall Call) const {
+return LibcallImpls[Call];
+  }
+
+  /// Rename the default libcall routine name for the specified libcall.
+  LLVM_ABI void setLibcallImpl(RTLIB::Libcall Call, RTLIB::LibcallImpl Impl) {
+LibcallImpls[Call] = Impl;
+  }
+
+  // FIXME: Remove this wrapper in favor of directly using
+  // getLibcallImplCallingConv
+  LLVM_ABI CallingConv::ID getLibcallCallingConv(RTLIB::Libcall Call) const {
+return RTLCI.LibcallImplCallingConvs[LibcallImpls[Call]];
+  }
+
+  /// Get the CallingConv that should be used for the specified libcall.
+  LLVM_ABI CallingConv::ID
+  getLibcallImplCallingConv(RTLIB::LibcallImpl Call) const {
+return RTLCI.LibcallImplCallingConvs[Call];
+  }
+
+  /// Return a function name compatible with RTLIB::MEMCPY, or nullptr if fully
+  /// unsupported.
+  LLVM_ABI StringRef getMemcpyName() const {
+RTLIB::LibcallImpl Memcpy = getLibcallImpl(RTLIB::MEMCPY);
+if (Memcpy != RTLIB::Unsupported)
+  return RTLIB::RuntimeLibcallsInfo::getLibcallImplName(Memcpy);
+
+// Fallback to memmove if memcpy isn't available.
+return getLibcallName(RTLIB::MEMMOVE);
+  }
+};
+
 /// This base class for TargetLowering contains the SelectionDAG-independent
 /// parts that can be used from the rest of CodeGen.
 class LLVM_ABI TargetLoweringBase {
@@ -3577,7 +3629,7 @@ class LLVM_ABI TargetLoweringBase {
   }
 
   const RTLIB::RuntimeLibcallsInfo &getRuntimeLibcallsInfo() const {
-return Libcalls;
+return RuntimeLibcallInfo;
   }
 
   void setLibcallImpl(RTLIB::Libcall Call, RTLIB::LibcallImpl Impl) {
@@ -3590,9 +3642,9 @@ class LLVM_ABI TargetLoweringBase {
   }
 
   /// Get the libcall routine name for the specified libcall.
+  // FIXME: This should be removed. Only LibcallImpl should have a name.
   const char *getLibcallName(RTLIB::Libcall Call) const {
-// FIXME: Return StringRef
-return Libcalls.getLibcallName(Call).data();
+return Libcalls.getLibcallName(Call);
   }
 
   /// Get the libcall routine name for the specified libcall implementation
@@ -3608,7 +3660,7 @@ class LLVM_ABI TargetLoweringBase {
   /// Check if

[llvm-branch-commits] [llvm] [DAGCombiner] Relax nsz constraint with fp->int->fp optimizations (PR #164503)

2025-10-24 Thread Yingwei Zheng via llvm-branch-commits



@@ -18871,6 +18871,37 @@ SDValue DAGCombiner::visitFPOW(SDNode *N) {
 
   return SDValue();
 }
+/// Check if a use of a floating-point operation doesn't care about the sign of
+/// zero. This allows us to optimize (sitofp (fptosi x)) -> ftrunc(x) even
+/// without NoSignedZerosFPMath, as long as all uses are sign-insensitive.
+static bool isSignInsensitiveUse(SDNode *Use, unsigned OperandNo,

dtcxzyw wrote:

Yeah. In ValueTracking we have `canIgnoreSignBitOfZero` and 
`canIgnoreSignBitOfNaN`.

https://github.com/llvm/llvm-project/pull/164503
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [openmp] [OpenMP] Implement omp_get_uid_from_device() / omp_get_device_from_uid() (PR #164392)

2025-10-24 Thread Robert Imschweiler via llvm-branch-commits


https://github.com/ro-i updated https://github.com/llvm/llvm-project/pull/164392

>From 8a60fc14cfedb224a09623d2e6f4299957dc67b7 Mon Sep 17 00:00:00 2001
From: Robert Imschweiler 
Date: Mon, 20 Oct 2025 11:39:19 -0500
Subject: [PATCH 1/8] [OpenMP] Implement omp_get_uid_from_device() /
 omp_get_device_from_uid()

Use the implementation in libomptarget. If libomptarget is not
available, always return the UID / device number of the host / the
initial device.
---
 offload/include/OpenMP/omp.h   |  7 ++
 offload/include/omptarget.h|  2 +
 offload/libomptarget/OpenMP/API.cpp| 55 +
 offload/libomptarget/exports   |  2 +
 offload/test/api/omp_device_uid.c  | 88 +
 openmp/device/include/DeviceTypes.h|  2 +
 openmp/device/include/Interface.h  |  4 +
 openmp/device/src/State.cpp|  6 ++
 openmp/runtime/src/dllexports  |  2 +
 openmp/runtime/src/include/omp.h.var   |  5 ++
 openmp/runtime/src/include/omp_lib.F90.var | 14 
 openmp/runtime/src/include/omp_lib.h.var   | 19 +
 openmp/runtime/src/kmp_ftn_entry.h | 16 +++-
 openmp/runtime/src/kmp_ftn_os.h|  8 ++
 openmp/runtime/test/api/omp_device_uid.c   | 89 ++
 15 files changed, 317 insertions(+), 2 deletions(-)
 create mode 100644 offload/test/api/omp_device_uid.c
 create mode 100644 openmp/runtime/test/api/omp_device_uid.c

diff --git a/offload/include/OpenMP/omp.h b/offload/include/OpenMP/omp.h
index 49d9f1fa75c20..a42724f87cf3a 100644
--- a/offload/include/OpenMP/omp.h
+++ b/offload/include/OpenMP/omp.h
@@ -30,6 +30,13 @@
 
 extern "C" {
 
+/// Definitions
+///{
+
+#define omp_invalid_device -2
+
+///}
+
 /// Type declarations
 ///{
 
diff --git a/offload/include/omptarget.h b/offload/include/omptarget.h
index 89aa468689eaf..78e0d855c11e0 100644
--- a/offload/include/omptarget.h
+++ b/offload/include/omptarget.h
@@ -274,6 +274,8 @@ extern "C" {
 void ompx_dump_mapping_tables(void);
 int omp_get_num_devices(void);
 int omp_get_device_num(void);
+int omp_get_device_from_uid(const char *DeviceUid);
+const char *omp_get_uid_from_device(int DeviceNum);
 int omp_get_initial_device(void);
 void *omp_target_alloc(size_t Size, int DeviceNum);
 void omp_target_free(void *DevicePtr, int DeviceNum);
diff --git a/offload/libomptarget/OpenMP/API.cpp 
b/offload/libomptarget/OpenMP/API.cpp
index 48b086d671285..81ce1623f50a5 100644
--- a/offload/libomptarget/OpenMP/API.cpp
+++ b/offload/libomptarget/OpenMP/API.cpp
@@ -40,6 +40,8 @@ EXTERN void ompx_dump_mapping_tables() {
 using namespace llvm::omp::target::ompt;
 #endif
 
+using GenericDeviceTy = llvm::omp::target::plugin::GenericDeviceTy;
+
 void *targetAllocExplicit(size_t Size, int DeviceNum, int Kind,
   const char *Name);
 void targetFreeExplicit(void *DevicePtr, int DeviceNum, int Kind,
@@ -91,6 +93,59 @@ EXTERN int omp_get_device_num(void) {
   return HostDevice;
 }
 
+EXTERN int omp_get_device_from_uid(const char *DeviceUid) {
+  TIMESCOPE();
+  OMPT_IF_BUILT(ReturnAddressSetterRAII RA(__builtin_return_address(0)));
+
+  if (!DeviceUid) {
+DP("Call to omp_get_device_from_uid returning omp_invalid_device\n");
+return omp_invalid_device;
+  }
+  if (strcmp(DeviceUid, GenericDeviceTy::getHostDeviceUid()) == 0) {
+DP("Call to omp_get_device_from_uid returning host device number %d\n",
+   omp_get_initial_device());
+return omp_get_initial_device();
+  }
+
+  int DeviceNum = omp_invalid_device;
+
+  auto ExclusiveDevicesAccessor = PM->getExclusiveDevicesAccessor();
+  for (const DeviceTy &Device : PM->devices(ExclusiveDevicesAccessor)) {
+const char *Uid = Device.RTL->getDevice(Device.RTLDeviceID).getDeviceUid();
+if (Uid && strcmp(DeviceUid, Uid) == 0) {
+  DeviceNum = Device.DeviceID;
+  break;
+}
+  }
+
+  DP("Call to omp_get_device_from_uid returning %d\n", DeviceNum);
+  return DeviceNum;
+}
+
+EXTERN const char *omp_get_uid_from_device(int DeviceNum) {
+  TIMESCOPE();
+  OMPT_IF_BUILT(ReturnAddressSetterRAII RA(__builtin_return_address(0)));
+
+  if (DeviceNum == omp_invalid_device) {
+DP("Call to omp_get_uid_from_device returning nullptr\n");
+return nullptr;
+  }
+  if (DeviceNum == omp_get_initial_device()) {
+DP("Call to omp_get_uid_from_device returning host device UID\n");
+return GenericDeviceTy::getHostDeviceUid();
+  }
+
+  llvm::Expected Device = PM->getDevice(DeviceNum);
+  if (!Device) {
+FATAL_MESSAGE(DeviceNum, "%s", toString(Device.takeError()).c_str());
+return nullptr;
+  }
+
+  const char *Uid = Device->RTL->getDevice(Device->RTLDeviceID).getDeviceUid();
+  DP("Call to omp_get_uid_from_device returning %s\n", Uid);
+  return Uid;
+}
+
 EXTERN int omp_get_initial_device(void) {
   TIMESCOPE();
   OMPT_IF_BUILT(ReturnAddressSetterRAII RA(__builtin_return_address(0)));
diff --git a/offload/libomptarget/exports b/offload/libomptarget/exports
i

[llvm-branch-commits] [flang] [flang][OpenACC] lower acc loops with early exits (PR #164992)

2025-10-24 Thread Valentin Clement バレンタインクレメン via llvm-branch-commits


https://github.com/clementval approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/164992
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [openmp] [OpenMP][Offload] Add offload runtime support for dyn_groupprivate clause (PR #152831)

2025-10-24 Thread Kevin Sala Penades via llvm-branch-commits



@@ -107,7 +107,7 @@ enum TargetAllocTy : int32_t {
 
 inline KernelArgsTy CTorDTorKernelArgs = {1,   0,   nullptr,   nullptr,
 nullptr, nullptr, nullptr,   nullptr,
-0,  {0,0,0},   {1, 0, 0}, {1, 0, 0}, 0};
+0,  {0,0,0,0},   {1, 0, 0}, {1, 0, 0}, 0};

kevinsala wrote:

I'll add comments in a separate PR.

https://github.com/llvm/llvm-project/pull/152831
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

83 matches

Mail list logo