[llvm-branch-commits] [flang] [mlir] [MLIR][OpenMP] Simplify OpenMP device codegen (PR #137201)

2025-04-24 Thread Sergio Afonso via llvm-branch-commits

https://github.com/skatrak created 
https://github.com/llvm/llvm-project/pull/137201

After removing host operations from the device MLIR module, it is no longer 
necessary to provide special codegen logic to prevent these operations from 
causing compiler crashes or miscompilations.

This patch removes these now unnecessary code paths to simplify codegen logic. 
Some MLIR tests are now replaced with Flang tests, since the responsibility of 
dealing with host operations has been moved earlier in the compilation flow.

MLIR tests holding target device modules are updated to no longer include now 
unsupported host operations.

>From 174401a11e313c6b1ebe80552c59ea59a0c6a4de Mon Sep 17 00:00:00 2001
From: Sergio Afonso 
Date: Tue, 22 Apr 2025 12:04:45 +0100
Subject: [PATCH] [MLIR][OpenMP] Simplify OpenMP device codegen

After removing host operations from the device MLIR module, it is no longer
necessary to provide special codegen logic to prevent these operations from
causing compiler crashes or miscompilations.

This patch removes these now unnecessary code paths to simplify codegen logic.
Some MLIR tests are now replaced with Flang tests, since the responsibility of
dealing with host operations has been moved earlier in the compilation flow.

MLIR tests holding target device modules are updated to no longer include now
unsupported host operations.
---
 .../OpenMP/target-nesting-in-host-ops.f90 |  87 
 .../Integration/OpenMP/task-target-device.f90 |  37 ++
 .../OpenMP/threadprivate-target-device.f90|  40 ++
 .../OpenMP/OpenMPToLLVMIRTranslation.cpp  | 423 +++---
 ...arget-constant-indexing-device-region.mlir |  25 +-
 .../Target/LLVMIR/omptarget-debug-var-1.mlir  |  17 +-
 .../omptarget-memcpy-align-metadata.mlir  |  61 +--
 .../LLVMIR/omptarget-target-inside-task.mlir  |  40 --
 ...ptarget-threadprivate-device-lowering.mlir |  30 --
 .../Target/LLVMIR/openmp-llvm-invalid.mlir|  45 ++
 .../openmp-target-nesting-in-host-ops.mlir| 156 ---
 .../LLVMIR/openmp-task-target-device.mlir |  26 --
 12 files changed, 408 insertions(+), 579 deletions(-)
 create mode 100644 flang/test/Integration/OpenMP/target-nesting-in-host-ops.f90
 create mode 100644 flang/test/Integration/OpenMP/task-target-device.f90
 create mode 100644 
flang/test/Integration/OpenMP/threadprivate-target-device.f90
 delete mode 100644 mlir/test/Target/LLVMIR/omptarget-target-inside-task.mlir
 delete mode 100644 
mlir/test/Target/LLVMIR/omptarget-threadprivate-device-lowering.mlir
 delete mode 100644 
mlir/test/Target/LLVMIR/openmp-target-nesting-in-host-ops.mlir
 delete mode 100644 mlir/test/Target/LLVMIR/openmp-task-target-device.mlir

diff --git a/flang/test/Integration/OpenMP/target-nesting-in-host-ops.f90 
b/flang/test/Integration/OpenMP/target-nesting-in-host-ops.f90
new file mode 100644
index 0..8c85a3c1784ed
--- /dev/null
+++ b/flang/test/Integration/OpenMP/target-nesting-in-host-ops.f90
@@ -0,0 +1,87 @@
+!===--===!
+! This directory can be used to add Integration tests involving multiple
+! stages of the compiler (for eg. from Fortran to LLVM IR). It should not
+! contain executable tests. We should only add tests here sparingly and only
+! if there is no other way to test. Repeat this message in each test that is
+! added to this directory and sub-directories.
+!===--===!
+
+!REQUIRES: amdgpu-registered-target
+!RUN: %flang_fc1 -triple amdgcn-amd-amdhsa -emit-llvm -fopenmp 
-fopenmp-version=50 -fopenmp-is-target-device %s -o - | FileCheck %s
+
+! CHECK-NOT: define void @nested_target_in_parallel
+! CHECK: define weak_odr protected amdgpu_kernel void 
@__omp_offloading_{{.*}}_nested_target_in_parallel_{{.*}}(ptr %{{.*}}, ptr 
%{{.*}})
+subroutine nested_target_in_parallel(v)
+  implicit none
+  integer, intent(inout) :: v(10)
+
+  !$omp parallel
+!$omp target map(tofrom: v)
+!$omp end target
+  !$omp end parallel
+end subroutine
+
+! CHECK-NOT: define void @nested_target_in_wsloop
+! CHECK: define weak_odr protected amdgpu_kernel void 
@__omp_offloading_{{.*}}_nested_target_in_wsloop_{{.*}}(ptr %{{.*}}, ptr 
%{{.*}})
+subroutine nested_target_in_wsloop(v)
+  implicit none
+  integer, intent(inout) :: v(10)
+  integer :: i
+
+  !$omp do
+  do i=1, 10
+!$omp target map(tofrom: v)
+!$omp end target
+  end do
+end subroutine
+
+! CHECK-NOT: define void @nested_target_in_parallel_with_private
+! CHECK: define weak_odr protected amdgpu_kernel void 
@__omp_offloading_{{.*}}_nested_target_in_parallel_with_private_{{.*}}(ptr 
%{{.*}}, ptr %{{.*}}, ptr %{{.*}})
+subroutine nested_target_in_parallel_with_private(v)
+  implicit none
+  integer, intent(inout) :: v(10)
+  integer :: x
+  x = 10
+
+  !$omp parallel firstprivate(x)
+!$omp target map(tofrom: v(1:x))
+!$omp end target
+  !$omp end parallel
+end subroutine
+
+! CHECK-NOT: defi

[llvm-branch-commits] [flang] [mlir] [MLIR][OpenMP] Simplify OpenMP device codegen (PR #137201)

2025-04-24 Thread via llvm-branch-commits

llvmbot wrote:



@llvm/pr-subscribers-flang-openmp
@llvm/pr-subscribers-mlir-llvm

@llvm/pr-subscribers-mlir

Author: Sergio Afonso (skatrak)


Changes

After removing host operations from the device MLIR module, it is no longer 
necessary to provide special codegen logic to prevent these operations from 
causing compiler crashes or miscompilations.

This patch removes these now unnecessary code paths to simplify codegen logic. 
Some MLIR tests are now replaced with Flang tests, since the responsibility of 
dealing with host operations has been moved earlier in the compilation flow.

MLIR tests holding target device modules are updated to no longer include now 
unsupported host operations.

---

Patch is 52.28 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/137201.diff


12 Files Affected:

- (added) flang/test/Integration/OpenMP/target-nesting-in-host-ops.f90 (+87) 
- (added) flang/test/Integration/OpenMP/task-target-device.f90 (+37) 
- (added) flang/test/Integration/OpenMP/threadprivate-target-device.f90 (+40) 
- (modified) 
mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+159-264) 
- (modified) 
mlir/test/Target/LLVMIR/omptarget-constant-indexing-device-region.mlir (+10-15) 
- (modified) mlir/test/Target/LLVMIR/omptarget-debug-var-1.mlir (+6-11) 
- (modified) mlir/test/Target/LLVMIR/omptarget-memcpy-align-metadata.mlir 
(+24-37) 
- (removed) mlir/test/Target/LLVMIR/omptarget-target-inside-task.mlir (-40) 
- (removed) 
mlir/test/Target/LLVMIR/omptarget-threadprivate-device-lowering.mlir (-30) 
- (modified) mlir/test/Target/LLVMIR/openmp-llvm-invalid.mlir (+45) 
- (removed) mlir/test/Target/LLVMIR/openmp-target-nesting-in-host-ops.mlir 
(-156) 
- (removed) mlir/test/Target/LLVMIR/openmp-task-target-device.mlir (-26) 


``diff
diff --git a/flang/test/Integration/OpenMP/target-nesting-in-host-ops.f90 
b/flang/test/Integration/OpenMP/target-nesting-in-host-ops.f90
new file mode 100644
index 0..8c85a3c1784ed
--- /dev/null
+++ b/flang/test/Integration/OpenMP/target-nesting-in-host-ops.f90
@@ -0,0 +1,87 @@
+!===--===!
+! This directory can be used to add Integration tests involving multiple
+! stages of the compiler (for eg. from Fortran to LLVM IR). It should not
+! contain executable tests. We should only add tests here sparingly and only
+! if there is no other way to test. Repeat this message in each test that is
+! added to this directory and sub-directories.
+!===--===!
+
+!REQUIRES: amdgpu-registered-target
+!RUN: %flang_fc1 -triple amdgcn-amd-amdhsa -emit-llvm -fopenmp 
-fopenmp-version=50 -fopenmp-is-target-device %s -o - | FileCheck %s
+
+! CHECK-NOT: define void @nested_target_in_parallel
+! CHECK: define weak_odr protected amdgpu_kernel void 
@__omp_offloading_{{.*}}_nested_target_in_parallel_{{.*}}(ptr %{{.*}}, ptr 
%{{.*}})
+subroutine nested_target_in_parallel(v)
+  implicit none
+  integer, intent(inout) :: v(10)
+
+  !$omp parallel
+!$omp target map(tofrom: v)
+!$omp end target
+  !$omp end parallel
+end subroutine
+
+! CHECK-NOT: define void @nested_target_in_wsloop
+! CHECK: define weak_odr protected amdgpu_kernel void 
@__omp_offloading_{{.*}}_nested_target_in_wsloop_{{.*}}(ptr %{{.*}}, ptr 
%{{.*}})
+subroutine nested_target_in_wsloop(v)
+  implicit none
+  integer, intent(inout) :: v(10)
+  integer :: i
+
+  !$omp do
+  do i=1, 10
+!$omp target map(tofrom: v)
+!$omp end target
+  end do
+end subroutine
+
+! CHECK-NOT: define void @nested_target_in_parallel_with_private
+! CHECK: define weak_odr protected amdgpu_kernel void 
@__omp_offloading_{{.*}}_nested_target_in_parallel_with_private_{{.*}}(ptr 
%{{.*}}, ptr %{{.*}}, ptr %{{.*}})
+subroutine nested_target_in_parallel_with_private(v)
+  implicit none
+  integer, intent(inout) :: v(10)
+  integer :: x
+  x = 10
+
+  !$omp parallel firstprivate(x)
+!$omp target map(tofrom: v(1:x))
+!$omp end target
+  !$omp end parallel
+end subroutine
+
+! CHECK-NOT: define void @nested_target_in_task_with_private
+! CHECK: define weak_odr protected amdgpu_kernel void 
@__omp_offloading_{{.*}}_nested_target_in_task_with_private_{{.*}}(ptr %{{.*}}, 
ptr %{{.*}}, ptr %{{.*}})
+subroutine nested_target_in_task_with_private(v)
+  implicit none
+  integer, intent(inout) :: v(10)
+  integer :: x
+  x = 10
+
+  !$omp task firstprivate(x)
+!$omp target map(tofrom: v(1:x))
+!$omp end target
+  !$omp end task
+end subroutine
+
+! CHECK-NOT: define void @target_and_atomic_update
+! CHECK: define weak_odr protected amdgpu_kernel void 
@__omp_offloading_{{.*}}_target_and_atomic_update_{{.*}}(ptr %{{.*}})
+subroutine target_and_atomic_update(x, expr)
+  implicit none
+  integer, intent(inout) :: x, expr
+
+  !$omp target
+  !$omp end target
+
+  !$omp atomic update
+  x = x + expr
+end subroutine
+
+! CHECK-NOT: define

[llvm-branch-commits] [mlir] [MLIR][OpenMP] Assert on map translation functions, NFC (PR #137199)

2025-04-24 Thread via llvm-branch-commits

llvmbot wrote:



@llvm/pr-subscribers-flang-openmp

@llvm/pr-subscribers-mlir

Author: Sergio Afonso (skatrak)


Changes

This patch adds assertions to map-related MLIR to LLVM IR translation functions 
and utils to explicitly document whether they are intended for host or device 
compilation only.

Over time, map-related handling has increased in complexity. This is compounded 
by the fact that some handling is device-specific and some is host-specific. By 
explicitly asserting on these functions on the expected compilation pass, the 
flow should become slighlty easier to follow.

---
Full diff: https://github.com/llvm/llvm-project/pull/137199.diff


1 Files Affected:

- (modified) 
mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+22-2) 


``diff
diff --git 
a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp 
b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
index 52aa1fbfab2c1..6d80c66e3596e 100644
--- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
@@ -3563,6 +3563,9 @@ static llvm::omp::OpenMPOffloadMappingFlags 
mapParentWithMembers(
 LLVM::ModuleTranslation &moduleTranslation, llvm::IRBuilderBase &builder,
 llvm::OpenMPIRBuilder &ompBuilder, DataLayout &dl, MapInfosTy 
&combinedInfo,
 MapInfoData &mapData, uint64_t mapDataIndex, bool isTargetParams) {
+  assert(!ompBuilder.Config.isTargetDevice() &&
+ "function only supported for host device codegen");
+
   // Map the first segment of our structure
   combinedInfo.Types.emplace_back(
   isTargetParams
@@ -3671,6 +3674,8 @@ static void processMapMembersWithParent(
 llvm::OpenMPIRBuilder &ompBuilder, DataLayout &dl, MapInfosTy 
&combinedInfo,
 MapInfoData &mapData, uint64_t mapDataIndex,
 llvm::omp::OpenMPOffloadMappingFlags memberOfFlag) {
+  assert(!ompBuilder.Config.isTargetDevice() &&
+ "function only supported for host device codegen");
 
   auto parentClause =
   llvm::cast(mapData.MapClause[mapDataIndex]);
@@ -3784,6 +3789,9 @@ static void 
processMapWithMembersOf(LLVM::ModuleTranslation &moduleTranslation,
 DataLayout &dl, MapInfosTy &combinedInfo,
 MapInfoData &mapData, uint64_t 
mapDataIndex,
 bool isTargetParams) {
+  assert(!ompBuilder.Config.isTargetDevice() &&
+ "function only supported for host device codegen");
+
   auto parentClause =
   llvm::cast(mapData.MapClause[mapDataIndex]);
 
@@ -3825,6 +3833,8 @@ static void
 createAlteredByCaptureMap(MapInfoData &mapData,
   LLVM::ModuleTranslation &moduleTranslation,
   llvm::IRBuilderBase &builder) {
+  assert(!moduleTranslation.getOpenMPBuilder()->Config.isTargetDevice() &&
+ "function only supported for host device codegen");
   for (size_t i = 0; i < mapData.MapClause.size(); ++i) {
 // if it's declare target, skip it, it's handled separately.
 if (!mapData.IsDeclareTarget[i]) {
@@ -3889,6 +3899,9 @@ static void genMapInfos(llvm::IRBuilderBase &builder,
 LLVM::ModuleTranslation &moduleTranslation,
 DataLayout &dl, MapInfosTy &combinedInfo,
 MapInfoData &mapData, bool isTargetParams = false) {
+  assert(!moduleTranslation.getOpenMPBuilder()->Config.isTargetDevice() &&
+ "function only supported for host device codegen");
+
   // We wish to modify some of the methods in which arguments are
   // passed based on their capture type by the target region, this can
   // involve generating new loads and stores, which changes the
@@ -3900,8 +3913,7 @@ static void genMapInfos(llvm::IRBuilderBase &builder,
   // kernel arg structure. It primarily becomes relevant in cases like
   // bycopy, or byref range'd arrays. In the default case, we simply
   // pass thee pointer byref as both basePointer and pointer.
-  if (!moduleTranslation.getOpenMPBuilder()->Config.isTargetDevice())
-createAlteredByCaptureMap(mapData, moduleTranslation, builder);
+  createAlteredByCaptureMap(mapData, moduleTranslation, builder);
 
   llvm::OpenMPIRBuilder *ompBuilder = moduleTranslation.getOpenMPBuilder();
 
@@ -3935,6 +3947,8 @@ emitUserDefinedMapper(Operation *declMapperOp, 
llvm::IRBuilderBase &builder,
 static llvm::Expected
 getOrCreateUserDefinedMapperFunc(Operation *op, llvm::IRBuilderBase &builder,
  LLVM::ModuleTranslation &moduleTranslation) {
+  assert(!moduleTranslation.getOpenMPBuilder()->Config.isTargetDevice() &&
+ "function only supported for host device codegen");
   auto declMapperOp = cast(op);
   std::string mapperFuncName =
   moduleTranslation.getOpenMPBuilder()->createPlatformSpecificName(
@@ -3951,6 +3965,8 @@ static llvm::Expected
 emitUserDefinedMapper(Operation *op, llvm::IRBui

[llvm-branch-commits] [flang] [Flang][OpenMP] Minimize host ops remaining in device compilation (PR #137200)

2025-04-24 Thread via llvm-branch-commits

llvmbot wrote:



@llvm/pr-subscribers-flang-fir-hlfir

@llvm/pr-subscribers-flang-openmp

Author: Sergio Afonso (skatrak)


Changes

This patch updates the function filtering OpenMP pass intended to remove host 
functions from the MLIR module created by Flang lowering when targeting an 
OpenMP target device.

Host functions holding target regions must be kept, so that the target regions 
within them can be translated for the device. The issue is that non-target 
operations inside these functions cannot be discarded because some of them hold 
information that is also relevant during target device codegen. Specifically, 
mapping information resides outside of `omp.target` regions.

This patch updates the previous behavior where all host operations were 
preserved to then ignore all of those that are not actually needed by target 
device codegen. This, in practice, means only keeping target regions and 
mapping information needed by the device. Arguments for some of these remaining 
operations are replaced by placeholder allocations and `fir.undefined`, since 
they are only actually defined inside of the target regions themselves.

As a result, this set of changes makes it possible to later simplify target 
device codegen, as it is no longer necessary to handle host operations 
differently to avoid issues.

---

Patch is 50.50 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/137200.diff


7 Files Affected:

- (modified) flang/include/flang/Optimizer/OpenMP/Passes.td (+2-1) 
- (modified) flang/lib/Optimizer/OpenMP/FunctionFiltering.cpp (+288) 
- (modified) flang/test/Lower/OpenMP/declare-target-link-tarop-cap.f90 (+10-9) 
- (modified) flang/test/Lower/OpenMP/host-eval.f90 (+37-18) 
- (modified) flang/test/Lower/OpenMP/real10.f90 (+1-4) 
- (added) flang/test/Transforms/OpenMP/function-filtering-host-ops.mlir (+400) 
- (renamed) flang/test/Transforms/OpenMP/function-filtering.mlir () 


``diff
diff --git a/flang/include/flang/Optimizer/OpenMP/Passes.td 
b/flang/include/flang/Optimizer/OpenMP/Passes.td
index fcc7a4ca31fef..dcc97122efdf7 100644
--- a/flang/include/flang/Optimizer/OpenMP/Passes.td
+++ b/flang/include/flang/Optimizer/OpenMP/Passes.td
@@ -46,7 +46,8 @@ def FunctionFilteringPass : Pass<"omp-function-filtering"> {
 "for the target device.";
   let dependentDialects = [
 "mlir::func::FuncDialect",
-"fir::FIROpsDialect"
+"fir::FIROpsDialect",
+"mlir::omp::OpenMPDialect"
   ];
 }
 
diff --git a/flang/lib/Optimizer/OpenMP/FunctionFiltering.cpp 
b/flang/lib/Optimizer/OpenMP/FunctionFiltering.cpp
index 9554808824ac3..9e11df77506d6 100644
--- a/flang/lib/Optimizer/OpenMP/FunctionFiltering.cpp
+++ b/flang/lib/Optimizer/OpenMP/FunctionFiltering.cpp
@@ -13,12 +13,14 @@
 
 #include "flang/Optimizer/Dialect/FIRDialect.h"
 #include "flang/Optimizer/Dialect/FIROpsSupport.h"
+#include "flang/Optimizer/HLFIR/HLFIROps.h"
 #include "flang/Optimizer/OpenMP/Passes.h"
 
 #include "mlir/Dialect/Func/IR/FuncOps.h"
 #include "mlir/Dialect/OpenMP/OpenMPDialect.h"
 #include "mlir/Dialect/OpenMP/OpenMPInterfaces.h"
 #include "mlir/IR/BuiltinOps.h"
+#include "llvm/ADT/SetVector.h"
 #include "llvm/ADT/SmallVector.h"
 
 namespace flangomp {
@@ -94,6 +96,12 @@ class FunctionFilteringPass
   funcOp.erase();
   return WalkResult::skip();
 }
+
+if (failed(rewriteHostRegion(funcOp.getRegion( {
+  funcOp.emitOpError() << "could not be rewritten for target device";
+  return WalkResult::interrupt();
+}
+
 if (declareTargetOp)
   declareTargetOp.setDeclareTarget(declareType,

omp::DeclareTargetCaptureClause::to);
@@ -101,5 +109,285 @@ class FunctionFilteringPass
   return WalkResult::advance();
 });
   }
+
+private:
+  /// Add the given \c omp.map.info to a sorted set while taking into account
+  /// its dependencies.
+  static void collectMapInfos(omp::MapInfoOp mapOp, Region ®ion,
+  llvm::SetVector &mapInfos) {
+for (Value member : mapOp.getMembers())
+  collectMapInfos(cast(member.getDefiningOp()), region,
+  mapInfos);
+
+if (region.isAncestor(mapOp->getParentRegion()))
+  mapInfos.insert(mapOp);
+  }
+
+  /// Add the given value to a sorted set if it should be replaced by a
+  /// placeholder when used as a pointer-like argument to an operation
+  /// participating in the initialization of an \c omp.map.info.
+  static void markPtrOperandForRewrite(Value value,
+   llvm::SetVector &rewriteValues) {
+// We don't need to rewrite operands if they are defined by block arguments
+// of operations that will still remain after the region is rewritten.
+if (isa(value) &&
+isa(
+cast(value).getOwner()->getParentOp()))
+  return;
+
+rewriteValues.insert(value);
+  }
+
+  /// Rewrite the given host

[llvm-branch-commits] [flang] [Flang][OpenMP] Minimize host ops remaining in device compilation (PR #137200)

2025-04-24 Thread Sergio Afonso via llvm-branch-commits

skatrak wrote:

PR stack:
- #137198
- #137199
- #137200
- #137201

https://github.com/llvm/llvm-project/pull/137200
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [mlir] [MLIR][OpenMP] Assert on map translation functions, NFC (PR #137199)

2025-04-24 Thread Sergio Afonso via llvm-branch-commits

skatrak wrote:

PR stack:
- #137198
- #137199
- #137200
- #137201

https://github.com/llvm/llvm-project/pull/137199
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [mlir] [MLIR][OpenMP] Assert on map translation functions, NFC (PR #137199)

2025-04-24 Thread Sergio Afonso via llvm-branch-commits

https://github.com/skatrak created 
https://github.com/llvm/llvm-project/pull/137199

This patch adds assertions to map-related MLIR to LLVM IR translation functions 
and utils to explicitly document whether they are intended for host or device 
compilation only.

Over time, map-related handling has increased in complexity. This is compounded 
by the fact that some handling is device-specific and some is host-specific. By 
explicitly asserting on these functions on the expected compilation pass, the 
flow should become slighlty easier to follow.

>From b70cbfaa049fc215b467d325918570afabda4939 Mon Sep 17 00:00:00 2001
From: Sergio Afonso 
Date: Fri, 11 Apr 2025 13:40:14 +0100
Subject: [PATCH] [MLIR][OpenMP] Assert on map translation functions, NFC

This patch adds assertions to map-related MLIR to LLVM IR translation functions
and utils to explicitly document whether they are intended for host or device
compilation only.

Over time, map-related handling has increased in complexity. This is compounded
by the fact that some handling is device-specific and some is host-specific. By
explicitly asserting on these functions on the expected compilation pass, the
flow should become slighlty easier to follow.
---
 .../OpenMP/OpenMPToLLVMIRTranslation.cpp  | 24 +--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git 
a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp 
b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
index 52aa1fbfab2c1..6d80c66e3596e 100644
--- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
@@ -3563,6 +3563,9 @@ static llvm::omp::OpenMPOffloadMappingFlags 
mapParentWithMembers(
 LLVM::ModuleTranslation &moduleTranslation, llvm::IRBuilderBase &builder,
 llvm::OpenMPIRBuilder &ompBuilder, DataLayout &dl, MapInfosTy 
&combinedInfo,
 MapInfoData &mapData, uint64_t mapDataIndex, bool isTargetParams) {
+  assert(!ompBuilder.Config.isTargetDevice() &&
+ "function only supported for host device codegen");
+
   // Map the first segment of our structure
   combinedInfo.Types.emplace_back(
   isTargetParams
@@ -3671,6 +3674,8 @@ static void processMapMembersWithParent(
 llvm::OpenMPIRBuilder &ompBuilder, DataLayout &dl, MapInfosTy 
&combinedInfo,
 MapInfoData &mapData, uint64_t mapDataIndex,
 llvm::omp::OpenMPOffloadMappingFlags memberOfFlag) {
+  assert(!ompBuilder.Config.isTargetDevice() &&
+ "function only supported for host device codegen");
 
   auto parentClause =
   llvm::cast(mapData.MapClause[mapDataIndex]);
@@ -3784,6 +3789,9 @@ static void 
processMapWithMembersOf(LLVM::ModuleTranslation &moduleTranslation,
 DataLayout &dl, MapInfosTy &combinedInfo,
 MapInfoData &mapData, uint64_t 
mapDataIndex,
 bool isTargetParams) {
+  assert(!ompBuilder.Config.isTargetDevice() &&
+ "function only supported for host device codegen");
+
   auto parentClause =
   llvm::cast(mapData.MapClause[mapDataIndex]);
 
@@ -3825,6 +3833,8 @@ static void
 createAlteredByCaptureMap(MapInfoData &mapData,
   LLVM::ModuleTranslation &moduleTranslation,
   llvm::IRBuilderBase &builder) {
+  assert(!moduleTranslation.getOpenMPBuilder()->Config.isTargetDevice() &&
+ "function only supported for host device codegen");
   for (size_t i = 0; i < mapData.MapClause.size(); ++i) {
 // if it's declare target, skip it, it's handled separately.
 if (!mapData.IsDeclareTarget[i]) {
@@ -3889,6 +3899,9 @@ static void genMapInfos(llvm::IRBuilderBase &builder,
 LLVM::ModuleTranslation &moduleTranslation,
 DataLayout &dl, MapInfosTy &combinedInfo,
 MapInfoData &mapData, bool isTargetParams = false) {
+  assert(!moduleTranslation.getOpenMPBuilder()->Config.isTargetDevice() &&
+ "function only supported for host device codegen");
+
   // We wish to modify some of the methods in which arguments are
   // passed based on their capture type by the target region, this can
   // involve generating new loads and stores, which changes the
@@ -3900,8 +3913,7 @@ static void genMapInfos(llvm::IRBuilderBase &builder,
   // kernel arg structure. It primarily becomes relevant in cases like
   // bycopy, or byref range'd arrays. In the default case, we simply
   // pass thee pointer byref as both basePointer and pointer.
-  if (!moduleTranslation.getOpenMPBuilder()->Config.isTargetDevice())
-createAlteredByCaptureMap(mapData, moduleTranslation, builder);
+  createAlteredByCaptureMap(mapData, moduleTranslation, builder);
 
   llvm::OpenMPIRBuilder *ompBuilder = moduleTranslation.getOpenMPBuilder();
 
@@ -3935,6 +3947,8 @@ emitUserDefinedMapper(Operation *declMapperOp, 
llvm::IRBuilderBase &bu

[llvm-branch-commits] [flang] [Flang][OpenMP] Minimize host ops remaining in device compilation (PR #137200)

2025-04-24 Thread Sergio Afonso via llvm-branch-commits

https://github.com/skatrak created 
https://github.com/llvm/llvm-project/pull/137200

This patch updates the function filtering OpenMP pass intended to remove host 
functions from the MLIR module created by Flang lowering when targeting an 
OpenMP target device.

Host functions holding target regions must be kept, so that the target regions 
within them can be translated for the device. The issue is that non-target 
operations inside these functions cannot be discarded because some of them hold 
information that is also relevant during target device codegen. Specifically, 
mapping information resides outside of `omp.target` regions.

This patch updates the previous behavior where all host operations were 
preserved to then ignore all of those that are not actually needed by target 
device codegen. This, in practice, means only keeping target regions and 
mapping information needed by the device. Arguments for some of these remaining 
operations are replaced by placeholder allocations and `fir.undefined`, since 
they are only actually defined inside of the target regions themselves.

As a result, this set of changes makes it possible to later simplify target 
device codegen, as it is no longer necessary to handle host operations 
differently to avoid issues.

>From c6c8443710c59e765e37c8a21267fe619f9f792a Mon Sep 17 00:00:00 2001
From: Sergio Afonso 
Date: Tue, 15 Apr 2025 16:59:18 +0100
Subject: [PATCH] [Flang][OpenMP] Minimize host ops remaining in device
 compilation

This patch updates the function filtering OpenMP pass intended to remove host
functions from the MLIR module created by Flang lowering when targeting an
OpenMP target device.

Host functions holding target regions must be kept, so that the target regions
within them can be translated for the device. The issue is that non-target
operations inside these functions cannot be discarded because some of them hold
information that is also relevant during target device codegen. Specifically,
mapping information resides outside of `omp.target` regions.

This patch updates the previous behavior where all host operations were
preserved to then ignore all of those that are not actually needed by target
device codegen. This, in practice, means only keeping target regions and mapping
information needed by the device. Arguments for some of these remaining
operations are replaced by placeholder allocations and `fir.undefined`, since
they are only actually defined inside of the target regions themselves.

As a result, this set of changes makes it possible to later simplify target
device codegen, as it is no longer necessary to handle host operations
differently to avoid issues.
---
 .../include/flang/Optimizer/OpenMP/Passes.td  |   3 +-
 .../Optimizer/OpenMP/FunctionFiltering.cpp| 288 +
 .../OpenMP/declare-target-link-tarop-cap.f90  |  19 +-
 flang/test/Lower/OpenMP/host-eval.f90 |  55 ++-
 flang/test/Lower/OpenMP/real10.f90|   5 +-
 .../OpenMP/function-filtering-host-ops.mlir   | 400 ++
 .../function-filtering.mlir}  |   0
 7 files changed, 738 insertions(+), 32 deletions(-)
 create mode 100644 
flang/test/Transforms/OpenMP/function-filtering-host-ops.mlir
 rename flang/test/Transforms/{omp-function-filtering.mlir => 
OpenMP/function-filtering.mlir} (100%)

diff --git a/flang/include/flang/Optimizer/OpenMP/Passes.td 
b/flang/include/flang/Optimizer/OpenMP/Passes.td
index fcc7a4ca31fef..dcc97122efdf7 100644
--- a/flang/include/flang/Optimizer/OpenMP/Passes.td
+++ b/flang/include/flang/Optimizer/OpenMP/Passes.td
@@ -46,7 +46,8 @@ def FunctionFilteringPass : Pass<"omp-function-filtering"> {
 "for the target device.";
   let dependentDialects = [
 "mlir::func::FuncDialect",
-"fir::FIROpsDialect"
+"fir::FIROpsDialect",
+"mlir::omp::OpenMPDialect"
   ];
 }
 
diff --git a/flang/lib/Optimizer/OpenMP/FunctionFiltering.cpp 
b/flang/lib/Optimizer/OpenMP/FunctionFiltering.cpp
index 9554808824ac3..9e11df77506d6 100644
--- a/flang/lib/Optimizer/OpenMP/FunctionFiltering.cpp
+++ b/flang/lib/Optimizer/OpenMP/FunctionFiltering.cpp
@@ -13,12 +13,14 @@
 
 #include "flang/Optimizer/Dialect/FIRDialect.h"
 #include "flang/Optimizer/Dialect/FIROpsSupport.h"
+#include "flang/Optimizer/HLFIR/HLFIROps.h"
 #include "flang/Optimizer/OpenMP/Passes.h"
 
 #include "mlir/Dialect/Func/IR/FuncOps.h"
 #include "mlir/Dialect/OpenMP/OpenMPDialect.h"
 #include "mlir/Dialect/OpenMP/OpenMPInterfaces.h"
 #include "mlir/IR/BuiltinOps.h"
+#include "llvm/ADT/SetVector.h"
 #include "llvm/ADT/SmallVector.h"
 
 namespace flangomp {
@@ -94,6 +96,12 @@ class FunctionFilteringPass
   funcOp.erase();
   return WalkResult::skip();
 }
+
+if (failed(rewriteHostRegion(funcOp.getRegion( {
+  funcOp.emitOpError() << "could not be rewritten for target device";
+  return WalkResult::interrupt();
+}
+
 if (declareTargetOp)
   declareTargetOp.s

[llvm-branch-commits] [llvm] [mlir] [mlir][OpenMP] Convert omp.cancellation_point to LLVMIR (PR #137205)

2025-04-24 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-mlir-llvm

Author: Tom Eccles (tblah)


Changes

This is basically identical to cancel except without the if clause.

taskgroup will be implemented in a followup PR.

---
Full diff: https://github.com/llvm/llvm-project/pull/137205.diff


5 Files Affected:

- (modified) llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h (+10) 
- (modified) llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp (+51) 
- (modified) 
mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+34-3) 
- (added) mlir/test/Target/LLVMIR/openmp-cancellation-point.mlir (+188) 
- (modified) mlir/test/Target/LLVMIR/openmp-todo.mlir (+10-6) 


``diff
diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h 
b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
index 10d69e561a987..14ad8629537f7 100644
--- a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
+++ b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
@@ -686,6 +686,16 @@ class OpenMPIRBuilder {
 Value *IfCondition,
 omp::Directive CanceledDirective);
 
+  /// Generator for '#omp cancellation point'
+  ///
+  /// \param Loc The location where the directive was encountered.
+  /// \param CanceledDirective The kind of directive that is cancled.
+  ///
+  /// \returns The insertion point after the barrier.
+  InsertPointOrErrorTy
+  createCancellationPoint(const LocationDescription &Loc,
+  omp::Directive CanceledDirective);
+
   /// Generator for '#omp parallel'
   ///
   /// \param Loc The insert and source location description.
diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp 
b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index 3f19088e6c73d..06aa61adcd739 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -1118,6 +1118,57 @@ OpenMPIRBuilder::createCancel(const LocationDescription 
&Loc,
   return Builder.saveIP();
 }
 
+OpenMPIRBuilder::InsertPointOrErrorTy
+OpenMPIRBuilder::createCancellationPoint(const LocationDescription &Loc,
+ omp::Directive CanceledDirective) {
+  if (!updateToLocation(Loc))
+return Loc.IP;
+
+  // LLVM utilities like blocks with terminators.
+  auto *UI = Builder.CreateUnreachable();
+  Builder.SetInsertPoint(UI);
+
+  Value *CancelKind = nullptr;
+  switch (CanceledDirective) {
+#define OMP_CANCEL_KIND(Enum, Str, DirectiveEnum, Value)   
\
+  case DirectiveEnum:  
\
+CancelKind = Builder.getInt32(Value);  
\
+break;
+#include "llvm/Frontend/OpenMP/OMPKinds.def"
+  default:
+llvm_unreachable("Unknown cancel kind!");
+  }
+
+  uint32_t SrcLocStrSize;
+  Constant *SrcLocStr = getOrCreateSrcLocStr(Loc, SrcLocStrSize);
+  Value *Ident = getOrCreateIdent(SrcLocStr, SrcLocStrSize);
+  Value *Args[] = {Ident, getOrCreateThreadID(Ident), CancelKind};
+  Value *Result = Builder.CreateCall(
+  getOrCreateRuntimeFunctionPtr(OMPRTL___kmpc_cancellationpoint), Args);
+  auto ExitCB = [this, CanceledDirective, Loc](InsertPointTy IP) -> Error {
+if (CanceledDirective == OMPD_parallel) {
+  IRBuilder<>::InsertPointGuard IPG(Builder);
+  Builder.restoreIP(IP);
+  return createBarrier(LocationDescription(Builder.saveIP(), Loc.DL),
+   omp::Directive::OMPD_unknown,
+   /* ForceSimpleCall */ false,
+   /* CheckCancelFlag */ false)
+  .takeError();
+}
+return Error::success();
+  };
+
+  // The actual cancel logic is shared with others, e.g., cancel_barriers.
+  if (Error Err = emitCancelationCheckImpl(Result, CanceledDirective, ExitCB))
+return Err;
+
+  // Update the insertion point and remove the terminator we introduced.
+  Builder.SetInsertPoint(UI->getParent());
+  UI->eraseFromParent();
+
+  return Builder.saveIP();
+}
+
 OpenMPIRBuilder::InsertPointTy OpenMPIRBuilder::emitTargetKernel(
 const LocationDescription &Loc, InsertPointTy AllocaIP, Value *&Return,
 Value *Ident, Value *DeviceID, Value *NumTeams, Value *NumThreads,
diff --git 
a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp 
b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
index 7d8a7ccb6e4ac..afae41f001736 100644
--- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
@@ -255,6 +255,9 @@ static LogicalResult checkImplementationStatus(Operation 
&op) {
   LogicalResult result = success();
   llvm::TypeSwitch(op)
   .Case([&](omp::CancelOp op) { checkCancelDirective(op, result); })
+  .Case([&](omp::CancellationPointOp op) {
+checkCancelDirective(op, result);
+  })
   .Case([&](omp::DistributeOp op) {
 checkAllocate(op, result);
 checkDistSchedule(op, result

[llvm-branch-commits] [llvm] [mlir] [mlir][OpenMP] Convert omp.cancellation_point to LLVMIR (PR #137205)

2025-04-24 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-mlir

Author: Tom Eccles (tblah)


Changes

This is basically identical to cancel except without the if clause.

taskgroup will be implemented in a followup PR.

---
Full diff: https://github.com/llvm/llvm-project/pull/137205.diff


5 Files Affected:

- (modified) llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h (+10) 
- (modified) llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp (+51) 
- (modified) 
mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+34-3) 
- (added) mlir/test/Target/LLVMIR/openmp-cancellation-point.mlir (+188) 
- (modified) mlir/test/Target/LLVMIR/openmp-todo.mlir (+10-6) 


``diff
diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h 
b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
index 10d69e561a987..14ad8629537f7 100644
--- a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
+++ b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
@@ -686,6 +686,16 @@ class OpenMPIRBuilder {
 Value *IfCondition,
 omp::Directive CanceledDirective);
 
+  /// Generator for '#omp cancellation point'
+  ///
+  /// \param Loc The location where the directive was encountered.
+  /// \param CanceledDirective The kind of directive that is cancled.
+  ///
+  /// \returns The insertion point after the barrier.
+  InsertPointOrErrorTy
+  createCancellationPoint(const LocationDescription &Loc,
+  omp::Directive CanceledDirective);
+
   /// Generator for '#omp parallel'
   ///
   /// \param Loc The insert and source location description.
diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp 
b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index 3f19088e6c73d..06aa61adcd739 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -1118,6 +1118,57 @@ OpenMPIRBuilder::createCancel(const LocationDescription 
&Loc,
   return Builder.saveIP();
 }
 
+OpenMPIRBuilder::InsertPointOrErrorTy
+OpenMPIRBuilder::createCancellationPoint(const LocationDescription &Loc,
+ omp::Directive CanceledDirective) {
+  if (!updateToLocation(Loc))
+return Loc.IP;
+
+  // LLVM utilities like blocks with terminators.
+  auto *UI = Builder.CreateUnreachable();
+  Builder.SetInsertPoint(UI);
+
+  Value *CancelKind = nullptr;
+  switch (CanceledDirective) {
+#define OMP_CANCEL_KIND(Enum, Str, DirectiveEnum, Value)   
\
+  case DirectiveEnum:  
\
+CancelKind = Builder.getInt32(Value);  
\
+break;
+#include "llvm/Frontend/OpenMP/OMPKinds.def"
+  default:
+llvm_unreachable("Unknown cancel kind!");
+  }
+
+  uint32_t SrcLocStrSize;
+  Constant *SrcLocStr = getOrCreateSrcLocStr(Loc, SrcLocStrSize);
+  Value *Ident = getOrCreateIdent(SrcLocStr, SrcLocStrSize);
+  Value *Args[] = {Ident, getOrCreateThreadID(Ident), CancelKind};
+  Value *Result = Builder.CreateCall(
+  getOrCreateRuntimeFunctionPtr(OMPRTL___kmpc_cancellationpoint), Args);
+  auto ExitCB = [this, CanceledDirective, Loc](InsertPointTy IP) -> Error {
+if (CanceledDirective == OMPD_parallel) {
+  IRBuilder<>::InsertPointGuard IPG(Builder);
+  Builder.restoreIP(IP);
+  return createBarrier(LocationDescription(Builder.saveIP(), Loc.DL),
+   omp::Directive::OMPD_unknown,
+   /* ForceSimpleCall */ false,
+   /* CheckCancelFlag */ false)
+  .takeError();
+}
+return Error::success();
+  };
+
+  // The actual cancel logic is shared with others, e.g., cancel_barriers.
+  if (Error Err = emitCancelationCheckImpl(Result, CanceledDirective, ExitCB))
+return Err;
+
+  // Update the insertion point and remove the terminator we introduced.
+  Builder.SetInsertPoint(UI->getParent());
+  UI->eraseFromParent();
+
+  return Builder.saveIP();
+}
+
 OpenMPIRBuilder::InsertPointTy OpenMPIRBuilder::emitTargetKernel(
 const LocationDescription &Loc, InsertPointTy AllocaIP, Value *&Return,
 Value *Ident, Value *DeviceID, Value *NumTeams, Value *NumThreads,
diff --git 
a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp 
b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
index 7d8a7ccb6e4ac..afae41f001736 100644
--- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
@@ -255,6 +255,9 @@ static LogicalResult checkImplementationStatus(Operation 
&op) {
   LogicalResult result = success();
   llvm::TypeSwitch(op)
   .Case([&](omp::CancelOp op) { checkCancelDirective(op, result); })
+  .Case([&](omp::CancellationPointOp op) {
+checkCancelDirective(op, result);
+  })
   .Case([&](omp::DistributeOp op) {
 checkAllocate(op, result);
 checkDistSchedule(op, result);
@@

[llvm-branch-commits] [llvm] [mlir] [mlir][OpenMP] Convert omp.cancellation_point to LLVMIR (PR #137205)

2025-04-24 Thread Tom Eccles via llvm-branch-commits

tblah wrote:

PR Stack:
- Cancel parallel https://github.com/llvm/llvm-project/pull/137192
- Cancel sections https://github.com/llvm/llvm-project/pull/137193
- Cancel wsloop https://github.com/llvm/llvm-project/pull/137194
- Cancellation point https://github.com/llvm/llvm-project/pull/137205
- Cancel(lation point) taskgroup (TODO)

https://github.com/llvm/llvm-project/pull/137205
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [mlir] [mlir][OpenMP] Convert omp.cancel sections to LLVMIR (PR #137193)

2025-04-24 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-flang-openmp

Author: Tom Eccles (tblah)


Changes

This is quite ugly but it is the best I could think of. The old FiniCBWrapper 
was way too brittle depending upon the exact block structure inside of the 
section, and could be confused by any control flow in the section (e.g. an if 
clause on cancel). The wording in the comment and variable names didn't seem to 
match where it was actually branching too as well.

Clang's (non-OpenMPIRBuilder) lowering for cancel inside of sections branches 
to a block containing __kmpc_for_static_fini.

This was hard to achieve here because sometimes the FiniCBWrapper has to run 
before the worksharing loop finalization has been crated.

To get around this ordering issue I created a dummy branch to a dummy block, 
which is then fixed later once all of the information is available.

---
Full diff: https://github.com/llvm/llvm-project/pull/137193.diff


4 Files Affected:

- (modified) llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp (+17-10) 
- (modified) 
mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+4-2) 
- (modified) mlir/test/Target/LLVMIR/openmp-cancel.mlir (+76) 
- (modified) mlir/test/Target/LLVMIR/openmp-todo.mlir (-16) 


``diff
diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp 
b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index be05f01c94603..3f19088e6c73d 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -2172,6 +2172,9 @@ OpenMPIRBuilder::InsertPointOrErrorTy 
OpenMPIRBuilder::createSections(
   if (!updateToLocation(Loc))
 return Loc.IP;
 
+  // FiniCBWrapper needs to create a branch to the loop finalization block, but
+  // this has not been created yet at some times when this callback runs.
+  SmallVector CancellationBranches;
   auto FiniCBWrapper = [&](InsertPointTy IP) {
 if (IP.getBlock()->end() != IP.getPoint())
   return FiniCB(IP);
@@ -2179,16 +2182,9 @@ OpenMPIRBuilder::InsertPointOrErrorTy 
OpenMPIRBuilder::createSections(
 // will fail because that function requires the Finalization Basic Block to
 // have a terminator, which is already removed by EmitOMPRegionBody.
 // IP is currently at cancelation block.
-// We need to backtrack to the condition block to fetch
-// the exit block and create a branch from cancelation
-// to exit block.
-IRBuilder<>::InsertPointGuard IPG(Builder);
-Builder.restoreIP(IP);
-auto *CaseBB = IP.getBlock()->getSinglePredecessor();
-auto *CondBB = CaseBB->getSinglePredecessor()->getSinglePredecessor();
-auto *ExitBB = CondBB->getTerminator()->getSuccessor(1);
-Instruction *I = Builder.CreateBr(ExitBB);
-IP = InsertPointTy(I->getParent(), I->getIterator());
+BranchInst *DummyBranch = Builder.CreateBr(IP.getBlock());
+IP = InsertPointTy(DummyBranch->getParent(), DummyBranch->getIterator());
+CancellationBranches.push_back(DummyBranch);
 return FiniCB(IP);
   };
 
@@ -2251,6 +2247,9 @@ OpenMPIRBuilder::InsertPointOrErrorTy 
OpenMPIRBuilder::createSections(
 return WsloopIP.takeError();
   InsertPointTy AfterIP = *WsloopIP;
 
+  BasicBlock *LoopFini = AfterIP.getBlock()->getSinglePredecessor();
+  assert(LoopFini && "Bad structure of static workshare loop finalization");
+
   // Apply the finalization callback in LoopAfterBB
   auto FiniInfo = FinalizationStack.pop_back_val();
   assert(FiniInfo.DK == OMPD_sections &&
@@ -2264,6 +2263,14 @@ OpenMPIRBuilder::InsertPointOrErrorTy 
OpenMPIRBuilder::createSections(
 AfterIP = {FiniBB, FiniBB->begin()};
   }
 
+  // Now we can fix the dummy branch to point to the right place
+  if (!CancellationBranches.empty()) {
+for (BranchInst *DummyBranch : CancellationBranches) {
+  assert(DummyBranch->getNumSuccessors() == 1);
+  DummyBranch->setSuccessor(0, LoopFini);
+}
+  }
+
   return AfterIP;
 }
 
diff --git 
a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp 
b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
index 6185a433a8199..d1885641f389d 100644
--- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
@@ -161,7 +161,8 @@ static LogicalResult checkImplementationStatus(Operation 
&op) {
   auto checkCancelDirective = [&todo](auto op, LogicalResult &result) {
 omp::ClauseCancellationConstructType cancelledDirective =
 op.getCancelDirective();
-if (cancelledDirective != omp::ClauseCancellationConstructType::Parallel)
+if (cancelledDirective != omp::ClauseCancellationConstructType::Parallel &&
+cancelledDirective != omp::ClauseCancellationConstructType::Sections)
   result = todo("cancel directive");
   };
   auto checkDepend = [&todo](auto op, LogicalResult &result) {
@@ -1690,10 +1691,11 @@ convertOmpSections(Operation &opInst, 
llvm::IRBuilderBase &builder,
   auto finiCB = [&](InsertPointTy codeGe

[llvm-branch-commits] [mlir] [mlir][OpenMP] convert wsloop cancellation to LLVMIR (PR #137194)

2025-04-24 Thread Tom Eccles via llvm-branch-commits

https://github.com/tblah created 
https://github.com/llvm/llvm-project/pull/137194

Taskloop support will follow in a later patch.

>From bb374c9f98cb13e55c9ce7d129f567428e58c24e Mon Sep 17 00:00:00 2001
From: Tom Eccles 
Date: Tue, 15 Apr 2025 15:05:50 +
Subject: [PATCH] [mlir][OpenMP] convert wsloop cancellation to LLVMIR

Taskloop support will follow in a later patch.
---
 .../OpenMP/OpenMPToLLVMIRTranslation.cpp  | 40 -
 mlir/test/Target/LLVMIR/openmp-cancel.mlir| 87 +++
 mlir/test/Target/LLVMIR/openmp-todo.mlir  | 16 
 3 files changed, 125 insertions(+), 18 deletions(-)

diff --git 
a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp 
b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
index d1885641f389d..7d8a7ccb6e4ac 100644
--- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
@@ -161,8 +161,7 @@ static LogicalResult checkImplementationStatus(Operation 
&op) {
   auto checkCancelDirective = [&todo](auto op, LogicalResult &result) {
 omp::ClauseCancellationConstructType cancelledDirective =
 op.getCancelDirective();
-if (cancelledDirective != omp::ClauseCancellationConstructType::Parallel &&
-cancelledDirective != omp::ClauseCancellationConstructType::Sections)
+if (cancelledDirective == omp::ClauseCancellationConstructType::Taskgroup)
   result = todo("cancel directive");
   };
   auto checkDepend = [&todo](auto op, LogicalResult &result) {
@@ -2360,6 +2359,30 @@ convertOmpWsloop(Operation &opInst, llvm::IRBuilderBase 
&builder,
   ? llvm::omp::WorksharingLoopType::DistributeForStaticLoop
   : llvm::omp::WorksharingLoopType::ForStaticLoop;
 
+  SmallVector cancelTerminators;
+  // This callback is invoked only if there is cancellation inside of the 
wsloop
+  // body.
+  auto finiCB = [&](llvm::OpenMPIRBuilder::InsertPointTy ip) -> llvm::Error {
+llvm::IRBuilderBase &llvmBuilder = ompBuilder->Builder;
+llvm::IRBuilderBase::InsertPointGuard guard(llvmBuilder);
+
+// ip is currently in the block branched to if cancellation occured.
+// We need to create a branch to terminate that block.
+llvmBuilder.restoreIP(ip);
+
+// We must still clean up the wsloop after cancelling it, so we need to
+// branch to the block that finalizes the wsloop.
+// That block has not been created yet so use this block as a dummy for now
+// and fix this after creating the wsloop.
+cancelTerminators.push_back(llvmBuilder.CreateBr(ip.getBlock()));
+return llvm::Error::success();
+  };
+  // We have to add the cleanup to the OpenMPIRBuilder before the body gets
+  // created in case the body contains omp.cancel (which will then expect to be
+  // able to find this cleanup callback).
+  ompBuilder->pushFinalizationCB({finiCB, llvm::omp::Directive::OMPD_for,
+  constructIsCancellable(wsloopOp)});
+
   llvm::OpenMPIRBuilder::LocationDescription ompLoc(builder);
   llvm::Expected regionBlock = convertOmpOpRegions(
   wsloopOp.getRegion(), "omp.wsloop.region", builder, moduleTranslation);
@@ -2381,6 +2404,19 @@ convertOmpWsloop(Operation &opInst, llvm::IRBuilderBase 
&builder,
   if (failed(handleError(wsloopIP, opInst)))
 return failure();
 
+  ompBuilder->popFinalizationCB();
+  if (!cancelTerminators.empty()) {
+// If we cancelled the loop, we should branch to the finalization block of
+// the wsloop (which is always immediately before the loop continuation
+// block). Now the finalization has been created, we can fix the branch.
+llvm::BasicBlock *wsloopFini = 
wsloopIP->getBlock()->getSinglePredecessor();
+for (llvm::BranchInst *cancelBranch : cancelTerminators) {
+  assert(cancelBranch->getNumSuccessors() == 1 &&
+ "cancel branch should have one target");
+  cancelBranch->setSuccessor(0, wsloopFini);
+}
+  }
+
   // Process the reductions if required.
   if (failed(createReductionsAndCleanup(
   wsloopOp, builder, moduleTranslation, allocaIP, reductionDecls,
diff --git a/mlir/test/Target/LLVMIR/openmp-cancel.mlir 
b/mlir/test/Target/LLVMIR/openmp-cancel.mlir
index fca16b936fc85..3c195a98d1000 100644
--- a/mlir/test/Target/LLVMIR/openmp-cancel.mlir
+++ b/mlir/test/Target/LLVMIR/openmp-cancel.mlir
@@ -156,3 +156,90 @@ llvm.func @cancel_sections_if(%cond : i1) {
 // CHECK: ret void
 // CHECK:   .cncl:; preds = 
%[[VAL_27]]
 // CHECK: br label %[[VAL_19]]
+
+llvm.func @cancel_wsloop_if(%lb : i32, %ub : i32, %step : i32, %cond : i1) {
+  omp.wsloop {
+omp.loop_nest (%iv) : i32 = (%lb) to (%ub) step (%step) {
+  omp.cancel cancellation_construct_type(loop) if(%cond)
+  omp.yield
+}
+  }
+  llvm.return
+}
+// CHECK-LABEL: define void @cancel_wsloop_if
+// CHECK: %[[VAL_0:.*]] = alloca i32, alig

[llvm-branch-commits] [mlir] [mlir][OpenMP] convert wsloop cancellation to LLVMIR (PR #137194)

2025-04-24 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-mlir

Author: Tom Eccles (tblah)


Changes

Taskloop support will follow in a later patch.

---
Full diff: https://github.com/llvm/llvm-project/pull/137194.diff


3 Files Affected:

- (modified) 
mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+38-2) 
- (modified) mlir/test/Target/LLVMIR/openmp-cancel.mlir (+87) 
- (modified) mlir/test/Target/LLVMIR/openmp-todo.mlir (-16) 


``diff
diff --git 
a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp 
b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
index d1885641f389d..7d8a7ccb6e4ac 100644
--- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
@@ -161,8 +161,7 @@ static LogicalResult checkImplementationStatus(Operation 
&op) {
   auto checkCancelDirective = [&todo](auto op, LogicalResult &result) {
 omp::ClauseCancellationConstructType cancelledDirective =
 op.getCancelDirective();
-if (cancelledDirective != omp::ClauseCancellationConstructType::Parallel &&
-cancelledDirective != omp::ClauseCancellationConstructType::Sections)
+if (cancelledDirective == omp::ClauseCancellationConstructType::Taskgroup)
   result = todo("cancel directive");
   };
   auto checkDepend = [&todo](auto op, LogicalResult &result) {
@@ -2360,6 +2359,30 @@ convertOmpWsloop(Operation &opInst, llvm::IRBuilderBase 
&builder,
   ? llvm::omp::WorksharingLoopType::DistributeForStaticLoop
   : llvm::omp::WorksharingLoopType::ForStaticLoop;
 
+  SmallVector cancelTerminators;
+  // This callback is invoked only if there is cancellation inside of the 
wsloop
+  // body.
+  auto finiCB = [&](llvm::OpenMPIRBuilder::InsertPointTy ip) -> llvm::Error {
+llvm::IRBuilderBase &llvmBuilder = ompBuilder->Builder;
+llvm::IRBuilderBase::InsertPointGuard guard(llvmBuilder);
+
+// ip is currently in the block branched to if cancellation occured.
+// We need to create a branch to terminate that block.
+llvmBuilder.restoreIP(ip);
+
+// We must still clean up the wsloop after cancelling it, so we need to
+// branch to the block that finalizes the wsloop.
+// That block has not been created yet so use this block as a dummy for now
+// and fix this after creating the wsloop.
+cancelTerminators.push_back(llvmBuilder.CreateBr(ip.getBlock()));
+return llvm::Error::success();
+  };
+  // We have to add the cleanup to the OpenMPIRBuilder before the body gets
+  // created in case the body contains omp.cancel (which will then expect to be
+  // able to find this cleanup callback).
+  ompBuilder->pushFinalizationCB({finiCB, llvm::omp::Directive::OMPD_for,
+  constructIsCancellable(wsloopOp)});
+
   llvm::OpenMPIRBuilder::LocationDescription ompLoc(builder);
   llvm::Expected regionBlock = convertOmpOpRegions(
   wsloopOp.getRegion(), "omp.wsloop.region", builder, moduleTranslation);
@@ -2381,6 +2404,19 @@ convertOmpWsloop(Operation &opInst, llvm::IRBuilderBase 
&builder,
   if (failed(handleError(wsloopIP, opInst)))
 return failure();
 
+  ompBuilder->popFinalizationCB();
+  if (!cancelTerminators.empty()) {
+// If we cancelled the loop, we should branch to the finalization block of
+// the wsloop (which is always immediately before the loop continuation
+// block). Now the finalization has been created, we can fix the branch.
+llvm::BasicBlock *wsloopFini = 
wsloopIP->getBlock()->getSinglePredecessor();
+for (llvm::BranchInst *cancelBranch : cancelTerminators) {
+  assert(cancelBranch->getNumSuccessors() == 1 &&
+ "cancel branch should have one target");
+  cancelBranch->setSuccessor(0, wsloopFini);
+}
+  }
+
   // Process the reductions if required.
   if (failed(createReductionsAndCleanup(
   wsloopOp, builder, moduleTranslation, allocaIP, reductionDecls,
diff --git a/mlir/test/Target/LLVMIR/openmp-cancel.mlir 
b/mlir/test/Target/LLVMIR/openmp-cancel.mlir
index fca16b936fc85..3c195a98d1000 100644
--- a/mlir/test/Target/LLVMIR/openmp-cancel.mlir
+++ b/mlir/test/Target/LLVMIR/openmp-cancel.mlir
@@ -156,3 +156,90 @@ llvm.func @cancel_sections_if(%cond : i1) {
 // CHECK: ret void
 // CHECK:   .cncl:; preds = 
%[[VAL_27]]
 // CHECK: br label %[[VAL_19]]
+
+llvm.func @cancel_wsloop_if(%lb : i32, %ub : i32, %step : i32, %cond : i1) {
+  omp.wsloop {
+omp.loop_nest (%iv) : i32 = (%lb) to (%ub) step (%step) {
+  omp.cancel cancellation_construct_type(loop) if(%cond)
+  omp.yield
+}
+  }
+  llvm.return
+}
+// CHECK-LABEL: define void @cancel_wsloop_if
+// CHECK: %[[VAL_0:.*]] = alloca i32, align 4
+// CHECK: %[[VAL_1:.*]] = alloca i32, align 4
+// CHECK: %[[VAL_2:.*]] = alloca i32, align 4
+// CHECK: %[[VAL_3:.*]] = alloca i32, align 4
+// 

[llvm-branch-commits] [llvm] release/20.x: [GlobalOpt] Do not promote malloc if there are atomic loads/stores (#137158) (PR #137179)

2025-04-24 Thread via llvm-branch-commits

https://github.com/llvmbot created 
https://github.com/llvm/llvm-project/pull/137179

Backport 57530c23a53b5e003d389437637f61c5b9814e22

Requested by: @nikic

>From 7972daff23be375db8218ac8ea04e8b9e18fb2b3 Mon Sep 17 00:00:00 2001
From: Nikita Popov 
Date: Thu, 24 Apr 2025 15:15:47 +0200
Subject: [PATCH] [GlobalOpt] Do not promote malloc if there are atomic
 loads/stores (#137158)

When converting a malloc stored to a global into a global, we will
introduce an i1 flag to track whether the global has been initialized.

In case of atomic loads/stores, this will result in verifier failures,
because atomic ops on i1 are illegal. Even if we changed this to i8, I
don't think it is a good idea to change atomic types in that way.

Instead, bail out of the transform is we encounter any atomic
loads/stores of the global.

Fixes https://github.com/llvm/llvm-project/issues/137152.

(cherry picked from commit 57530c23a53b5e003d389437637f61c5b9814e22)
---
 llvm/lib/Transforms/IPO/GlobalOpt.cpp |  4 +++
 .../GlobalOpt/malloc-promote-atomic.ll| 28 +++
 2 files changed, 32 insertions(+)
 create mode 100644 llvm/test/Transforms/GlobalOpt/malloc-promote-atomic.ll

diff --git a/llvm/lib/Transforms/IPO/GlobalOpt.cpp 
b/llvm/lib/Transforms/IPO/GlobalOpt.cpp
index 9586fc97a39f7..236a531317678 100644
--- a/llvm/lib/Transforms/IPO/GlobalOpt.cpp
+++ b/llvm/lib/Transforms/IPO/GlobalOpt.cpp
@@ -719,10 +719,14 @@ static bool allUsesOfLoadedValueWillTrapIfNull(const 
GlobalVariable *GV) {
 const Value *P = Worklist.pop_back_val();
 for (const auto *U : P->users()) {
   if (auto *LI = dyn_cast(U)) {
+if (!LI->isSimple())
+  return false;
 SmallPtrSet PHIs;
 if (!AllUsesOfValueWillTrapIfNull(LI, PHIs))
   return false;
   } else if (auto *SI = dyn_cast(U)) {
+if (!SI->isSimple())
+  return false;
 // Ignore stores to the global.
 if (SI->getPointerOperand() != P)
   return false;
diff --git a/llvm/test/Transforms/GlobalOpt/malloc-promote-atomic.ll 
b/llvm/test/Transforms/GlobalOpt/malloc-promote-atomic.ll
new file mode 100644
index 0..0ecdf095efdd8
--- /dev/null
+++ b/llvm/test/Transforms/GlobalOpt/malloc-promote-atomic.ll
@@ -0,0 +1,28 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py 
UTC_ARGS: --version 5
+; RUN: opt -passes=globalopt -S < %s | FileCheck %s
+
+@g = internal global ptr null, align 8
+
+define void @init() {
+; CHECK-LABEL: define void @init() local_unnamed_addr {
+; CHECK-NEXT:[[ALLOC:%.*]] = call ptr @malloc(i64 48)
+; CHECK-NEXT:store atomic ptr [[ALLOC]], ptr @g seq_cst, align 8
+; CHECK-NEXT:ret void
+;
+  %alloc = call ptr @malloc(i64 48)
+  store atomic ptr %alloc, ptr @g seq_cst, align 8
+  ret void
+}
+
+define i1 @check() {
+; CHECK-LABEL: define i1 @check() local_unnamed_addr {
+; CHECK-NEXT:[[VAL:%.*]] = load atomic ptr, ptr @g seq_cst, align 8
+; CHECK-NEXT:[[CMP:%.*]] = icmp eq ptr [[VAL]], null
+; CHECK-NEXT:ret i1 [[CMP]]
+;
+  %val = load atomic ptr, ptr @g seq_cst, align 8
+  %cmp = icmp eq ptr %val, null
+  ret i1 %cmp
+}
+
+declare ptr @malloc(i64) allockind("alloc,uninitialized") allocsize(0)

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/20.x: [GlobalOpt] Do not promote malloc if there are atomic loads/stores (#137158) (PR #137179)

2025-04-24 Thread via llvm-branch-commits

llvmbot wrote:

@fhahn What do you think about merging this PR to the release branch?

https://github.com/llvm/llvm-project/pull/137179
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/20.x: [GlobalOpt] Do not promote malloc if there are atomic loads/stores (#137158) (PR #137179)

2025-04-24 Thread via llvm-branch-commits

https://github.com/llvmbot milestoned 
https://github.com/llvm/llvm-project/pull/137179
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] LiveRangeShrink: Early exit when encountering a code motion barrier. (PR #136806)

2025-04-24 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm approved this pull request.


https://github.com/llvm/llvm-project/pull/136806
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Add noundef to mbcnt intrinsic returns (PR #136304)

2025-04-24 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm edited 
https://github.com/llvm/llvm-project/pull/136304
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Add noundef to mbcnt intrinsic returns (PR #136304)

2025-04-24 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm commented:

ping 

https://github.com/llvm/llvm-project/pull/136304
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [mlir] [mlir][OpenMP] Convert omp.cancel sections to LLVMIR (PR #137193)

2025-04-24 Thread Tom Eccles via llvm-branch-commits

https://github.com/tblah created 
https://github.com/llvm/llvm-project/pull/137193

This is quite ugly but it is the best I could think of. The old FiniCBWrapper 
was way too brittle depending upon the exact block structure inside of the 
section, and could be confused by any control flow in the section (e.g. an if 
clause on cancel). The wording in the comment and variable names didn't seem to 
match where it was actually branching too as well.

Clang's (non-OpenMPIRBuilder) lowering for cancel inside of sections branches 
to a block containing __kmpc_for_static_fini.

This was hard to achieve here because sometimes the FiniCBWrapper has to run 
before the worksharing loop finalization has been crated.

To get around this ordering issue I created a dummy branch to a dummy block, 
which is then fixed later once all of the information is available.

>From aa2445b4b8cfd3253464ffb466bf4f84fb5a488f Mon Sep 17 00:00:00 2001
From: Tom Eccles 
Date: Thu, 10 Apr 2025 11:43:18 +
Subject: [PATCH] [mlir][OpenMP] Convert omp.cancel sections to LLVMIR

This is quite ugly but it is the best I could think of. The old
FiniCBWrapper was way too brittle depending upon the exact block
structure inside of the section, and could be confused by any control
flow in the section (e.g. an if clause on cancel). The wording in the
comment and variable names didn't seem to match where it was actually
branching too as well.

Clang's (non-OpenMPIRBuilder) lowering for cancel inside of sections
branches to a block containing __kmpc_for_static_fini.

This was hard to achieve here because sometimes the FiniCBWrapper has to
run before the worksharing loop finalization has been crated.

To get around this ordering issue I created a dummy branch to a dummy
block, which is then fixed later once all of the information is
available.
---
 llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp | 27 ---
 .../OpenMP/OpenMPToLLVMIRTranslation.cpp  |  6 +-
 mlir/test/Target/LLVMIR/openmp-cancel.mlir| 76 +++
 mlir/test/Target/LLVMIR/openmp-todo.mlir  | 16 
 4 files changed, 97 insertions(+), 28 deletions(-)

diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp 
b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index be05f01c94603..3f19088e6c73d 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -2172,6 +2172,9 @@ OpenMPIRBuilder::InsertPointOrErrorTy 
OpenMPIRBuilder::createSections(
   if (!updateToLocation(Loc))
 return Loc.IP;
 
+  // FiniCBWrapper needs to create a branch to the loop finalization block, but
+  // this has not been created yet at some times when this callback runs.
+  SmallVector CancellationBranches;
   auto FiniCBWrapper = [&](InsertPointTy IP) {
 if (IP.getBlock()->end() != IP.getPoint())
   return FiniCB(IP);
@@ -2179,16 +2182,9 @@ OpenMPIRBuilder::InsertPointOrErrorTy 
OpenMPIRBuilder::createSections(
 // will fail because that function requires the Finalization Basic Block to
 // have a terminator, which is already removed by EmitOMPRegionBody.
 // IP is currently at cancelation block.
-// We need to backtrack to the condition block to fetch
-// the exit block and create a branch from cancelation
-// to exit block.
-IRBuilder<>::InsertPointGuard IPG(Builder);
-Builder.restoreIP(IP);
-auto *CaseBB = IP.getBlock()->getSinglePredecessor();
-auto *CondBB = CaseBB->getSinglePredecessor()->getSinglePredecessor();
-auto *ExitBB = CondBB->getTerminator()->getSuccessor(1);
-Instruction *I = Builder.CreateBr(ExitBB);
-IP = InsertPointTy(I->getParent(), I->getIterator());
+BranchInst *DummyBranch = Builder.CreateBr(IP.getBlock());
+IP = InsertPointTy(DummyBranch->getParent(), DummyBranch->getIterator());
+CancellationBranches.push_back(DummyBranch);
 return FiniCB(IP);
   };
 
@@ -2251,6 +2247,9 @@ OpenMPIRBuilder::InsertPointOrErrorTy 
OpenMPIRBuilder::createSections(
 return WsloopIP.takeError();
   InsertPointTy AfterIP = *WsloopIP;
 
+  BasicBlock *LoopFini = AfterIP.getBlock()->getSinglePredecessor();
+  assert(LoopFini && "Bad structure of static workshare loop finalization");
+
   // Apply the finalization callback in LoopAfterBB
   auto FiniInfo = FinalizationStack.pop_back_val();
   assert(FiniInfo.DK == OMPD_sections &&
@@ -2264,6 +2263,14 @@ OpenMPIRBuilder::InsertPointOrErrorTy 
OpenMPIRBuilder::createSections(
 AfterIP = {FiniBB, FiniBB->begin()};
   }
 
+  // Now we can fix the dummy branch to point to the right place
+  if (!CancellationBranches.empty()) {
+for (BranchInst *DummyBranch : CancellationBranches) {
+  assert(DummyBranch->getNumSuccessors() == 1);
+  DummyBranch->setSuccessor(0, LoopFini);
+}
+  }
+
   return AfterIP;
 }
 
diff --git 
a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp 
b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
index 6185a433a8199..d1885641f389d 100644
--- a/mlir/li

[llvm-branch-commits] [llvm] [GOFF] Add writing of text records (PR #137235)

2025-04-24 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-systemz

Author: Kai Nacke (redstar)


Changes

Sections which are not allowed to carry data are marked as virtual. Only 
complication when writing out the text is that it must be written in chunks of 
32k-1 bytes, which is done by having a wrapper stream writing those records.
Data of BSS sections is not written, since the contents is known to be zero. 
Instead, the fill byte value is used.

---
Full diff: https://github.com/llvm/llvm-project/pull/137235.diff


8 Files Affected:

- (modified) llvm/include/llvm/MC/MCContext.h (+4-2) 
- (modified) llvm/include/llvm/MC/MCSectionGOFF.h (+32-15) 
- (modified) llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp (+3-2) 
- (modified) llvm/lib/MC/GOFFObjectWriter.cpp (+78) 
- (modified) llvm/lib/MC/MCContext.cpp (+12-7) 
- (modified) llvm/lib/MC/MCObjectFileInfo.cpp (+4-4) 
- (modified) llvm/test/CodeGen/SystemZ/zos-section-1.ll (+22-5) 
- (modified) llvm/test/CodeGen/SystemZ/zos-section-2.ll (+25-5) 


``diff
diff --git a/llvm/include/llvm/MC/MCContext.h b/llvm/include/llvm/MC/MCContext.h
index 8eb904966f4de..2cdba64be116d 100644
--- a/llvm/include/llvm/MC/MCContext.h
+++ b/llvm/include/llvm/MC/MCContext.h
@@ -366,7 +366,8 @@ class MCContext {
 
   template 
   MCSectionGOFF *getGOFFSection(SectionKind Kind, StringRef Name,
-TAttr SDAttributes, MCSection *Parent);
+TAttr SDAttributes, MCSection *Parent,
+bool IsVirtual);
 
   /// Map of currently defined macros.
   StringMap MacroMap;
@@ -607,7 +608,8 @@ class MCContext {
   MCSectionGOFF *getGOFFSection(SectionKind Kind, StringRef Name,
 GOFF::SDAttr SDAttributes);
   MCSectionGOFF *getGOFFSection(SectionKind Kind, StringRef Name,
-GOFF::EDAttr EDAttributes, MCSection *Parent);
+GOFF::EDAttr EDAttributes, MCSection *Parent,
+bool IsVirtual);
   MCSectionGOFF *getGOFFSection(SectionKind Kind, StringRef Name,
 GOFF::PRAttr PRAttributes, MCSection *Parent);
 
diff --git a/llvm/include/llvm/MC/MCSectionGOFF.h 
b/llvm/include/llvm/MC/MCSectionGOFF.h
index b8b8cf112a34d..b2ca74c3ba78a 100644
--- a/llvm/include/llvm/MC/MCSectionGOFF.h
+++ b/llvm/include/llvm/MC/MCSectionGOFF.h
@@ -39,6 +39,9 @@ class MCSectionGOFF final : public MCSection {
   // The type of this section.
   GOFF::ESDSymbolType SymbolType;
 
+  // This section is a BSS section.
+  unsigned IsBSS : 1;
+
   // Indicates that the PR symbol needs to set the length of the section to a
   // non-zero value. This is only a problem with the ADA PR - the binder will
   // generate an error in this case.
@@ -50,26 +53,26 @@ class MCSectionGOFF final : public MCSection {
   friend class MCContext;
   friend class MCSymbolGOFF;
 
-  MCSectionGOFF(StringRef Name, SectionKind K, GOFF::SDAttr SDAttributes,
-MCSectionGOFF *Parent)
-  : MCSection(SV_GOFF, Name, K.isText(), /*IsVirtual=*/false, nullptr),
+  MCSectionGOFF(StringRef Name, SectionKind K, bool IsVirtual,
+GOFF::SDAttr SDAttributes, MCSectionGOFF *Parent)
+  : MCSection(SV_GOFF, Name, K.isText(), IsVirtual, nullptr),
 Parent(Parent), SDAttributes(SDAttributes),
-SymbolType(GOFF::ESD_ST_SectionDefinition), RequiresNonZeroLength(0),
-Emitted(0) {}
+SymbolType(GOFF::ESD_ST_SectionDefinition), IsBSS(K.isBSS()),
+RequiresNonZeroLength(0), Emitted(0) {}
 
-  MCSectionGOFF(StringRef Name, SectionKind K, GOFF::EDAttr EDAttributes,
-MCSectionGOFF *Parent)
-  : MCSection(SV_GOFF, Name, K.isText(), /*IsVirtual=*/false, nullptr),
+  MCSectionGOFF(StringRef Name, SectionKind K, bool IsVirtual,
+GOFF::EDAttr EDAttributes, MCSectionGOFF *Parent)
+  : MCSection(SV_GOFF, Name, K.isText(), IsVirtual, nullptr),
 Parent(Parent), EDAttributes(EDAttributes),
-SymbolType(GOFF::ESD_ST_ElementDefinition), RequiresNonZeroLength(0),
-Emitted(0) {}
+SymbolType(GOFF::ESD_ST_ElementDefinition), IsBSS(K.isBSS()),
+RequiresNonZeroLength(0), Emitted(0) {}
 
-  MCSectionGOFF(StringRef Name, SectionKind K, GOFF::PRAttr PRAttributes,
-MCSectionGOFF *Parent)
-  : MCSection(SV_GOFF, Name, K.isText(), /*IsVirtual=*/false, nullptr),
+  MCSectionGOFF(StringRef Name, SectionKind K, bool IsVirtual,
+GOFF::PRAttr PRAttributes, MCSectionGOFF *Parent)
+  : MCSection(SV_GOFF, Name, K.isText(), IsVirtual, nullptr),
 Parent(Parent), PRAttributes(PRAttributes),
-SymbolType(GOFF::ESD_ST_PartReference), RequiresNonZeroLength(0),
-Emitted(0) {}
+SymbolType(GOFF::ESD_ST_PartReference), IsBSS(K.isBSS()),
+RequiresNonZeroLength(0), Emitted(0) {}
 
 public:
   void printSwitchToSection(const MCAsmInfo &MAI, const Triple &T,

[llvm-branch-commits] [llvm] [GOFF] Add writing of text records (PR #137235)

2025-04-24 Thread Kai Nacke via llvm-branch-commits

https://github.com/redstar created 
https://github.com/llvm/llvm-project/pull/137235

Sections which are not allowed to carry data are marked as virtual. Only 
complication when writing out the text is that it must be written in chunks of 
32k-1 bytes, which is done by having a wrapper stream writing those records.
Data of BSS sections is not written, since the contents is known to be zero. 
Instead, the fill byte value is used.

>From 0e5c36a691fcbaa6f63c46f4cf86fa16857e137c Mon Sep 17 00:00:00 2001
From: Kai Nacke 
Date: Wed, 9 Apr 2025 15:08:52 -0400
Subject: [PATCH] [GOFF] Add writing of text records

Sections which are not allowed to carry data are marked as virtual.
Only complication when writing out the text is that it must be
written in chunks of 32k-1 bytes, which is done by having a wrapper
stream writing those records.
Data of BSS sections is not written, since the contents is known to
be zero. Instead, the fill byte value is used.
---
 llvm/include/llvm/MC/MCContext.h  |  6 +-
 llvm/include/llvm/MC/MCSectionGOFF.h  | 47 +++
 .../CodeGen/TargetLoweringObjectFileImpl.cpp  |  5 +-
 llvm/lib/MC/GOFFObjectWriter.cpp  | 78 +++
 llvm/lib/MC/MCContext.cpp | 19 +++--
 llvm/lib/MC/MCObjectFileInfo.cpp  |  8 +-
 llvm/test/CodeGen/SystemZ/zos-section-1.ll| 27 +--
 llvm/test/CodeGen/SystemZ/zos-section-2.ll| 30 +--
 8 files changed, 180 insertions(+), 40 deletions(-)

diff --git a/llvm/include/llvm/MC/MCContext.h b/llvm/include/llvm/MC/MCContext.h
index 8eb904966f4de..2cdba64be116d 100644
--- a/llvm/include/llvm/MC/MCContext.h
+++ b/llvm/include/llvm/MC/MCContext.h
@@ -366,7 +366,8 @@ class MCContext {
 
   template 
   MCSectionGOFF *getGOFFSection(SectionKind Kind, StringRef Name,
-TAttr SDAttributes, MCSection *Parent);
+TAttr SDAttributes, MCSection *Parent,
+bool IsVirtual);
 
   /// Map of currently defined macros.
   StringMap MacroMap;
@@ -607,7 +608,8 @@ class MCContext {
   MCSectionGOFF *getGOFFSection(SectionKind Kind, StringRef Name,
 GOFF::SDAttr SDAttributes);
   MCSectionGOFF *getGOFFSection(SectionKind Kind, StringRef Name,
-GOFF::EDAttr EDAttributes, MCSection *Parent);
+GOFF::EDAttr EDAttributes, MCSection *Parent,
+bool IsVirtual);
   MCSectionGOFF *getGOFFSection(SectionKind Kind, StringRef Name,
 GOFF::PRAttr PRAttributes, MCSection *Parent);
 
diff --git a/llvm/include/llvm/MC/MCSectionGOFF.h 
b/llvm/include/llvm/MC/MCSectionGOFF.h
index b8b8cf112a34d..b2ca74c3ba78a 100644
--- a/llvm/include/llvm/MC/MCSectionGOFF.h
+++ b/llvm/include/llvm/MC/MCSectionGOFF.h
@@ -39,6 +39,9 @@ class MCSectionGOFF final : public MCSection {
   // The type of this section.
   GOFF::ESDSymbolType SymbolType;
 
+  // This section is a BSS section.
+  unsigned IsBSS : 1;
+
   // Indicates that the PR symbol needs to set the length of the section to a
   // non-zero value. This is only a problem with the ADA PR - the binder will
   // generate an error in this case.
@@ -50,26 +53,26 @@ class MCSectionGOFF final : public MCSection {
   friend class MCContext;
   friend class MCSymbolGOFF;
 
-  MCSectionGOFF(StringRef Name, SectionKind K, GOFF::SDAttr SDAttributes,
-MCSectionGOFF *Parent)
-  : MCSection(SV_GOFF, Name, K.isText(), /*IsVirtual=*/false, nullptr),
+  MCSectionGOFF(StringRef Name, SectionKind K, bool IsVirtual,
+GOFF::SDAttr SDAttributes, MCSectionGOFF *Parent)
+  : MCSection(SV_GOFF, Name, K.isText(), IsVirtual, nullptr),
 Parent(Parent), SDAttributes(SDAttributes),
-SymbolType(GOFF::ESD_ST_SectionDefinition), RequiresNonZeroLength(0),
-Emitted(0) {}
+SymbolType(GOFF::ESD_ST_SectionDefinition), IsBSS(K.isBSS()),
+RequiresNonZeroLength(0), Emitted(0) {}
 
-  MCSectionGOFF(StringRef Name, SectionKind K, GOFF::EDAttr EDAttributes,
-MCSectionGOFF *Parent)
-  : MCSection(SV_GOFF, Name, K.isText(), /*IsVirtual=*/false, nullptr),
+  MCSectionGOFF(StringRef Name, SectionKind K, bool IsVirtual,
+GOFF::EDAttr EDAttributes, MCSectionGOFF *Parent)
+  : MCSection(SV_GOFF, Name, K.isText(), IsVirtual, nullptr),
 Parent(Parent), EDAttributes(EDAttributes),
-SymbolType(GOFF::ESD_ST_ElementDefinition), RequiresNonZeroLength(0),
-Emitted(0) {}
+SymbolType(GOFF::ESD_ST_ElementDefinition), IsBSS(K.isBSS()),
+RequiresNonZeroLength(0), Emitted(0) {}
 
-  MCSectionGOFF(StringRef Name, SectionKind K, GOFF::PRAttr PRAttributes,
-MCSectionGOFF *Parent)
-  : MCSection(SV_GOFF, Name, K.isText(), /*IsVirtual=*/false, nullptr),
+  MCSectionGOFF(StringRef Name, SectionKind K, bool IsVirtual,
+ 

[llvm-branch-commits] [llvm] [GOFF] Add writing of text records (PR #137235)

2025-04-24 Thread via llvm-branch-commits

github-actions[bot] wrote:




:warning: C/C++ code formatter, clang-format found issues in your code. 
:warning:



You can test this locally with the following command:


``bash
git-clang-format --diff ba1223ae5a08197d30d3fb285a725a5278549e74 
0e5c36a691fcbaa6f63c46f4cf86fa16857e137c --extensions cpp,h -- 
llvm/include/llvm/MC/MCContext.h llvm/include/llvm/MC/MCSectionGOFF.h 
llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp 
llvm/lib/MC/GOFFObjectWriter.cpp llvm/lib/MC/MCContext.cpp 
llvm/lib/MC/MCObjectFileInfo.cpp
``





View the diff from clang-format here.


``diff
diff --git a/llvm/lib/MC/GOFFObjectWriter.cpp b/llvm/lib/MC/GOFFObjectWriter.cpp
index 119cab6e6e..f2b5ac0a4d 100644
--- a/llvm/lib/MC/GOFFObjectWriter.cpp
+++ b/llvm/lib/MC/GOFFObjectWriter.cpp
@@ -506,7 +506,7 @@ uint64_t GOFFWriter::writeObject() {
   defineSymbols();
 
   for (const MCSection &Section : Asm)
-writeText(static_cast(&Section));
+writeText(static_cast(&Section));
 
   writeEnd();
 

``




https://github.com/llvm/llvm-project/pull/137235
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [llvm][AsmPrinter] Emit call graph section (PR #87576)

2025-04-24 Thread Paul Kirth via llvm-branch-commits


@@ -1642,6 +1642,94 @@ void AsmPrinter::emitStackUsage(const MachineFunction 
&MF) {
 *StackUsageStream << "static\n";
 }
 
+/// Extracts a generalized numeric type identifier of a Function's type from
+/// type metadata. Returns null if metadata cannot be found.
+static ConstantInt *extractNumericCGTypeId(const Function &F) {
+  SmallVector Types;
+  F.getMetadata(LLVMContext::MD_type, Types);
+  for (const auto &Type : Types) {
+if (Type->hasGeneralizedMDString()) {
+  MDString *MDGeneralizedTypeId = cast(Type->getOperand(1));
+  uint64_t TypeIdVal = llvm::MD5Hash(MDGeneralizedTypeId->getString());
+  IntegerType *Int64Ty = Type::getInt64Ty(F.getContext());
+  return ConstantInt::get(Int64Ty, TypeIdVal);
+}
+  }
+
+  return nullptr;
+}
+
+/// Emits .callgraph section.
+void AsmPrinter::emitCallGraphSection(const MachineFunction &MF,
+  FunctionInfo &FuncInfo) {
+  if (!MF.getTarget().Options.EmitCallGraphSection)
+return;
+
+  // Switch to the call graph section for the function
+  MCSection *FuncCGSection =
+  getObjFileLowering().getCallGraphSection(*getCurrentSection());
+  assert(FuncCGSection && "null callgraph section");
+  OutStreamer->pushSection();
+  OutStreamer->switchSection(FuncCGSection);
+
+  // Emit format version number.
+  OutStreamer->emitInt64(0);

ilovepi wrote:

Lets use a constant defined somewhere for the version number, so its easy to 
update. Probably looking at PGO profile versioning is a reasonable place to 
take inspiration.

https://github.com/llvm/llvm-project/pull/87576
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [llvm][AsmPrinter] Emit call graph section (PR #87576)

2025-04-24 Thread Paul Kirth via llvm-branch-commits


@@ -68,6 +68,9 @@ class MCObjectFileInfo {
   /// Language Specific Data Area information is emitted to.
   MCSection *LSDASection = nullptr;
 
+  /// Section containing metadata on call graph.

ilovepi wrote:

```suggestion
  /// Section containing call graph metadata.
```

https://github.com/llvm/llvm-project/pull/87576
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] 2db36d8 - Revert "[DLCov] Implement DebugLoc coverage tracking (#107279)"

2025-04-24 Thread via llvm-branch-commits

Author: Stephen Tozer
Date: 2025-04-25T00:33:24+01:00
New Revision: 2db36d8693610acbe87bed2f10e49ca938429bde

URL: 
https://github.com/llvm/llvm-project/commit/2db36d8693610acbe87bed2f10e49ca938429bde
DIFF: 
https://github.com/llvm/llvm-project/commit/2db36d8693610acbe87bed2f10e49ca938429bde.diff

LOG: Revert "[DLCov] Implement DebugLoc coverage tracking (#107279)"

This reverts commit a9d93ecf1f8d2cfe3f77851e0df179b386cff353.

Added: 


Modified: 
clang/lib/CodeGen/BackendUtil.cpp
llvm/docs/HowToUpdateDebugInfo.rst
llvm/include/llvm/IR/DebugLoc.h
llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
llvm/lib/IR/DebugInfo.cpp
llvm/lib/IR/DebugLoc.cpp
llvm/lib/Transforms/Utils/Debugify.cpp

Removed: 




diff  --git a/clang/lib/CodeGen/BackendUtil.cpp 
b/clang/lib/CodeGen/BackendUtil.cpp
index cafc703420183..f7eb853beb23c 100644
--- a/clang/lib/CodeGen/BackendUtil.cpp
+++ b/clang/lib/CodeGen/BackendUtil.cpp
@@ -961,22 +961,6 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
   Debugify.setOrigDIVerifyBugsReportFilePath(
   CodeGenOpts.DIBugsReportFilePath);
 Debugify.registerCallbacks(PIC, MAM);
-
-#if ENABLE_DEBUGLOC_COVERAGE_TRACKING
-// If we're using debug location coverage tracking, mark all the
-// instructions coming out of the frontend without a DebugLoc as being
-// compiler-generated, to prevent both those instructions and new
-// instructions that inherit their location from being treated as
-// incorrectly empty locations.
-for (Function &F : *TheModule) {
-  if (!F.getSubprogram())
-continue;
-  for (BasicBlock &BB : F)
-for (Instruction &I : BB)
-  if (!I.getDebugLoc())
-I.setDebugLoc(DebugLoc::getCompilerGenerated());
-}
-#endif
   }
   // Attempt to load pass plugins and register their callbacks with PB.
   for (auto &PluginFN : CodeGenOpts.PassPlugins) {

diff  --git a/llvm/docs/HowToUpdateDebugInfo.rst 
b/llvm/docs/HowToUpdateDebugInfo.rst
index a87efe7e6e43f..3088f59c1066a 100644
--- a/llvm/docs/HowToUpdateDebugInfo.rst
+++ b/llvm/docs/HowToUpdateDebugInfo.rst
@@ -169,47 +169,6 @@ See the discussion in the section about
 :ref:`merging locations` for examples of when the rule for
 dropping locations applies.
 
-.. _NewInstLocations:
-
-Setting locations for new instructions
---
-
-Whenever a new instruction is created and there is no suitable location for 
that
-instruction, that instruction should be annotated accordingly. There are a set
-of special ``DebugLoc`` values that can be set on an instruction to annotate 
the
-reason that it does not have a valid location. These are as follows:
-
-* ``DebugLoc::getCompilerGenerated()``: This indicates that the instruction is 
a
-  compiler-generated instruction, i.e. it is not associated with any user 
source
-  code.
-
-* ``DebugLoc::getDropped()``: This indicates that the instruction has
-  intentionally had its source location removed, according to the rules for
-  :ref:`dropping locations`; this is set automatically by
-  ``Instruction::dropLocation()``.
-
-* ``DebugLoc::getUnknown()``: This indicates that the instruction does not have
-  a known or currently knowable source location, e.g. that it is infeasible to
-  determine the correct source location, or that the source location is
-  ambiguous in a way that LLVM cannot currently represent.
-
-* ``DebugLoc::getTemporary()``: This is used for instructions that we don't
-  expect to be emitted (e.g. ``UnreachableInst``), and so should not need a
-  valid location; if we ever try to emit a temporary location into an 
object/asm
-  file, this indicates that something has gone wrong.
-
-Where applicable, these should be used instead of leaving an instruction 
without
-an assigned location or explicitly setting the location as ``DebugLoc()``.
-Ordinarily these special locations are identical to an absent location, but 
LLVM
-built with coverage-tracking
-(``-DLLVM_ENABLE_DEBUGLOC_COVERAGE_TRACKING="COVERAGE"``) will keep track of
-these special locations in order to detect unintentionally-missing locations;
-for this reason, the most important rule is to *not* apply any of these if it
-isn't clear which, if any, is appropriate - an absent location can be detected
-and fixed, while an incorrectly annotated instruction is much harder to detect.
-On the other hand, if any of these clearly apply, then they should be used to
-prevent false positives from being flagged up.
-
 Rules for updating debug values
 ===
 

diff  --git a/llvm/include/llvm/IR/DebugLoc.h b/llvm/include/llvm/IR/DebugLoc.h
index 9f1dafa8b71d9..c22d3e9b10d27 100644
--- a/llvm/include/llvm/IR/DebugLoc.h
+++ b/llvm/include/llvm/IR/DebugLoc.h
@@ -14,7 +14,6 @@
 #ifndef LLVM_IR_DEBUGLOC_H
 #define LLVM_IR_DEBUGLOC_H
 
-#include "llvm/Config/config.h"
 #include "llvm/IR/Trackin

[llvm-branch-commits] [libc] 8ce5c04 - update jmp_buf.h to always gate this by __linux__

2025-04-24 Thread via llvm-branch-commits

Author: Schrodinger ZHU Yifan
Date: 2025-04-24T19:32:58-04:00
New Revision: 8ce5c04559d3c66700fa8e62bcc47a6b02c29535

URL: 
https://github.com/llvm/llvm-project/commit/8ce5c04559d3c66700fa8e62bcc47a6b02c29535
DIFF: 
https://github.com/llvm/llvm-project/commit/8ce5c04559d3c66700fa8e62bcc47a6b02c29535.diff

LOG: update jmp_buf.h to always gate this by __linux__

Added: 


Modified: 
libc/include/llvm-libc-types/jmp_buf.h

Removed: 




diff  --git a/libc/include/llvm-libc-types/jmp_buf.h 
b/libc/include/llvm-libc-types/jmp_buf.h
index a6638b222138a..90bd60d741293 100644
--- a/libc/include/llvm-libc-types/jmp_buf.h
+++ b/libc/include/llvm-libc-types/jmp_buf.h
@@ -9,7 +9,13 @@
 #ifndef LLVM_LIBC_TYPES_JMP_BUF_H
 #define LLVM_LIBC_TYPES_JMP_BUF_H
 
-#if defined(__i386__) || defined(__x86_64__)
+// TODO: implement sigjmp_buf related functions for other architectures
+// Issue: https://github.com/llvm/llvm-project/issues/136358
+#if defined(__linux__) && (defined(__i386__) || defined(__x86_64__))
+#define __LIBC_HAS_SIGJMP_BUF
+#endif
+
+#if defined(__LIBC_HAS_SIGJMP_BUF)
 #include "sigset_t.h"
 #endif
 
@@ -54,9 +60,7 @@ typedef struct {
 #else
 #error "__jmp_buf not available for your target architecture."
 #endif
-  // TODO: implement sigjmp_buf related functions for other architectures
-  // Issue: https://github.com/llvm/llvm-project/issues/136358
-#if defined(__i386__) || defined(__x86_64__)
+#if defined(__LIBC_HAS_SIGJMP_BUF)
   // return address
   void *sig_retaddr;
   // extra register buffer to avoid indefinite stack growth in sigsetjmp
@@ -68,7 +72,10 @@ typedef struct {
 
 typedef __jmp_buf jmp_buf[1];
 
-#if defined(__i386__) || defined(__x86_64__)
+#if defined(__LIBC_HAS_SIGJMP_BUF)
 typedef __jmp_buf sigjmp_buf[1];
 #endif
+
+#undef __LIBC_HAS_SIGJMP_BUF
+
 #endif // LLVM_LIBC_TYPES_JMP_BUF_H



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [llvm][AsmPrinter] Emit call graph section (PR #87576)

2025-04-24 Thread Paul Kirth via llvm-branch-commits


@@ -0,0 +1,50 @@
+;; Tests that we store the type identifiers in .callgraph section of the 
binary.
+
+; RUN: llc --call-graph-section -filetype=obj -o - < %s | \
+; RUN: llvm-readelf -x .callgraph - | FileCheck %s

ilovepi wrote:

I don't see any tests under `tests/MC` though. I agree w/ @arsenm that we need 
to have dedicated tests for the assembler/disassembler, and not just for 
codegen. Those commonly find bugs in the implementation that are otherwise easy 
to miss.

https://github.com/llvm/llvm-project/pull/87576
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [RISCV][Driver] Add riscv emulation mode to linker job of BareMetal toolchain (PR #134442)

2025-04-24 Thread Petr Hosek via llvm-branch-commits


@@ -534,8 +534,18 @@ void baremetal::Linker::ConstructJob(Compilation &C, const 
JobAction &JA,
 
   CmdArgs.push_back("-Bstatic");
 
-  if (TC.getTriple().isRISCV() && Args.hasArg(options::OPT_mno_relax))
-CmdArgs.push_back("--no-relax");
+  if (Triple.isRISCV()) {
+CmdArgs.push_back("-X");
+if (Args.hasArg(options::OPT_mno_relax))
+  CmdArgs.push_back("--no-relax");
+if (const char *LDMOption = getLDMOption(TC.getTriple(), Args)) {
+  CmdArgs.push_back("-m");
+  CmdArgs.push_back(LDMOption);
+} else {
+  D.Diag(diag::err_target_unknown_triple) << Triple.str();
+  return;
+}

petrhosek wrote:

Can you also swap the order of the `-m` option to be the same as in the `Gnu` 
driver? 
```suggestion
if (const char *LDMOption = getLDMOption(TC.getTriple(), Args)) {
  CmdArgs.push_back("-m");
  CmdArgs.push_back(LDMOption);
} else {
  D.Diag(diag::err_target_unknown_triple) << Triple.str();
  return;
}

CmdArgs.push_back("-X");
if (Args.hasArg(options::OPT_mno_relax))
  CmdArgs.push_back("--no-relax");
```
In a follow up change, I'd like to move the `-m` out of this condition since 
it'd be also beneficial for other targets.

https://github.com/llvm/llvm-project/pull/134442
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [RISCV][Driver] Add riscv emulation mode to linker job of BareMetal toolchain (PR #134442)

2025-04-24 Thread Petr Hosek via llvm-branch-commits

https://github.com/petrhosek approved this pull request.


https://github.com/llvm/llvm-project/pull/134442
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)

2025-04-24 Thread Sander de Smalen via llvm-branch-commits

https://github.com/sdesmalen-arm deleted 
https://github.com/llvm/llvm-project/pull/136173
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)

2025-04-24 Thread Sander de Smalen via llvm-branch-commits


@@ -4923,9 +4923,7 @@ InstructionCost AArch64TTIImpl::getPartialReductionCost(
 return Invalid;
   break;
 case 16:
-  if (AccumEVT == MVT::i64)
-Cost *= 2;
-  else if (AccumEVT != MVT::i32)
+  if (AccumEVT != MVT::i32)

sdesmalen-arm wrote:

Why are you making this change?

https://github.com/llvm/llvm-project/pull/136173
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)

2025-04-24 Thread Sander de Smalen via llvm-branch-commits


@@ -219,6 +219,8 @@ class TargetTransformInfo {
   /// Get the kind of extension that an instruction represents.
   static PartialReductionExtendKind
   getPartialReductionExtendKind(Instruction *I);
+  static PartialReductionExtendKind
+  getPartialReductionExtendKind(Instruction::CastOps ExtOpcode);

sdesmalen-arm wrote:

What about either replacing `getPartialReductionExtendKind(Instruction *I);` 
with this one, rather than adding a new interface, Or otherwise changing the 
implementation of `getPartialReductionExtendKind(Instruction *I)` to use 
`getPartialReductionExtendKind(Instruction::CastOps)` ?

https://github.com/llvm/llvm-project/pull/136173
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)

2025-04-24 Thread Sander de Smalen via llvm-branch-commits


@@ -2056,55 +2056,6 @@ class VPReductionPHIRecipe : public VPHeaderPHIRecipe,
   }
 };
 
-/// A recipe for forming partial reductions. In the loop, an accumulator and

sdesmalen-arm wrote:

Would it be possible to make the change of `VPPartialReductionRecipe : public 
VPSingleDefRecipe` -> `VPPartialReductionRecipe : public VPReductionRecipe` as 
an NFC change? (For cases around VPMulAccumulateReductionRecipes you can 
initially add some asserts that the recipe isn't a partial reduction, because 
that won't be supported until this PR lands)

https://github.com/llvm/llvm-project/pull/136173
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)

2025-04-24 Thread Benjamin Maxwell via llvm-branch-commits


@@ -4923,9 +4923,7 @@ InstructionCost AArch64TTIImpl::getPartialReductionCost(
 return Invalid;
   break;
 case 16:
-  if (AccumEVT == MVT::i64)
-Cost *= 2;
-  else if (AccumEVT != MVT::i32)
+  if (AccumEVT != MVT::i32)

MacDue wrote:

It's due to: 
https://github.com/llvm/llvm-project/pull/136173#discussion_r2053920360

https://github.com/llvm/llvm-project/pull/136173
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions with different extensions (PR #136997)

2025-04-24 Thread Sander de Smalen via llvm-branch-commits


@@ -2438,14 +2438,14 @@ 
VPMulAccumulateReductionRecipe::computeCost(ElementCount VF,
 return Ctx.TTI.getPartialReductionCost(
 Instruction::Add, Ctx.Types.inferScalarType(getVecOp0()),
 Ctx.Types.inferScalarType(getVecOp1()), getResultType(), VF,
-TTI::getPartialReductionExtendKind(getExtOpcode()),
-TTI::getPartialReductionExtendKind(getExtOpcode()), Instruction::Mul);
+TTI::getPartialReductionExtendKind(getExt0Opcode()),
+TTI::getPartialReductionExtendKind(getExt1Opcode()), Instruction::Mul);
   }
 
   Type *RedTy = Ctx.Types.inferScalarType(this);
   auto *SrcVecTy =
   cast(toVectorTy(Ctx.Types.inferScalarType(getVecOp0()), VF));
-  return Ctx.TTI.getMulAccReductionCost(isZExt(), RedTy, SrcVecTy,
+  return Ctx.TTI.getMulAccReductionCost(isZExt0(), RedTy, SrcVecTy,

sdesmalen-arm wrote:

The TTI hook also needs updating to reflect the separate extends.

https://github.com/llvm/llvm-project/pull/136997
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)

2025-04-24 Thread Sander de Smalen via llvm-branch-commits


@@ -2496,6 +2501,9 @@ class VPMulAccumulateReductionRecipe : public 
VPReductionRecipe {
 
   Type *ResultTy;
 
+  /// If the reduction this is based on is a partial reduction.

sdesmalen-arm wrote:

This comment makes no sense.

https://github.com/llvm/llvm-project/pull/136173
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: add RegBankLegalize rules for extends and trunc (PR #132383)

2025-04-24 Thread Petar Avramovic via llvm-branch-commits

https://github.com/petar-avramovic updated 
https://github.com/llvm/llvm-project/pull/132383

>From 61c28d2d564c63b986c6adfae26f17d868b53cd1 Mon Sep 17 00:00:00 2001
From: Petar Avramovic 
Date: Mon, 14 Apr 2025 16:34:00 +0200
Subject: [PATCH] AMDGPU/GlobalISel: add RegBankLegalize rules for extends and
 trunc

Uniform S1:
Truncs to uniform S1 and AnyExts from S1 are left as is as they are meant
to be combined away. Uniform S1 ZExt and SExt are lowered using select.
Divergent S1:
Trunc of VGPR to VCC is lowered as compare.
Extends of VCC are lowered using select.

For remaining types:
S32 to S64 ZExt and SExt are lowered using merge values, AnyExt and Trunc
are again left as is to be combined away.
Notably uniform S16 for SExt and Zext is not lowered to S32 and left as is
for instruction select to deal with them. This is because there are patterns
that check for S16 type.
---
 .../Target/AMDGPU/AMDGPURegBankLegalize.cpp   |   7 ++
 .../AMDGPU/AMDGPURegBankLegalizeHelper.cpp| 107 +-
 .../AMDGPU/AMDGPURegBankLegalizeHelper.h  |   1 +
 .../AMDGPU/AMDGPURegBankLegalizeRules.cpp |  47 +++-
 .../AMDGPU/AMDGPURegBankLegalizeRules.h   |   3 +
 .../GlobalISel/regbankselect-and-s1.mir   | 105 +
 .../GlobalISel/regbankselect-anyext.mir   |  59 +-
 .../AMDGPU/GlobalISel/regbankselect-sext.mir  | 100 ++--
 .../AMDGPU/GlobalISel/regbankselect-trunc.mir |  22 +++-
 .../AMDGPU/GlobalISel/regbankselect-zext.mir  |  89 +--
 10 files changed, 357 insertions(+), 183 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp
index ad6a0772fe8b6..9544c9f43eeaf 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp
@@ -214,6 +214,13 @@ class AMDGPURegBankLegalizeCombiner {
   return;
 }
 
+if (DstTy == S64 && TruncSrcTy == S32) {
+  B.buildMergeLikeInstr(MI.getOperand(0).getReg(),
+{TruncSrc, B.buildUndef({SgprRB, S32})});
+  cleanUpAfterCombine(MI, Trunc);
+  return;
+}
+
 if (DstTy == S32 && TruncSrcTy == S16) {
   B.buildAnyExt(Dst, TruncSrc);
   cleanUpAfterCombine(MI, Trunc);
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
index 37cdf0c926b09..8e71c172cd20d 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
@@ -131,6 +131,40 @@ void RegBankLegalizeHelper::widenLoad(MachineInstr &MI, 
LLT WideTy,
   MI.eraseFromParent();
 }
 
+void RegBankLegalizeHelper::lowerVccExtToSel(MachineInstr &MI) {
+  Register Dst = MI.getOperand(0).getReg();
+  LLT Ty = MRI.getType(Dst);
+  Register Src = MI.getOperand(1).getReg();
+  unsigned Opc = MI.getOpcode();
+  if (Ty == S32 || Ty == S16) {
+auto True = B.buildConstant({VgprRB, Ty}, Opc == G_SEXT ? -1 : 1);
+auto False = B.buildConstant({VgprRB, Ty}, 0);
+B.buildSelect(Dst, Src, True, False);
+  }
+  if (Ty == S64) {
+auto True = B.buildConstant({VgprRB, S32}, Opc == G_SEXT ? -1 : 1);
+auto False = B.buildConstant({VgprRB, S32}, 0);
+auto Lo = B.buildSelect({VgprRB, S32}, Src, True, False);
+MachineInstrBuilder Hi;
+switch (Opc) {
+case G_SEXT:
+  Hi = Lo;
+  break;
+case G_ZEXT:
+  Hi = False;
+  break;
+case G_ANYEXT:
+  Hi = B.buildUndef({VgprRB_S32});
+  break;
+default:
+  llvm_unreachable("Opcode not supported");
+}
+
+B.buildMergeValues(Dst, {Lo.getReg(0), Hi.getReg(0)});
+  }
+  MI.eraseFromParent();
+}
+
 static bool isSignedBFE(MachineInstr &MI) {
   if (isa(MI)) {
 if (MI.getOperand(1).getIntrinsicID() == Intrinsic::amdgcn_sbfe)
@@ -259,26 +293,8 @@ void RegBankLegalizeHelper::lower(MachineInstr &MI,
   switch (Mapping.LoweringMethod) {
   case DoNotLower:
 return;
-  case VccExtToSel: {
-LLT Ty = MRI.getType(MI.getOperand(0).getReg());
-Register Src = MI.getOperand(1).getReg();
-unsigned Opc = MI.getOpcode();
-if (Ty == S32 || Ty == S16) {
-  auto True = B.buildConstant({VgprRB, Ty}, Opc == G_SEXT ? -1 : 1);
-  auto False = B.buildConstant({VgprRB, Ty}, 0);
-  B.buildSelect(MI.getOperand(0).getReg(), Src, True, False);
-}
-if (Ty == S64) {
-  auto True = B.buildConstant({VgprRB, S32}, Opc == G_SEXT ? -1 : 1);
-  auto False = B.buildConstant({VgprRB, S32}, 0);
-  auto Sel = B.buildSelect({VgprRB, S32}, Src, True, False);
-  B.buildMergeValues(
-  MI.getOperand(0).getReg(),
-  {Sel.getReg(0), Opc == G_SEXT ? Sel.getReg(0) : False.getReg(0)});
-}
-MI.eraseFromParent();
-return;
-  }
+  case VccExtToSel:
+return lowerVccExtToSel(MI);
   case UniExtToSel: {
 LLT Ty = MRI.getType(MI.getOperand(0).getReg());
 auto True = B.buildConstant({SgprRB, Ty},
@@ -295,13 +311,

[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: add RegBankLegalize rules for extends and trunc (PR #132383)

2025-04-24 Thread Petar Avramovic via llvm-branch-commits

https://github.com/petar-avramovic updated 
https://github.com/llvm/llvm-project/pull/132383

>From 61c28d2d564c63b986c6adfae26f17d868b53cd1 Mon Sep 17 00:00:00 2001
From: Petar Avramovic 
Date: Mon, 14 Apr 2025 16:34:00 +0200
Subject: [PATCH] AMDGPU/GlobalISel: add RegBankLegalize rules for extends and
 trunc

Uniform S1:
Truncs to uniform S1 and AnyExts from S1 are left as is as they are meant
to be combined away. Uniform S1 ZExt and SExt are lowered using select.
Divergent S1:
Trunc of VGPR to VCC is lowered as compare.
Extends of VCC are lowered using select.

For remaining types:
S32 to S64 ZExt and SExt are lowered using merge values, AnyExt and Trunc
are again left as is to be combined away.
Notably uniform S16 for SExt and Zext is not lowered to S32 and left as is
for instruction select to deal with them. This is because there are patterns
that check for S16 type.
---
 .../Target/AMDGPU/AMDGPURegBankLegalize.cpp   |   7 ++
 .../AMDGPU/AMDGPURegBankLegalizeHelper.cpp| 107 +-
 .../AMDGPU/AMDGPURegBankLegalizeHelper.h  |   1 +
 .../AMDGPU/AMDGPURegBankLegalizeRules.cpp |  47 +++-
 .../AMDGPU/AMDGPURegBankLegalizeRules.h   |   3 +
 .../GlobalISel/regbankselect-and-s1.mir   | 105 +
 .../GlobalISel/regbankselect-anyext.mir   |  59 +-
 .../AMDGPU/GlobalISel/regbankselect-sext.mir  | 100 ++--
 .../AMDGPU/GlobalISel/regbankselect-trunc.mir |  22 +++-
 .../AMDGPU/GlobalISel/regbankselect-zext.mir  |  89 +--
 10 files changed, 357 insertions(+), 183 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp
index ad6a0772fe8b6..9544c9f43eeaf 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp
@@ -214,6 +214,13 @@ class AMDGPURegBankLegalizeCombiner {
   return;
 }
 
+if (DstTy == S64 && TruncSrcTy == S32) {
+  B.buildMergeLikeInstr(MI.getOperand(0).getReg(),
+{TruncSrc, B.buildUndef({SgprRB, S32})});
+  cleanUpAfterCombine(MI, Trunc);
+  return;
+}
+
 if (DstTy == S32 && TruncSrcTy == S16) {
   B.buildAnyExt(Dst, TruncSrc);
   cleanUpAfterCombine(MI, Trunc);
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
index 37cdf0c926b09..8e71c172cd20d 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
@@ -131,6 +131,40 @@ void RegBankLegalizeHelper::widenLoad(MachineInstr &MI, 
LLT WideTy,
   MI.eraseFromParent();
 }
 
+void RegBankLegalizeHelper::lowerVccExtToSel(MachineInstr &MI) {
+  Register Dst = MI.getOperand(0).getReg();
+  LLT Ty = MRI.getType(Dst);
+  Register Src = MI.getOperand(1).getReg();
+  unsigned Opc = MI.getOpcode();
+  if (Ty == S32 || Ty == S16) {
+auto True = B.buildConstant({VgprRB, Ty}, Opc == G_SEXT ? -1 : 1);
+auto False = B.buildConstant({VgprRB, Ty}, 0);
+B.buildSelect(Dst, Src, True, False);
+  }
+  if (Ty == S64) {
+auto True = B.buildConstant({VgprRB, S32}, Opc == G_SEXT ? -1 : 1);
+auto False = B.buildConstant({VgprRB, S32}, 0);
+auto Lo = B.buildSelect({VgprRB, S32}, Src, True, False);
+MachineInstrBuilder Hi;
+switch (Opc) {
+case G_SEXT:
+  Hi = Lo;
+  break;
+case G_ZEXT:
+  Hi = False;
+  break;
+case G_ANYEXT:
+  Hi = B.buildUndef({VgprRB_S32});
+  break;
+default:
+  llvm_unreachable("Opcode not supported");
+}
+
+B.buildMergeValues(Dst, {Lo.getReg(0), Hi.getReg(0)});
+  }
+  MI.eraseFromParent();
+}
+
 static bool isSignedBFE(MachineInstr &MI) {
   if (isa(MI)) {
 if (MI.getOperand(1).getIntrinsicID() == Intrinsic::amdgcn_sbfe)
@@ -259,26 +293,8 @@ void RegBankLegalizeHelper::lower(MachineInstr &MI,
   switch (Mapping.LoweringMethod) {
   case DoNotLower:
 return;
-  case VccExtToSel: {
-LLT Ty = MRI.getType(MI.getOperand(0).getReg());
-Register Src = MI.getOperand(1).getReg();
-unsigned Opc = MI.getOpcode();
-if (Ty == S32 || Ty == S16) {
-  auto True = B.buildConstant({VgprRB, Ty}, Opc == G_SEXT ? -1 : 1);
-  auto False = B.buildConstant({VgprRB, Ty}, 0);
-  B.buildSelect(MI.getOperand(0).getReg(), Src, True, False);
-}
-if (Ty == S64) {
-  auto True = B.buildConstant({VgprRB, S32}, Opc == G_SEXT ? -1 : 1);
-  auto False = B.buildConstant({VgprRB, S32}, 0);
-  auto Sel = B.buildSelect({VgprRB, S32}, Src, True, False);
-  B.buildMergeValues(
-  MI.getOperand(0).getReg(),
-  {Sel.getReg(0), Opc == G_SEXT ? Sel.getReg(0) : False.getReg(0)});
-}
-MI.eraseFromParent();
-return;
-  }
+  case VccExtToSel:
+return lowerVccExtToSel(MI);
   case UniExtToSel: {
 LLT Ty = MRI.getType(MI.getOperand(0).getReg());
 auto True = B.buildConstant({SgprRB, Ty},
@@ -295,13 +311,

[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: add RegBankLegalize rules for bit shifts and sext-inreg (PR #132385)

2025-04-24 Thread Petar Avramovic via llvm-branch-commits

https://github.com/petar-avramovic updated 
https://github.com/llvm/llvm-project/pull/132385

>From 585448f8cf5f77c26c63c5b1dc126bec85a5ff53 Mon Sep 17 00:00:00 2001
From: Petar Avramovic 
Date: Mon, 14 Apr 2025 16:35:19 +0200
Subject: [PATCH] AMDGPU/GlobalISel: add RegBankLegalize rules for bit shifts
 and sext-inreg

Uniform S16 shifts have to be extended to S32 using appropriate Extend
before lowering to S32 instruction.
Uniform packed V2S16 are lowered to SGPR S32 instructions,
other option is to use VALU packed V2S16 and ReadAnyLane.
For uniform S32 and S64 and divergent S16, S32, S64 and V2S16 there are
instructions available.
---
 .../Target/AMDGPU/AMDGPURegBankLegalize.cpp   |   2 +-
 .../AMDGPU/AMDGPURegBankLegalizeHelper.cpp| 107 ++
 .../AMDGPU/AMDGPURegBankLegalizeHelper.h  |   5 +
 .../AMDGPU/AMDGPURegBankLegalizeRules.cpp |  43 +++-
 .../AMDGPU/AMDGPURegBankLegalizeRules.h   |  11 ++
 llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll   |  10 +-
 llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll   | 187 +-
 .../AMDGPU/GlobalISel/regbankselect-ashr.mir  |   6 +-
 .../AMDGPU/GlobalISel/regbankselect-lshr.mir  |  17 +-
 .../GlobalISel/regbankselect-sext-inreg.mir   |  24 +--
 .../AMDGPU/GlobalISel/regbankselect-shl.mir   |   6 +-
 .../CodeGen/AMDGPU/GlobalISel/sext_inreg.ll   |  34 ++--
 llvm/test/CodeGen/AMDGPU/GlobalISel/shl.ll|  10 +-
 13 files changed, 311 insertions(+), 151 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp
index 9544c9f43eeaf..15584f16a0638 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp
@@ -310,7 +310,7 @@ bool 
AMDGPURegBankLegalize::runOnMachineFunction(MachineFunction &MF) {
 // Opcodes that support pretty much all combinations of reg banks and LLTs
 // (except S1). There is no point in writing rules for them.
 if (Opc == AMDGPU::G_BUILD_VECTOR || Opc == AMDGPU::G_UNMERGE_VALUES ||
-Opc == AMDGPU::G_MERGE_VALUES) {
+Opc == AMDGPU::G_MERGE_VALUES || Opc == G_BITCAST) {
   RBLHelper.applyMappingTrivial(*MI);
   continue;
 }
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
index 670ebc0474264..fa040684fc567 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
@@ -14,11 +14,13 @@
 #include "AMDGPURegBankLegalizeHelper.h"
 #include "AMDGPUGlobalISelUtils.h"
 #include "AMDGPUInstrInfo.h"
+#include "AMDGPURegBankLegalizeRules.h"
 #include "AMDGPURegisterBankInfo.h"
 #include "GCNSubtarget.h"
 #include "MCTargetDesc/AMDGPUMCTargetDesc.h"
 #include "llvm/CodeGen/GlobalISel/GenericMachineInstrs.h"
 #include "llvm/CodeGen/GlobalISel/MachineIRBuilder.h"
+#include "llvm/CodeGen/MachineInstr.h"
 #include "llvm/CodeGen/MachineUniformityAnalysis.h"
 #include "llvm/IR/IntrinsicsAMDGPU.h"
 #include "llvm/Support/AMDGPUAddrSpace.h"
@@ -166,6 +168,59 @@ void RegBankLegalizeHelper::lowerVccExtToSel(MachineInstr 
&MI) {
   MI.eraseFromParent();
 }
 
+std::pair RegBankLegalizeHelper::unpackZExt(Register Reg) {
+  auto PackedS32 = B.buildBitcast(SgprRB_S32, Reg);
+  auto Mask = B.buildConstant(SgprRB_S32, 0x);
+  auto Lo = B.buildAnd(SgprRB_S32, PackedS32, Mask);
+  auto Hi = B.buildLShr(SgprRB_S32, PackedS32, B.buildConstant(SgprRB_S32, 
16));
+  return {Lo.getReg(0), Hi.getReg(0)};
+}
+
+std::pair RegBankLegalizeHelper::unpackSExt(Register Reg) {
+  auto PackedS32 = B.buildBitcast(SgprRB_S32, Reg);
+  auto Lo = B.buildSExtInReg(SgprRB_S32, PackedS32, 16);
+  auto Hi = B.buildAShr(SgprRB_S32, PackedS32, B.buildConstant(SgprRB_S32, 
16));
+  return {Lo.getReg(0), Hi.getReg(0)};
+}
+
+std::pair RegBankLegalizeHelper::unpackAExt(Register Reg) {
+  auto PackedS32 = B.buildBitcast(SgprRB_S32, Reg);
+  auto Lo = PackedS32;
+  auto Hi = B.buildLShr(SgprRB_S32, PackedS32, B.buildConstant(SgprRB_S32, 
16));
+  return {Lo.getReg(0), Hi.getReg(0)};
+}
+
+void RegBankLegalizeHelper::lowerUnpack(MachineInstr &MI) {
+  Register Lo, Hi;
+  switch (MI.getOpcode()) {
+  case AMDGPU::G_SHL: {
+auto [Val0, Val1] = unpackAExt(MI.getOperand(1).getReg());
+auto [Amt0, Amt1] = unpackAExt(MI.getOperand(2).getReg());
+Lo = B.buildInstr(MI.getOpcode(), {SgprRB_S32}, {Val0, Amt0}).getReg(0);
+Hi = B.buildInstr(MI.getOpcode(), {SgprRB_S32}, {Val1, Amt1}).getReg(0);
+break;
+  }
+  case AMDGPU::G_LSHR: {
+auto [Val0, Val1] = unpackZExt(MI.getOperand(1).getReg());
+auto [Amt0, Amt1] = unpackZExt(MI.getOperand(2).getReg());
+Lo = B.buildInstr(MI.getOpcode(), {SgprRB_S32}, {Val0, Amt0}).getReg(0);
+Hi = B.buildInstr(MI.getOpcode(), {SgprRB_S32}, {Val1, Amt1}).getReg(0);
+break;
+  }
+  case AMDGPU::G_ASHR: {
+auto [Val0, Val1] = unpackSExt(MI.getOperand(1).getReg());
+auto [Amt0, Amt1] = unpackSExt(MI.get

[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: add RegBankLegalize rules for AND OR and XOR (PR #132382)

2025-04-24 Thread Petar Avramovic via llvm-branch-commits

https://github.com/petar-avramovic updated 
https://github.com/llvm/llvm-project/pull/132382

>From 5cc4f822aafd00eb4c88a76dadd07b716904ad97 Mon Sep 17 00:00:00 2001
From: Petar Avramovic 
Date: Mon, 14 Apr 2025 16:32:49 +0200
Subject: [PATCH] AMDGPU/GlobalISel: add RegBankLegalize rules for AND OR and
 XOR

Uniform S1 is lowered to S32.
Divergent S1 is selected as VCC(S1) instruction select will select
SALU instruction based on wavesize (S32 or S64).
S16 are selected as is. There are register classes for vgpr S16.
Since some isel patterns check for sgpr S16 we don't lower to S32.
For 32 and 64 bit types we use B32/B64 rules that cover scalar vector
and pointers types.
SALU B32 and B64 and VALU B32 instructions are available.
Divergent B64 is lowered to B32.
---
 .../AMDGPU/AMDGPURegBankLegalizeHelper.cpp| 31 ---
 .../AMDGPU/AMDGPURegBankLegalizeHelper.h  |  1 +
 .../AMDGPU/AMDGPURegBankLegalizeRules.cpp | 10 ++-
 .../AMDGPU/AMDGPURegBankLegalizeRules.h   |  2 +
 .../AMDGPU/GlobalISel/regbankselect-and.mir   | 33 ---
 .../AMDGPU/GlobalISel/regbankselect-or.mir| 85 +--
 .../AMDGPU/GlobalISel/regbankselect-xor.mir   | 84 +-
 7 files changed, 133 insertions(+), 113 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
index dffabe3932cc3..37cdf0c926b09 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
@@ -237,6 +237,21 @@ void RegBankLegalizeHelper::lowerS_BFE(MachineInstr &MI) {
   MI.eraseFromParent();
 }
 
+void RegBankLegalizeHelper::lowerSplitTo32(MachineInstr &MI) {
+  Register Dst = MI.getOperand(0).getReg();
+  LLT Ty = MRI.getType(Dst) == V4S16 ? V2S16 : S32;
+  auto Op1 = B.buildUnmerge({VgprRB, Ty}, MI.getOperand(1).getReg());
+  auto Op2 = B.buildUnmerge({VgprRB, Ty}, MI.getOperand(2).getReg());
+  unsigned Opc = MI.getOpcode();
+  auto Flags = MI.getFlags();
+  auto Lo =
+  B.buildInstr(Opc, {{VgprRB, Ty}}, {Op1.getReg(0), Op2.getReg(0)}, Flags);
+  auto Hi =
+  B.buildInstr(Opc, {{VgprRB, Ty}}, {Op1.getReg(1), Op2.getReg(1)}, Flags);
+  B.buildMergeLikeInstr(MI.getOperand(0).getReg(), {Lo, Hi});
+  MI.eraseFromParent();
+}
+
 void RegBankLegalizeHelper::lower(MachineInstr &MI,
   const RegBankLLTMapping &Mapping,
   SmallSet &WaterfallSgprs) {
@@ -325,20 +340,12 @@ void RegBankLegalizeHelper::lower(MachineInstr &MI,
 MI.eraseFromParent();
 return;
   }
-  case SplitTo32: {
-auto Op1 = B.buildUnmerge(VgprRB_S32, MI.getOperand(1).getReg());
-auto Op2 = B.buildUnmerge(VgprRB_S32, MI.getOperand(2).getReg());
-unsigned Opc = MI.getOpcode();
-auto Lo = B.buildInstr(Opc, {VgprRB_S32}, {Op1.getReg(0), Op2.getReg(0)});
-auto Hi = B.buildInstr(Opc, {VgprRB_S32}, {Op1.getReg(1), Op2.getReg(1)});
-B.buildMergeLikeInstr(MI.getOperand(0).getReg(), {Lo, Hi});
-MI.eraseFromParent();
-break;
-  }
   case V_BFE:
 return lowerV_BFE(MI);
   case S_BFE:
 return lowerS_BFE(MI);
+  case SplitTo32:
+return lowerSplitTo32(MI);
   case SplitLoad: {
 LLT DstTy = MRI.getType(MI.getOperand(0).getReg());
 unsigned Size = DstTy.getSizeInBits();
@@ -398,6 +405,7 @@ LLT 
RegBankLegalizeHelper::getTyFromID(RegBankLLTMappingApplyID ID) {
   case UniInVcc:
 return LLT::scalar(1);
   case Sgpr16:
+  case Vgpr16:
 return LLT::scalar(16);
   case Sgpr32:
   case Sgpr32Trunc:
@@ -517,6 +525,7 @@ 
RegBankLegalizeHelper::getRegBankFromID(RegBankLLTMappingApplyID ID) {
   case Sgpr32AExtBoolInReg:
   case Sgpr32SExt:
 return SgprRB;
+  case Vgpr16:
   case Vgpr32:
   case Vgpr64:
   case VgprP0:
@@ -560,6 +569,7 @@ void RegBankLegalizeHelper::applyMappingDst(
 case SgprP4:
 case SgprP5:
 case SgprV4S32:
+case Vgpr16:
 case Vgpr32:
 case Vgpr64:
 case VgprP0:
@@ -691,6 +701,7 @@ void RegBankLegalizeHelper::applyMappingSrc(
   break;
 }
 // vgpr scalars, pointers and vectors
+case Vgpr16:
 case Vgpr32:
 case Vgpr64:
 case VgprP0:
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.h 
b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.h
index 2d4da4cc90ea7..bbfa7b3986fd2 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.h
@@ -112,6 +112,7 @@ class RegBankLegalizeHelper {
 
   void lowerV_BFE(MachineInstr &MI);
   void lowerS_BFE(MachineInstr &MI);
+  void lowerSplitTo32(MachineInstr &MI);
 };
 
 } // end namespace AMDGPU
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
index d13748c0ef390..2d987e0647eba 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
@@ -106,6 +106,8 @@ bool matchUnifor

[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: add RegBankLegalize rules for bit shifts and sext-inreg (PR #132385)

2025-04-24 Thread Petar Avramovic via llvm-branch-commits

https://github.com/petar-avramovic updated 
https://github.com/llvm/llvm-project/pull/132385

>From 585448f8cf5f77c26c63c5b1dc126bec85a5ff53 Mon Sep 17 00:00:00 2001
From: Petar Avramovic 
Date: Mon, 14 Apr 2025 16:35:19 +0200
Subject: [PATCH] AMDGPU/GlobalISel: add RegBankLegalize rules for bit shifts
 and sext-inreg

Uniform S16 shifts have to be extended to S32 using appropriate Extend
before lowering to S32 instruction.
Uniform packed V2S16 are lowered to SGPR S32 instructions,
other option is to use VALU packed V2S16 and ReadAnyLane.
For uniform S32 and S64 and divergent S16, S32, S64 and V2S16 there are
instructions available.
---
 .../Target/AMDGPU/AMDGPURegBankLegalize.cpp   |   2 +-
 .../AMDGPU/AMDGPURegBankLegalizeHelper.cpp| 107 ++
 .../AMDGPU/AMDGPURegBankLegalizeHelper.h  |   5 +
 .../AMDGPU/AMDGPURegBankLegalizeRules.cpp |  43 +++-
 .../AMDGPU/AMDGPURegBankLegalizeRules.h   |  11 ++
 llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll   |  10 +-
 llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll   | 187 +-
 .../AMDGPU/GlobalISel/regbankselect-ashr.mir  |   6 +-
 .../AMDGPU/GlobalISel/regbankselect-lshr.mir  |  17 +-
 .../GlobalISel/regbankselect-sext-inreg.mir   |  24 +--
 .../AMDGPU/GlobalISel/regbankselect-shl.mir   |   6 +-
 .../CodeGen/AMDGPU/GlobalISel/sext_inreg.ll   |  34 ++--
 llvm/test/CodeGen/AMDGPU/GlobalISel/shl.ll|  10 +-
 13 files changed, 311 insertions(+), 151 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp
index 9544c9f43eeaf..15584f16a0638 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp
@@ -310,7 +310,7 @@ bool 
AMDGPURegBankLegalize::runOnMachineFunction(MachineFunction &MF) {
 // Opcodes that support pretty much all combinations of reg banks and LLTs
 // (except S1). There is no point in writing rules for them.
 if (Opc == AMDGPU::G_BUILD_VECTOR || Opc == AMDGPU::G_UNMERGE_VALUES ||
-Opc == AMDGPU::G_MERGE_VALUES) {
+Opc == AMDGPU::G_MERGE_VALUES || Opc == G_BITCAST) {
   RBLHelper.applyMappingTrivial(*MI);
   continue;
 }
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
index 670ebc0474264..fa040684fc567 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
@@ -14,11 +14,13 @@
 #include "AMDGPURegBankLegalizeHelper.h"
 #include "AMDGPUGlobalISelUtils.h"
 #include "AMDGPUInstrInfo.h"
+#include "AMDGPURegBankLegalizeRules.h"
 #include "AMDGPURegisterBankInfo.h"
 #include "GCNSubtarget.h"
 #include "MCTargetDesc/AMDGPUMCTargetDesc.h"
 #include "llvm/CodeGen/GlobalISel/GenericMachineInstrs.h"
 #include "llvm/CodeGen/GlobalISel/MachineIRBuilder.h"
+#include "llvm/CodeGen/MachineInstr.h"
 #include "llvm/CodeGen/MachineUniformityAnalysis.h"
 #include "llvm/IR/IntrinsicsAMDGPU.h"
 #include "llvm/Support/AMDGPUAddrSpace.h"
@@ -166,6 +168,59 @@ void RegBankLegalizeHelper::lowerVccExtToSel(MachineInstr 
&MI) {
   MI.eraseFromParent();
 }
 
+std::pair RegBankLegalizeHelper::unpackZExt(Register Reg) {
+  auto PackedS32 = B.buildBitcast(SgprRB_S32, Reg);
+  auto Mask = B.buildConstant(SgprRB_S32, 0x);
+  auto Lo = B.buildAnd(SgprRB_S32, PackedS32, Mask);
+  auto Hi = B.buildLShr(SgprRB_S32, PackedS32, B.buildConstant(SgprRB_S32, 
16));
+  return {Lo.getReg(0), Hi.getReg(0)};
+}
+
+std::pair RegBankLegalizeHelper::unpackSExt(Register Reg) {
+  auto PackedS32 = B.buildBitcast(SgprRB_S32, Reg);
+  auto Lo = B.buildSExtInReg(SgprRB_S32, PackedS32, 16);
+  auto Hi = B.buildAShr(SgprRB_S32, PackedS32, B.buildConstant(SgprRB_S32, 
16));
+  return {Lo.getReg(0), Hi.getReg(0)};
+}
+
+std::pair RegBankLegalizeHelper::unpackAExt(Register Reg) {
+  auto PackedS32 = B.buildBitcast(SgprRB_S32, Reg);
+  auto Lo = PackedS32;
+  auto Hi = B.buildLShr(SgprRB_S32, PackedS32, B.buildConstant(SgprRB_S32, 
16));
+  return {Lo.getReg(0), Hi.getReg(0)};
+}
+
+void RegBankLegalizeHelper::lowerUnpack(MachineInstr &MI) {
+  Register Lo, Hi;
+  switch (MI.getOpcode()) {
+  case AMDGPU::G_SHL: {
+auto [Val0, Val1] = unpackAExt(MI.getOperand(1).getReg());
+auto [Amt0, Amt1] = unpackAExt(MI.getOperand(2).getReg());
+Lo = B.buildInstr(MI.getOpcode(), {SgprRB_S32}, {Val0, Amt0}).getReg(0);
+Hi = B.buildInstr(MI.getOpcode(), {SgprRB_S32}, {Val1, Amt1}).getReg(0);
+break;
+  }
+  case AMDGPU::G_LSHR: {
+auto [Val0, Val1] = unpackZExt(MI.getOperand(1).getReg());
+auto [Amt0, Amt1] = unpackZExt(MI.getOperand(2).getReg());
+Lo = B.buildInstr(MI.getOpcode(), {SgprRB_S32}, {Val0, Amt0}).getReg(0);
+Hi = B.buildInstr(MI.getOpcode(), {SgprRB_S32}, {Val1, Amt1}).getReg(0);
+break;
+  }
+  case AMDGPU::G_ASHR: {
+auto [Val0, Val1] = unpackSExt(MI.getOperand(1).getReg());
+auto [Amt0, Amt1] = unpackSExt(MI.get

[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: add RegBankLegalize rules for AND OR and XOR (PR #132382)

2025-04-24 Thread Petar Avramovic via llvm-branch-commits

https://github.com/petar-avramovic updated 
https://github.com/llvm/llvm-project/pull/132382

>From 5cc4f822aafd00eb4c88a76dadd07b716904ad97 Mon Sep 17 00:00:00 2001
From: Petar Avramovic 
Date: Mon, 14 Apr 2025 16:32:49 +0200
Subject: [PATCH] AMDGPU/GlobalISel: add RegBankLegalize rules for AND OR and
 XOR

Uniform S1 is lowered to S32.
Divergent S1 is selected as VCC(S1) instruction select will select
SALU instruction based on wavesize (S32 or S64).
S16 are selected as is. There are register classes for vgpr S16.
Since some isel patterns check for sgpr S16 we don't lower to S32.
For 32 and 64 bit types we use B32/B64 rules that cover scalar vector
and pointers types.
SALU B32 and B64 and VALU B32 instructions are available.
Divergent B64 is lowered to B32.
---
 .../AMDGPU/AMDGPURegBankLegalizeHelper.cpp| 31 ---
 .../AMDGPU/AMDGPURegBankLegalizeHelper.h  |  1 +
 .../AMDGPU/AMDGPURegBankLegalizeRules.cpp | 10 ++-
 .../AMDGPU/AMDGPURegBankLegalizeRules.h   |  2 +
 .../AMDGPU/GlobalISel/regbankselect-and.mir   | 33 ---
 .../AMDGPU/GlobalISel/regbankselect-or.mir| 85 +--
 .../AMDGPU/GlobalISel/regbankselect-xor.mir   | 84 +-
 7 files changed, 133 insertions(+), 113 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
index dffabe3932cc3..37cdf0c926b09 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
@@ -237,6 +237,21 @@ void RegBankLegalizeHelper::lowerS_BFE(MachineInstr &MI) {
   MI.eraseFromParent();
 }
 
+void RegBankLegalizeHelper::lowerSplitTo32(MachineInstr &MI) {
+  Register Dst = MI.getOperand(0).getReg();
+  LLT Ty = MRI.getType(Dst) == V4S16 ? V2S16 : S32;
+  auto Op1 = B.buildUnmerge({VgprRB, Ty}, MI.getOperand(1).getReg());
+  auto Op2 = B.buildUnmerge({VgprRB, Ty}, MI.getOperand(2).getReg());
+  unsigned Opc = MI.getOpcode();
+  auto Flags = MI.getFlags();
+  auto Lo =
+  B.buildInstr(Opc, {{VgprRB, Ty}}, {Op1.getReg(0), Op2.getReg(0)}, Flags);
+  auto Hi =
+  B.buildInstr(Opc, {{VgprRB, Ty}}, {Op1.getReg(1), Op2.getReg(1)}, Flags);
+  B.buildMergeLikeInstr(MI.getOperand(0).getReg(), {Lo, Hi});
+  MI.eraseFromParent();
+}
+
 void RegBankLegalizeHelper::lower(MachineInstr &MI,
   const RegBankLLTMapping &Mapping,
   SmallSet &WaterfallSgprs) {
@@ -325,20 +340,12 @@ void RegBankLegalizeHelper::lower(MachineInstr &MI,
 MI.eraseFromParent();
 return;
   }
-  case SplitTo32: {
-auto Op1 = B.buildUnmerge(VgprRB_S32, MI.getOperand(1).getReg());
-auto Op2 = B.buildUnmerge(VgprRB_S32, MI.getOperand(2).getReg());
-unsigned Opc = MI.getOpcode();
-auto Lo = B.buildInstr(Opc, {VgprRB_S32}, {Op1.getReg(0), Op2.getReg(0)});
-auto Hi = B.buildInstr(Opc, {VgprRB_S32}, {Op1.getReg(1), Op2.getReg(1)});
-B.buildMergeLikeInstr(MI.getOperand(0).getReg(), {Lo, Hi});
-MI.eraseFromParent();
-break;
-  }
   case V_BFE:
 return lowerV_BFE(MI);
   case S_BFE:
 return lowerS_BFE(MI);
+  case SplitTo32:
+return lowerSplitTo32(MI);
   case SplitLoad: {
 LLT DstTy = MRI.getType(MI.getOperand(0).getReg());
 unsigned Size = DstTy.getSizeInBits();
@@ -398,6 +405,7 @@ LLT 
RegBankLegalizeHelper::getTyFromID(RegBankLLTMappingApplyID ID) {
   case UniInVcc:
 return LLT::scalar(1);
   case Sgpr16:
+  case Vgpr16:
 return LLT::scalar(16);
   case Sgpr32:
   case Sgpr32Trunc:
@@ -517,6 +525,7 @@ 
RegBankLegalizeHelper::getRegBankFromID(RegBankLLTMappingApplyID ID) {
   case Sgpr32AExtBoolInReg:
   case Sgpr32SExt:
 return SgprRB;
+  case Vgpr16:
   case Vgpr32:
   case Vgpr64:
   case VgprP0:
@@ -560,6 +569,7 @@ void RegBankLegalizeHelper::applyMappingDst(
 case SgprP4:
 case SgprP5:
 case SgprV4S32:
+case Vgpr16:
 case Vgpr32:
 case Vgpr64:
 case VgprP0:
@@ -691,6 +701,7 @@ void RegBankLegalizeHelper::applyMappingSrc(
   break;
 }
 // vgpr scalars, pointers and vectors
+case Vgpr16:
 case Vgpr32:
 case Vgpr64:
 case VgprP0:
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.h 
b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.h
index 2d4da4cc90ea7..bbfa7b3986fd2 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.h
@@ -112,6 +112,7 @@ class RegBankLegalizeHelper {
 
   void lowerV_BFE(MachineInstr &MI);
   void lowerS_BFE(MachineInstr &MI);
+  void lowerSplitTo32(MachineInstr &MI);
 };
 
 } // end namespace AMDGPU
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
index d13748c0ef390..2d987e0647eba 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
@@ -106,6 +106,8 @@ bool matchUnifor

[llvm-branch-commits] [llvm] release/20.x: [GlobalOpt] Do not promote malloc if there are atomic loads/stores (#137158) (PR #137179)

2025-04-24 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-llvm-transforms

Author: None (llvmbot)


Changes

Backport 57530c23a53b5e003d389437637f61c5b9814e22

Requested by: @nikic

---
Full diff: https://github.com/llvm/llvm-project/pull/137179.diff


2 Files Affected:

- (modified) llvm/lib/Transforms/IPO/GlobalOpt.cpp (+4) 
- (added) llvm/test/Transforms/GlobalOpt/malloc-promote-atomic.ll (+28) 


``diff
diff --git a/llvm/lib/Transforms/IPO/GlobalOpt.cpp 
b/llvm/lib/Transforms/IPO/GlobalOpt.cpp
index 9586fc97a39f7..236a531317678 100644
--- a/llvm/lib/Transforms/IPO/GlobalOpt.cpp
+++ b/llvm/lib/Transforms/IPO/GlobalOpt.cpp
@@ -719,10 +719,14 @@ static bool allUsesOfLoadedValueWillTrapIfNull(const 
GlobalVariable *GV) {
 const Value *P = Worklist.pop_back_val();
 for (const auto *U : P->users()) {
   if (auto *LI = dyn_cast(U)) {
+if (!LI->isSimple())
+  return false;
 SmallPtrSet PHIs;
 if (!AllUsesOfValueWillTrapIfNull(LI, PHIs))
   return false;
   } else if (auto *SI = dyn_cast(U)) {
+if (!SI->isSimple())
+  return false;
 // Ignore stores to the global.
 if (SI->getPointerOperand() != P)
   return false;
diff --git a/llvm/test/Transforms/GlobalOpt/malloc-promote-atomic.ll 
b/llvm/test/Transforms/GlobalOpt/malloc-promote-atomic.ll
new file mode 100644
index 0..0ecdf095efdd8
--- /dev/null
+++ b/llvm/test/Transforms/GlobalOpt/malloc-promote-atomic.ll
@@ -0,0 +1,28 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py 
UTC_ARGS: --version 5
+; RUN: opt -passes=globalopt -S < %s | FileCheck %s
+
+@g = internal global ptr null, align 8
+
+define void @init() {
+; CHECK-LABEL: define void @init() local_unnamed_addr {
+; CHECK-NEXT:[[ALLOC:%.*]] = call ptr @malloc(i64 48)
+; CHECK-NEXT:store atomic ptr [[ALLOC]], ptr @g seq_cst, align 8
+; CHECK-NEXT:ret void
+;
+  %alloc = call ptr @malloc(i64 48)
+  store atomic ptr %alloc, ptr @g seq_cst, align 8
+  ret void
+}
+
+define i1 @check() {
+; CHECK-LABEL: define i1 @check() local_unnamed_addr {
+; CHECK-NEXT:[[VAL:%.*]] = load atomic ptr, ptr @g seq_cst, align 8
+; CHECK-NEXT:[[CMP:%.*]] = icmp eq ptr [[VAL]], null
+; CHECK-NEXT:ret i1 [[CMP]]
+;
+  %val = load atomic ptr, ptr @g seq_cst, align 8
+  %cmp = icmp eq ptr %val, null
+  ret i1 %cmp
+}
+
+declare ptr @malloc(i64) allockind("alloc,uninitialized") allocsize(0)

``




https://github.com/llvm/llvm-project/pull/137179
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [libc] 2304226 - fix build again

2025-04-24 Thread Schrodinger ZHU Yifan via llvm-branch-commits

Author: Schrodinger ZHU Yifan
Date: 2025-04-24T13:15:42-04:00
New Revision: 230422621aeb97c4839ba133c75c33496aa5a75a

URL: 
https://github.com/llvm/llvm-project/commit/230422621aeb97c4839ba133c75c33496aa5a75a
DIFF: 
https://github.com/llvm/llvm-project/commit/230422621aeb97c4839ba133c75c33496aa5a75a.diff

LOG: fix build again

Added: 


Modified: 
libc/src/setjmp/CMakeLists.txt
libc/src/setjmp/x86_64/CMakeLists.txt

Removed: 




diff  --git a/libc/src/setjmp/CMakeLists.txt b/libc/src/setjmp/CMakeLists.txt
index 2591319f15240..239254fa57dc6 100644
--- a/libc/src/setjmp/CMakeLists.txt
+++ b/libc/src/setjmp/CMakeLists.txt
@@ -26,19 +26,21 @@ add_entrypoint_object(
 .${LIBC_TARGET_ARCHITECTURE}.longjmp
 )
 
-add_entrypoint_object(
-  siglongjmp
-  SRCS
-siglongjmp.cpp
-  HDRS
-siglongjmp.h
-  DEPENDS
-.longjmp
-)
+if (TARGET libc.src.setjmp.sigsetjmp_epilogue)
+  add_entrypoint_object(
+siglongjmp
+SRCS
+  siglongjmp.cpp
+HDRS
+  siglongjmp.h
+DEPENDS
+  .longjmp
+  )
 
-add_entrypoint_object(
-  sigsetjmp
-  ALIAS
-  DEPENDS
-.${LIBC_TARGET_ARCHITECTURE}.sigsetjmp
-)
+  add_entrypoint_object(
+sigsetjmp
+ALIAS
+DEPENDS
+  .${LIBC_TARGET_ARCHITECTURE}.sigsetjmp
+  )
+endif()

diff  --git a/libc/src/setjmp/x86_64/CMakeLists.txt 
b/libc/src/setjmp/x86_64/CMakeLists.txt
index 0090e81655662..03ed5fb647084 100644
--- a/libc/src/setjmp/x86_64/CMakeLists.txt
+++ b/libc/src/setjmp/x86_64/CMakeLists.txt
@@ -8,20 +8,21 @@ add_entrypoint_object(
 libc.hdr.offsetof_macros
 libc.hdr.types.jmp_buf
 )
-
-add_entrypoint_object(
-  sigsetjmp
-  SRCS
-sigsetjmp.cpp
-  HDRS
-../sigsetjmp.h
-  DEPENDS
-libc.hdr.types.jmp_buf
-libc.hdr.types.sigset_t
-libc.hdr.offsetof_macros
-libc.src.setjmp.sigsetjmp_epilogue
-libc.src.setjmp.setjmp
-)
+if (TARGET libc.src.setjmp.sigsetjmp_epilogue)
+  add_entrypoint_object(
+sigsetjmp
+SRCS
+  sigsetjmp.cpp
+HDRS
+  ../sigsetjmp.h
+DEPENDS
+  libc.hdr.types.jmp_buf
+  libc.hdr.types.sigset_t
+  libc.hdr.offsetof_macros
+  libc.src.setjmp.sigsetjmp_epilogue
+  libc.src.setjmp.setjmp
+  )
+endif()
 
 add_entrypoint_object(
   longjmp



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [Attributor] Use `getAssumedAddrSpace` to get address space for `AllocaInst` (PR #136865)

2025-04-24 Thread Johannes Doerfert via llvm-branch-commits

jdoerfert wrote:

FWIW, nvptx backend unfortunately works by "fixing stuff up" late. It 
shouldn't, but it does. I'd prefer to not to fix stuff up at all and maybe the 
best way is to have proper assertions in the creation of allocas/globals/... 
and/or the verifier. 

https://github.com/llvm/llvm-project/pull/136865
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [DirectX] Adding support for Root Descriptor in Obj2yaml/Yaml2Obj (PR #136732)

2025-04-24 Thread via llvm-branch-commits

https://github.com/joaosaffran updated 
https://github.com/llvm/llvm-project/pull/136732

>From 578faea764d630b8782ba53b5153fdbeda2c45f8 Mon Sep 17 00:00:00 2001
From: joaosaffran 
Date: Thu, 24 Apr 2025 19:33:05 +
Subject: [PATCH 1/8] addressing pr comments

---
 .../llvm/MC/DXContainerRootSignature.h|   2 +-
 llvm/lib/Target/DirectX/DXILRootSignature.cpp | 105 +++---
 .../RootSignature-MultipleEntryFunctions.ll   |   4 +-
 .../ContainerData/RootSignature-Parameters.ll |  24 ++--
 4 files changed, 57 insertions(+), 78 deletions(-)

diff --git a/llvm/include/llvm/MC/DXContainerRootSignature.h 
b/llvm/include/llvm/MC/DXContainerRootSignature.h
index 6d3329a2c6ce9..fee799249b255 100644
--- a/llvm/include/llvm/MC/DXContainerRootSignature.h
+++ b/llvm/include/llvm/MC/DXContainerRootSignature.h
@@ -25,7 +25,7 @@ struct RootSignatureDesc {
 
   uint32_t Version = 2U;
   uint32_t Flags = 0U;
-  uint32_t RootParameterOffset = 24U;
+  uint32_t RootParameterOffset = 0U;
   uint32_t StaticSamplersOffset = 0u;
   uint32_t NumStaticSamplers = 0u;
   SmallVector Parameters;
diff --git a/llvm/lib/Target/DirectX/DXILRootSignature.cpp 
b/llvm/lib/Target/DirectX/DXILRootSignature.cpp
index 5e615461df4f3..fe30793aa9853 100644
--- a/llvm/lib/Target/DirectX/DXILRootSignature.cpp
+++ b/llvm/lib/Target/DirectX/DXILRootSignature.cpp
@@ -40,21 +40,19 @@ static bool reportError(LLVMContext *Ctx, Twine Message,
   return true;
 }
 
-static bool reportValueError(LLVMContext *Ctx, Twine ParamName, uint32_t Value,
- DiagnosticSeverity Severity = DS_Error) {
+static bool reportValueError(LLVMContext *Ctx, Twine ParamName,
+ uint32_t Value) {
   Ctx->diagnose(DiagnosticInfoGeneric(
-  "Invalid value for " + ParamName + ": " + Twine(Value), Severity));
+  "Invalid value for " + ParamName + ": " + Twine(Value), DS_Error));
   return true;
 }
 
-static bool extractMdIntValue(uint32_t &Value, MDNode *Node,
-  unsigned int OpId) {
-  auto *CI = mdconst::dyn_extract(Node->getOperand(OpId).get());
-  if (CI == nullptr)
-return true;
-
-  Value = CI->getZExtValue();
-  return false;
+static std::optional extractMdIntValue(MDNode *Node,
+ unsigned int OpId) {
+  if (auto *CI =
+  mdconst::dyn_extract(Node->getOperand(OpId).get()))
+return CI->getZExtValue();
+  return std::nullopt;
 }
 
 static bool parseRootFlags(LLVMContext *Ctx, mcdxbc::RootSignatureDesc &RSD,
@@ -63,7 +61,9 @@ static bool parseRootFlags(LLVMContext *Ctx, 
mcdxbc::RootSignatureDesc &RSD,
   if (RootFlagNode->getNumOperands() != 2)
 return reportError(Ctx, "Invalid format for RootFlag Element");
 
-  if (extractMdIntValue(RSD.Flags, RootFlagNode, 1))
+  if (std::optional Val = extractMdIntValue(RootFlagNode, 1))
+RSD.Flags = *Val;
+  else
 return reportError(Ctx, "Invalid value for RootFlag");
 
   return false;
@@ -79,22 +79,24 @@ static bool parseRootConstants(LLVMContext *Ctx, 
mcdxbc::RootSignatureDesc &RSD,
   NewParameter.Header.ParameterType =
   llvm::to_underlying(dxbc::RootParameterType::Constants32Bit);
 
-  uint32_t SV;
-  if (extractMdIntValue(SV, RootConstantNode, 1))
+  if (std::optional Val = extractMdIntValue(RootConstantNode, 1))
+NewParameter.Header.ShaderVisibility = *Val;
+  else
 return reportError(Ctx, "Invalid value for ShaderVisibility");
 
-  NewParameter.Header.ShaderVisibility = SV;
-
-  if (extractMdIntValue(NewParameter.Constants.ShaderRegister, 
RootConstantNode,
-2))
+  if (std::optional Val = extractMdIntValue(RootConstantNode, 2))
+NewParameter.Constants.ShaderRegister = *Val;
+  else
 return reportError(Ctx, "Invalid value for ShaderRegister");
 
-  if (extractMdIntValue(NewParameter.Constants.RegisterSpace, RootConstantNode,
-3))
+  if (std::optional Val = extractMdIntValue(RootConstantNode, 3))
+NewParameter.Constants.RegisterSpace = *Val;
+  else
 return reportError(Ctx, "Invalid value for RegisterSpace");
 
-  if (extractMdIntValue(NewParameter.Constants.Num32BitValues, 
RootConstantNode,
-4))
+  if (std::optional Val = extractMdIntValue(RootConstantNode, 4))
+NewParameter.Constants.Num32BitValues = *Val;
+  else
 return reportError(Ctx, "Invalid value for Num32BitValues");
 
   RSD.Parameters.push_back(NewParameter);
@@ -148,32 +150,6 @@ static bool parse(LLVMContext *Ctx, 
mcdxbc::RootSignatureDesc &RSD,
 
 static bool verifyRootFlag(uint32_t Flags) { return (Flags & ~0xfff) == 0; }
 
-static bool verifyShaderVisibility(uint32_t Flags) {
-  switch (Flags) {
-
-  case llvm::to_underlying(dxbc::ShaderVisibility::All):
-  case llvm::to_underlying(dxbc::ShaderVisibility::Vertex):
-  case llvm::to_underlying(dxbc::ShaderVisibility::Hull):
-  case llvm::to_underlying(dxbc::ShaderVisibility::Domain):
-  case llvm::to_underlying(dxbc::ShaderVisibilit

[llvm-branch-commits] [BOLT][NFCI] Switch heatmap to using parsed basic/branch events (PR #136531)

2025-04-24 Thread Amir Ayupov via llvm-branch-commits

https://github.com/aaupov updated 
https://github.com/llvm/llvm-project/pull/136531


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [BOLT][NFCI] Switch heatmap to using parsed basic/branch events (PR #136531)

2025-04-24 Thread Amir Ayupov via llvm-branch-commits

https://github.com/aaupov updated 
https://github.com/llvm/llvm-project/pull/136531


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [NFC] Refactoring MCDXBC to support out of order store of root parameters (PR #137284)

2025-04-24 Thread via llvm-branch-commits

github-actions[bot] wrote:




:warning: C/C++ code formatter, clang-format found issues in your code. 
:warning:



You can test this locally with the following command:


``bash
git-clang-format --diff HEAD~1 HEAD --extensions h,cpp -- 
llvm/include/llvm/MC/DXContainerRootSignature.h 
llvm/include/llvm/ObjectYAML/DXContainerYAML.h 
llvm/lib/MC/DXContainerRootSignature.cpp 
llvm/lib/ObjectYAML/DXContainerEmitter.cpp 
llvm/lib/Target/DirectX/DXILRootSignature.cpp
``





View the diff from clang-format here.


``diff
diff --git a/llvm/include/llvm/ObjectYAML/DXContainerYAML.h 
b/llvm/include/llvm/ObjectYAML/DXContainerYAML.h
index e86a869da..c54c995ac 100644
--- a/llvm/include/llvm/ObjectYAML/DXContainerYAML.h
+++ b/llvm/include/llvm/ObjectYAML/DXContainerYAML.h
@@ -95,7 +95,7 @@ struct RootParameterYamlDesc {
   uint32_t Type;
   uint32_t Visibility;
   uint32_t Offset;
-  RootParameterYamlDesc(){};
+  RootParameterYamlDesc() {};
   RootParameterYamlDesc(uint32_t T) : Type(T) {
 switch (T) {
 

``




https://github.com/llvm/llvm-project/pull/137284
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [NFC] Refactoring MCDXBC to support out of order store of root parameters (PR #137284)

2025-04-24 Thread via llvm-branch-commits

https://github.com/joaosaffran created 
https://github.com/llvm/llvm-project/pull/137284

None

>From 7219ed4328aff2929f021c5efbd6901bc4bd2e20 Mon Sep 17 00:00:00 2001
From: joaosaffran 
Date: Fri, 25 Apr 2025 05:09:08 +
Subject: [PATCH] refactoring mcdxbc struct to store root parameters out of
 order

---
 .../llvm/MC/DXContainerRootSignature.h| 138 +-
 .../include/llvm/ObjectYAML/DXContainerYAML.h |   2 +-
 llvm/lib/MC/DXContainerRootSignature.cpp  |  78 +-
 llvm/lib/ObjectYAML/DXContainerEmitter.cpp|  36 ++---
 llvm/lib/Target/DirectX/DXILRootSignature.cpp |  45 +++---
 5 files changed, 208 insertions(+), 91 deletions(-)

diff --git a/llvm/include/llvm/MC/DXContainerRootSignature.h 
b/llvm/include/llvm/MC/DXContainerRootSignature.h
index 1f421d726bf38..e1f4abbcebf8f 100644
--- a/llvm/include/llvm/MC/DXContainerRootSignature.h
+++ b/llvm/include/llvm/MC/DXContainerRootSignature.h
@@ -6,22 +6,146 @@
 //
 
//===--===//
 
+#include "llvm/ADT/STLForwardCompat.h"
 #include "llvm/BinaryFormat/DXContainer.h"
+#include "llvm/Support/ErrorHandling.h"
+#include 
 #include 
-#include 
+#include 
 
 namespace llvm {
 
 class raw_ostream;
 namespace mcdxbc {
 
+struct RootParameterHeader : public dxbc::RootParameterHeader {
+
+  size_t Location;
+
+  RootParameterHeader() = default;
+
+  RootParameterHeader(dxbc::RootParameterHeader H, size_t L)
+  : dxbc::RootParameterHeader(H), Location(L) {}
+};
+
+using RootDescriptor = std::variant;
+using ParametersView =
+std::variant;
 struct RootParameter {
-  dxbc::RootParameterHeader Header;
-  union {
-dxbc::RootConstants Constants;
-dxbc::RST0::v0::RootDescriptor Descriptor_V10;
-dxbc::RST0::v1::RootDescriptor Descriptor_V11;
+  SmallVector Headers;
+
+  SmallVector Constants;
+  SmallVector Descriptors;
+
+  void addHeader(dxbc::RootParameterHeader H, size_t L) {
+Headers.push_back(RootParameterHeader(H, L));
+  }
+
+  void addParameter(dxbc::RootParameterHeader H, dxbc::RootConstants C) {
+addHeader(H, Constants.size());
+Constants.push_back(C);
+  }
+
+  void addParameter(dxbc::RootParameterHeader H,
+dxbc::RST0::v0::RootDescriptor D) {
+addHeader(H, Descriptors.size());
+Descriptors.push_back(D);
+  }
+
+  void addParameter(dxbc::RootParameterHeader H,
+dxbc::RST0::v1::RootDescriptor D) {
+addHeader(H, Descriptors.size());
+Descriptors.push_back(D);
+  }
+
+  ParametersView get(const RootParameterHeader &H) const {
+switch (H.ParameterType) {
+case llvm::to_underlying(dxbc::RootParameterType::Constants32Bit):
+  return Constants[H.Location];
+case llvm::to_underlying(dxbc::RootParameterType::CBV):
+case llvm::to_underlying(dxbc::RootParameterType::SRV):
+case llvm::to_underlying(dxbc::RootParameterType::UAV):
+  RootDescriptor VersionedParam = Descriptors[H.Location];
+  if (std::holds_alternative(
+  VersionedParam))
+return std::get(VersionedParam);
+  return std::get(VersionedParam);
+}
+
+llvm_unreachable("Unimplemented parameter type");
+  }
+
+  struct iterator {
+const RootParameter &Parameters;
+SmallVector::const_iterator Current;
+
+// Changed parameter type to match member variable (removed const)
+iterator(const RootParameter &P,
+ SmallVector::const_iterator C)
+: Parameters(P), Current(C) {}
+iterator(const iterator &) = default;
+
+ParametersView operator*() {
+  ParametersView Val;
+  switch (Current->ParameterType) {
+  case llvm::to_underlying(dxbc::RootParameterType::Constants32Bit):
+Val = Parameters.Constants[Current->Location];
+break;
+
+  case llvm::to_underlying(dxbc::RootParameterType::CBV):
+  case llvm::to_underlying(dxbc::RootParameterType::SRV):
+  case llvm::to_underlying(dxbc::RootParameterType::UAV):
+RootDescriptor VersionedParam =
+Parameters.Descriptors[Current->Location];
+if (std::holds_alternative(
+VersionedParam))
+  Val = std::get(VersionedParam);
+else
+  Val = std::get(VersionedParam);
+break;
+  }
+  return Val;
+}
+
+iterator operator++() {
+  Current++;
+  return *this;
+}
+
+iterator operator++(int) {
+  iterator Tmp = *this;
+  ++*this;
+  return Tmp;
+}
+
+iterator operator--() {
+  Current--;
+  return *this;
+}
+
+iterator operator--(int) {
+  iterator Tmp = *this;
+  --*this;
+  return Tmp;
+}
+
+bool operator==(const iterator I) { return I.Current == Current; }
+bool operator!=(const iterator I) { return !(*this == I); }
   };
+
+  iterator begin() const { return iterator(*this, Headers.begin()); }
+
+  iterator end() const { return iterator(*this, Headers.end()); }
+
+  size_t size() const { return Headers

[llvm-branch-commits] [llvm] [NFC] Refactoring MCDXBC to support out of order store of root parameters (PR #137284)

2025-04-24 Thread via llvm-branch-commits

https://github.com/joaosaffran updated 
https://github.com/llvm/llvm-project/pull/137284

>From 7219ed4328aff2929f021c5efbd6901bc4bd2e20 Mon Sep 17 00:00:00 2001
From: joaosaffran 
Date: Fri, 25 Apr 2025 05:09:08 +
Subject: [PATCH 1/2] refactoring mcdxbc struct to store root parameters out of
 order

---
 .../llvm/MC/DXContainerRootSignature.h| 138 +-
 .../include/llvm/ObjectYAML/DXContainerYAML.h |   2 +-
 llvm/lib/MC/DXContainerRootSignature.cpp  |  78 +-
 llvm/lib/ObjectYAML/DXContainerEmitter.cpp|  36 ++---
 llvm/lib/Target/DirectX/DXILRootSignature.cpp |  45 +++---
 5 files changed, 208 insertions(+), 91 deletions(-)

diff --git a/llvm/include/llvm/MC/DXContainerRootSignature.h 
b/llvm/include/llvm/MC/DXContainerRootSignature.h
index 1f421d726bf38..e1f4abbcebf8f 100644
--- a/llvm/include/llvm/MC/DXContainerRootSignature.h
+++ b/llvm/include/llvm/MC/DXContainerRootSignature.h
@@ -6,22 +6,146 @@
 //
 
//===--===//
 
+#include "llvm/ADT/STLForwardCompat.h"
 #include "llvm/BinaryFormat/DXContainer.h"
+#include "llvm/Support/ErrorHandling.h"
+#include 
 #include 
-#include 
+#include 
 
 namespace llvm {
 
 class raw_ostream;
 namespace mcdxbc {
 
+struct RootParameterHeader : public dxbc::RootParameterHeader {
+
+  size_t Location;
+
+  RootParameterHeader() = default;
+
+  RootParameterHeader(dxbc::RootParameterHeader H, size_t L)
+  : dxbc::RootParameterHeader(H), Location(L) {}
+};
+
+using RootDescriptor = std::variant;
+using ParametersView =
+std::variant;
 struct RootParameter {
-  dxbc::RootParameterHeader Header;
-  union {
-dxbc::RootConstants Constants;
-dxbc::RST0::v0::RootDescriptor Descriptor_V10;
-dxbc::RST0::v1::RootDescriptor Descriptor_V11;
+  SmallVector Headers;
+
+  SmallVector Constants;
+  SmallVector Descriptors;
+
+  void addHeader(dxbc::RootParameterHeader H, size_t L) {
+Headers.push_back(RootParameterHeader(H, L));
+  }
+
+  void addParameter(dxbc::RootParameterHeader H, dxbc::RootConstants C) {
+addHeader(H, Constants.size());
+Constants.push_back(C);
+  }
+
+  void addParameter(dxbc::RootParameterHeader H,
+dxbc::RST0::v0::RootDescriptor D) {
+addHeader(H, Descriptors.size());
+Descriptors.push_back(D);
+  }
+
+  void addParameter(dxbc::RootParameterHeader H,
+dxbc::RST0::v1::RootDescriptor D) {
+addHeader(H, Descriptors.size());
+Descriptors.push_back(D);
+  }
+
+  ParametersView get(const RootParameterHeader &H) const {
+switch (H.ParameterType) {
+case llvm::to_underlying(dxbc::RootParameterType::Constants32Bit):
+  return Constants[H.Location];
+case llvm::to_underlying(dxbc::RootParameterType::CBV):
+case llvm::to_underlying(dxbc::RootParameterType::SRV):
+case llvm::to_underlying(dxbc::RootParameterType::UAV):
+  RootDescriptor VersionedParam = Descriptors[H.Location];
+  if (std::holds_alternative(
+  VersionedParam))
+return std::get(VersionedParam);
+  return std::get(VersionedParam);
+}
+
+llvm_unreachable("Unimplemented parameter type");
+  }
+
+  struct iterator {
+const RootParameter &Parameters;
+SmallVector::const_iterator Current;
+
+// Changed parameter type to match member variable (removed const)
+iterator(const RootParameter &P,
+ SmallVector::const_iterator C)
+: Parameters(P), Current(C) {}
+iterator(const iterator &) = default;
+
+ParametersView operator*() {
+  ParametersView Val;
+  switch (Current->ParameterType) {
+  case llvm::to_underlying(dxbc::RootParameterType::Constants32Bit):
+Val = Parameters.Constants[Current->Location];
+break;
+
+  case llvm::to_underlying(dxbc::RootParameterType::CBV):
+  case llvm::to_underlying(dxbc::RootParameterType::SRV):
+  case llvm::to_underlying(dxbc::RootParameterType::UAV):
+RootDescriptor VersionedParam =
+Parameters.Descriptors[Current->Location];
+if (std::holds_alternative(
+VersionedParam))
+  Val = std::get(VersionedParam);
+else
+  Val = std::get(VersionedParam);
+break;
+  }
+  return Val;
+}
+
+iterator operator++() {
+  Current++;
+  return *this;
+}
+
+iterator operator++(int) {
+  iterator Tmp = *this;
+  ++*this;
+  return Tmp;
+}
+
+iterator operator--() {
+  Current--;
+  return *this;
+}
+
+iterator operator--(int) {
+  iterator Tmp = *this;
+  --*this;
+  return Tmp;
+}
+
+bool operator==(const iterator I) { return I.Current == Current; }
+bool operator!=(const iterator I) { return !(*this == I); }
   };
+
+  iterator begin() const { return iterator(*this, Headers.begin()); }
+
+  iterator end() const { return iterator(*this, Headers.end()); }
+
+  size_t size() const { return Headers.s

[llvm-branch-commits] [llvm] dd4b43f - Revert "[DebugInfo][DWARF] Emit DW_AT_abstract_origin for concrete/inlined DW…"

2025-04-24 Thread via llvm-branch-commits

Author: Vladislav Dzhidzhoev
Date: 2025-04-24T19:54:52+02:00
New Revision: dd4b43f1ffd419bbec1554ebb529038066d4eae5

URL: 
https://github.com/llvm/llvm-project/commit/dd4b43f1ffd419bbec1554ebb529038066d4eae5
DIFF: 
https://github.com/llvm/llvm-project/commit/dd4b43f1ffd419bbec1554ebb529038066d4eae5.diff

LOG: Revert "[DebugInfo][DWARF] Emit DW_AT_abstract_origin for concrete/inlined 
DW…"

This reverts commit 1143a04f349c4081a1a2d2503046f6ca422aa338.

Added: 


Modified: 
llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp
llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.h
llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
llvm/test/DebugInfo/Generic/inline-scopes.ll
llvm/test/DebugInfo/X86/lexical-block-file-inline.ll
llvm/test/DebugInfo/X86/missing-abstract-variable.ll

Removed: 
llvm/test/DebugInfo/Generic/lexical-block-abstract-origin.ll



diff  --git a/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp 
b/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp
index a20c374e08935..3939dae81841f 100644
--- a/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp
@@ -782,8 +782,6 @@ DIE 
*DwarfCompileUnit::constructLexicalScopeDIE(LexicalScope *Scope) {
 assert(!LexicalBlockDIEs.count(DS) &&
"Concrete out-of-line DIE for this scope exists!");
 LexicalBlockDIEs[DS] = ScopeDIE;
-  } else {
-InlinedLocalScopeDIEs[DS].push_back(ScopeDIE);
   }
 
   attachRangesOrLowHighPC(*ScopeDIE, Scope->getRanges());
@@ -1493,19 +1491,6 @@ void DwarfCompileUnit::finishEntityDefinition(const 
DbgEntity *Entity) {
 getDwarfDebug().addAccelName(*this, CUNode->getNameTableKind(), Name, 
*Die);
 }
 
-void DwarfCompileUnit::attachLexicalScopesAbstractOrigins() {
-  auto AttachAO = [&](const DILocalScope *LS, DIE *ScopeDIE) {
-if (auto *AbsLSDie = getAbstractScopeDIEs().lookup(LS))
-  addDIEEntry(*ScopeDIE, dwarf::DW_AT_abstract_origin, *AbsLSDie);
-  };
-
-  for (auto [LScope, ScopeDIE] : LexicalBlockDIEs)
-AttachAO(LScope, ScopeDIE);
-  for (auto &[LScope, ScopeDIEs] : InlinedLocalScopeDIEs)
-for (auto *ScopeDIE : ScopeDIEs)
-  AttachAO(LScope, ScopeDIE);
-}
-
 DbgEntity *DwarfCompileUnit::getExistingAbstractEntity(const DINode *Node) {
   auto &AbstractEntities = getAbstractEntities();
   auto I = AbstractEntities.find(Node);

diff  --git a/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.h 
b/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.h
index 09be22ce35e36..104039db03c7c 100644
--- a/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.h
+++ b/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.h
@@ -82,10 +82,6 @@ class DwarfCompileUnit final : public DwarfUnit {
   // List of abstract local scopes (either DISubprogram or DILexicalBlock).
   DenseMap AbstractLocalScopeDIEs;
 
-  // List of inlined lexical block scopes that belong to subprograms within 
this
-  // CU.
-  DenseMap> InlinedLocalScopeDIEs;
-
   DenseMap> AbstractEntities;
 
   /// DWO ID for correlating skeleton and split units.
@@ -303,7 +299,6 @@ class DwarfCompileUnit final : public DwarfUnit {
 
   void finishSubprogramDefinition(const DISubprogram *SP);
   void finishEntityDefinition(const DbgEntity *Entity);
-  void attachLexicalScopesAbstractOrigins();
 
   /// Find abstract variable associated with Var.
   using InlinedEntity = DbgValueHistoryMap::InlinedEntity;

diff  --git a/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp 
b/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
index 6c932651750ee..39f1299a24e81 100644
--- a/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
@@ -1262,7 +1262,6 @@ void DwarfDebug::finalizeModuleInfo() {
 auto &TheCU = *P.second;
 if (TheCU.getCUNode()->isDebugDirectivesOnly())
   continue;
-TheCU.attachLexicalScopesAbstractOrigins();
 // Emit DW_AT_containing_type attribute to connect types with their
 // vtable holding type.
 TheCU.constructContainingTypeDIEs();

diff  --git a/llvm/test/DebugInfo/Generic/inline-scopes.ll 
b/llvm/test/DebugInfo/Generic/inline-scopes.ll
index 45ecdd0594f64..8e7543eb16e69 100644
--- a/llvm/test/DebugInfo/Generic/inline-scopes.ll
+++ b/llvm/test/DebugInfo/Generic/inline-scopes.ll
@@ -20,29 +20,16 @@
 ; }
 
 ; Ensure that lexical_blocks within inlined_subroutines are preserved/emitted.
-; CHECK:  DW_TAG_subprogram
-; CHECK-NEXT: DW_AT_linkage_name ("_Z2f1v")
-; CHECK:  [[ADDR1:0x[0-9a-f]+]]: DW_TAG_lexical_block
-; CHECK:  DW_TAG_subprogram
-; CHECK-NEXT: DW_AT_linkage_name ("_Z2f2v")
-; CHECK:  [[ADDR2:0x[0-9a-f]+]]: DW_TAG_lexical_block
 ; CHECK: DW_TAG_inlined_subroutine
 ; CHECK-NOT: DW_TAG
 ; CHECK-NOT: NULL
-; CHECK:  DW_TAG_lexical_block
-; CHECK-NOT: {{DW_TAG|NULL}}
-; CHECK:  DW_AT_abstract_origin ([[ADDR1]]
+; CHECK: DW_TAG_lexical_block
 ; CHECK-NOT: DW_TAG
 ; CHECK-NOT: NULL
 ; CHECK: DW_TAG_variable
 ; Ensure that file changes don't 

[llvm-branch-commits] [mlir] [mlir][OpenMP] convert wsloop cancellation to LLVMIR (PR #137194)

2025-04-24 Thread Tom Eccles via llvm-branch-commits

tblah wrote:

PR Stack:
- Cancel parallel https://github.com/llvm/llvm-project/pull/137192
- Cancel sections https://github.com/llvm/llvm-project/pull/137193
- Cancel wsloop https://github.com/llvm/llvm-project/pull/137194
- Cancellation point (TODO)
- Cancel(lation point) taskgroup (TODO)

https://github.com/llvm/llvm-project/pull/137194
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: detect signing oracles (PR #134146)

2025-04-24 Thread Anatoly Trosinenko via llvm-branch-commits


@@ -339,6 +369,183 @@ class AArch64MCPlusBuilder : public MCPlusBuilder {
 }
   }
 
+  std::optional>
+  getAuthCheckedReg(BinaryBasicBlock &BB) const override {
+// Match several possible hard-coded sequences of instructions which can be
+// emitted by LLVM backend to check that the authenticated pointer is
+// correct (see AArch64AsmPrinter::emitPtrauthCheckAuthenticatedValue).
+//
+// This function only matches sequences involving branch instructions.
+// All these sequences have the form:
+//
+// (0) ... regular code that authenticates a pointer in Xn ...
+// (1) analyze Xn
+// (2) branch to .Lon_success if the pointer is correct
+// (3) BRK #imm (fall-through basic block)
+//
+// In the above pseudocode, (1) + (2) is one of the following sequences:
+//
+// - eor Xtmp, Xn, Xn, lsl #1
+//   tbz Xtmp, #62, .Lon_success
+//
+// - mov Xtmp, Xn
+//   xpac(i|d) Xn (or xpaclri if Xn is LR)
+//   cmp Xtmp, Xn
+//   b.eq .Lon_success
+//
+// Note that any branch destination operand is accepted as .Lon_success -
+// it is the responsibility of the caller of getAuthCheckedReg to inspect
+// the list of successors of this basic block as appropriate.
+
+// Any of the above code sequences assume the fall-through basic block
+// is a dead-end BRK instruction (any immediate operand is accepted).
+const BinaryBasicBlock *BreakBB = BB.getFallthrough();
+if (!BreakBB || BreakBB->empty() ||
+BreakBB->front().getOpcode() != AArch64::BRK)
+  return std::nullopt;
+
+// Iterate over the instructions of BB in reverse order, matching opcodes
+// and operands.
+MCPhysReg TestedReg = 0;
+MCPhysReg ScratchReg = 0;
+auto It = BB.end();
+auto StepAndGetOpcode = [&It, &BB]() -> int {
+  if (It == BB.begin())
+return -1;
+  --It;
+  return It->getOpcode();
+};
+
+switch (StepAndGetOpcode()) {
+default:
+  // Not matched the branch instruction.
+  return std::nullopt;
+case AArch64::Bcc:
+  // Bcc EQ, .Lon_success
+  if (It->getOperand(0).getImm() != AArch64CC::EQ)
+return std::nullopt;
+  // Not checking .Lon_success (see above).
+
+  // SUBSXrs XZR, TestedReg, ScratchReg, 0 (used by "CMP reg, reg" alias)
+  if (StepAndGetOpcode() != AArch64::SUBSXrs ||
+  It->getOperand(0).getReg() != AArch64::XZR ||
+  It->getOperand(3).getImm() != 0)
+return std::nullopt;
+  TestedReg = It->getOperand(1).getReg();
+  ScratchReg = It->getOperand(2).getReg();
+
+  // Either XPAC(I|D) ScratchReg, ScratchReg
+  // or XPACLRI
+  switch (StepAndGetOpcode()) {
+  default:
+return std::nullopt;
+  case AArch64::XPACLRI:
+// No operands to check, but using XPACLRI forces TestedReg to be X30.
+if (TestedReg != AArch64::LR)
+  return std::nullopt;
+break;
+  case AArch64::XPACI:
+  case AArch64::XPACD:
+if (It->getOperand(0).getReg() != ScratchReg ||
+It->getOperand(1).getReg() != ScratchReg)
+  return std::nullopt;
+break;
+  }
+
+  // ORRXrs ScratchReg, XZR, TestedReg, 0 (used by "MOV reg, reg" alias)
+  if (StepAndGetOpcode() != AArch64::ORRXrs)
+return std::nullopt;
+  if (It->getOperand(0).getReg() != ScratchReg ||
+  It->getOperand(1).getReg() != AArch64::XZR ||
+  It->getOperand(2).getReg() != TestedReg ||
+  It->getOperand(3).getImm() != 0)
+return std::nullopt;
+
+  return std::make_pair(TestedReg, &*It);
+
+case AArch64::TBZX:
+  // TBZX ScratchReg, 62, .Lon_success
+  ScratchReg = It->getOperand(0).getReg();
+  if (It->getOperand(1).getImm() != 62)
+return std::nullopt;
+  // Not checking .Lon_success (see above).
+
+  // EORXrs ScratchReg, TestedReg, TestedReg, 1
+  if (StepAndGetOpcode() != AArch64::EORXrs)
+return std::nullopt;
+  TestedReg = It->getOperand(1).getReg();
+  if (It->getOperand(0).getReg() != ScratchReg ||
+  It->getOperand(2).getReg() != TestedReg ||
+  It->getOperand(3).getImm() != 1)
+return std::nullopt;
+
+  return std::make_pair(TestedReg, &*It);
+}
+  }
+
+  MCPhysReg getAuthCheckedReg(const MCInst &Inst,
+  bool MayOverwrite) const override {
+// Cannot trivially reuse AArch64InstrInfo::getMemOperandWithOffsetWidth()
+// method as it accepts an instance of MachineInstr, not MCInst.
+const MCInstrDesc &Desc = Info->get(Inst.getOpcode());
+
+// If signing oracles are considered, the particular value left in the base
+// register after this instruction is important. This function checks that
+// if the base register was overwritten, it is due to address write-back.
+//
+// Note that this function is not needed for authentication oracles, as the
+  

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: detect untrusted LR before tail call (PR #137224)

2025-04-24 Thread Anatoly Trosinenko via llvm-branch-commits

https://github.com/atrosinenko created 
https://github.com/llvm/llvm-project/pull/137224

Implement the detection of tail calls performed with untrusted link
register, which violates the assumption made on entry to every function.

Unlike other pauth gadgets, this one involves some amount of guessing
which branch instructions should be checked as tail calls.

>From 5ccb98144a2625908c6abf3aab9fb6d0c226d80b Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Tue, 22 Apr 2025 21:43:14 +0300
Subject: [PATCH] [BOLT] Gadget scanner: detect untrusted LR before tail call

Implement the detection of tail calls performed with untrusted link
register, which violates the assumption made on entry to every function.

Unlike other pauth gadgets, this one involves some amount of guessing
which branch instructions should be checked as tail calls.
---
 bolt/lib/Passes/PAuthGadgetScanner.cpp| 112 +++-
 .../AArch64/gs-pacret-autiasp.s   |  31 +-
 .../AArch64/gs-pauth-debug-output.s   |  30 +-
 .../AArch64/gs-pauth-tail-calls.s | 597 ++
 4 files changed, 730 insertions(+), 40 deletions(-)
 create mode 100644 bolt/test/binary-analysis/AArch64/gs-pauth-tail-calls.s

diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp 
b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index 0ce9f51c44af4..d67f10a311396 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -655,8 +655,9 @@ class DataflowSrcSafetyAnalysis
 //
 // Then, a function can be split into a number of disjoint contiguous sequences
 // of instructions without labels in between. These sequences can be processed
-// the same way basic blocks are processed by data-flow analysis, assuming
-// pessimistically that all registers are unsafe at the start of each sequence.
+// the same way basic blocks are processed by data-flow analysis, with the same
+// pessimistic estimation of the initial state at the start of each sequence
+// (except the first instruction of the function).
 class CFGUnawareSrcSafetyAnalysis : public SrcSafetyAnalysis {
   BinaryFunction &BF;
   MCPlusBuilder::AllocatorIdTy AllocId;
@@ -667,6 +668,30 @@ class CFGUnawareSrcSafetyAnalysis : public 
SrcSafetyAnalysis {
   BC.MIB->removeAnnotation(I.second, StateAnnotationIndex);
   }
 
+  /// Compute a reasonably pessimistic estimation of the register state when
+  /// the previous instruction is not known for sure. Take the set of registers
+  /// which are trusted at function entry and remove all registers that can be
+  /// clobbered inside this function.
+  SrcState computePessimisticState(BinaryFunction &BF) {
+BitVector ClobberedRegs(NumRegs);
+for (auto &I : BF.instrs()) {
+  MCInst &Inst = I.second;
+  BC.MIB->getClobberedRegs(Inst, ClobberedRegs);
+
+  // If this is a call instruction, no register is safe anymore, unless
+  // it is a tail call. Ignore tail calls for the purpose of estimating the
+  // worst-case scenario, assuming no instructions are executed in the
+  // caller after this point anyway.
+  if (BC.MIB->isCall(Inst) && !BC.MIB->isTailCall(Inst))
+ClobberedRegs.set();
+}
+
+SrcState S = createEntryState();
+S.SafeToDerefRegs.reset(ClobberedRegs);
+S.TrustedRegs.reset(ClobberedRegs);
+return S;
+  }
+
 public:
   CFGUnawareSrcSafetyAnalysis(BinaryFunction &BF,
   MCPlusBuilder::AllocatorIdTy AllocId,
@@ -677,6 +702,7 @@ class CFGUnawareSrcSafetyAnalysis : public 
SrcSafetyAnalysis {
   }
 
   void run() override {
+const SrcState DefaultState = computePessimisticState(BF);
 SrcState S = createEntryState();
 for (auto &I : BF.instrs()) {
   MCInst &Inst = I.second;
@@ -691,7 +717,7 @@ class CFGUnawareSrcSafetyAnalysis : public 
SrcSafetyAnalysis {
 LLVM_DEBUG({
   traceInst(BC, "Due to label, resetting the state before", Inst);
 });
-S = createUnsafeState();
+S = DefaultState;
   }
 
   // Check if we need to remove an old annotation (this is the case if
@@ -1226,6 +1252,83 @@ shouldReportReturnGadget(const BinaryContext &BC, const 
MCInstReference &Inst,
   return make_report(RetKind, Inst, *RetReg);
 }
 
+/// While BOLT already marks some of the branch instructions as tail calls,
+/// this function tries to improve the coverage by including less obvious cases
+/// when it is possible to do without introducing too many false positives.
+static bool shouldAnalyzeTailCallInst(const BinaryContext &BC,
+  const BinaryFunction &BF,
+  const MCInstReference &Inst) {
+  // Some BC.MIB->isXYZ(Inst) methods simply delegate to MCInstrDesc::isXYZ()
+  // (such as isBranch at the time of writing this comment), some don't (such
+  // as isCall). For that reason, call MCInstrDesc's methods explicitly when
+  // it is important.
+  const MCInstrDesc &Desc =
+  BC.MII->get(stati

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: do not crash on debug-printing CFI instructions (PR #136151)

2025-04-24 Thread Anatoly Trosinenko via llvm-branch-commits

https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/136151

>From e22ae5ebfe066cec1335f2b4080b013560c5e844 Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Tue, 15 Apr 2025 21:47:18 +0300
Subject: [PATCH] [BOLT] Gadget scanner: do not crash on debug-printing CFI
 instructions

Some instruction-printing code used under LLVM_DEBUG does not handle CFI
instructions well. While CFI instructions seem to be harmless for the
correctness of the analysis results, they do not convey any useful
information to the analysis either, so skip them early.
---
 bolt/lib/Passes/PAuthGadgetScanner.cpp| 16 ++
 .../AArch64/gs-pauth-debug-output.s   | 32 +++
 2 files changed, 48 insertions(+)

diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp 
b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index 849272cac73d2..f7ac0b67d00da 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -431,6 +431,9 @@ class SrcSafetyAnalysis {
   }
 
   SrcState computeNext(const MCInst &Point, const SrcState &Cur) {
+if (BC.MIB->isCFI(Point))
+  return Cur;
+
 SrcStatePrinter P(BC);
 LLVM_DEBUG({
   dbgs() << "  SrcSafetyAnalysis::ComputeNext(";
@@ -670,6 +673,8 @@ class CFGUnawareSrcSafetyAnalysis : public 
SrcSafetyAnalysis {
 SrcState S = createEntryState();
 for (auto &I : BF.instrs()) {
   MCInst &Inst = I.second;
+  if (BC.MIB->isCFI(Inst))
+continue;
 
   // If there is a label before this instruction, it is possible that it
   // can be jumped-to, thus conservatively resetting S. As an exception,
@@ -947,6 +952,9 @@ class DstSafetyAnalysis {
   }
 
   DstState computeNext(const MCInst &Point, const DstState &Cur) {
+if (BC.MIB->isCFI(Point))
+  return Cur;
+
 DstStatePrinter P(BC);
 LLVM_DEBUG({
   dbgs() << "  DstSafetyAnalysis::ComputeNext(";
@@ -1123,6 +1131,8 @@ class CFGUnawareDstSafetyAnalysis : public 
DstSafetyAnalysis {
 DstState S = createUnsafeState();
 for (auto &I : llvm::reverse(BF.instrs())) {
   MCInst &Inst = I.second;
+  if (BC.MIB->isCFI(Inst))
+continue;
 
   // If Inst can change the control flow, we cannot be sure that the next
   // instruction (to be executed in analyzed program) is the one processed
@@ -1319,6 +1329,9 @@ void FunctionAnalysis::findUnsafeUses(
   });
 
   iterateOverInstrs(BF, [&](MCInstReference Inst) {
+if (BC.MIB->isCFI(Inst))
+  return;
+
 const SrcState &S = Analysis->getStateBefore(Inst);
 
 // If non-empty state was never propagated from the entry basic block
@@ -1382,6 +1395,9 @@ void FunctionAnalysis::findUnsafeDefs(
   });
 
   iterateOverInstrs(BF, [&](MCInstReference Inst) {
+if (BC.MIB->isCFI(Inst))
+  return;
+
 const DstState &S = Analysis->getStateAfter(Inst);
 
 if (auto Report = shouldReportAuthOracle(BC, Inst, S))
diff --git a/bolt/test/binary-analysis/AArch64/gs-pauth-debug-output.s 
b/bolt/test/binary-analysis/AArch64/gs-pauth-debug-output.s
index fd55880921d06..07b61bea77e94 100644
--- a/bolt/test/binary-analysis/AArch64/gs-pauth-debug-output.s
+++ b/bolt/test/binary-analysis/AArch64/gs-pauth-debug-output.s
@@ -329,6 +329,38 @@ auth_oracle:
 // PAUTH-EMPTY:
 // PAUTH-NEXT:   Attaching leakage info to: :  autia   x0, x1 
# DataflowDstSafetyAnalysis: dst-state
 
+// Gadget scanner should not crash on CFI instructions, including when 
debug-printing them.
+// Note that the particular debug output is not checked, but BOLT should be
+// compiled with assertions enabled to support -debug-only argument.
+
+.globl  cfi_inst_df
+.type   cfi_inst_df,@function
+cfi_inst_df:
+.cfi_startproc
+sub sp, sp, #16
+.cfi_def_cfa_offset 16
+add sp, sp, #16
+.cfi_def_cfa_offset 0
+ret
+.size   cfi_inst_df, .-cfi_inst_df
+.cfi_endproc
+
+.globl  cfi_inst_nocfg
+.type   cfi_inst_nocfg,@function
+cfi_inst_nocfg:
+.cfi_startproc
+sub sp, sp, #16
+.cfi_def_cfa_offset 16
+
+adr x0, 1f
+br  x0
+1:
+add sp, sp, #16
+.cfi_def_cfa_offset 0
+ret
+.size   cfi_inst_nocfg, .-cfi_inst_nocfg
+.cfi_endproc
+
 // CHECK-LABEL:Analyzing function main, AllocatorId = 1
 .globl  main
 .type   main,@function

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: detect signing oracles (PR #134146)

2025-04-24 Thread Anatoly Trosinenko via llvm-branch-commits

https://github.com/atrosinenko edited 
https://github.com/llvm/llvm-project/pull/134146
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: detect signing oracles (PR #134146)

2025-04-24 Thread Anatoly Trosinenko via llvm-branch-commits


@@ -339,6 +369,198 @@ class AArch64MCPlusBuilder : public MCPlusBuilder {
 }
   }
 
+  std::optional>
+  getAuthCheckedReg(BinaryBasicBlock &BB) const override {
+// Match several possible hard-coded sequences of instructions which can be
+// emitted by LLVM backend to check that the authenticated pointer is
+// correct (see AArch64AsmPrinter::emitPtrauthCheckAuthenticatedValue).
+//
+// This function only matches sequences involving branch instructions.
+// All these sequences have the form:
+//
+// (0) ... regular code that authenticates a pointer in Xn ...
+// (1) analyze Xn
+// (2) branch to .Lon_success if the pointer is correct
+// (3) BRK #imm (fall-through basic block)
+//
+// In the above pseudocode, (1) + (2) is one of the following sequences:
+//
+// - eor Xtmp, Xn, Xn, lsl #1
+//   tbz Xtmp, #62, .Lon_success
+//
+// - mov Xtmp, Xn
+//   xpac(i|d) Xn (or xpaclri if Xn is LR)
+//   cmp Xtmp, Xn
+//   b.eq .Lon_success
+//
+// Note that any branch destination operand is accepted as .Lon_success -
+// it is the responsibility of the caller of getAuthCheckedReg to inspect
+// the list of successors of this basic block as appropriate.
+
+// Any of the above code sequences assume the fall-through basic block
+// is a dead-end BRK instruction (any immediate operand is accepted).
+const BinaryBasicBlock *BreakBB = BB.getFallthrough();
+if (!BreakBB || BreakBB->empty() ||
+BreakBB->front().getOpcode() != AArch64::BRK)
+  return std::nullopt;
+
+// Iterate over the instructions of BB in reverse order, matching opcodes
+// and operands.
+MCPhysReg TestedReg = 0;
+MCPhysReg ScratchReg = 0;
+auto It = BB.end();
+auto StepAndGetOpcode = [&It, &BB]() -> int {
+  if (It == BB.begin())
+return -1;
+  --It;
+  return It->getOpcode();
+};
+
+switch (StepAndGetOpcode()) {
+default:
+  // Not matched the branch instruction.
+  return std::nullopt;
+case AArch64::Bcc:
+  // Bcc EQ, .Lon_success
+  if (It->getOperand(0).getImm() != AArch64CC::EQ)
+return std::nullopt;
+  // Not checking .Lon_success (see above).
+
+  // SUBSXrs XZR, TestedReg, ScratchReg, 0 (used by "CMP reg, reg" alias)
+  if (StepAndGetOpcode() != AArch64::SUBSXrs ||
+  It->getOperand(0).getReg() != AArch64::XZR ||
+  It->getOperand(3).getImm() != 0)
+return std::nullopt;
+  TestedReg = It->getOperand(1).getReg();
+  ScratchReg = It->getOperand(2).getReg();
+
+  // Either XPAC(I|D) ScratchReg, ScratchReg
+  // or XPACLRI
+  switch (StepAndGetOpcode()) {
+  default:
+return std::nullopt;
+  case AArch64::XPACLRI:
+// No operands to check, but using XPACLRI forces TestedReg to be X30.
+if (TestedReg != AArch64::LR)
+  return std::nullopt;
+break;
+  case AArch64::XPACI:
+  case AArch64::XPACD:
+if (It->getOperand(0).getReg() != ScratchReg ||
+It->getOperand(1).getReg() != ScratchReg)
+  return std::nullopt;
+break;
+  }
+
+  // ORRXrs ScratchReg, XZR, TestedReg, 0 (used by "MOV reg, reg" alias)
+  if (StepAndGetOpcode() != AArch64::ORRXrs)
+return std::nullopt;
+  if (It->getOperand(0).getReg() != ScratchReg ||
+  It->getOperand(1).getReg() != AArch64::XZR ||
+  It->getOperand(2).getReg() != TestedReg ||
+  It->getOperand(3).getImm() != 0)
+return std::nullopt;
+
+  return std::make_pair(TestedReg, &*It);
+
+case AArch64::TBZX:
+  // TBZX ScratchReg, 62, .Lon_success
+  ScratchReg = It->getOperand(0).getReg();
+  if (It->getOperand(1).getImm() != 62)
+return std::nullopt;
+  // Not checking .Lon_success (see above).
+
+  // EORXrs ScratchReg, TestedReg, TestedReg, 1
+  if (StepAndGetOpcode() != AArch64::EORXrs)
+return std::nullopt;
+  TestedReg = It->getOperand(1).getReg();
+  if (It->getOperand(0).getReg() != ScratchReg ||
+  It->getOperand(2).getReg() != TestedReg ||
+  It->getOperand(3).getImm() != 1)
+return std::nullopt;
+
+  return std::make_pair(TestedReg, &*It);
+}
+  }
+
+  MCPhysReg getAuthCheckedReg(const MCInst &Inst,
+  bool MayOverwrite) const override {
+// Cannot trivially reuse AArch64InstrInfo::getMemOperandWithOffsetWidth()
+// method as it accepts an instance of MachineInstr, not MCInst.
+const MCInstrDesc &Desc = Info->get(Inst.getOpcode());
+
+// If signing oracles are considered, the particular value left in the base
+// register after this instruction is important. This function checks that
+// if the base register was overwritten, it is due to address write-back:
+//
+// ; good:
+// autdza  x1   ; x1 is authenticated (may fail

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: detect untrusted LR before tail call (PR #137224)

2025-04-24 Thread Anatoly Trosinenko via llvm-branch-commits

atrosinenko wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/137224?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#137224** https://app.graphite.dev/github/pr/llvm/llvm-project/137224?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/137224?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#136183** https://app.graphite.dev/github/pr/llvm/llvm-project/136183?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#136151** https://app.graphite.dev/github/pr/llvm/llvm-project/136151?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#135663** https://app.graphite.dev/github/pr/llvm/llvm-project/135663?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#136147** https://app.graphite.dev/github/pr/llvm/llvm-project/136147?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#135662** https://app.graphite.dev/github/pr/llvm/llvm-project/135662?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#135661** https://app.graphite.dev/github/pr/llvm/llvm-project/135661?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#134146** https://app.graphite.dev/github/pr/llvm/llvm-project/134146?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#133461** https://app.graphite.dev/github/pr/llvm/llvm-project/133461?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#135073** https://app.graphite.dev/github/pr/llvm/llvm-project/135073?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/137224
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: detect signing oracles (PR #134146)

2025-04-24 Thread Anatoly Trosinenko via llvm-branch-commits

https://github.com/atrosinenko commented:

@kbeyls Thank you for the comments! I have updated the comments accordingly.

https://github.com/llvm/llvm-project/pull/134146
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: detect signing oracles (PR #134146)

2025-04-24 Thread Anatoly Trosinenko via llvm-branch-commits

https://github.com/atrosinenko edited 
https://github.com/llvm/llvm-project/pull/134146
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: detect signing oracles (PR #134146)

2025-04-24 Thread Anatoly Trosinenko via llvm-branch-commits

https://github.com/atrosinenko edited 
https://github.com/llvm/llvm-project/pull/134146
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: detect signing oracles (PR #134146)

2025-04-24 Thread Anatoly Trosinenko via llvm-branch-commits

https://github.com/atrosinenko edited 
https://github.com/llvm/llvm-project/pull/134146
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: clarify MCPlusBuilder callbacks interface (PR #136147)

2025-04-24 Thread Anatoly Trosinenko via llvm-branch-commits

https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/136147

>From 4a7d095c971e9ae8a3957847eebc56cae8ed6cad Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Thu, 17 Apr 2025 15:40:05 +0300
Subject: [PATCH] [BOLT] Gadget scanner: clarify MCPlusBuilder callbacks
 interface

Clarify the semantics of `getAuthenticatedReg` and remove a redundant
`isAuthenticationOfReg` method, as combined auth+something instructions
(such as `retaa` on AArch64) should be handled carefully, especially
when searching for authentication oracles: usually, such instructions
cannot be authentication oracles and only some of them actually write
an authenticated pointer to a register (such as "ldra x0, [x1]!").

Use `std::optional` returned type instead of plain MCPhysReg
and returning `getNoRegister()` as a "not applicable" indication.

Document a few existing methods, add information about preconditions.
---
 bolt/include/bolt/Core/MCPlusBuilder.h| 61 ++-
 bolt/lib/Passes/PAuthGadgetScanner.cpp| 64 +---
 .../Target/AArch64/AArch64MCPlusBuilder.cpp   | 76 ---
 .../AArch64/gs-pauth-debug-output.s   |  3 -
 .../AArch64/gs-pauth-signing-oracles.s| 20 +
 5 files changed, 130 insertions(+), 94 deletions(-)

diff --git a/bolt/include/bolt/Core/MCPlusBuilder.h 
b/bolt/include/bolt/Core/MCPlusBuilder.h
index 132d58f3f9f79..83ad70ea97076 100644
--- a/bolt/include/bolt/Core/MCPlusBuilder.h
+++ b/bolt/include/bolt/Core/MCPlusBuilder.h
@@ -562,30 +562,50 @@ class MCPlusBuilder {
 return {};
   }
 
-  virtual ErrorOr getAuthenticatedReg(const MCInst &Inst) const {
-llvm_unreachable("not implemented");
-return getNoRegister();
-  }
-
-  virtual bool isAuthenticationOfReg(const MCInst &Inst,
- MCPhysReg AuthenticatedReg) const {
+  /// Returns the register where an authenticated pointer is written to by 
Inst,
+  /// or std::nullopt if not authenticating any register.
+  ///
+  /// Sets IsChecked if the instruction always checks authenticated pointer,
+  /// i.e. it either returns a successfully authenticated pointer or terminates
+  /// the program abnormally (such as "ldra x0, [x1]!" on AArch64, which 
crashes
+  /// on authentication failure even if FEAT_FPAC is not implemented).
+  virtual std::optional
+  getWrittenAuthenticatedReg(const MCInst &Inst, bool &IsChecked) const {
 llvm_unreachable("not implemented");
-return false;
+return std::nullopt;
   }
 
-  virtual MCPhysReg getSignedReg(const MCInst &Inst) const {
+  /// Returns the register signed by Inst, or std::nullopt if not signing any
+  /// register.
+  ///
+  /// The returned register is assumed to be both input and output operand,
+  /// as it is done on AArch64.
+  virtual std::optional getSignedReg(const MCInst &Inst) const {
 llvm_unreachable("not implemented");
-return getNoRegister();
+return std::nullopt;
   }
 
-  virtual ErrorOr getRegUsedAsRetDest(const MCInst &Inst) const {
+  /// Returns the register used as a return address. Returns std::nullopt if
+  /// not applicable, such as reading the return address from a system register
+  /// or from the stack.
+  ///
+  /// Sets IsAuthenticatedInternally if the instruction accepts a signed
+  /// pointer as its operand and authenticates it internally.
+  ///
+  /// Should only be called when isReturn(Inst) is true.
+  virtual std::optional
+  getRegUsedAsRetDest(const MCInst &Inst,
+  bool &IsAuthenticatedInternally) const {
 llvm_unreachable("not implemented");
-return getNoRegister();
+return std::nullopt;
   }
 
   /// Returns the register used as the destination of an indirect branch or 
call
   /// instruction. Sets IsAuthenticatedInternally if the instruction accepts
   /// a signed pointer as its operand and authenticates it internally.
+  ///
+  /// Should only be called if isIndirectCall(Inst) or isIndirectBranch(Inst)
+  /// returns true.
   virtual MCPhysReg
   getRegUsedAsIndirectBranchDest(const MCInst &Inst,
  bool &IsAuthenticatedInternally) const {
@@ -602,14 +622,14 @@ class MCPlusBuilder {
   ///controlled, under the Pointer Authentication threat model.
   ///
   /// If the instruction does not write to any register satisfying the above
-  /// two conditions, NoRegister is returned.
+  /// two conditions, std::nullopt is returned.
   ///
   /// The Pointer Authentication threat model assumes an attacker is able to
   /// modify any writable memory, but not executable code (due to W^X).
-  virtual MCPhysReg
+  virtual std::optional
   getMaterializedAddressRegForPtrAuth(const MCInst &Inst) const {
 llvm_unreachable("not implemented");
-return getNoRegister();
+return std::nullopt;
   }
 
   /// Analyzes if this instruction can safely perform address arithmetics
@@ -622,10 +642,13 @@ class MCPlusBuilder {
   /// controlled, provided InReg and executa

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: detect authentication oracles (PR #135663)

2025-04-24 Thread Anatoly Trosinenko via llvm-branch-commits

https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/135663

>From cdaa1549c3a4804bc5a3788f1fc0c2bfc910168e Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Sat, 5 Apr 2025 14:54:01 +0300
Subject: [PATCH 1/3] [BOLT] Gadget scanner: detect authentication oracles

Implement the detection of authentication instructions whose results can
be inspected by an attacker to know whether authentication succeeded.

As the properties of output registers of authentication instructions are
inspected, add a second set of analysis-related classes to iterate over
the instructions in reverse order.
---
 bolt/include/bolt/Passes/PAuthGadgetScanner.h |  12 +
 bolt/lib/Passes/PAuthGadgetScanner.cpp| 542 +
 .../AArch64/gs-pauth-authentication-oracles.s | 723 ++
 .../AArch64/gs-pauth-debug-output.s   |  78 ++
 4 files changed, 1355 insertions(+)
 create mode 100644 
bolt/test/binary-analysis/AArch64/gs-pauth-authentication-oracles.s

diff --git a/bolt/include/bolt/Passes/PAuthGadgetScanner.h 
b/bolt/include/bolt/Passes/PAuthGadgetScanner.h
index 9a4e9598c9f98..7c51ba42c3294 100644
--- a/bolt/include/bolt/Passes/PAuthGadgetScanner.h
+++ b/bolt/include/bolt/Passes/PAuthGadgetScanner.h
@@ -260,6 +260,15 @@ class ClobberingInfo : public ExtraInfo {
   void print(raw_ostream &OS, const MCInstReference Location) const override;
 };
 
+class LeakageInfo : public ExtraInfo {
+  SmallVector LeakingInstrs;
+
+public:
+  LeakageInfo(const ArrayRef Instrs) : LeakingInstrs(Instrs) 
{}
+
+  void print(raw_ostream &OS, const MCInstReference Location) const override;
+};
+
 /// A brief version of a report that can be further augmented with the details.
 ///
 /// It is common for a particular type of gadget detector to be tied to some
@@ -301,6 +310,9 @@ class FunctionAnalysis {
   void findUnsafeUses(SmallVector> &Reports);
   void augmentUnsafeUseReports(ArrayRef> Reports);
 
+  void findUnsafeDefs(SmallVector> &Reports);
+  void augmentUnsafeDefReports(const ArrayRef> Reports);
+
   /// Process the reports which do not have to be augmented, and remove them
   /// from Reports.
   void handleSimpleReports(SmallVector> &Reports);
diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp 
b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index 0bd9a1ed5941c..fff52de8b9c73 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -712,6 +712,459 @@ SrcSafetyAnalysis::create(BinaryFunction &BF,
RegsToTrackInstsFor);
 }
 
+/// A state representing which registers are safe to be used as the destination
+/// operand of an authentication instruction.
+///
+/// Similar to SrcState, it is the analysis that should take register aliasing
+/// into account.
+///
+/// Depending on the implementation, it may be possible that an authentication
+/// instruction returns an invalid pointer on failure instead of terminating
+/// the program immediately (assuming the program will crash as soon as that
+/// pointer is dereferenced). To prevent brute-forcing the correct signature,
+/// it should be impossible for an attacker to test if a pointer is correctly
+/// signed - either the program should be terminated on authentication failure
+/// or it should be impossible to tell whether authentication succeeded or not.
+///
+/// For that reason, a restricted set of operations is allowed on any register
+/// containing a value derived from the result of an authentication instruction
+/// until that register is either wiped or checked not to contain a result of a
+/// failed authentication.
+///
+/// Specifically, the safety property for a register is computed by iterating
+/// the instructions in backward order: the source register Xn of an 
instruction
+/// Inst is safe if at least one of the following is true:
+/// * Inst checks if Xn contains the result of a successful authentication and
+///   terminates the program on failure. Note that Inst can either naturally
+///   dereference Xn (load, branch, return, etc. instructions) or be the first
+///   instruction of an explicit checking sequence.
+/// * Inst performs safe address arithmetic AND both source and result
+///   registers, as well as any temporary registers, must be safe after
+///   execution of Inst (temporaries are not used on AArch64 and thus not
+///   currently supported/allowed).
+///   See MCPlusBuilder::analyzeAddressArithmeticsForPtrAuth for the details.
+/// * Inst fully overwrites Xn with an unrelated value.
+struct DstState {
+  /// The set of registers whose values cannot be inspected by an attacker in
+  /// a way usable as an authentication oracle. The results of authentication
+  /// instructions should be written to such registers.
+  BitVector CannotEscapeUnchecked;
+
+  std::vector> FirstInstLeakingReg;
+
+  /// Construct an empty state.
+  DstState() {}
+
+  DstState(unsigned NumRegs, unsigned NumRegsToTrack)
+  : CannotE

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: clarify MCPlusBuilder callbacks interface (PR #136147)

2025-04-24 Thread Anatoly Trosinenko via llvm-branch-commits

https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/136147

>From 4a7d095c971e9ae8a3957847eebc56cae8ed6cad Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Thu, 17 Apr 2025 15:40:05 +0300
Subject: [PATCH] [BOLT] Gadget scanner: clarify MCPlusBuilder callbacks
 interface

Clarify the semantics of `getAuthenticatedReg` and remove a redundant
`isAuthenticationOfReg` method, as combined auth+something instructions
(such as `retaa` on AArch64) should be handled carefully, especially
when searching for authentication oracles: usually, such instructions
cannot be authentication oracles and only some of them actually write
an authenticated pointer to a register (such as "ldra x0, [x1]!").

Use `std::optional` returned type instead of plain MCPhysReg
and returning `getNoRegister()` as a "not applicable" indication.

Document a few existing methods, add information about preconditions.
---
 bolt/include/bolt/Core/MCPlusBuilder.h| 61 ++-
 bolt/lib/Passes/PAuthGadgetScanner.cpp| 64 +---
 .../Target/AArch64/AArch64MCPlusBuilder.cpp   | 76 ---
 .../AArch64/gs-pauth-debug-output.s   |  3 -
 .../AArch64/gs-pauth-signing-oracles.s| 20 +
 5 files changed, 130 insertions(+), 94 deletions(-)

diff --git a/bolt/include/bolt/Core/MCPlusBuilder.h 
b/bolt/include/bolt/Core/MCPlusBuilder.h
index 132d58f3f9f79..83ad70ea97076 100644
--- a/bolt/include/bolt/Core/MCPlusBuilder.h
+++ b/bolt/include/bolt/Core/MCPlusBuilder.h
@@ -562,30 +562,50 @@ class MCPlusBuilder {
 return {};
   }
 
-  virtual ErrorOr getAuthenticatedReg(const MCInst &Inst) const {
-llvm_unreachable("not implemented");
-return getNoRegister();
-  }
-
-  virtual bool isAuthenticationOfReg(const MCInst &Inst,
- MCPhysReg AuthenticatedReg) const {
+  /// Returns the register where an authenticated pointer is written to by 
Inst,
+  /// or std::nullopt if not authenticating any register.
+  ///
+  /// Sets IsChecked if the instruction always checks authenticated pointer,
+  /// i.e. it either returns a successfully authenticated pointer or terminates
+  /// the program abnormally (such as "ldra x0, [x1]!" on AArch64, which 
crashes
+  /// on authentication failure even if FEAT_FPAC is not implemented).
+  virtual std::optional
+  getWrittenAuthenticatedReg(const MCInst &Inst, bool &IsChecked) const {
 llvm_unreachable("not implemented");
-return false;
+return std::nullopt;
   }
 
-  virtual MCPhysReg getSignedReg(const MCInst &Inst) const {
+  /// Returns the register signed by Inst, or std::nullopt if not signing any
+  /// register.
+  ///
+  /// The returned register is assumed to be both input and output operand,
+  /// as it is done on AArch64.
+  virtual std::optional getSignedReg(const MCInst &Inst) const {
 llvm_unreachable("not implemented");
-return getNoRegister();
+return std::nullopt;
   }
 
-  virtual ErrorOr getRegUsedAsRetDest(const MCInst &Inst) const {
+  /// Returns the register used as a return address. Returns std::nullopt if
+  /// not applicable, such as reading the return address from a system register
+  /// or from the stack.
+  ///
+  /// Sets IsAuthenticatedInternally if the instruction accepts a signed
+  /// pointer as its operand and authenticates it internally.
+  ///
+  /// Should only be called when isReturn(Inst) is true.
+  virtual std::optional
+  getRegUsedAsRetDest(const MCInst &Inst,
+  bool &IsAuthenticatedInternally) const {
 llvm_unreachable("not implemented");
-return getNoRegister();
+return std::nullopt;
   }
 
   /// Returns the register used as the destination of an indirect branch or 
call
   /// instruction. Sets IsAuthenticatedInternally if the instruction accepts
   /// a signed pointer as its operand and authenticates it internally.
+  ///
+  /// Should only be called if isIndirectCall(Inst) or isIndirectBranch(Inst)
+  /// returns true.
   virtual MCPhysReg
   getRegUsedAsIndirectBranchDest(const MCInst &Inst,
  bool &IsAuthenticatedInternally) const {
@@ -602,14 +622,14 @@ class MCPlusBuilder {
   ///controlled, under the Pointer Authentication threat model.
   ///
   /// If the instruction does not write to any register satisfying the above
-  /// two conditions, NoRegister is returned.
+  /// two conditions, std::nullopt is returned.
   ///
   /// The Pointer Authentication threat model assumes an attacker is able to
   /// modify any writable memory, but not executable code (due to W^X).
-  virtual MCPhysReg
+  virtual std::optional
   getMaterializedAddressRegForPtrAuth(const MCInst &Inst) const {
 llvm_unreachable("not implemented");
-return getNoRegister();
+return std::nullopt;
   }
 
   /// Analyzes if this instruction can safely perform address arithmetics
@@ -622,10 +642,13 @@ class MCPlusBuilder {
   /// controlled, provided InReg and executa

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: refactor issue reporting (PR #135662)

2025-04-24 Thread Anatoly Trosinenko via llvm-branch-commits

https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/135662

>From 499d3297fb86db41061e7371d419a0c05e98302c Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Mon, 14 Apr 2025 15:08:54 +0300
Subject: [PATCH 1/3] [BOLT] Gadget scanner: refactor issue reporting

Remove `getAffectedRegisters` and `setOverwritingInstrs` methods from
the base `Report` class. Instead, make `Report` always represent the
brief version of the report. When an issue is detected on the first run
of the analysis, return an optional request for extra details to attach
to the report on the second run.
---
 bolt/include/bolt/Passes/PAuthGadgetScanner.h | 102 ++---
 bolt/lib/Passes/PAuthGadgetScanner.cpp| 200 ++
 .../AArch64/gs-pauth-debug-output.s   |   8 +-
 3 files changed, 187 insertions(+), 123 deletions(-)

diff --git a/bolt/include/bolt/Passes/PAuthGadgetScanner.h 
b/bolt/include/bolt/Passes/PAuthGadgetScanner.h
index 4c1bef3d2265f..3b6c1f6af94a0 100644
--- a/bolt/include/bolt/Passes/PAuthGadgetScanner.h
+++ b/bolt/include/bolt/Passes/PAuthGadgetScanner.h
@@ -219,11 +219,6 @@ struct Report {
   virtual void generateReport(raw_ostream &OS,
   const BinaryContext &BC) const = 0;
 
-  // The two methods below are called by Analysis::computeDetailedInfo when
-  // iterating over the reports.
-  virtual ArrayRef getAffectedRegisters() const { return {}; }
-  virtual void setOverwritingInstrs(ArrayRef Instrs) {}
-
   void printBasicInfo(raw_ostream &OS, const BinaryContext &BC,
   StringRef IssueKind) const;
 };
@@ -231,27 +226,11 @@ struct Report {
 struct GadgetReport : public Report {
   // The particular kind of gadget that is detected.
   const GadgetKind &Kind;
-  // The set of registers related to this gadget report (possibly empty).
-  SmallVector AffectedRegisters;
-  // The instructions that clobber the affected registers.
-  // There is no one-to-one correspondence with AffectedRegisters: for example,
-  // the same register can be overwritten by different instructions in 
different
-  // preceding basic blocks.
-  SmallVector OverwritingInstrs;
-
-  GadgetReport(const GadgetKind &Kind, MCInstReference Location,
-   MCPhysReg AffectedRegister)
-  : Report(Location), Kind(Kind), AffectedRegisters({AffectedRegister}) {}
-
-  void generateReport(raw_ostream &OS, const BinaryContext &BC) const override;
 
-  ArrayRef getAffectedRegisters() const override {
-return AffectedRegisters;
-  }
+  GadgetReport(const GadgetKind &Kind, MCInstReference Location)
+  : Report(Location), Kind(Kind) {}
 
-  void setOverwritingInstrs(ArrayRef Instrs) override {
-OverwritingInstrs.assign(Instrs.begin(), Instrs.end());
-  }
+  void generateReport(raw_ostream &OS, const BinaryContext &BC) const override;
 };
 
 /// Report with a free-form message attached.
@@ -263,8 +242,75 @@ struct GenericReport : public Report {
   const BinaryContext &BC) const override;
 };
 
+/// An information about an issue collected on the slower, detailed,
+/// run of an analysis.
+class ExtraInfo {
+public:
+  virtual void print(raw_ostream &OS, const MCInstReference Location) const = 
0;
+
+  virtual ~ExtraInfo() {}
+};
+
+class ClobberingInfo : public ExtraInfo {
+  SmallVector ClobberingInstrs;
+
+public:
+  ClobberingInfo(const ArrayRef Instrs)
+  : ClobberingInstrs(Instrs) {}
+
+  void print(raw_ostream &OS, const MCInstReference Location) const override;
+};
+
+/// A brief version of a report that can be further augmented with the details.
+///
+/// It is common for a particular type of gadget detector to be tied to some
+/// specific kind of analysis. If an issue is returned by that detector, it may
+/// be further augmented with the detailed info in an analysis-specific way,
+/// or just be left as-is (f.e. if a free-form warning was reported).
+template  struct BriefReport {
+  BriefReport(std::shared_ptr Issue,
+  const std::optional RequestedDetails)
+  : Issue(Issue), RequestedDetails(RequestedDetails) {}
+
+  std::shared_ptr Issue;
+  std::optional RequestedDetails;
+};
+
+/// A detailed version of a report.
+struct DetailedReport {
+  DetailedReport(std::shared_ptr Issue,
+ std::shared_ptr Details)
+  : Issue(Issue), Details(Details) {}
+
+  std::shared_ptr Issue;
+  std::shared_ptr Details;
+};
+
 struct FunctionAnalysisResult {
-  std::vector> Diagnostics;
+  std::vector Diagnostics;
+};
+
+/// A helper class storing per-function context to be instantiated by Analysis.
+class FunctionAnalysis {
+  BinaryContext &BC;
+  BinaryFunction &BF;
+  MCPlusBuilder::AllocatorIdTy AllocatorId;
+  FunctionAnalysisResult Result;
+
+  bool PacRetGadgetsOnly;
+
+  void findUnsafeUses(SmallVector> &Reports);
+  void augmentUnsafeUseReports(const ArrayRef> Reports);
+
+public:
+  FunctionAnalysis(BinaryFunction &BF, MCPlusBuilder::AllocatorIdTy 
AllocatorId,
+ 

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: detect authentication oracles (PR #135663)

2025-04-24 Thread Anatoly Trosinenko via llvm-branch-commits

https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/135663

>From cdaa1549c3a4804bc5a3788f1fc0c2bfc910168e Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Sat, 5 Apr 2025 14:54:01 +0300
Subject: [PATCH 1/3] [BOLT] Gadget scanner: detect authentication oracles

Implement the detection of authentication instructions whose results can
be inspected by an attacker to know whether authentication succeeded.

As the properties of output registers of authentication instructions are
inspected, add a second set of analysis-related classes to iterate over
the instructions in reverse order.
---
 bolt/include/bolt/Passes/PAuthGadgetScanner.h |  12 +
 bolt/lib/Passes/PAuthGadgetScanner.cpp| 542 +
 .../AArch64/gs-pauth-authentication-oracles.s | 723 ++
 .../AArch64/gs-pauth-debug-output.s   |  78 ++
 4 files changed, 1355 insertions(+)
 create mode 100644 
bolt/test/binary-analysis/AArch64/gs-pauth-authentication-oracles.s

diff --git a/bolt/include/bolt/Passes/PAuthGadgetScanner.h 
b/bolt/include/bolt/Passes/PAuthGadgetScanner.h
index 9a4e9598c9f98..7c51ba42c3294 100644
--- a/bolt/include/bolt/Passes/PAuthGadgetScanner.h
+++ b/bolt/include/bolt/Passes/PAuthGadgetScanner.h
@@ -260,6 +260,15 @@ class ClobberingInfo : public ExtraInfo {
   void print(raw_ostream &OS, const MCInstReference Location) const override;
 };
 
+class LeakageInfo : public ExtraInfo {
+  SmallVector LeakingInstrs;
+
+public:
+  LeakageInfo(const ArrayRef Instrs) : LeakingInstrs(Instrs) 
{}
+
+  void print(raw_ostream &OS, const MCInstReference Location) const override;
+};
+
 /// A brief version of a report that can be further augmented with the details.
 ///
 /// It is common for a particular type of gadget detector to be tied to some
@@ -301,6 +310,9 @@ class FunctionAnalysis {
   void findUnsafeUses(SmallVector> &Reports);
   void augmentUnsafeUseReports(ArrayRef> Reports);
 
+  void findUnsafeDefs(SmallVector> &Reports);
+  void augmentUnsafeDefReports(const ArrayRef> Reports);
+
   /// Process the reports which do not have to be augmented, and remove them
   /// from Reports.
   void handleSimpleReports(SmallVector> &Reports);
diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp 
b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index 0bd9a1ed5941c..fff52de8b9c73 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -712,6 +712,459 @@ SrcSafetyAnalysis::create(BinaryFunction &BF,
RegsToTrackInstsFor);
 }
 
+/// A state representing which registers are safe to be used as the destination
+/// operand of an authentication instruction.
+///
+/// Similar to SrcState, it is the analysis that should take register aliasing
+/// into account.
+///
+/// Depending on the implementation, it may be possible that an authentication
+/// instruction returns an invalid pointer on failure instead of terminating
+/// the program immediately (assuming the program will crash as soon as that
+/// pointer is dereferenced). To prevent brute-forcing the correct signature,
+/// it should be impossible for an attacker to test if a pointer is correctly
+/// signed - either the program should be terminated on authentication failure
+/// or it should be impossible to tell whether authentication succeeded or not.
+///
+/// For that reason, a restricted set of operations is allowed on any register
+/// containing a value derived from the result of an authentication instruction
+/// until that register is either wiped or checked not to contain a result of a
+/// failed authentication.
+///
+/// Specifically, the safety property for a register is computed by iterating
+/// the instructions in backward order: the source register Xn of an 
instruction
+/// Inst is safe if at least one of the following is true:
+/// * Inst checks if Xn contains the result of a successful authentication and
+///   terminates the program on failure. Note that Inst can either naturally
+///   dereference Xn (load, branch, return, etc. instructions) or be the first
+///   instruction of an explicit checking sequence.
+/// * Inst performs safe address arithmetic AND both source and result
+///   registers, as well as any temporary registers, must be safe after
+///   execution of Inst (temporaries are not used on AArch64 and thus not
+///   currently supported/allowed).
+///   See MCPlusBuilder::analyzeAddressArithmeticsForPtrAuth for the details.
+/// * Inst fully overwrites Xn with an unrelated value.
+struct DstState {
+  /// The set of registers whose values cannot be inspected by an attacker in
+  /// a way usable as an authentication oracle. The results of authentication
+  /// instructions should be written to such registers.
+  BitVector CannotEscapeUnchecked;
+
+  std::vector> FirstInstLeakingReg;
+
+  /// Construct an empty state.
+  DstState() {}
+
+  DstState(unsigned NumRegs, unsigned NumRegsToTrack)
+  : CannotE

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: improve handling of unreachable basic blocks (PR #136183)

2025-04-24 Thread Anatoly Trosinenko via llvm-branch-commits

https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/136183

>From b03ffd01b9d634f604f39ab5225452e97b45479b Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Thu, 17 Apr 2025 20:51:16 +0300
Subject: [PATCH] [BOLT] Gadget scanner: improve handling of unreachable basic
 blocks

Instead of refusing to analyze an instruction completely, when it is
unreachable according to the CFG reconstructed by BOLT, pessimistically
assume all registers to be unsafe at the start of basic blocks without
any predecessors. Nevertheless, unreachable basic blocks found in
optimized code likely means imprecise CFG reconstruction, thus report a
warning once per basic block without predecessors.
---
 bolt/lib/Passes/PAuthGadgetScanner.cpp| 46 ++-
 .../AArch64/gs-pacret-autiasp.s   |  7 ++-
 .../binary-analysis/AArch64/gs-pauth-calls.s  | 57 +++
 3 files changed, 95 insertions(+), 15 deletions(-)

diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp 
b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index f7ac0b67d00da..0ce9f51c44af4 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -344,6 +344,12 @@ class SrcSafetyAnalysis {
 return S;
   }
 
+  /// Creates a state with all registers marked unsafe (not to be confused
+  /// with empty state).
+  SrcState createUnsafeState() const {
+return SrcState(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters());
+  }
+
   BitVector getClobberedRegs(const MCInst &Point) const {
 BitVector Clobbered(NumRegs);
 // Assume a call can clobber all registers, including callee-saved
@@ -581,6 +587,13 @@ class DataflowSrcSafetyAnalysis
 if (BB.isEntryPoint())
   return createEntryState();
 
+// If a basic block without any predecessors is found in an optimized code,
+// this likely means that some CFG edges were not detected. Pessimistically
+// assume all registers to be unsafe before this basic block and warn about
+// this fact in FunctionAnalysis::findUnsafeUses().
+if (BB.pred_empty())
+  return createUnsafeState();
+
 return SrcState();
   }
 
@@ -654,12 +667,6 @@ class CFGUnawareSrcSafetyAnalysis : public 
SrcSafetyAnalysis {
   BC.MIB->removeAnnotation(I.second, StateAnnotationIndex);
   }
 
-  /// Creates a state with all registers marked unsafe (not to be confused
-  /// with empty state).
-  SrcState createUnsafeState() const {
-return SrcState(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters());
-  }
-
 public:
   CFGUnawareSrcSafetyAnalysis(BinaryFunction &BF,
   MCPlusBuilder::AllocatorIdTy AllocId,
@@ -1328,19 +1335,30 @@ void FunctionAnalysis::findUnsafeUses(
 BF.dump();
   });
 
+  if (BF.hasCFG()) {
+// Warn on basic blocks being unreachable according to BOLT, as this
+// likely means CFG is imprecise.
+for (BinaryBasicBlock &BB : BF) {
+  if (!BB.pred_empty() || BB.isEntryPoint())
+continue;
+  // Arbitrarily attach the report to the first instruction of BB.
+  MCInst *InstToReport = BB.getFirstNonPseudoInstr();
+  if (!InstToReport)
+continue; // BB has no real instructions
+
+  Reports.push_back(
+  make_generic_report(MCInstReference::get(InstToReport, BF),
+  "Warning: no predecessor basic blocks detected "
+  "(possibly incomplete CFG)"));
+}
+  }
+
   iterateOverInstrs(BF, [&](MCInstReference Inst) {
 if (BC.MIB->isCFI(Inst))
   return;
 
 const SrcState &S = Analysis->getStateBefore(Inst);
-
-// If non-empty state was never propagated from the entry basic block
-// to Inst, assume it to be unreachable and report a warning.
-if (S.empty()) {
-  Reports.push_back(
-  make_generic_report(Inst, "Warning: unreachable instruction found"));
-  return;
-}
+assert(!S.empty() && "Instruction has no associated state");
 
 if (auto Report = shouldReportReturnGadget(BC, Inst, S))
   Reports.push_back(*Report);
diff --git a/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s 
b/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s
index 284f0bea607a5..6559ba336e8de 100644
--- a/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s
+++ b/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s
@@ -215,12 +215,17 @@ f_callclobbered_calleesaved:
 .globl  f_unreachable_instruction
 .type   f_unreachable_instruction,@function
 f_unreachable_instruction:
-// CHECK-LABEL: GS-PAUTH: Warning: unreachable instruction found in function 
f_unreachable_instruction, basic block {{[0-9a-zA-Z.]+}}, at address
+// CHECK-LABEL: GS-PAUTH: Warning: no predecessor basic blocks detected 
(possibly incomplete CFG) in function f_unreachable_instruction, basic block 
{{[0-9a-zA-Z.]+}}, at address
 // CHECK-NEXT:The instruction is {{[0-9a-f]+}}:   add x0, x1, 
x2
 // CHECK-NOT:   instructions that wr

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: improve handling of unreachable basic blocks (PR #136183)

2025-04-24 Thread Anatoly Trosinenko via llvm-branch-commits

https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/136183

>From b03ffd01b9d634f604f39ab5225452e97b45479b Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Thu, 17 Apr 2025 20:51:16 +0300
Subject: [PATCH] [BOLT] Gadget scanner: improve handling of unreachable basic
 blocks

Instead of refusing to analyze an instruction completely, when it is
unreachable according to the CFG reconstructed by BOLT, pessimistically
assume all registers to be unsafe at the start of basic blocks without
any predecessors. Nevertheless, unreachable basic blocks found in
optimized code likely means imprecise CFG reconstruction, thus report a
warning once per basic block without predecessors.
---
 bolt/lib/Passes/PAuthGadgetScanner.cpp| 46 ++-
 .../AArch64/gs-pacret-autiasp.s   |  7 ++-
 .../binary-analysis/AArch64/gs-pauth-calls.s  | 57 +++
 3 files changed, 95 insertions(+), 15 deletions(-)

diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp 
b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index f7ac0b67d00da..0ce9f51c44af4 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -344,6 +344,12 @@ class SrcSafetyAnalysis {
 return S;
   }
 
+  /// Creates a state with all registers marked unsafe (not to be confused
+  /// with empty state).
+  SrcState createUnsafeState() const {
+return SrcState(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters());
+  }
+
   BitVector getClobberedRegs(const MCInst &Point) const {
 BitVector Clobbered(NumRegs);
 // Assume a call can clobber all registers, including callee-saved
@@ -581,6 +587,13 @@ class DataflowSrcSafetyAnalysis
 if (BB.isEntryPoint())
   return createEntryState();
 
+// If a basic block without any predecessors is found in an optimized code,
+// this likely means that some CFG edges were not detected. Pessimistically
+// assume all registers to be unsafe before this basic block and warn about
+// this fact in FunctionAnalysis::findUnsafeUses().
+if (BB.pred_empty())
+  return createUnsafeState();
+
 return SrcState();
   }
 
@@ -654,12 +667,6 @@ class CFGUnawareSrcSafetyAnalysis : public 
SrcSafetyAnalysis {
   BC.MIB->removeAnnotation(I.second, StateAnnotationIndex);
   }
 
-  /// Creates a state with all registers marked unsafe (not to be confused
-  /// with empty state).
-  SrcState createUnsafeState() const {
-return SrcState(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters());
-  }
-
 public:
   CFGUnawareSrcSafetyAnalysis(BinaryFunction &BF,
   MCPlusBuilder::AllocatorIdTy AllocId,
@@ -1328,19 +1335,30 @@ void FunctionAnalysis::findUnsafeUses(
 BF.dump();
   });
 
+  if (BF.hasCFG()) {
+// Warn on basic blocks being unreachable according to BOLT, as this
+// likely means CFG is imprecise.
+for (BinaryBasicBlock &BB : BF) {
+  if (!BB.pred_empty() || BB.isEntryPoint())
+continue;
+  // Arbitrarily attach the report to the first instruction of BB.
+  MCInst *InstToReport = BB.getFirstNonPseudoInstr();
+  if (!InstToReport)
+continue; // BB has no real instructions
+
+  Reports.push_back(
+  make_generic_report(MCInstReference::get(InstToReport, BF),
+  "Warning: no predecessor basic blocks detected "
+  "(possibly incomplete CFG)"));
+}
+  }
+
   iterateOverInstrs(BF, [&](MCInstReference Inst) {
 if (BC.MIB->isCFI(Inst))
   return;
 
 const SrcState &S = Analysis->getStateBefore(Inst);
-
-// If non-empty state was never propagated from the entry basic block
-// to Inst, assume it to be unreachable and report a warning.
-if (S.empty()) {
-  Reports.push_back(
-  make_generic_report(Inst, "Warning: unreachable instruction found"));
-  return;
-}
+assert(!S.empty() && "Instruction has no associated state");
 
 if (auto Report = shouldReportReturnGadget(BC, Inst, S))
   Reports.push_back(*Report);
diff --git a/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s 
b/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s
index 284f0bea607a5..6559ba336e8de 100644
--- a/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s
+++ b/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s
@@ -215,12 +215,17 @@ f_callclobbered_calleesaved:
 .globl  f_unreachable_instruction
 .type   f_unreachable_instruction,@function
 f_unreachable_instruction:
-// CHECK-LABEL: GS-PAUTH: Warning: unreachable instruction found in function 
f_unreachable_instruction, basic block {{[0-9a-zA-Z.]+}}, at address
+// CHECK-LABEL: GS-PAUTH: Warning: no predecessor basic blocks detected 
(possibly incomplete CFG) in function f_unreachable_instruction, basic block 
{{[0-9a-zA-Z.]+}}, at address
 // CHECK-NEXT:The instruction is {{[0-9a-f]+}}:   add x0, x1, 
x2
 // CHECK-NOT:   instructions that wr

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: use more appropriate types (NFC) (PR #135661)

2025-04-24 Thread Anatoly Trosinenko via llvm-branch-commits

https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/135661

>From fa2f2aeb7624d57b25915aeb23d62a9df3a11044 Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Mon, 14 Apr 2025 14:35:56 +0300
Subject: [PATCH 1/2] [BOLT] Gadget scanner: use more appropriate types (NFC)

* use more flexible `const ArrayRef` and `StringRef` types instead of
  `const std::vector &` and `const std::string &`, correspondingly,
  for function arguments
* return plain `const SrcState &` instead of `ErrorOr`
  from `SrcSafetyAnalysis::getStateBefore`, as absent state is not
  handled gracefully by any caller
---
 bolt/include/bolt/Passes/PAuthGadgetScanner.h |  8 +---
 bolt/lib/Passes/PAuthGadgetScanner.cpp| 39 ---
 2 files changed, 19 insertions(+), 28 deletions(-)

diff --git a/bolt/include/bolt/Passes/PAuthGadgetScanner.h 
b/bolt/include/bolt/Passes/PAuthGadgetScanner.h
index 6765e2aff414f..3e39b64e59e0f 100644
--- a/bolt/include/bolt/Passes/PAuthGadgetScanner.h
+++ b/bolt/include/bolt/Passes/PAuthGadgetScanner.h
@@ -12,7 +12,6 @@
 #include "bolt/Core/BinaryContext.h"
 #include "bolt/Core/BinaryFunction.h"
 #include "bolt/Passes/BinaryPasses.h"
-#include "llvm/ADT/SmallSet.h"
 #include "llvm/Support/raw_ostream.h"
 #include 
 
@@ -199,9 +198,6 @@ raw_ostream &operator<<(raw_ostream &OS, const 
MCInstReference &);
 
 namespace PAuthGadgetScanner {
 
-class SrcSafetyAnalysis;
-struct SrcState;
-
 /// Description of a gadget kind that can be detected. Intended to be
 /// statically allocated to be attached to reports by reference.
 class GadgetKind {
@@ -210,7 +206,7 @@ class GadgetKind {
 public:
   GadgetKind(const char *Description) : Description(Description) {}
 
-  const StringRef getDescription() const { return Description; }
+  StringRef getDescription() const { return Description; }
 };
 
 /// Base report located at some instruction, without any additional 
information.
@@ -261,7 +257,7 @@ struct GadgetReport : public Report {
 /// Report with a free-form message attached.
 struct GenericReport : public Report {
   std::string Text;
-  GenericReport(MCInstReference Location, const std::string &Text)
+  GenericReport(MCInstReference Location, StringRef Text)
   : Report(Location), Text(Text) {}
   virtual void generateReport(raw_ostream &OS,
   const BinaryContext &BC) const override;
diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp 
b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index 4f601558dec4e..339673b600765 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -91,14 +91,14 @@ class TrackedRegisters {
   const std::vector Registers;
   std::vector RegToIndexMapping;
 
-  static size_t getMappingSize(const std::vector &RegsToTrack) {
+  static size_t getMappingSize(const ArrayRef RegsToTrack) {
 if (RegsToTrack.empty())
   return 0;
 return 1 + *llvm::max_element(RegsToTrack);
   }
 
 public:
-  TrackedRegisters(const std::vector &RegsToTrack)
+  TrackedRegisters(const ArrayRef RegsToTrack)
   : Registers(RegsToTrack),
 RegToIndexMapping(getMappingSize(RegsToTrack), NoIndex) {
 for (unsigned I = 0; I < RegsToTrack.size(); ++I)
@@ -234,7 +234,7 @@ struct SrcState {
 
 static void printLastInsts(
 raw_ostream &OS,
-const std::vector> &LastInstWritingReg) {
+const ArrayRef> LastInstWritingReg) {
   OS << "Insts: ";
   for (unsigned I = 0; I < LastInstWritingReg.size(); ++I) {
 auto &Set = LastInstWritingReg[I];
@@ -295,7 +295,7 @@ void SrcStatePrinter::print(raw_ostream &OS, const SrcState 
&S) const {
 class SrcSafetyAnalysis {
 public:
   SrcSafetyAnalysis(BinaryFunction &BF,
-const std::vector &RegsToTrackInstsFor)
+const ArrayRef RegsToTrackInstsFor)
   : BC(BF.getBinaryContext()), NumRegs(BC.MRI->getNumRegs()),
 RegsToTrackInstsFor(RegsToTrackInstsFor) {}
 
@@ -303,11 +303,10 @@ class SrcSafetyAnalysis {
 
   static std::shared_ptr
   create(BinaryFunction &BF, MCPlusBuilder::AllocatorIdTy AllocId,
- const std::vector &RegsToTrackInstsFor);
+ const ArrayRef RegsToTrackInstsFor);
 
   virtual void run() = 0;
-  virtual ErrorOr
-  getStateBefore(const MCInst &Inst) const = 0;
+  virtual const SrcState &getStateBefore(const MCInst &Inst) const = 0;
 
 protected:
   BinaryContext &BC;
@@ -347,7 +346,7 @@ class SrcSafetyAnalysis {
   }
 
   BitVector getClobberedRegs(const MCInst &Point) const {
-BitVector Clobbered(NumRegs, false);
+BitVector Clobbered(NumRegs);
 // Assume a call can clobber all registers, including callee-saved
 // registers. There's a good chance that callee-saved registers will be
 // saved on the stack at some point during execution of the callee.
@@ -408,8 +407,7 @@ class SrcSafetyAnalysis {
 
   // FirstCheckerInst should belong to the same basic block, meaning
   // it was deterministically processed a few steps before this 
ins

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: use more appropriate types (NFC) (PR #135661)

2025-04-24 Thread Anatoly Trosinenko via llvm-branch-commits

https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/135661

>From fa2f2aeb7624d57b25915aeb23d62a9df3a11044 Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Mon, 14 Apr 2025 14:35:56 +0300
Subject: [PATCH 1/2] [BOLT] Gadget scanner: use more appropriate types (NFC)

* use more flexible `const ArrayRef` and `StringRef` types instead of
  `const std::vector &` and `const std::string &`, correspondingly,
  for function arguments
* return plain `const SrcState &` instead of `ErrorOr`
  from `SrcSafetyAnalysis::getStateBefore`, as absent state is not
  handled gracefully by any caller
---
 bolt/include/bolt/Passes/PAuthGadgetScanner.h |  8 +---
 bolt/lib/Passes/PAuthGadgetScanner.cpp| 39 ---
 2 files changed, 19 insertions(+), 28 deletions(-)

diff --git a/bolt/include/bolt/Passes/PAuthGadgetScanner.h 
b/bolt/include/bolt/Passes/PAuthGadgetScanner.h
index 6765e2aff414f..3e39b64e59e0f 100644
--- a/bolt/include/bolt/Passes/PAuthGadgetScanner.h
+++ b/bolt/include/bolt/Passes/PAuthGadgetScanner.h
@@ -12,7 +12,6 @@
 #include "bolt/Core/BinaryContext.h"
 #include "bolt/Core/BinaryFunction.h"
 #include "bolt/Passes/BinaryPasses.h"
-#include "llvm/ADT/SmallSet.h"
 #include "llvm/Support/raw_ostream.h"
 #include 
 
@@ -199,9 +198,6 @@ raw_ostream &operator<<(raw_ostream &OS, const 
MCInstReference &);
 
 namespace PAuthGadgetScanner {
 
-class SrcSafetyAnalysis;
-struct SrcState;
-
 /// Description of a gadget kind that can be detected. Intended to be
 /// statically allocated to be attached to reports by reference.
 class GadgetKind {
@@ -210,7 +206,7 @@ class GadgetKind {
 public:
   GadgetKind(const char *Description) : Description(Description) {}
 
-  const StringRef getDescription() const { return Description; }
+  StringRef getDescription() const { return Description; }
 };
 
 /// Base report located at some instruction, without any additional 
information.
@@ -261,7 +257,7 @@ struct GadgetReport : public Report {
 /// Report with a free-form message attached.
 struct GenericReport : public Report {
   std::string Text;
-  GenericReport(MCInstReference Location, const std::string &Text)
+  GenericReport(MCInstReference Location, StringRef Text)
   : Report(Location), Text(Text) {}
   virtual void generateReport(raw_ostream &OS,
   const BinaryContext &BC) const override;
diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp 
b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index 4f601558dec4e..339673b600765 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -91,14 +91,14 @@ class TrackedRegisters {
   const std::vector Registers;
   std::vector RegToIndexMapping;
 
-  static size_t getMappingSize(const std::vector &RegsToTrack) {
+  static size_t getMappingSize(const ArrayRef RegsToTrack) {
 if (RegsToTrack.empty())
   return 0;
 return 1 + *llvm::max_element(RegsToTrack);
   }
 
 public:
-  TrackedRegisters(const std::vector &RegsToTrack)
+  TrackedRegisters(const ArrayRef RegsToTrack)
   : Registers(RegsToTrack),
 RegToIndexMapping(getMappingSize(RegsToTrack), NoIndex) {
 for (unsigned I = 0; I < RegsToTrack.size(); ++I)
@@ -234,7 +234,7 @@ struct SrcState {
 
 static void printLastInsts(
 raw_ostream &OS,
-const std::vector> &LastInstWritingReg) {
+const ArrayRef> LastInstWritingReg) {
   OS << "Insts: ";
   for (unsigned I = 0; I < LastInstWritingReg.size(); ++I) {
 auto &Set = LastInstWritingReg[I];
@@ -295,7 +295,7 @@ void SrcStatePrinter::print(raw_ostream &OS, const SrcState 
&S) const {
 class SrcSafetyAnalysis {
 public:
   SrcSafetyAnalysis(BinaryFunction &BF,
-const std::vector &RegsToTrackInstsFor)
+const ArrayRef RegsToTrackInstsFor)
   : BC(BF.getBinaryContext()), NumRegs(BC.MRI->getNumRegs()),
 RegsToTrackInstsFor(RegsToTrackInstsFor) {}
 
@@ -303,11 +303,10 @@ class SrcSafetyAnalysis {
 
   static std::shared_ptr
   create(BinaryFunction &BF, MCPlusBuilder::AllocatorIdTy AllocId,
- const std::vector &RegsToTrackInstsFor);
+ const ArrayRef RegsToTrackInstsFor);
 
   virtual void run() = 0;
-  virtual ErrorOr
-  getStateBefore(const MCInst &Inst) const = 0;
+  virtual const SrcState &getStateBefore(const MCInst &Inst) const = 0;
 
 protected:
   BinaryContext &BC;
@@ -347,7 +346,7 @@ class SrcSafetyAnalysis {
   }
 
   BitVector getClobberedRegs(const MCInst &Point) const {
-BitVector Clobbered(NumRegs, false);
+BitVector Clobbered(NumRegs);
 // Assume a call can clobber all registers, including callee-saved
 // registers. There's a good chance that callee-saved registers will be
 // saved on the stack at some point during execution of the callee.
@@ -408,8 +407,7 @@ class SrcSafetyAnalysis {
 
   // FirstCheckerInst should belong to the same basic block, meaning
   // it was deterministically processed a few steps before this 
ins

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: do not crash on debug-printing CFI instructions (PR #136151)

2025-04-24 Thread Anatoly Trosinenko via llvm-branch-commits

https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/136151

>From e22ae5ebfe066cec1335f2b4080b013560c5e844 Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Tue, 15 Apr 2025 21:47:18 +0300
Subject: [PATCH] [BOLT] Gadget scanner: do not crash on debug-printing CFI
 instructions

Some instruction-printing code used under LLVM_DEBUG does not handle CFI
instructions well. While CFI instructions seem to be harmless for the
correctness of the analysis results, they do not convey any useful
information to the analysis either, so skip them early.
---
 bolt/lib/Passes/PAuthGadgetScanner.cpp| 16 ++
 .../AArch64/gs-pauth-debug-output.s   | 32 +++
 2 files changed, 48 insertions(+)

diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp 
b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index 849272cac73d2..f7ac0b67d00da 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -431,6 +431,9 @@ class SrcSafetyAnalysis {
   }
 
   SrcState computeNext(const MCInst &Point, const SrcState &Cur) {
+if (BC.MIB->isCFI(Point))
+  return Cur;
+
 SrcStatePrinter P(BC);
 LLVM_DEBUG({
   dbgs() << "  SrcSafetyAnalysis::ComputeNext(";
@@ -670,6 +673,8 @@ class CFGUnawareSrcSafetyAnalysis : public 
SrcSafetyAnalysis {
 SrcState S = createEntryState();
 for (auto &I : BF.instrs()) {
   MCInst &Inst = I.second;
+  if (BC.MIB->isCFI(Inst))
+continue;
 
   // If there is a label before this instruction, it is possible that it
   // can be jumped-to, thus conservatively resetting S. As an exception,
@@ -947,6 +952,9 @@ class DstSafetyAnalysis {
   }
 
   DstState computeNext(const MCInst &Point, const DstState &Cur) {
+if (BC.MIB->isCFI(Point))
+  return Cur;
+
 DstStatePrinter P(BC);
 LLVM_DEBUG({
   dbgs() << "  DstSafetyAnalysis::ComputeNext(";
@@ -1123,6 +1131,8 @@ class CFGUnawareDstSafetyAnalysis : public 
DstSafetyAnalysis {
 DstState S = createUnsafeState();
 for (auto &I : llvm::reverse(BF.instrs())) {
   MCInst &Inst = I.second;
+  if (BC.MIB->isCFI(Inst))
+continue;
 
   // If Inst can change the control flow, we cannot be sure that the next
   // instruction (to be executed in analyzed program) is the one processed
@@ -1319,6 +1329,9 @@ void FunctionAnalysis::findUnsafeUses(
   });
 
   iterateOverInstrs(BF, [&](MCInstReference Inst) {
+if (BC.MIB->isCFI(Inst))
+  return;
+
 const SrcState &S = Analysis->getStateBefore(Inst);
 
 // If non-empty state was never propagated from the entry basic block
@@ -1382,6 +1395,9 @@ void FunctionAnalysis::findUnsafeDefs(
   });
 
   iterateOverInstrs(BF, [&](MCInstReference Inst) {
+if (BC.MIB->isCFI(Inst))
+  return;
+
 const DstState &S = Analysis->getStateAfter(Inst);
 
 if (auto Report = shouldReportAuthOracle(BC, Inst, S))
diff --git a/bolt/test/binary-analysis/AArch64/gs-pauth-debug-output.s 
b/bolt/test/binary-analysis/AArch64/gs-pauth-debug-output.s
index fd55880921d06..07b61bea77e94 100644
--- a/bolt/test/binary-analysis/AArch64/gs-pauth-debug-output.s
+++ b/bolt/test/binary-analysis/AArch64/gs-pauth-debug-output.s
@@ -329,6 +329,38 @@ auth_oracle:
 // PAUTH-EMPTY:
 // PAUTH-NEXT:   Attaching leakage info to: :  autia   x0, x1 
# DataflowDstSafetyAnalysis: dst-state
 
+// Gadget scanner should not crash on CFI instructions, including when 
debug-printing them.
+// Note that the particular debug output is not checked, but BOLT should be
+// compiled with assertions enabled to support -debug-only argument.
+
+.globl  cfi_inst_df
+.type   cfi_inst_df,@function
+cfi_inst_df:
+.cfi_startproc
+sub sp, sp, #16
+.cfi_def_cfa_offset 16
+add sp, sp, #16
+.cfi_def_cfa_offset 0
+ret
+.size   cfi_inst_df, .-cfi_inst_df
+.cfi_endproc
+
+.globl  cfi_inst_nocfg
+.type   cfi_inst_nocfg,@function
+cfi_inst_nocfg:
+.cfi_startproc
+sub sp, sp, #16
+.cfi_def_cfa_offset 16
+
+adr x0, 1f
+br  x0
+1:
+add sp, sp, #16
+.cfi_def_cfa_offset 0
+ret
+.size   cfi_inst_nocfg, .-cfi_inst_nocfg
+.cfi_endproc
+
 // CHECK-LABEL:Analyzing function main, AllocatorId = 1
 .globl  main
 .type   main,@function

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: detect untrusted LR before tail call (PR #137224)

2025-04-24 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-bolt

Author: Anatoly Trosinenko (atrosinenko)


Changes

Implement the detection of tail calls performed with untrusted link
register, which violates the assumption made on entry to every function.

Unlike other pauth gadgets, detection of this one involves some amount
of guessing which branch instructions should be checked as tail calls.

---

Patch is 41.46 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/137224.diff


4 Files Affected:

- (modified) bolt/lib/Passes/PAuthGadgetScanner.cpp (+109-3) 
- (modified) bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s (+22-9) 
- (modified) bolt/test/binary-analysis/AArch64/gs-pauth-debug-output.s (+2-28) 
- (added) bolt/test/binary-analysis/AArch64/gs-pauth-tail-calls.s (+597) 


``diff
diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp 
b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index 0ce9f51c44af4..d67f10a311396 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -655,8 +655,9 @@ class DataflowSrcSafetyAnalysis
 //
 // Then, a function can be split into a number of disjoint contiguous sequences
 // of instructions without labels in between. These sequences can be processed
-// the same way basic blocks are processed by data-flow analysis, assuming
-// pessimistically that all registers are unsafe at the start of each sequence.
+// the same way basic blocks are processed by data-flow analysis, with the same
+// pessimistic estimation of the initial state at the start of each sequence
+// (except the first instruction of the function).
 class CFGUnawareSrcSafetyAnalysis : public SrcSafetyAnalysis {
   BinaryFunction &BF;
   MCPlusBuilder::AllocatorIdTy AllocId;
@@ -667,6 +668,30 @@ class CFGUnawareSrcSafetyAnalysis : public 
SrcSafetyAnalysis {
   BC.MIB->removeAnnotation(I.second, StateAnnotationIndex);
   }
 
+  /// Compute a reasonably pessimistic estimation of the register state when
+  /// the previous instruction is not known for sure. Take the set of registers
+  /// which are trusted at function entry and remove all registers that can be
+  /// clobbered inside this function.
+  SrcState computePessimisticState(BinaryFunction &BF) {
+BitVector ClobberedRegs(NumRegs);
+for (auto &I : BF.instrs()) {
+  MCInst &Inst = I.second;
+  BC.MIB->getClobberedRegs(Inst, ClobberedRegs);
+
+  // If this is a call instruction, no register is safe anymore, unless
+  // it is a tail call. Ignore tail calls for the purpose of estimating the
+  // worst-case scenario, assuming no instructions are executed in the
+  // caller after this point anyway.
+  if (BC.MIB->isCall(Inst) && !BC.MIB->isTailCall(Inst))
+ClobberedRegs.set();
+}
+
+SrcState S = createEntryState();
+S.SafeToDerefRegs.reset(ClobberedRegs);
+S.TrustedRegs.reset(ClobberedRegs);
+return S;
+  }
+
 public:
   CFGUnawareSrcSafetyAnalysis(BinaryFunction &BF,
   MCPlusBuilder::AllocatorIdTy AllocId,
@@ -677,6 +702,7 @@ class CFGUnawareSrcSafetyAnalysis : public 
SrcSafetyAnalysis {
   }
 
   void run() override {
+const SrcState DefaultState = computePessimisticState(BF);
 SrcState S = createEntryState();
 for (auto &I : BF.instrs()) {
   MCInst &Inst = I.second;
@@ -691,7 +717,7 @@ class CFGUnawareSrcSafetyAnalysis : public 
SrcSafetyAnalysis {
 LLVM_DEBUG({
   traceInst(BC, "Due to label, resetting the state before", Inst);
 });
-S = createUnsafeState();
+S = DefaultState;
   }
 
   // Check if we need to remove an old annotation (this is the case if
@@ -1226,6 +1252,83 @@ shouldReportReturnGadget(const BinaryContext &BC, const 
MCInstReference &Inst,
   return make_report(RetKind, Inst, *RetReg);
 }
 
+/// While BOLT already marks some of the branch instructions as tail calls,
+/// this function tries to improve the coverage by including less obvious cases
+/// when it is possible to do without introducing too many false positives.
+static bool shouldAnalyzeTailCallInst(const BinaryContext &BC,
+  const BinaryFunction &BF,
+  const MCInstReference &Inst) {
+  // Some BC.MIB->isXYZ(Inst) methods simply delegate to MCInstrDesc::isXYZ()
+  // (such as isBranch at the time of writing this comment), some don't (such
+  // as isCall). For that reason, call MCInstrDesc's methods explicitly when
+  // it is important.
+  const MCInstrDesc &Desc =
+  BC.MII->get(static_cast(Inst).getOpcode());
+  // Tail call should be a branch (but not necessarily an indirect one).
+  if (!Desc.isBranch())
+return false;
+
+  // Always analyze the branches already marked as tail calls by BOLT.
+  if (BC.MIB->isTailCall(Inst))
+return true;
+
+  // Try to also check the branches marked as "UNKNOWN CONTROL FLOW" - the
+  // below is a simplified condition 

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: detect untrusted LR before tail call (PR #137224)

2025-04-24 Thread Anatoly Trosinenko via llvm-branch-commits

https://github.com/atrosinenko edited 
https://github.com/llvm/llvm-project/pull/137224
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: refactor issue reporting (PR #135662)

2025-04-24 Thread Anatoly Trosinenko via llvm-branch-commits

https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/135662

>From 499d3297fb86db41061e7371d419a0c05e98302c Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Mon, 14 Apr 2025 15:08:54 +0300
Subject: [PATCH 1/3] [BOLT] Gadget scanner: refactor issue reporting

Remove `getAffectedRegisters` and `setOverwritingInstrs` methods from
the base `Report` class. Instead, make `Report` always represent the
brief version of the report. When an issue is detected on the first run
of the analysis, return an optional request for extra details to attach
to the report on the second run.
---
 bolt/include/bolt/Passes/PAuthGadgetScanner.h | 102 ++---
 bolt/lib/Passes/PAuthGadgetScanner.cpp| 200 ++
 .../AArch64/gs-pauth-debug-output.s   |   8 +-
 3 files changed, 187 insertions(+), 123 deletions(-)

diff --git a/bolt/include/bolt/Passes/PAuthGadgetScanner.h 
b/bolt/include/bolt/Passes/PAuthGadgetScanner.h
index 4c1bef3d2265f..3b6c1f6af94a0 100644
--- a/bolt/include/bolt/Passes/PAuthGadgetScanner.h
+++ b/bolt/include/bolt/Passes/PAuthGadgetScanner.h
@@ -219,11 +219,6 @@ struct Report {
   virtual void generateReport(raw_ostream &OS,
   const BinaryContext &BC) const = 0;
 
-  // The two methods below are called by Analysis::computeDetailedInfo when
-  // iterating over the reports.
-  virtual ArrayRef getAffectedRegisters() const { return {}; }
-  virtual void setOverwritingInstrs(ArrayRef Instrs) {}
-
   void printBasicInfo(raw_ostream &OS, const BinaryContext &BC,
   StringRef IssueKind) const;
 };
@@ -231,27 +226,11 @@ struct Report {
 struct GadgetReport : public Report {
   // The particular kind of gadget that is detected.
   const GadgetKind &Kind;
-  // The set of registers related to this gadget report (possibly empty).
-  SmallVector AffectedRegisters;
-  // The instructions that clobber the affected registers.
-  // There is no one-to-one correspondence with AffectedRegisters: for example,
-  // the same register can be overwritten by different instructions in 
different
-  // preceding basic blocks.
-  SmallVector OverwritingInstrs;
-
-  GadgetReport(const GadgetKind &Kind, MCInstReference Location,
-   MCPhysReg AffectedRegister)
-  : Report(Location), Kind(Kind), AffectedRegisters({AffectedRegister}) {}
-
-  void generateReport(raw_ostream &OS, const BinaryContext &BC) const override;
 
-  ArrayRef getAffectedRegisters() const override {
-return AffectedRegisters;
-  }
+  GadgetReport(const GadgetKind &Kind, MCInstReference Location)
+  : Report(Location), Kind(Kind) {}
 
-  void setOverwritingInstrs(ArrayRef Instrs) override {
-OverwritingInstrs.assign(Instrs.begin(), Instrs.end());
-  }
+  void generateReport(raw_ostream &OS, const BinaryContext &BC) const override;
 };
 
 /// Report with a free-form message attached.
@@ -263,8 +242,75 @@ struct GenericReport : public Report {
   const BinaryContext &BC) const override;
 };
 
+/// An information about an issue collected on the slower, detailed,
+/// run of an analysis.
+class ExtraInfo {
+public:
+  virtual void print(raw_ostream &OS, const MCInstReference Location) const = 
0;
+
+  virtual ~ExtraInfo() {}
+};
+
+class ClobberingInfo : public ExtraInfo {
+  SmallVector ClobberingInstrs;
+
+public:
+  ClobberingInfo(const ArrayRef Instrs)
+  : ClobberingInstrs(Instrs) {}
+
+  void print(raw_ostream &OS, const MCInstReference Location) const override;
+};
+
+/// A brief version of a report that can be further augmented with the details.
+///
+/// It is common for a particular type of gadget detector to be tied to some
+/// specific kind of analysis. If an issue is returned by that detector, it may
+/// be further augmented with the detailed info in an analysis-specific way,
+/// or just be left as-is (f.e. if a free-form warning was reported).
+template  struct BriefReport {
+  BriefReport(std::shared_ptr Issue,
+  const std::optional RequestedDetails)
+  : Issue(Issue), RequestedDetails(RequestedDetails) {}
+
+  std::shared_ptr Issue;
+  std::optional RequestedDetails;
+};
+
+/// A detailed version of a report.
+struct DetailedReport {
+  DetailedReport(std::shared_ptr Issue,
+ std::shared_ptr Details)
+  : Issue(Issue), Details(Details) {}
+
+  std::shared_ptr Issue;
+  std::shared_ptr Details;
+};
+
 struct FunctionAnalysisResult {
-  std::vector> Diagnostics;
+  std::vector Diagnostics;
+};
+
+/// A helper class storing per-function context to be instantiated by Analysis.
+class FunctionAnalysis {
+  BinaryContext &BC;
+  BinaryFunction &BF;
+  MCPlusBuilder::AllocatorIdTy AllocatorId;
+  FunctionAnalysisResult Result;
+
+  bool PacRetGadgetsOnly;
+
+  void findUnsafeUses(SmallVector> &Reports);
+  void augmentUnsafeUseReports(const ArrayRef> Reports);
+
+public:
+  FunctionAnalysis(BinaryFunction &BF, MCPlusBuilder::AllocatorIdTy 
AllocatorId,
+ 

[llvm-branch-commits] [llvm] [mlir] [mlir][OpenMP] Convert omp.cancellation_point to LLVMIR (PR #137205)

2025-04-24 Thread Tom Eccles via llvm-branch-commits

https://github.com/tblah created 
https://github.com/llvm/llvm-project/pull/137205

This is basically identical to cancel except without the if clause.

taskgroup will be implemented in a followup PR.

>From 88fc39d0b2a3a846006889d4320b9f29ced025b7 Mon Sep 17 00:00:00 2001
From: Tom Eccles 
Date: Tue, 15 Apr 2025 15:40:39 +
Subject: [PATCH] [mlir][OpenMP] Convert omp.cancellation_point to LLVMIR

This is basically identical to cancel except without the if clause.

taskgroup will be implemented in a followup PR.
---
 .../llvm/Frontend/OpenMP/OMPIRBuilder.h   |  10 +
 llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp |  51 +
 .../OpenMP/OpenMPToLLVMIRTranslation.cpp  |  37 +++-
 .../LLVMIR/openmp-cancellation-point.mlir | 188 ++
 mlir/test/Target/LLVMIR/openmp-todo.mlir  |  16 +-
 5 files changed, 293 insertions(+), 9 deletions(-)
 create mode 100644 mlir/test/Target/LLVMIR/openmp-cancellation-point.mlir

diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h 
b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
index 10d69e561a987..14ad8629537f7 100644
--- a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
+++ b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
@@ -686,6 +686,16 @@ class OpenMPIRBuilder {
 Value *IfCondition,
 omp::Directive CanceledDirective);
 
+  /// Generator for '#omp cancellation point'
+  ///
+  /// \param Loc The location where the directive was encountered.
+  /// \param CanceledDirective The kind of directive that is cancled.
+  ///
+  /// \returns The insertion point after the barrier.
+  InsertPointOrErrorTy
+  createCancellationPoint(const LocationDescription &Loc,
+  omp::Directive CanceledDirective);
+
   /// Generator for '#omp parallel'
   ///
   /// \param Loc The insert and source location description.
diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp 
b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index 3f19088e6c73d..06aa61adcd739 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -1118,6 +1118,57 @@ OpenMPIRBuilder::createCancel(const LocationDescription 
&Loc,
   return Builder.saveIP();
 }
 
+OpenMPIRBuilder::InsertPointOrErrorTy
+OpenMPIRBuilder::createCancellationPoint(const LocationDescription &Loc,
+ omp::Directive CanceledDirective) {
+  if (!updateToLocation(Loc))
+return Loc.IP;
+
+  // LLVM utilities like blocks with terminators.
+  auto *UI = Builder.CreateUnreachable();
+  Builder.SetInsertPoint(UI);
+
+  Value *CancelKind = nullptr;
+  switch (CanceledDirective) {
+#define OMP_CANCEL_KIND(Enum, Str, DirectiveEnum, Value)   
\
+  case DirectiveEnum:  
\
+CancelKind = Builder.getInt32(Value);  
\
+break;
+#include "llvm/Frontend/OpenMP/OMPKinds.def"
+  default:
+llvm_unreachable("Unknown cancel kind!");
+  }
+
+  uint32_t SrcLocStrSize;
+  Constant *SrcLocStr = getOrCreateSrcLocStr(Loc, SrcLocStrSize);
+  Value *Ident = getOrCreateIdent(SrcLocStr, SrcLocStrSize);
+  Value *Args[] = {Ident, getOrCreateThreadID(Ident), CancelKind};
+  Value *Result = Builder.CreateCall(
+  getOrCreateRuntimeFunctionPtr(OMPRTL___kmpc_cancellationpoint), Args);
+  auto ExitCB = [this, CanceledDirective, Loc](InsertPointTy IP) -> Error {
+if (CanceledDirective == OMPD_parallel) {
+  IRBuilder<>::InsertPointGuard IPG(Builder);
+  Builder.restoreIP(IP);
+  return createBarrier(LocationDescription(Builder.saveIP(), Loc.DL),
+   omp::Directive::OMPD_unknown,
+   /* ForceSimpleCall */ false,
+   /* CheckCancelFlag */ false)
+  .takeError();
+}
+return Error::success();
+  };
+
+  // The actual cancel logic is shared with others, e.g., cancel_barriers.
+  if (Error Err = emitCancelationCheckImpl(Result, CanceledDirective, ExitCB))
+return Err;
+
+  // Update the insertion point and remove the terminator we introduced.
+  Builder.SetInsertPoint(UI->getParent());
+  UI->eraseFromParent();
+
+  return Builder.saveIP();
+}
+
 OpenMPIRBuilder::InsertPointTy OpenMPIRBuilder::emitTargetKernel(
 const LocationDescription &Loc, InsertPointTy AllocaIP, Value *&Return,
 Value *Ident, Value *DeviceID, Value *NumTeams, Value *NumThreads,
diff --git 
a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp 
b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
index 7d8a7ccb6e4ac..afae41f001736 100644
--- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
@@ -255,6 +255,9 @@ static LogicalResult checkImplementationStatus(Operation 
&op) {
   LogicalResult result = success();
   llvm::TypeSwitch(op)
   .

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: detect untrusted LR before tail call (PR #137224)

2025-04-24 Thread Anatoly Trosinenko via llvm-branch-commits

https://github.com/atrosinenko ready_for_review 
https://github.com/llvm/llvm-project/pull/137224
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [DirectX] Adding support for Root Descriptor in Obj2yaml/Yaml2Obj (PR #136732)

2025-04-24 Thread via llvm-branch-commits

https://github.com/joaosaffran updated 
https://github.com/llvm/llvm-project/pull/136732

>From 156975528eefad5be199165fcfce6398f6a79ab8 Mon Sep 17 00:00:00 2001
From: joaosaffran 
Date: Fri, 18 Apr 2025 00:41:37 +
Subject: [PATCH 1/6] adding support for root descriptor

---
 llvm/include/llvm/BinaryFormat/DXContainer.h  | 25 
 .../BinaryFormat/DXContainerConstants.def | 15 +
 .../llvm/MC/DXContainerRootSignature.h|  2 +
 llvm/include/llvm/Object/DXContainer.h| 39 
 .../include/llvm/ObjectYAML/DXContainerYAML.h | 36 ++-
 llvm/lib/MC/DXContainerRootSignature.cpp  | 26 
 llvm/lib/ObjectYAML/DXContainerEmitter.cpp| 19 +-
 llvm/lib/ObjectYAML/DXContainerYAML.cpp   | 63 +--
 .../RootSignature-MultipleParameters.yaml | 20 --
 9 files changed, 233 insertions(+), 12 deletions(-)

diff --git a/llvm/include/llvm/BinaryFormat/DXContainer.h 
b/llvm/include/llvm/BinaryFormat/DXContainer.h
index 455657980bf40..d6e585c94fed1 100644
--- a/llvm/include/llvm/BinaryFormat/DXContainer.h
+++ b/llvm/include/llvm/BinaryFormat/DXContainer.h
@@ -18,6 +18,7 @@
 #include "llvm/Support/SwapByteOrder.h"
 #include "llvm/TargetParser/Triple.h"
 
+#include 
 #include 
 
 namespace llvm {
@@ -158,6 +159,11 @@ enum class RootElementFlag : uint32_t {
 #include "DXContainerConstants.def"
 };
 
+#define ROOT_DESCRIPTOR_FLAG(Num, Val) Val = 1ull << Num,
+enum class RootDescriptorFlag : uint32_t {
+#include "DXContainerConstants.def"
+};
+
 #define ROOT_PARAMETER(Val, Enum) Enum = Val,
 enum class RootParameterType : uint32_t {
 #include "DXContainerConstants.def"
@@ -594,6 +600,25 @@ struct RootConstants {
 sys::swapByteOrder(Num32BitValues);
   }
 };
+struct RootDescriptor_V1_0 {
+  uint32_t ShaderRegister;
+  uint32_t RegisterSpace;
+  void swapBytes() {
+sys::swapByteOrder(ShaderRegister);
+sys::swapByteOrder(RegisterSpace);
+  }
+};
+
+struct RootDescriptor_V1_1 {
+  uint32_t ShaderRegister;
+  uint32_t RegisterSpace;
+  uint32_t Flags;
+  void swapBytes() {
+sys::swapByteOrder(ShaderRegister);
+sys::swapByteOrder(RegisterSpace);
+sys::swapByteOrder(Flags);
+  }
+};
 
 struct RootParameterHeader {
   uint32_t ParameterType;
diff --git a/llvm/include/llvm/BinaryFormat/DXContainerConstants.def 
b/llvm/include/llvm/BinaryFormat/DXContainerConstants.def
index 590ded5e8c899..6840901460ced 100644
--- a/llvm/include/llvm/BinaryFormat/DXContainerConstants.def
+++ b/llvm/include/llvm/BinaryFormat/DXContainerConstants.def
@@ -72,9 +72,24 @@ ROOT_ELEMENT_FLAG(11, SamplerHeapDirectlyIndexed)
 #undef ROOT_ELEMENT_FLAG
 #endif // ROOT_ELEMENT_FLAG
 
+ 
+// ROOT_ELEMENT_FLAG(bit offset for the flag, name).
+#ifdef ROOT_DESCRIPTOR_FLAG
+
+ROOT_DESCRIPTOR_FLAG(0, NONE)
+ROOT_DESCRIPTOR_FLAG(2, DATA_VOLATILE)
+ROOT_DESCRIPTOR_FLAG(4, DATA_STATIC_WHILE_SET_AT_EXECUTE)
+ROOT_DESCRIPTOR_FLAG(8, DATA_STATIC)
+#undef ROOT_DESCRIPTOR_FLAG
+#endif // ROOT_DESCRIPTOR_FLAG
+
+
 #ifdef ROOT_PARAMETER
 
 ROOT_PARAMETER(1, Constants32Bit)
+ROOT_PARAMETER(2, CBV)
+ROOT_PARAMETER(3, SRV)
+ROOT_PARAMETER(4, UAV)
 #undef ROOT_PARAMETER
 #endif // ROOT_PARAMETER
 
diff --git a/llvm/include/llvm/MC/DXContainerRootSignature.h 
b/llvm/include/llvm/MC/DXContainerRootSignature.h
index 6d3329a2c6ce9..5e3bc9b873f82 100644
--- a/llvm/include/llvm/MC/DXContainerRootSignature.h
+++ b/llvm/include/llvm/MC/DXContainerRootSignature.h
@@ -19,6 +19,8 @@ struct RootParameter {
   dxbc::RootParameterHeader Header;
   union {
 dxbc::RootConstants Constants;
+dxbc::RootDescriptor_V1_0 Descriptor_V10;
+dxbc::RootDescriptor_V1_1 Descriptor_V11;
   };
 };
 struct RootSignatureDesc {
diff --git a/llvm/include/llvm/Object/DXContainer.h 
b/llvm/include/llvm/Object/DXContainer.h
index e8287ce078365..7812906700fe3 100644
--- a/llvm/include/llvm/Object/DXContainer.h
+++ b/llvm/include/llvm/Object/DXContainer.h
@@ -15,6 +15,7 @@
 #ifndef LLVM_OBJECT_DXCONTAINER_H
 #define LLVM_OBJECT_DXCONTAINER_H
 
+#include "llvm/ADT/STLForwardCompat.h"
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/ADT/StringRef.h"
 #include "llvm/BinaryFormat/DXContainer.h"
@@ -149,6 +150,36 @@ struct RootConstantView : RootParameterView {
   }
 };
 
+struct RootDescriptorView_V1_0 : RootParameterView {
+  static bool classof(const RootParameterView *V) {
+return (V->Header.ParameterType ==
+llvm::to_underlying(dxbc::RootParameterType::CBV) ||
+V->Header.ParameterType ==
+llvm::to_underlying(dxbc::RootParameterType::SRV) ||
+V->Header.ParameterType ==
+llvm::to_underlying(dxbc::RootParameterType::UAV));
+  }
+
+  llvm::Expected read() {
+return readParameter();
+  }
+};
+
+struct RootDescriptorView_V1_1 : RootParameterView {
+  static bool classof(const RootParameterView *V) {
+return (V->Header.ParameterType ==
+llvm::to_underlying(dxbc::RootParameterType::CBV) ||
+V->Header.Par

[llvm-branch-commits] [llvm] [llvm] Extract and propagate indirect call type id (PR #87575)

2025-04-24 Thread Prabhu Rajasekaran via llvm-branch-commits

https://github.com/Prabhuk updated 
https://github.com/llvm/llvm-project/pull/87575

>From 1a8d810d352fbe84c0521c7614689b60ade693c8 Mon Sep 17 00:00:00 2001
From: Necip Fazil Yildiran 
Date: Tue, 19 Nov 2024 15:25:34 -0800
Subject: [PATCH 1/5] Fixed the tests and addressed most of the review
 comments.

Created using spr 1.3.6-beta.1
---
 llvm/include/llvm/CodeGen/MachineFunction.h   | 15 +++--
 .../CodeGen/AArch64/call-site-info-typeid.ll  | 28 +++--
 .../test/CodeGen/ARM/call-site-info-typeid.ll | 28 +++--
 .../CodeGen/MIR/X86/call-site-info-typeid.ll  | 58 ---
 .../CodeGen/MIR/X86/call-site-info-typeid.mir | 13 ++---
 .../CodeGen/Mips/call-site-info-typeid.ll | 28 +++--
 .../test/CodeGen/X86/call-site-info-typeid.ll | 28 +++--
 7 files changed, 71 insertions(+), 127 deletions(-)

diff --git a/llvm/include/llvm/CodeGen/MachineFunction.h 
b/llvm/include/llvm/CodeGen/MachineFunction.h
index bb0b87a3a04a3..44633df38a651 100644
--- a/llvm/include/llvm/CodeGen/MachineFunction.h
+++ b/llvm/include/llvm/CodeGen/MachineFunction.h
@@ -493,7 +493,7 @@ class LLVM_EXTERNAL_VISIBILITY MachineFunction {
 /// Callee type id.
 ConstantInt *TypeId = nullptr;
 
-CallSiteInfo() {}
+CallSiteInfo() = default;
 
 /// Extracts the numeric type id from the CallBase's type operand bundle,
 /// and sets TypeId. This is used as type id for the indirect call in the
@@ -503,12 +503,11 @@ class LLVM_EXTERNAL_VISIBILITY MachineFunction {
   if (!CB.isIndirectCall())
 return;
 
-  auto Opt = CB.getOperandBundle(LLVMContext::OB_type);
-  if (!Opt.has_value()) {
-errs() << "warning: cannot find indirect call type operand bundle for  
"
-  "call graph section\n";
+  std::optional Opt =
+  CB.getOperandBundle(LLVMContext::OB_type);
+  // Return if the operand bundle for call graph section cannot be found.
+  if (!Opt.has_value())
 return;
-  }
 
   // Get generalized type id string
   auto OB = Opt.value();
@@ -520,9 +519,9 @@ class LLVM_EXTERNAL_VISIBILITY MachineFunction {
  "invalid type identifier");
 
   // Compute numeric type id from generalized type id string
-  uint64_t TypeIdVal = llvm::MD5Hash(TypeIdStr->getString());
+  uint64_t TypeIdVal = MD5Hash(TypeIdStr->getString());
   IntegerType *Int64Ty = Type::getInt64Ty(CB.getContext());
-  TypeId = llvm::ConstantInt::get(Int64Ty, TypeIdVal, /*IsSigned=*/false);
+  TypeId = ConstantInt::get(Int64Ty, TypeIdVal, /*IsSigned=*/false);
 }
   };
 
diff --git a/llvm/test/CodeGen/AArch64/call-site-info-typeid.ll 
b/llvm/test/CodeGen/AArch64/call-site-info-typeid.ll
index f0a6b44755c5c..f3b98c2c7a395 100644
--- a/llvm/test/CodeGen/AArch64/call-site-info-typeid.ll
+++ b/llvm/test/CodeGen/AArch64/call-site-info-typeid.ll
@@ -1,14 +1,9 @@
-; Tests that call site type ids can be extracted and set from type operand
-; bundles.
+;; Tests that call site type ids can be extracted and set from type operand
+;; bundles.
 
-; Verify the exact typeId value to ensure it is not garbage but the value
-; computed as the type id from the type operand bundle.
-; RUN: llc --call-graph-section -mtriple aarch64-linux-gnu %s 
-stop-before=finalize-isel -o - | FileCheck %s
-
-; ModuleID = 'test.c'
-source_filename = "test.c"
-target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
-target triple = "aarch64-unknown-linux-gnu"
+;; Verify the exact typeId value to ensure it is not garbage but the value
+;; computed as the type id from the type operand bundle.
+; RUN: llc --call-graph-section -mtriple aarch64-linux-gnu < %s 
-stop-before=finalize-isel -o - | FileCheck %s
 
 define dso_local void @foo(i8 signext %a) !type !3 {
 entry:
@@ -19,10 +14,10 @@ entry:
 define dso_local i32 @main() !type !4 {
 entry:
   %retval = alloca i32, align 4
-  %fp = alloca void (i8)*, align 8
-  store i32 0, i32* %retval, align 4
-  store void (i8)* @foo, void (i8)** %fp, align 8
-  %0 = load void (i8)*, void (i8)** %fp, align 8
+  %fp = alloca ptr, align 8
+  store i32 0, ptr %retval, align 4
+  store ptr @foo, ptr %fp, align 8
+  %0 = load ptr, ptr %fp, align 8
   ; CHECK: callSites:
   ; CHECK-NEXT: - { bb: {{.*}}, offset: {{.*}}, fwdArgRegs: [], typeId:
   ; CHECK-NEXT: 7854600665770582568 }
@@ -30,10 +25,5 @@ entry:
   ret i32 0
 }
 
-!llvm.module.flags = !{!0, !1, !2}
-
-!0 = !{i32 1, !"wchar_size", i32 4}
-!1 = !{i32 7, !"uwtable", i32 1}
-!2 = !{i32 7, !"frame-pointer", i32 2}
 !3 = !{i64 0, !"_ZTSFvcE.generalized"}
 !4 = !{i64 0, !"_ZTSFiE.generalized"}
diff --git a/llvm/test/CodeGen/ARM/call-site-info-typeid.ll 
b/llvm/test/CodeGen/ARM/call-site-info-typeid.ll
index ec7f8a425051b..9feeef9a564cc 100644
--- a/llvm/test/CodeGen/ARM/call-site-info-typeid.ll
+++ b/llvm/test/CodeGen/ARM/call-site-info-typeid.ll
@@ -1,14 +1,9 @@
-; Tests that call site type ids can be extracted and set from type operand
-; bundles.
+;; Tests that ca

[llvm-branch-commits] [llvm] [llvm] Extract and propagate indirect call type id (PR #87575)

2025-04-24 Thread Prabhu Rajasekaran via llvm-branch-commits

https://github.com/Prabhuk updated 
https://github.com/llvm/llvm-project/pull/87575

>From 1a8d810d352fbe84c0521c7614689b60ade693c8 Mon Sep 17 00:00:00 2001
From: Necip Fazil Yildiran 
Date: Tue, 19 Nov 2024 15:25:34 -0800
Subject: [PATCH 1/5] Fixed the tests and addressed most of the review
 comments.

Created using spr 1.3.6-beta.1
---
 llvm/include/llvm/CodeGen/MachineFunction.h   | 15 +++--
 .../CodeGen/AArch64/call-site-info-typeid.ll  | 28 +++--
 .../test/CodeGen/ARM/call-site-info-typeid.ll | 28 +++--
 .../CodeGen/MIR/X86/call-site-info-typeid.ll  | 58 ---
 .../CodeGen/MIR/X86/call-site-info-typeid.mir | 13 ++---
 .../CodeGen/Mips/call-site-info-typeid.ll | 28 +++--
 .../test/CodeGen/X86/call-site-info-typeid.ll | 28 +++--
 7 files changed, 71 insertions(+), 127 deletions(-)

diff --git a/llvm/include/llvm/CodeGen/MachineFunction.h 
b/llvm/include/llvm/CodeGen/MachineFunction.h
index bb0b87a3a04a3..44633df38a651 100644
--- a/llvm/include/llvm/CodeGen/MachineFunction.h
+++ b/llvm/include/llvm/CodeGen/MachineFunction.h
@@ -493,7 +493,7 @@ class LLVM_EXTERNAL_VISIBILITY MachineFunction {
 /// Callee type id.
 ConstantInt *TypeId = nullptr;
 
-CallSiteInfo() {}
+CallSiteInfo() = default;
 
 /// Extracts the numeric type id from the CallBase's type operand bundle,
 /// and sets TypeId. This is used as type id for the indirect call in the
@@ -503,12 +503,11 @@ class LLVM_EXTERNAL_VISIBILITY MachineFunction {
   if (!CB.isIndirectCall())
 return;
 
-  auto Opt = CB.getOperandBundle(LLVMContext::OB_type);
-  if (!Opt.has_value()) {
-errs() << "warning: cannot find indirect call type operand bundle for  
"
-  "call graph section\n";
+  std::optional Opt =
+  CB.getOperandBundle(LLVMContext::OB_type);
+  // Return if the operand bundle for call graph section cannot be found.
+  if (!Opt.has_value())
 return;
-  }
 
   // Get generalized type id string
   auto OB = Opt.value();
@@ -520,9 +519,9 @@ class LLVM_EXTERNAL_VISIBILITY MachineFunction {
  "invalid type identifier");
 
   // Compute numeric type id from generalized type id string
-  uint64_t TypeIdVal = llvm::MD5Hash(TypeIdStr->getString());
+  uint64_t TypeIdVal = MD5Hash(TypeIdStr->getString());
   IntegerType *Int64Ty = Type::getInt64Ty(CB.getContext());
-  TypeId = llvm::ConstantInt::get(Int64Ty, TypeIdVal, /*IsSigned=*/false);
+  TypeId = ConstantInt::get(Int64Ty, TypeIdVal, /*IsSigned=*/false);
 }
   };
 
diff --git a/llvm/test/CodeGen/AArch64/call-site-info-typeid.ll 
b/llvm/test/CodeGen/AArch64/call-site-info-typeid.ll
index f0a6b44755c5c..f3b98c2c7a395 100644
--- a/llvm/test/CodeGen/AArch64/call-site-info-typeid.ll
+++ b/llvm/test/CodeGen/AArch64/call-site-info-typeid.ll
@@ -1,14 +1,9 @@
-; Tests that call site type ids can be extracted and set from type operand
-; bundles.
+;; Tests that call site type ids can be extracted and set from type operand
+;; bundles.
 
-; Verify the exact typeId value to ensure it is not garbage but the value
-; computed as the type id from the type operand bundle.
-; RUN: llc --call-graph-section -mtriple aarch64-linux-gnu %s 
-stop-before=finalize-isel -o - | FileCheck %s
-
-; ModuleID = 'test.c'
-source_filename = "test.c"
-target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
-target triple = "aarch64-unknown-linux-gnu"
+;; Verify the exact typeId value to ensure it is not garbage but the value
+;; computed as the type id from the type operand bundle.
+; RUN: llc --call-graph-section -mtriple aarch64-linux-gnu < %s 
-stop-before=finalize-isel -o - | FileCheck %s
 
 define dso_local void @foo(i8 signext %a) !type !3 {
 entry:
@@ -19,10 +14,10 @@ entry:
 define dso_local i32 @main() !type !4 {
 entry:
   %retval = alloca i32, align 4
-  %fp = alloca void (i8)*, align 8
-  store i32 0, i32* %retval, align 4
-  store void (i8)* @foo, void (i8)** %fp, align 8
-  %0 = load void (i8)*, void (i8)** %fp, align 8
+  %fp = alloca ptr, align 8
+  store i32 0, ptr %retval, align 4
+  store ptr @foo, ptr %fp, align 8
+  %0 = load ptr, ptr %fp, align 8
   ; CHECK: callSites:
   ; CHECK-NEXT: - { bb: {{.*}}, offset: {{.*}}, fwdArgRegs: [], typeId:
   ; CHECK-NEXT: 7854600665770582568 }
@@ -30,10 +25,5 @@ entry:
   ret i32 0
 }
 
-!llvm.module.flags = !{!0, !1, !2}
-
-!0 = !{i32 1, !"wchar_size", i32 4}
-!1 = !{i32 7, !"uwtable", i32 1}
-!2 = !{i32 7, !"frame-pointer", i32 2}
 !3 = !{i64 0, !"_ZTSFvcE.generalized"}
 !4 = !{i64 0, !"_ZTSFiE.generalized"}
diff --git a/llvm/test/CodeGen/ARM/call-site-info-typeid.ll 
b/llvm/test/CodeGen/ARM/call-site-info-typeid.ll
index ec7f8a425051b..9feeef9a564cc 100644
--- a/llvm/test/CodeGen/ARM/call-site-info-typeid.ll
+++ b/llvm/test/CodeGen/ARM/call-site-info-typeid.ll
@@ -1,14 +1,9 @@
-; Tests that call site type ids can be extracted and set from type operand
-; bundles.
+;; Tests that ca

[llvm-branch-commits] [llvm] [llvm] Add option to emit `callgraph` section (PR #87574)

2025-04-24 Thread Prabhu Rajasekaran via llvm-branch-commits

https://github.com/Prabhuk updated 
https://github.com/llvm/llvm-project/pull/87574

>From 1d7ee612e408ee7e64e984eb08e6d7089a435d09 Mon Sep 17 00:00:00 2001
From: Necip Fazil Yildiran 
Date: Sun, 2 Feb 2025 00:58:49 +
Subject: [PATCH 1/6] Simplify MIR test.

Created using spr 1.3.6-beta.1
---
 .../CodeGen/MIR/X86/call-site-info-typeid.mir | 21 ++-
 1 file changed, 6 insertions(+), 15 deletions(-)

diff --git a/llvm/test/CodeGen/MIR/X86/call-site-info-typeid.mir 
b/llvm/test/CodeGen/MIR/X86/call-site-info-typeid.mir
index 5ab797bfcc18f..a99ee50a608fb 100644
--- a/llvm/test/CodeGen/MIR/X86/call-site-info-typeid.mir
+++ b/llvm/test/CodeGen/MIR/X86/call-site-info-typeid.mir
@@ -8,11 +8,6 @@
 # CHECK-NEXT: 123456789 }
 
 --- |
-  ; ModuleID = 'test.ll'
-  source_filename = "test.ll"
-  target datalayout = 
"e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
-  target triple = "x86_64-unknown-linux-gnu"
-  
   define dso_local void @foo(i8 signext %a) {
   entry:
 ret void
@@ -21,10 +16,10 @@
   define dso_local i32 @main() {
   entry:
 %retval = alloca i32, align 4
-%fp = alloca void (i8)*, align 8
-store i32 0, i32* %retval, align 4
-store void (i8)* @foo, void (i8)** %fp, align 8
-%0 = load void (i8)*, void (i8)** %fp, align 8
+%fp = alloca ptr, align 8
+store i32 0, ptr %retval, align 4
+store ptr @foo, ptr %fp, align 8
+%0 = load ptr, ptr %fp, align 8
 call void %0(i8 signext 97)
 ret i32 0
   }
@@ -42,12 +37,8 @@ body: |
 name:main
 tracksRegLiveness: true
 stack:
-  - { id: 0, name: retval, type: default, offset: 0, size: 4, alignment: 4, 
-  stack-id: default, callee-saved-register: '', callee-saved-restored: 
true, 
-  debug-info-variable: '', debug-info-expression: '', debug-info-location: 
'' }
-  - { id: 1, name: fp, type: default, offset: 0, size: 8, alignment: 8, 
-  stack-id: default, callee-saved-register: '', callee-saved-restored: 
true, 
-  debug-info-variable: '', debug-info-expression: '', debug-info-location: 
'' }
+  - { id: 0, name: retval, size: 4, alignment: 4 }
+  - { id: 1, name: fp, size: 8, alignment: 8 }
 callSites:
   - { bb: 0, offset: 6, fwdArgRegs: [], typeId: 
 123456789 }

>From 86e2c9dc37170499252ed50c6bbef2931e106fbb Mon Sep 17 00:00:00 2001
From: prabhukr 
Date: Thu, 13 Mar 2025 01:03:40 +
Subject: [PATCH 2/6] Add requested tests part 1.

Created using spr 1.3.6-beta.1
---
 ...te-info-ambiguous-indirect-call-typeid.mir | 145 ++
 .../call-site-info-direct-calls-typeid.mir| 145 ++
 2 files changed, 290 insertions(+)
 create mode 100644 
llvm/test/CodeGen/MIR/X86/call-site-info-ambiguous-indirect-call-typeid.mir
 create mode 100644 
llvm/test/CodeGen/MIR/X86/call-site-info-direct-calls-typeid.mir

diff --git 
a/llvm/test/CodeGen/MIR/X86/call-site-info-ambiguous-indirect-call-typeid.mir 
b/llvm/test/CodeGen/MIR/X86/call-site-info-ambiguous-indirect-call-typeid.mir
new file mode 100644
index 0..9d1b099cc9093
--- /dev/null
+++ 
b/llvm/test/CodeGen/MIR/X86/call-site-info-ambiguous-indirect-call-typeid.mir
@@ -0,0 +1,145 @@
+# Test MIR printer and parser for type id field in callSites. It is used
+# for propogating call site type identifiers to emit in the call graph section.
+
+# RUN: llc --call-graph-section %s -run-pass=none -o - | FileCheck %s
+# CHECK: name: main
+# CHECK: callSites:
+# CHECK-NEXT: - { bb: {{.*}}, offset: {{.*}}, fwdArgRegs: []
+# CHECK-NEXT: - { bb: {{.*}}, offset: {{.*}}, fwdArgRegs: [], typeId:
+# CHECK-NEXT: 1234567890 }
+
+--- |  
+  ; Function Attrs: mustprogress noinline nounwind optnone uwtable
+  define dso_local noundef i32 @_Z3addii(i32 noundef %a, i32 noundef %b) #0 
!type !6 !type !6 {
+  entry:
+%a.addr = alloca i32, align 4
+%b.addr = alloca i32, align 4
+store i32 %a, ptr %a.addr, align 4
+store i32 %b, ptr %b.addr, align 4
+%0 = load i32, ptr %a.addr, align 4
+%1 = load i32, ptr %b.addr, align 4
+%add = add nsw i32 %0, %1
+ret i32 %add
+  }
+  
+  ; Function Attrs: mustprogress noinline nounwind optnone uwtable
+  define dso_local noundef i32 @_Z8multiplyii(i32 noundef %a, i32 noundef %b) 
#0 !type !6 !type !6 {
+  entry:
+%a.addr = alloca i32, align 4
+%b.addr = alloca i32, align 4
+store i32 %a, ptr %a.addr, align 4
+store i32 %b, ptr %b.addr, align 4
+%0 = load i32, ptr %a.addr, align 4
+%1 = load i32, ptr %b.addr, align 4
+%mul = mul nsw i32 %0, %1
+ret i32 %mul
+  }
+  
+  ; Function Attrs: mustprogress noinline nounwind optnone uwtable
+  define dso_local noundef ptr @_Z13get_operationb(i1 noundef zeroext 
%is_addition) #0 !type !7 !type !7 {
+  entry:
+%is_addition.addr = alloca i8, align 1
+%storedv = zext i1 %is_addition to i8
+store i8 %storedv, ptr %is_addition.addr, align 1
+%0 = load i8, ptr %is_addition.addr, align 1
+%loadedv = trunc i8 %0 to i1
+br i1 %loade

[llvm-branch-commits] [llvm] [llvm] Add option to emit `callgraph` section (PR #87574)

2025-04-24 Thread Prabhu Rajasekaran via llvm-branch-commits

https://github.com/Prabhuk updated 
https://github.com/llvm/llvm-project/pull/87574

>From 1d7ee612e408ee7e64e984eb08e6d7089a435d09 Mon Sep 17 00:00:00 2001
From: Necip Fazil Yildiran 
Date: Sun, 2 Feb 2025 00:58:49 +
Subject: [PATCH 1/6] Simplify MIR test.

Created using spr 1.3.6-beta.1
---
 .../CodeGen/MIR/X86/call-site-info-typeid.mir | 21 ++-
 1 file changed, 6 insertions(+), 15 deletions(-)

diff --git a/llvm/test/CodeGen/MIR/X86/call-site-info-typeid.mir 
b/llvm/test/CodeGen/MIR/X86/call-site-info-typeid.mir
index 5ab797bfcc18f..a99ee50a608fb 100644
--- a/llvm/test/CodeGen/MIR/X86/call-site-info-typeid.mir
+++ b/llvm/test/CodeGen/MIR/X86/call-site-info-typeid.mir
@@ -8,11 +8,6 @@
 # CHECK-NEXT: 123456789 }
 
 --- |
-  ; ModuleID = 'test.ll'
-  source_filename = "test.ll"
-  target datalayout = 
"e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
-  target triple = "x86_64-unknown-linux-gnu"
-  
   define dso_local void @foo(i8 signext %a) {
   entry:
 ret void
@@ -21,10 +16,10 @@
   define dso_local i32 @main() {
   entry:
 %retval = alloca i32, align 4
-%fp = alloca void (i8)*, align 8
-store i32 0, i32* %retval, align 4
-store void (i8)* @foo, void (i8)** %fp, align 8
-%0 = load void (i8)*, void (i8)** %fp, align 8
+%fp = alloca ptr, align 8
+store i32 0, ptr %retval, align 4
+store ptr @foo, ptr %fp, align 8
+%0 = load ptr, ptr %fp, align 8
 call void %0(i8 signext 97)
 ret i32 0
   }
@@ -42,12 +37,8 @@ body: |
 name:main
 tracksRegLiveness: true
 stack:
-  - { id: 0, name: retval, type: default, offset: 0, size: 4, alignment: 4, 
-  stack-id: default, callee-saved-register: '', callee-saved-restored: 
true, 
-  debug-info-variable: '', debug-info-expression: '', debug-info-location: 
'' }
-  - { id: 1, name: fp, type: default, offset: 0, size: 8, alignment: 8, 
-  stack-id: default, callee-saved-register: '', callee-saved-restored: 
true, 
-  debug-info-variable: '', debug-info-expression: '', debug-info-location: 
'' }
+  - { id: 0, name: retval, size: 4, alignment: 4 }
+  - { id: 1, name: fp, size: 8, alignment: 8 }
 callSites:
   - { bb: 0, offset: 6, fwdArgRegs: [], typeId: 
 123456789 }

>From 86e2c9dc37170499252ed50c6bbef2931e106fbb Mon Sep 17 00:00:00 2001
From: prabhukr 
Date: Thu, 13 Mar 2025 01:03:40 +
Subject: [PATCH 2/6] Add requested tests part 1.

Created using spr 1.3.6-beta.1
---
 ...te-info-ambiguous-indirect-call-typeid.mir | 145 ++
 .../call-site-info-direct-calls-typeid.mir| 145 ++
 2 files changed, 290 insertions(+)
 create mode 100644 
llvm/test/CodeGen/MIR/X86/call-site-info-ambiguous-indirect-call-typeid.mir
 create mode 100644 
llvm/test/CodeGen/MIR/X86/call-site-info-direct-calls-typeid.mir

diff --git 
a/llvm/test/CodeGen/MIR/X86/call-site-info-ambiguous-indirect-call-typeid.mir 
b/llvm/test/CodeGen/MIR/X86/call-site-info-ambiguous-indirect-call-typeid.mir
new file mode 100644
index 0..9d1b099cc9093
--- /dev/null
+++ 
b/llvm/test/CodeGen/MIR/X86/call-site-info-ambiguous-indirect-call-typeid.mir
@@ -0,0 +1,145 @@
+# Test MIR printer and parser for type id field in callSites. It is used
+# for propogating call site type identifiers to emit in the call graph section.
+
+# RUN: llc --call-graph-section %s -run-pass=none -o - | FileCheck %s
+# CHECK: name: main
+# CHECK: callSites:
+# CHECK-NEXT: - { bb: {{.*}}, offset: {{.*}}, fwdArgRegs: []
+# CHECK-NEXT: - { bb: {{.*}}, offset: {{.*}}, fwdArgRegs: [], typeId:
+# CHECK-NEXT: 1234567890 }
+
+--- |  
+  ; Function Attrs: mustprogress noinline nounwind optnone uwtable
+  define dso_local noundef i32 @_Z3addii(i32 noundef %a, i32 noundef %b) #0 
!type !6 !type !6 {
+  entry:
+%a.addr = alloca i32, align 4
+%b.addr = alloca i32, align 4
+store i32 %a, ptr %a.addr, align 4
+store i32 %b, ptr %b.addr, align 4
+%0 = load i32, ptr %a.addr, align 4
+%1 = load i32, ptr %b.addr, align 4
+%add = add nsw i32 %0, %1
+ret i32 %add
+  }
+  
+  ; Function Attrs: mustprogress noinline nounwind optnone uwtable
+  define dso_local noundef i32 @_Z8multiplyii(i32 noundef %a, i32 noundef %b) 
#0 !type !6 !type !6 {
+  entry:
+%a.addr = alloca i32, align 4
+%b.addr = alloca i32, align 4
+store i32 %a, ptr %a.addr, align 4
+store i32 %b, ptr %b.addr, align 4
+%0 = load i32, ptr %a.addr, align 4
+%1 = load i32, ptr %b.addr, align 4
+%mul = mul nsw i32 %0, %1
+ret i32 %mul
+  }
+  
+  ; Function Attrs: mustprogress noinline nounwind optnone uwtable
+  define dso_local noundef ptr @_Z13get_operationb(i1 noundef zeroext 
%is_addition) #0 !type !7 !type !7 {
+  entry:
+%is_addition.addr = alloca i8, align 1
+%storedv = zext i1 %is_addition to i8
+store i8 %storedv, ptr %is_addition.addr, align 1
+%0 = load i8, ptr %is_addition.addr, align 1
+%loadedv = trunc i8 %0 to i1
+br i1 %loade

[llvm-branch-commits] [llvm] [llvm][AsmPrinter] Emit call graph section (PR #87576)

2025-04-24 Thread Prabhu Rajasekaran via llvm-branch-commits

https://github.com/Prabhuk updated 
https://github.com/llvm/llvm-project/pull/87576

>From 6b67376bd5e1f21606017c83cc67f2186ba36a33 Mon Sep 17 00:00:00 2001
From: Necip Fazil Yildiran 
Date: Thu, 13 Mar 2025 01:41:04 +
Subject: [PATCH 1/4] Updated the test as reviewers suggested.

Created using spr 1.3.6-beta.1
---
 llvm/test/CodeGen/X86/call-graph-section.ll | 66 +++
 llvm/test/CodeGen/call-graph-section.ll | 73 -
 2 files changed, 66 insertions(+), 73 deletions(-)
 create mode 100644 llvm/test/CodeGen/X86/call-graph-section.ll
 delete mode 100644 llvm/test/CodeGen/call-graph-section.ll

diff --git a/llvm/test/CodeGen/X86/call-graph-section.ll 
b/llvm/test/CodeGen/X86/call-graph-section.ll
new file mode 100644
index 0..a77a2b8051ed3
--- /dev/null
+++ b/llvm/test/CodeGen/X86/call-graph-section.ll
@@ -0,0 +1,66 @@
+;; Tests that we store the type identifiers in .callgraph section of the 
binary.
+
+; RUN: llc --call-graph-section -filetype=obj -o - < %s | \
+; RUN: llvm-readelf -x .callgraph - | FileCheck %s
+
+; Function Attrs: noinline nounwind optnone uwtable
+define dso_local void @foo() #0 !type !4 {
+entry:
+  ret void
+}
+
+; Function Attrs: noinline nounwind optnone uwtable
+define dso_local i32 @bar(i8 signext %a) #0 !type !5 {
+entry:
+  %a.addr = alloca i8, align 1
+  store i8 %a, ptr %a.addr, align 1
+  ret i32 0
+}
+
+; Function Attrs: noinline nounwind optnone uwtable
+define dso_local ptr @baz(ptr %a) #0 !type !6 {
+entry:
+  %a.addr = alloca ptr, align 8
+  store ptr %a, ptr %a.addr, align 8
+  ret ptr null
+}
+
+; Function Attrs: noinline nounwind optnone uwtable
+define dso_local void @main() #0 !type !7 {
+entry:
+  %retval = alloca i32, align 4
+  %fp_foo = alloca ptr, align 8
+  %a = alloca i8, align 1
+  %fp_bar = alloca ptr, align 8
+  %fp_baz = alloca ptr, align 8
+  store i32 0, ptr %retval, align 4
+  store ptr @foo, ptr %fp_foo, align 8
+  %0 = load ptr, ptr %fp_foo, align 8
+  call void (...) %0() [ "callee_type"(metadata !"_ZTSFvE.generalized") ]
+  store ptr @bar, ptr %fp_bar, align 8
+  %1 = load ptr, ptr %fp_bar, align 8
+  %2 = load i8, ptr %a, align 1
+  %call = call i32 %1(i8 signext %2) [ "callee_type"(metadata 
!"_ZTSFicE.generalized") ]
+  store ptr @baz, ptr %fp_baz, align 8
+  %3 = load ptr, ptr %fp_baz, align 8
+  %call1 = call ptr %3(ptr %a) [ "callee_type"(metadata 
!"_ZTSFPvS_E.generalized") ]
+  call void @foo() [ "callee_type"(metadata !"_ZTSFvE.generalized") ]
+  %4 = load i8, ptr %a, align 1
+  %call2 = call i32 @bar(i8 signext %4) [ "callee_type"(metadata 
!"_ZTSFicE.generalized") ]
+  %call3 = call ptr @baz(ptr %a) [ "callee_type"(metadata 
!"_ZTSFPvS_E.generalized") ]
+  ret void
+}
+
+;; Check that the numeric type id (md5 hash) for the below type ids are emitted
+;; to the callgraph section.
+
+; CHECK: Hex dump of section '.callgraph':
+
+; CHECK-DAG: 2444f731 f5eecb3e
+!4 = !{i64 0, !"_ZTSFvE.generalized"}
+; CHECK-DAG: 5486bc59 814b8e30
+!5 = !{i64 0, !"_ZTSFicE.generalized"}
+; CHECK-DAG: 7ade6814 f897fd77
+!6 = !{i64 0, !"_ZTSFPvS_E.generalized"}
+; CHECK-DAG: caaf769a 600968fa
+!7 = !{i64 0, !"_ZTSFiE.generalized"}
diff --git a/llvm/test/CodeGen/call-graph-section.ll 
b/llvm/test/CodeGen/call-graph-section.ll
deleted file mode 100644
index bb158d11e82c9..0
--- a/llvm/test/CodeGen/call-graph-section.ll
+++ /dev/null
@@ -1,73 +0,0 @@
-; Tests that we store the type identifiers in .callgraph section of the binary.
-
-; RUN: llc --call-graph-section -filetype=obj -o - < %s | \
-; RUN: llvm-readelf -x .callgraph - | FileCheck %s
-
-target triple = "x86_64-unknown-linux-gnu"
-
-define dso_local void @foo() #0 !type !4 {
-entry:
-  ret void
-}
-
-define dso_local i32 @bar(i8 signext %a) #0 !type !5 {
-entry:
-  %a.addr = alloca i8, align 1
-  store i8 %a, i8* %a.addr, align 1
-  ret i32 0
-}
-
-define dso_local i32* @baz(i8* %a) #0 !type !6 {
-entry:
-  %a.addr = alloca i8*, align 8
-  store i8* %a, i8** %a.addr, align 8
-  ret i32* null
-}
-
-define dso_local i32 @main() #0 !type !7 {
-entry:
-  %retval = alloca i32, align 4
-  %fp_foo = alloca void (...)*, align 8
-  %a = alloca i8, align 1
-  %fp_bar = alloca i32 (i8)*, align 8
-  %fp_baz = alloca i32* (i8*)*, align 8
-  store i32 0, i32* %retval, align 4
-  store void (...)* bitcast (void ()* @foo to void (...)*), void (...)** 
%fp_foo, align 8
-  %0 = load void (...)*, void (...)** %fp_foo, align 8
-  call void (...) %0() [ "callee_type"(metadata !"_ZTSFvE.generalized") ]
-  store i32 (i8)* @bar, i32 (i8)** %fp_bar, align 8
-  %1 = load i32 (i8)*, i32 (i8)** %fp_bar, align 8
-  %2 = load i8, i8* %a, align 1
-  %call = call i32 %1(i8 signext %2) [ "callee_type"(metadata 
!"_ZTSFicE.generalized") ]
-  store i32* (i8*)* @baz, i32* (i8*)** %fp_baz, align 8
-  %3 = load i32* (i8*)*, i32* (i8*)** %fp_baz, align 8
-  %call1 = call i32* %3(i8* %a) [ "callee_type"(metadata 
!"_ZTSFPvS_E.generalized") ]
-  call void @foo() [ "callee_type"(meta

[llvm-branch-commits] [clang] [clang] Introduce CallGraphSection option (PR #117037)

2025-04-24 Thread Prabhu Rajasekaran via llvm-branch-commits

https://github.com/Prabhuk updated 
https://github.com/llvm/llvm-project/pull/117037

>From 6a12be2c5b60a95a06875b0b2c4f14228d1fa882 Mon Sep 17 00:00:00 2001
From: prabhukr 
Date: Wed, 12 Mar 2025 23:30:01 +
Subject: [PATCH] Fix EOF newlines.

Created using spr 1.3.6-beta.1
---
 clang/test/Driver/call-graph-section.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/clang/test/Driver/call-graph-section.c 
b/clang/test/Driver/call-graph-section.c
index 108446729d857..5832aa6754137 100644
--- a/clang/test/Driver/call-graph-section.c
+++ b/clang/test/Driver/call-graph-section.c
@@ -2,4 +2,4 @@
 // RUN: %clang -### -S -fcall-graph-section -fno-call-graph-section %s 2>&1 | 
FileCheck --check-prefix=NO-CALL-GRAPH-SECTION %s
 
 // CALL-GRAPH-SECTION: "-fcall-graph-section"
-// NO-CALL-GRAPH-SECTION-NOT: "-fcall-graph-section"
\ No newline at end of file
+// NO-CALL-GRAPH-SECTION-NOT: "-fcall-graph-section"

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [llvm][AsmPrinter] Emit call graph section (PR #87576)

2025-04-24 Thread Prabhu Rajasekaran via llvm-branch-commits

https://github.com/Prabhuk updated 
https://github.com/llvm/llvm-project/pull/87576

>From 6b67376bd5e1f21606017c83cc67f2186ba36a33 Mon Sep 17 00:00:00 2001
From: Necip Fazil Yildiran 
Date: Thu, 13 Mar 2025 01:41:04 +
Subject: [PATCH 1/4] Updated the test as reviewers suggested.

Created using spr 1.3.6-beta.1
---
 llvm/test/CodeGen/X86/call-graph-section.ll | 66 +++
 llvm/test/CodeGen/call-graph-section.ll | 73 -
 2 files changed, 66 insertions(+), 73 deletions(-)
 create mode 100644 llvm/test/CodeGen/X86/call-graph-section.ll
 delete mode 100644 llvm/test/CodeGen/call-graph-section.ll

diff --git a/llvm/test/CodeGen/X86/call-graph-section.ll 
b/llvm/test/CodeGen/X86/call-graph-section.ll
new file mode 100644
index 0..a77a2b8051ed3
--- /dev/null
+++ b/llvm/test/CodeGen/X86/call-graph-section.ll
@@ -0,0 +1,66 @@
+;; Tests that we store the type identifiers in .callgraph section of the 
binary.
+
+; RUN: llc --call-graph-section -filetype=obj -o - < %s | \
+; RUN: llvm-readelf -x .callgraph - | FileCheck %s
+
+; Function Attrs: noinline nounwind optnone uwtable
+define dso_local void @foo() #0 !type !4 {
+entry:
+  ret void
+}
+
+; Function Attrs: noinline nounwind optnone uwtable
+define dso_local i32 @bar(i8 signext %a) #0 !type !5 {
+entry:
+  %a.addr = alloca i8, align 1
+  store i8 %a, ptr %a.addr, align 1
+  ret i32 0
+}
+
+; Function Attrs: noinline nounwind optnone uwtable
+define dso_local ptr @baz(ptr %a) #0 !type !6 {
+entry:
+  %a.addr = alloca ptr, align 8
+  store ptr %a, ptr %a.addr, align 8
+  ret ptr null
+}
+
+; Function Attrs: noinline nounwind optnone uwtable
+define dso_local void @main() #0 !type !7 {
+entry:
+  %retval = alloca i32, align 4
+  %fp_foo = alloca ptr, align 8
+  %a = alloca i8, align 1
+  %fp_bar = alloca ptr, align 8
+  %fp_baz = alloca ptr, align 8
+  store i32 0, ptr %retval, align 4
+  store ptr @foo, ptr %fp_foo, align 8
+  %0 = load ptr, ptr %fp_foo, align 8
+  call void (...) %0() [ "callee_type"(metadata !"_ZTSFvE.generalized") ]
+  store ptr @bar, ptr %fp_bar, align 8
+  %1 = load ptr, ptr %fp_bar, align 8
+  %2 = load i8, ptr %a, align 1
+  %call = call i32 %1(i8 signext %2) [ "callee_type"(metadata 
!"_ZTSFicE.generalized") ]
+  store ptr @baz, ptr %fp_baz, align 8
+  %3 = load ptr, ptr %fp_baz, align 8
+  %call1 = call ptr %3(ptr %a) [ "callee_type"(metadata 
!"_ZTSFPvS_E.generalized") ]
+  call void @foo() [ "callee_type"(metadata !"_ZTSFvE.generalized") ]
+  %4 = load i8, ptr %a, align 1
+  %call2 = call i32 @bar(i8 signext %4) [ "callee_type"(metadata 
!"_ZTSFicE.generalized") ]
+  %call3 = call ptr @baz(ptr %a) [ "callee_type"(metadata 
!"_ZTSFPvS_E.generalized") ]
+  ret void
+}
+
+;; Check that the numeric type id (md5 hash) for the below type ids are emitted
+;; to the callgraph section.
+
+; CHECK: Hex dump of section '.callgraph':
+
+; CHECK-DAG: 2444f731 f5eecb3e
+!4 = !{i64 0, !"_ZTSFvE.generalized"}
+; CHECK-DAG: 5486bc59 814b8e30
+!5 = !{i64 0, !"_ZTSFicE.generalized"}
+; CHECK-DAG: 7ade6814 f897fd77
+!6 = !{i64 0, !"_ZTSFPvS_E.generalized"}
+; CHECK-DAG: caaf769a 600968fa
+!7 = !{i64 0, !"_ZTSFiE.generalized"}
diff --git a/llvm/test/CodeGen/call-graph-section.ll 
b/llvm/test/CodeGen/call-graph-section.ll
deleted file mode 100644
index bb158d11e82c9..0
--- a/llvm/test/CodeGen/call-graph-section.ll
+++ /dev/null
@@ -1,73 +0,0 @@
-; Tests that we store the type identifiers in .callgraph section of the binary.
-
-; RUN: llc --call-graph-section -filetype=obj -o - < %s | \
-; RUN: llvm-readelf -x .callgraph - | FileCheck %s
-
-target triple = "x86_64-unknown-linux-gnu"
-
-define dso_local void @foo() #0 !type !4 {
-entry:
-  ret void
-}
-
-define dso_local i32 @bar(i8 signext %a) #0 !type !5 {
-entry:
-  %a.addr = alloca i8, align 1
-  store i8 %a, i8* %a.addr, align 1
-  ret i32 0
-}
-
-define dso_local i32* @baz(i8* %a) #0 !type !6 {
-entry:
-  %a.addr = alloca i8*, align 8
-  store i8* %a, i8** %a.addr, align 8
-  ret i32* null
-}
-
-define dso_local i32 @main() #0 !type !7 {
-entry:
-  %retval = alloca i32, align 4
-  %fp_foo = alloca void (...)*, align 8
-  %a = alloca i8, align 1
-  %fp_bar = alloca i32 (i8)*, align 8
-  %fp_baz = alloca i32* (i8*)*, align 8
-  store i32 0, i32* %retval, align 4
-  store void (...)* bitcast (void ()* @foo to void (...)*), void (...)** 
%fp_foo, align 8
-  %0 = load void (...)*, void (...)** %fp_foo, align 8
-  call void (...) %0() [ "callee_type"(metadata !"_ZTSFvE.generalized") ]
-  store i32 (i8)* @bar, i32 (i8)** %fp_bar, align 8
-  %1 = load i32 (i8)*, i32 (i8)** %fp_bar, align 8
-  %2 = load i8, i8* %a, align 1
-  %call = call i32 %1(i8 signext %2) [ "callee_type"(metadata 
!"_ZTSFicE.generalized") ]
-  store i32* (i8*)* @baz, i32* (i8*)** %fp_baz, align 8
-  %3 = load i32* (i8*)*, i32* (i8*)** %fp_baz, align 8
-  %call1 = call i32* %3(i8* %a) [ "callee_type"(metadata 
!"_ZTSFPvS_E.generalized") ]
-  call void @foo() [ "callee_type"(meta

[llvm-branch-commits] [clang] [clang] Introduce CallGraphSection option (PR #117037)

2025-04-24 Thread Prabhu Rajasekaran via llvm-branch-commits

https://github.com/Prabhuk updated 
https://github.com/llvm/llvm-project/pull/117037

>From 6a12be2c5b60a95a06875b0b2c4f14228d1fa882 Mon Sep 17 00:00:00 2001
From: prabhukr 
Date: Wed, 12 Mar 2025 23:30:01 +
Subject: [PATCH] Fix EOF newlines.

Created using spr 1.3.6-beta.1
---
 clang/test/Driver/call-graph-section.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/clang/test/Driver/call-graph-section.c 
b/clang/test/Driver/call-graph-section.c
index 108446729d857..5832aa6754137 100644
--- a/clang/test/Driver/call-graph-section.c
+++ b/clang/test/Driver/call-graph-section.c
@@ -2,4 +2,4 @@
 // RUN: %clang -### -S -fcall-graph-section -fno-call-graph-section %s 2>&1 | 
FileCheck --check-prefix=NO-CALL-GRAPH-SECTION %s
 
 // CALL-GRAPH-SECTION: "-fcall-graph-section"
-// NO-CALL-GRAPH-SECTION-NOT: "-fcall-graph-section"
\ No newline at end of file
+// NO-CALL-GRAPH-SECTION-NOT: "-fcall-graph-section"

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [clang] callee_type metadata for indirect calls (PR #117036)

2025-04-24 Thread Prabhu Rajasekaran via llvm-branch-commits

https://github.com/Prabhuk updated 
https://github.com/llvm/llvm-project/pull/117036

>From b7fbe09b32ff02d4f7c52d82fbf8b5cd28138852 Mon Sep 17 00:00:00 2001
From: prabhukr 
Date: Wed, 23 Apr 2025 04:05:47 +
Subject: [PATCH] Address review comments.

Created using spr 1.3.6-beta.1
---
 clang/lib/CodeGen/CGCall.cpp|  8 
 clang/lib/CodeGen/CodeGenModule.cpp | 10 +-
 clang/lib/CodeGen/CodeGenModule.h   |  4 ++--
 3 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp
index 185ee1a970aac..d8ab7140f7943 100644
--- a/clang/lib/CodeGen/CGCall.cpp
+++ b/clang/lib/CodeGen/CGCall.cpp
@@ -5780,19 +5780,19 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo 
&CallInfo,
   if (callOrInvoke) {
 *callOrInvoke = CI;
 if (CGM.getCodeGenOpts().CallGraphSection) {
-  assert((TargetDecl && TargetDecl->getFunctionType() ||
-  Callee.getAbstractInfo().getCalleeFunctionProtoType()) &&
- "cannot find callsite type");
   QualType CST;
   if (TargetDecl && TargetDecl->getFunctionType())
 CST = QualType(TargetDecl->getFunctionType(), 0);
   else if (const auto *FPT =
Callee.getAbstractInfo().getCalleeFunctionProtoType())
 CST = QualType(FPT, 0);
+  else
+llvm_unreachable(
+"Cannot find the callee type to generate callee_type metadata.");
 
   // Set type identifier metadata of indirect calls for call graph section.
   if (!CST.isNull())
-CGM.CreateCalleeTypeMetadataForIcall(CST, *callOrInvoke);
+CGM.createCalleeTypeMetadataForIcall(CST, *callOrInvoke);
 }
   }
 
diff --git a/clang/lib/CodeGen/CodeGenModule.cpp 
b/clang/lib/CodeGen/CodeGenModule.cpp
index 43cd2405571cf..2fc99639a75cb 100644
--- a/clang/lib/CodeGen/CodeGenModule.cpp
+++ b/clang/lib/CodeGen/CodeGenModule.cpp
@@ -2654,7 +2654,7 @@ void 
CodeGenModule::SetLLVMFunctionAttributesForDefinition(const Decl *D,
   // Skip available_externally functions. They won't be codegen'ed in the
   // current module anyway.
   if (getContext().GetGVALinkageForFunction(FD) != GVA_AvailableExternally)
-CreateFunctionTypeMetadataForIcall(FD, F);
+createFunctionTypeMetadataForIcall(FD, F);
 }
   }
 
@@ -2868,7 +2868,7 @@ static bool hasExistingGeneralizedTypeMD(llvm::Function 
*F) {
   return MD->hasGeneralizedMDString();
 }
 
-void CodeGenModule::CreateFunctionTypeMetadataForIcall(const FunctionDecl *FD,
+void CodeGenModule::createFunctionTypeMetadataForIcall(const FunctionDecl *FD,
llvm::Function *F) {
   if (CodeGenOpts.CallGraphSection && !hasExistingGeneralizedTypeMD(F) &&
   (!F->hasLocalLinkage() ||
@@ -2898,7 +2898,7 @@ void 
CodeGenModule::CreateFunctionTypeMetadataForIcall(const FunctionDecl *FD,
   F->addTypeMetadata(0, llvm::ConstantAsMetadata::get(CrossDsoTypeId));
 }
 
-void CodeGenModule::CreateCalleeTypeMetadataForIcall(const QualType &QT,
+void CodeGenModule::createCalleeTypeMetadataForIcall(const QualType &QT,
  llvm::CallBase *CB) {
   // Only if needed for call graph section and only for indirect calls.
   if (!CodeGenOpts.CallGraphSection || !CB->isIndirectCall())
@@ -2909,7 +2909,7 @@ void 
CodeGenModule::CreateCalleeTypeMetadataForIcall(const QualType &QT,
   getLLVMContext(), {llvm::ConstantAsMetadata::get(llvm::ConstantInt::get(
  llvm::Type::getInt64Ty(getLLVMContext()), 0)),
  TypeIdMD});
-  llvm::MDTuple *MDN = llvm::MDNode::get(getLLVMContext(), { TypeTuple });
+  llvm::MDTuple *MDN = llvm::MDNode::get(getLLVMContext(), {TypeTuple});
   CB->setMetadata(llvm::LLVMContext::MD_callee_type, MDN);
 }
 
@@ -3041,7 +3041,7 @@ void CodeGenModule::SetFunctionAttributes(GlobalDecl GD, 
llvm::Function *F,
   // jump table.
   if (!CodeGenOpts.SanitizeCfiCrossDso ||
   !CodeGenOpts.SanitizeCfiCanonicalJumpTables)
-CreateFunctionTypeMetadataForIcall(FD, F);
+createFunctionTypeMetadataForIcall(FD, F);
 
   if (LangOpts.Sanitize.has(SanitizerKind::KCFI))
 setKCFIType(FD, F);
diff --git a/clang/lib/CodeGen/CodeGenModule.h 
b/clang/lib/CodeGen/CodeGenModule.h
index dfbe4388349dd..4b53f0f241b52 100644
--- a/clang/lib/CodeGen/CodeGenModule.h
+++ b/clang/lib/CodeGen/CodeGenModule.h
@@ -1619,11 +1619,11 @@ class CodeGenModule : public CodeGenTypeCache {
   llvm::Metadata *CreateMetadataIdentifierGeneralized(QualType T);
 
   /// Create and attach type metadata to the given function.
-  void CreateFunctionTypeMetadataForIcall(const FunctionDecl *FD,
+  void createFunctionTypeMetadataForIcall(const FunctionDecl *FD,
   llvm::Function *F);
 
   /// Create and attach type metadata to the given call.
-  void CreateCalleeTypeMetadataForIcall(const QualType &QT, llvm::CallBase 
*CB);
+  void createCa

[llvm-branch-commits] [clang] [clang] callee_type metadata for indirect calls (PR #117036)

2025-04-24 Thread Prabhu Rajasekaran via llvm-branch-commits

https://github.com/Prabhuk updated 
https://github.com/llvm/llvm-project/pull/117036

>From b7fbe09b32ff02d4f7c52d82fbf8b5cd28138852 Mon Sep 17 00:00:00 2001
From: prabhukr 
Date: Wed, 23 Apr 2025 04:05:47 +
Subject: [PATCH] Address review comments.

Created using spr 1.3.6-beta.1
---
 clang/lib/CodeGen/CGCall.cpp|  8 
 clang/lib/CodeGen/CodeGenModule.cpp | 10 +-
 clang/lib/CodeGen/CodeGenModule.h   |  4 ++--
 3 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp
index 185ee1a970aac..d8ab7140f7943 100644
--- a/clang/lib/CodeGen/CGCall.cpp
+++ b/clang/lib/CodeGen/CGCall.cpp
@@ -5780,19 +5780,19 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo 
&CallInfo,
   if (callOrInvoke) {
 *callOrInvoke = CI;
 if (CGM.getCodeGenOpts().CallGraphSection) {
-  assert((TargetDecl && TargetDecl->getFunctionType() ||
-  Callee.getAbstractInfo().getCalleeFunctionProtoType()) &&
- "cannot find callsite type");
   QualType CST;
   if (TargetDecl && TargetDecl->getFunctionType())
 CST = QualType(TargetDecl->getFunctionType(), 0);
   else if (const auto *FPT =
Callee.getAbstractInfo().getCalleeFunctionProtoType())
 CST = QualType(FPT, 0);
+  else
+llvm_unreachable(
+"Cannot find the callee type to generate callee_type metadata.");
 
   // Set type identifier metadata of indirect calls for call graph section.
   if (!CST.isNull())
-CGM.CreateCalleeTypeMetadataForIcall(CST, *callOrInvoke);
+CGM.createCalleeTypeMetadataForIcall(CST, *callOrInvoke);
 }
   }
 
diff --git a/clang/lib/CodeGen/CodeGenModule.cpp 
b/clang/lib/CodeGen/CodeGenModule.cpp
index 43cd2405571cf..2fc99639a75cb 100644
--- a/clang/lib/CodeGen/CodeGenModule.cpp
+++ b/clang/lib/CodeGen/CodeGenModule.cpp
@@ -2654,7 +2654,7 @@ void 
CodeGenModule::SetLLVMFunctionAttributesForDefinition(const Decl *D,
   // Skip available_externally functions. They won't be codegen'ed in the
   // current module anyway.
   if (getContext().GetGVALinkageForFunction(FD) != GVA_AvailableExternally)
-CreateFunctionTypeMetadataForIcall(FD, F);
+createFunctionTypeMetadataForIcall(FD, F);
 }
   }
 
@@ -2868,7 +2868,7 @@ static bool hasExistingGeneralizedTypeMD(llvm::Function 
*F) {
   return MD->hasGeneralizedMDString();
 }
 
-void CodeGenModule::CreateFunctionTypeMetadataForIcall(const FunctionDecl *FD,
+void CodeGenModule::createFunctionTypeMetadataForIcall(const FunctionDecl *FD,
llvm::Function *F) {
   if (CodeGenOpts.CallGraphSection && !hasExistingGeneralizedTypeMD(F) &&
   (!F->hasLocalLinkage() ||
@@ -2898,7 +2898,7 @@ void 
CodeGenModule::CreateFunctionTypeMetadataForIcall(const FunctionDecl *FD,
   F->addTypeMetadata(0, llvm::ConstantAsMetadata::get(CrossDsoTypeId));
 }
 
-void CodeGenModule::CreateCalleeTypeMetadataForIcall(const QualType &QT,
+void CodeGenModule::createCalleeTypeMetadataForIcall(const QualType &QT,
  llvm::CallBase *CB) {
   // Only if needed for call graph section and only for indirect calls.
   if (!CodeGenOpts.CallGraphSection || !CB->isIndirectCall())
@@ -2909,7 +2909,7 @@ void 
CodeGenModule::CreateCalleeTypeMetadataForIcall(const QualType &QT,
   getLLVMContext(), {llvm::ConstantAsMetadata::get(llvm::ConstantInt::get(
  llvm::Type::getInt64Ty(getLLVMContext()), 0)),
  TypeIdMD});
-  llvm::MDTuple *MDN = llvm::MDNode::get(getLLVMContext(), { TypeTuple });
+  llvm::MDTuple *MDN = llvm::MDNode::get(getLLVMContext(), {TypeTuple});
   CB->setMetadata(llvm::LLVMContext::MD_callee_type, MDN);
 }
 
@@ -3041,7 +3041,7 @@ void CodeGenModule::SetFunctionAttributes(GlobalDecl GD, 
llvm::Function *F,
   // jump table.
   if (!CodeGenOpts.SanitizeCfiCrossDso ||
   !CodeGenOpts.SanitizeCfiCanonicalJumpTables)
-CreateFunctionTypeMetadataForIcall(FD, F);
+createFunctionTypeMetadataForIcall(FD, F);
 
   if (LangOpts.Sanitize.has(SanitizerKind::KCFI))
 setKCFIType(FD, F);
diff --git a/clang/lib/CodeGen/CodeGenModule.h 
b/clang/lib/CodeGen/CodeGenModule.h
index dfbe4388349dd..4b53f0f241b52 100644
--- a/clang/lib/CodeGen/CodeGenModule.h
+++ b/clang/lib/CodeGen/CodeGenModule.h
@@ -1619,11 +1619,11 @@ class CodeGenModule : public CodeGenTypeCache {
   llvm::Metadata *CreateMetadataIdentifierGeneralized(QualType T);
 
   /// Create and attach type metadata to the given function.
-  void CreateFunctionTypeMetadataForIcall(const FunctionDecl *FD,
+  void createFunctionTypeMetadataForIcall(const FunctionDecl *FD,
   llvm::Function *F);
 
   /// Create and attach type metadata to the given call.
-  void CreateCalleeTypeMetadataForIcall(const QualType &QT, llvm::CallBase 
*CB);
+  void createCa

[llvm-branch-commits] [clang] [llvm] [llvm] Introduce callee_type metadata (PR #87573)

2025-04-24 Thread Prabhu Rajasekaran via llvm-branch-commits

https://github.com/Prabhuk updated 
https://github.com/llvm/llvm-project/pull/87573

>From a8a5848885e12c771f12cfa33b4dbc6a0272e925 Mon Sep 17 00:00:00 2001
From: Prabhuk 
Date: Mon, 22 Apr 2024 11:34:04 -0700
Subject: [PATCH 01/12] Update clang/lib/CodeGen/CodeGenModule.cpp

Cleaner if checks.

Co-authored-by: Matt Arsenault 
---
 clang/lib/CodeGen/CodeGenModule.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/clang/lib/CodeGen/CodeGenModule.cpp 
b/clang/lib/CodeGen/CodeGenModule.cpp
index e19bbee996f58..ff1586d2fa8ab 100644
--- a/clang/lib/CodeGen/CodeGenModule.cpp
+++ b/clang/lib/CodeGen/CodeGenModule.cpp
@@ -2711,7 +2711,7 @@ void 
CodeGenModule::CreateFunctionTypeMetadataForIcall(const FunctionDecl *FD,
 void CodeGenModule::CreateFunctionTypeMetadataForIcall(const QualType &QT,
llvm::CallBase *CB) {
   // Only if needed for call graph section and only for indirect calls.
-  if (!(CodeGenOpts.CallGraphSection && CB && CB->isIndirectCall()))
+  if (!CodeGenOpts.CallGraphSection || !CB || !CB->isIndirectCall())
 return;
 
   auto *MD = CreateMetadataIdentifierGeneralized(QT);

>From 019b2ca5e1c263183ed114e0b967b4e77b4a17a8 Mon Sep 17 00:00:00 2001
From: Prabhuk 
Date: Mon, 22 Apr 2024 11:34:31 -0700
Subject: [PATCH 02/12] Update clang/lib/CodeGen/CodeGenModule.cpp

Update the comments as suggested.

Co-authored-by: Matt Arsenault 
---
 clang/lib/CodeGen/CodeGenModule.cpp | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/clang/lib/CodeGen/CodeGenModule.cpp 
b/clang/lib/CodeGen/CodeGenModule.cpp
index ff1586d2fa8ab..5635a87d2358a 100644
--- a/clang/lib/CodeGen/CodeGenModule.cpp
+++ b/clang/lib/CodeGen/CodeGenModule.cpp
@@ -2680,9 +2680,9 @@ void 
CodeGenModule::CreateFunctionTypeMetadataForIcall(const FunctionDecl *FD,
   bool EmittedMDIdGeneralized = false;
   if (CodeGenOpts.CallGraphSection &&
   (!F->hasLocalLinkage() ||
-   F->getFunction().hasAddressTaken(nullptr, /* IgnoreCallbackUses */ true,
-/* IgnoreAssumeLikeCalls */ true,
-/* IgnoreLLVMUsed */ false))) {
+   F->getFunction().hasAddressTaken(nullptr, /*IgnoreCallbackUses=*/ true,
+/*IgnoreAssumeLikeCalls=*/ true,
+/*IgnoreLLVMUsed=*/ false))) {
 F->addTypeMetadata(0, CreateMetadataIdentifierGeneralized(FD->getType()));
 EmittedMDIdGeneralized = true;
   }

>From 99242900c51778abd4b7e7f4361b09202b7abcda Mon Sep 17 00:00:00 2001
From: Prabhuk 
Date: Mon, 29 Apr 2024 11:53:40 -0700
Subject: [PATCH 03/12] dyn_cast to isa

Created using spr 1.3.6-beta.1
---
 clang/lib/CodeGen/CGCall.cpp | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp
index 526a63b24ff83..45033ced1d834 100644
--- a/clang/lib/CodeGen/CGCall.cpp
+++ b/clang/lib/CodeGen/CGCall.cpp
@@ -5713,8 +5713,8 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo 
&CallInfo,
 if (callOrInvoke && *callOrInvoke && (*callOrInvoke)->isIndirectCall()) {
   if (const FunctionDecl *FD = dyn_cast_or_null(TargetDecl)) 
{
 // Type id metadata is set only for C/C++ contexts.
-if (dyn_cast(FD) || dyn_cast(FD) ||
-dyn_cast(FD)) {
+if (isa(FD) || isa(FD) ||
+isa(FD)) {
   CGM.CreateFunctionTypeMetadataForIcall(FD->getType(), *callOrInvoke);
 }
   }

>From 24882b15939b781bcf28d87fdf4f6e8834b6cfde Mon Sep 17 00:00:00 2001
From: prabhukr 
Date: Tue, 10 Dec 2024 14:54:27 -0800
Subject: [PATCH 04/12] Address review comments. Break llvm and clang patches.

Created using spr 1.3.6-beta.1
---
 llvm/lib/IR/Verifier.cpp  | 7 +++
 llvm/test/Verifier/operand-bundles.ll | 4 ++--
 2 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/llvm/lib/IR/Verifier.cpp b/llvm/lib/IR/Verifier.cpp
index 0ad7ba555bfad..b72672e7b8e56 100644
--- a/llvm/lib/IR/Verifier.cpp
+++ b/llvm/lib/IR/Verifier.cpp
@@ -3707,10 +3707,9 @@ void Verifier::visitCallBase(CallBase &Call) {
 if (Intrinsic::ID ID = (Intrinsic::ID)F->getIntrinsicID())
   visitIntrinsicCall(ID, Call);
 
-  // Verify that a callsite has at most one "deopt", at most one "funclet", at
-  // most one "gc-transition", at most one "cfguardtarget", at most one "type",
-  // at most one "preallocated" operand bundle, and at most one "ptrauth"
-  // operand bundle.
+  // Verify that a callsite has at most one operand bundle for each of the
+  // following: "deopt", "funclet", "gc-transition", "cfguardtarget", "type",
+  // "preallocated", and "ptrauth".
   bool FoundDeoptBundle = false, FoundFuncletBundle = false,
FoundGCTransitionBundle = false, FoundCFGuardTargetBundle = false,
FoundPreallocatedBundle = false, FoundGCLiveBundle = false,
diff --git a/llvm/test/Verifier/operand-bundles.ll 
b/llvm/t

[llvm-branch-commits] AArch64: Relax x16/x17 constraint on AUT in certain cases. (PR #132857)

2025-04-24 Thread Peter Collingbourne via llvm-branch-commits

pcc wrote:

@asl Ping

https://github.com/llvm/llvm-project/pull/132857
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [mlir] [MLIR][OpenMP] Assert on map translation functions, NFC (PR #137199)

2025-04-24 Thread Sergio Afonso via llvm-branch-commits

skatrak wrote:

This PR is based on the following analysis of the current state of OpenMP map 
translation to LLVM IR. If there are any issues with this PR, they are likely 
easier to spot on this graph:
![MapInfoData 
uses](https://github.com/user-attachments/assets/f9f56688-51e5-4ab4-be7c-39b3de805e03)

https://github.com/llvm/llvm-project/pull/137199
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [mlir] [MLIR][OpenMP] Simplify OpenMP device codegen (PR #137201)

2025-04-24 Thread Sergio Afonso via llvm-branch-commits

skatrak wrote:

PR stack:
- #137198
- #137199
- #137200
- #137201

https://github.com/llvm/llvm-project/pull/137201
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [mlir] [mlir][OpenMP] Convert omp.cancel sections to LLVMIR (PR #137193)

2025-04-24 Thread Tom Eccles via llvm-branch-commits

tblah wrote:

PR Stack:

Cancel parallel 

https://github.com/llvm/llvm-project/pull/137192
Cancel sections
https://github.com/llvm/llvm-project/pull/137193
Cancel wsloop
https://github.com/llvm/llvm-project/pull/137194
Cancellation point (TODO)
Cancel(lation point) taskgroup (TODO)

https://github.com/llvm/llvm-project/pull/137193
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] Bundle operands to specify denormal modes (PR #136501)

2025-04-24 Thread Serge Pavlov via llvm-branch-commits

https://github.com/spavloff updated 
https://github.com/llvm/llvm-project/pull/136501

>From 22742e24c1eef3ecc0fb4294dac9f42c9d160019 Mon Sep 17 00:00:00 2001
From: Serge Pavlov 
Date: Thu, 17 Apr 2025 18:42:15 +0700
Subject: [PATCH 1/2] Bundle operands to specify denormal modes

Two new operands are now supported in the "fp.control" operand bundle:

* "denorm.in=xxx"   - specifies the inpot denormal mode.
* "denorm.out=xxx"  - specifies the output denormal mode.

Here xxx must be one of the following values:

* "ieee"  - preserve denormals.
* "zero"  - flush to zero preserving sign.
* "pzero" - flush to positive zero.
* "dyn"   - mode is dynamically read from a control register.

These values align those permitted in the "denormal-fp-math" function
attribute.
---
 llvm/docs/LangRef.rst |  18 +-
 llvm/include/llvm/ADT/FloatingPointMode.h |  33 
 llvm/include/llvm/IR/InstrTypes.h |  21 +++
 llvm/lib/Analysis/ConstantFolding.cpp |  24 ++-
 llvm/lib/IR/Instructions.cpp  | 168 +-
 llvm/lib/IR/Verifier.cpp  |  14 ++
 .../constant-fold-fp-denormal-strict.ll   |  91 ++
 llvm/test/Verifier/fp-intrinsics.ll   |  36 
 8 files changed, 394 insertions(+), 11 deletions(-)
 create mode 100644 
llvm/test/Transforms/InstSimplify/constant-fold-fp-denormal-strict.ll

diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 8252971aa8a58..954f0e96b46f6 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -3084,7 +3084,10 @@ floating-point control modes and the treatment of status 
bits respectively.
 
 An operand bundle tagged with "fp.control" contains information about the
 control modes used for the operation execution. Operands specified in this
-bundle represent particular options. Currently, only rounding mode is 
supported.
+bundle represent particular options. The following modes are supported:
+
+* rounding mode,
+* denormal behavior.
 
 Rounding mode is represented by a metadata string value, which specifies the
 the mode used for the operation evaluation. Possible values are:
@@ -3103,6 +3106,19 @@ rounding rounding mode is taken from the control 
register (dynamic rounding).
 In the particular case of :ref:`default floating-point environment `,
 the operation uses rounding to nearest, ties to even.
 
+Denormal behavior defines whether denormal values are flushed to zero during
+the call's execution. This behavior is specified separately for input and
+output values. Such specification is a string, which starts with
+"denorm.in=" or "denorm.out=" respectively. The remainder of the string should
+be one of the values:
+
+::
+
+``"ieee"`` - preserve denormals,
+``"zero"`` - flush to +0.0 or -0.0 depending on value sign,
+``"pzero"`` - flush to +0.0,
+``"dyn"`` - concrete mode is read from some register.
+
 An operand bundle tagged with "fp.except" may be associated with operations
 that can read or write floating-point exception flags. It contains a single
 metadata string value, which can have one of the following values:
diff --git a/llvm/include/llvm/ADT/FloatingPointMode.h 
b/llvm/include/llvm/ADT/FloatingPointMode.h
index 639d931ef88fe..5fceccfd1d0bf 100644
--- a/llvm/include/llvm/ADT/FloatingPointMode.h
+++ b/llvm/include/llvm/ADT/FloatingPointMode.h
@@ -234,6 +234,39 @@ void DenormalMode::print(raw_ostream &OS) const {
   OS << denormalModeKindName(Output) << ',' << denormalModeKindName(Input);
 }
 
+/// If the specified string represents denormal mode as used in operand 
bundles,
+/// returns the corresponding mode.
+inline std::optional
+parseDenormalKindFromOperandBundle(StringRef Str) {
+  if (Str == "ieee")
+return DenormalMode::IEEE;
+  if (Str == "zero")
+return DenormalMode::PreserveSign;
+  if (Str == "pzero")
+return DenormalMode::PositiveZero;
+  if (Str == "dyn")
+return DenormalMode::Dynamic;
+  return std::nullopt;
+}
+
+/// Converts the specified denormal mode into string suitable for use in an
+/// operand bundle.
+inline std::optional
+printDenormalForOperandBundle(DenormalMode::DenormalModeKind Mode) {
+  switch (Mode) {
+  case DenormalMode::IEEE:
+return "ieee";
+  case DenormalMode::PreserveSign:
+return "zero";
+  case DenormalMode::PositiveZero:
+return "pzero";
+  case DenormalMode::Dynamic:
+return "dyn";
+  default:
+return std::nullopt;
+  }
+}
+
 /// Floating-point class tests, supported by 'is_fpclass' intrinsic. Actual
 /// test may be an OR combination of basic tests.
 enum FPClassTest : unsigned {
diff --git a/llvm/include/llvm/IR/InstrTypes.h 
b/llvm/include/llvm/IR/InstrTypes.h
index 8425243e5efe9..8492c911ffc6a 100644
--- a/llvm/include/llvm/IR/InstrTypes.h
+++ b/llvm/include/llvm/IR/InstrTypes.h
@@ -1092,12 +1092,24 @@ template  class OperandBundleDefT {
 using OperandBundleDef = OperandBundleDefT;
 using ConstOperandBundleDef = OperandBundleDefT;
 
+s

[llvm-branch-commits] [mlir] [MLIR][OpenMP] Assert on map translation functions, NFC (PR #137199)

2025-04-24 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-mlir-openmp

Author: Sergio Afonso (skatrak)


Changes

This patch adds assertions to map-related MLIR to LLVM IR translation functions 
and utils to explicitly document whether they are intended for host or device 
compilation only.

Over time, map-related handling has increased in complexity. This is compounded 
by the fact that some handling is device-specific and some is host-specific. By 
explicitly asserting on these functions on the expected compilation pass, the 
flow should become slighlty easier to follow.

---
Full diff: https://github.com/llvm/llvm-project/pull/137199.diff


1 Files Affected:

- (modified) 
mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+22-2) 


``diff
diff --git 
a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp 
b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
index 52aa1fbfab2c1..6d80c66e3596e 100644
--- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
@@ -3563,6 +3563,9 @@ static llvm::omp::OpenMPOffloadMappingFlags 
mapParentWithMembers(
 LLVM::ModuleTranslation &moduleTranslation, llvm::IRBuilderBase &builder,
 llvm::OpenMPIRBuilder &ompBuilder, DataLayout &dl, MapInfosTy 
&combinedInfo,
 MapInfoData &mapData, uint64_t mapDataIndex, bool isTargetParams) {
+  assert(!ompBuilder.Config.isTargetDevice() &&
+ "function only supported for host device codegen");
+
   // Map the first segment of our structure
   combinedInfo.Types.emplace_back(
   isTargetParams
@@ -3671,6 +3674,8 @@ static void processMapMembersWithParent(
 llvm::OpenMPIRBuilder &ompBuilder, DataLayout &dl, MapInfosTy 
&combinedInfo,
 MapInfoData &mapData, uint64_t mapDataIndex,
 llvm::omp::OpenMPOffloadMappingFlags memberOfFlag) {
+  assert(!ompBuilder.Config.isTargetDevice() &&
+ "function only supported for host device codegen");
 
   auto parentClause =
   llvm::cast(mapData.MapClause[mapDataIndex]);
@@ -3784,6 +3789,9 @@ static void 
processMapWithMembersOf(LLVM::ModuleTranslation &moduleTranslation,
 DataLayout &dl, MapInfosTy &combinedInfo,
 MapInfoData &mapData, uint64_t 
mapDataIndex,
 bool isTargetParams) {
+  assert(!ompBuilder.Config.isTargetDevice() &&
+ "function only supported for host device codegen");
+
   auto parentClause =
   llvm::cast(mapData.MapClause[mapDataIndex]);
 
@@ -3825,6 +3833,8 @@ static void
 createAlteredByCaptureMap(MapInfoData &mapData,
   LLVM::ModuleTranslation &moduleTranslation,
   llvm::IRBuilderBase &builder) {
+  assert(!moduleTranslation.getOpenMPBuilder()->Config.isTargetDevice() &&
+ "function only supported for host device codegen");
   for (size_t i = 0; i < mapData.MapClause.size(); ++i) {
 // if it's declare target, skip it, it's handled separately.
 if (!mapData.IsDeclareTarget[i]) {
@@ -3889,6 +3899,9 @@ static void genMapInfos(llvm::IRBuilderBase &builder,
 LLVM::ModuleTranslation &moduleTranslation,
 DataLayout &dl, MapInfosTy &combinedInfo,
 MapInfoData &mapData, bool isTargetParams = false) {
+  assert(!moduleTranslation.getOpenMPBuilder()->Config.isTargetDevice() &&
+ "function only supported for host device codegen");
+
   // We wish to modify some of the methods in which arguments are
   // passed based on their capture type by the target region, this can
   // involve generating new loads and stores, which changes the
@@ -3900,8 +3913,7 @@ static void genMapInfos(llvm::IRBuilderBase &builder,
   // kernel arg structure. It primarily becomes relevant in cases like
   // bycopy, or byref range'd arrays. In the default case, we simply
   // pass thee pointer byref as both basePointer and pointer.
-  if (!moduleTranslation.getOpenMPBuilder()->Config.isTargetDevice())
-createAlteredByCaptureMap(mapData, moduleTranslation, builder);
+  createAlteredByCaptureMap(mapData, moduleTranslation, builder);
 
   llvm::OpenMPIRBuilder *ompBuilder = moduleTranslation.getOpenMPBuilder();
 
@@ -3935,6 +3947,8 @@ emitUserDefinedMapper(Operation *declMapperOp, 
llvm::IRBuilderBase &builder,
 static llvm::Expected
 getOrCreateUserDefinedMapperFunc(Operation *op, llvm::IRBuilderBase &builder,
  LLVM::ModuleTranslation &moduleTranslation) {
+  assert(!moduleTranslation.getOpenMPBuilder()->Config.isTargetDevice() &&
+ "function only supported for host device codegen");
   auto declMapperOp = cast(op);
   std::string mapperFuncName =
   moduleTranslation.getOpenMPBuilder()->createPlatformSpecificName(
@@ -3951,6 +3965,8 @@ static llvm::Expected
 emitUserDefinedMapper(Operation *op, llvm::IRBuilderBase &builder,

[llvm-branch-commits] [llvm] Bundle operands to specify denormal modes (PR #136501)

2025-04-24 Thread Serge Pavlov via llvm-branch-commits


@@ -678,6 +682,71 @@ fp::ExceptionBehavior CallBase::getExceptionBehavior() 
const {
   return fp::ebIgnore;
 }
 
+DenormalMode::DenormalModeKind CallBase::getInputDenormMode() const {
+  if (auto InDenormBundle = getOperandBundle(LLVMContext::OB_fp_control)) {
+auto DenormOperand =
+getBundleOperandByPrefix(*InDenormBundle, "denorm.in=");
+if (DenormOperand) {
+  if (auto Mode = parseDenormalKindFromOperandBundle(*DenormOperand))
+return *Mode;
+} else {
+  return DenormalMode::IEEE;
+}
+  }
+
+  if (!getParent())
+return DenormalMode::IEEE;
+  const Function *F = getFunction();
+  if (!F)
+return DenormalMode::IEEE;
+
+  Type *Ty = nullptr;
+  for (auto &A : args())
+if (auto *T = A.get()->getType(); T->isFPOrFPVectorTy()) {
+  Ty = T;
+  break;
+}
+  assert(Ty && "Some input argument must be of floating-point type");

spavloff wrote:

Introduced "denorm.f32.in=" and "denorm.f32.out=" bundle operands.

https://github.com/llvm/llvm-project/pull/136501
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] Bundle operands to specify denormal modes (PR #136501)

2025-04-24 Thread Serge Pavlov via llvm-branch-commits


@@ -678,6 +682,71 @@ fp::ExceptionBehavior CallBase::getExceptionBehavior() 
const {
   return fp::ebIgnore;
 }
 
+DenormalMode::DenormalModeKind CallBase::getInputDenormMode() const {
+  if (auto InDenormBundle = getOperandBundle(LLVMContext::OB_fp_control)) {
+auto DenormOperand =
+getBundleOperandByPrefix(*InDenormBundle, "denorm.in=");
+if (DenormOperand) {
+  if (auto Mode = parseDenormalKindFromOperandBundle(*DenormOperand))
+return *Mode;
+} else {
+  return DenormalMode::IEEE;
+}
+  }
+
+  if (!getParent())
+return DenormalMode::IEEE;
+  const Function *F = getFunction();
+  if (!F)
+return DenormalMode::IEEE;

spavloff wrote:

Removed this search.

https://github.com/llvm/llvm-project/pull/136501
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


  1   2   >