[llvm-branch-commits] [flang] [mlir] [MLIR][OpenMP] Simplify OpenMP device codegen (PR #137201)
https://github.com/skatrak created https://github.com/llvm/llvm-project/pull/137201 After removing host operations from the device MLIR module, it is no longer necessary to provide special codegen logic to prevent these operations from causing compiler crashes or miscompilations. This patch removes these now unnecessary code paths to simplify codegen logic. Some MLIR tests are now replaced with Flang tests, since the responsibility of dealing with host operations has been moved earlier in the compilation flow. MLIR tests holding target device modules are updated to no longer include now unsupported host operations. >From 174401a11e313c6b1ebe80552c59ea59a0c6a4de Mon Sep 17 00:00:00 2001 From: Sergio Afonso Date: Tue, 22 Apr 2025 12:04:45 +0100 Subject: [PATCH] [MLIR][OpenMP] Simplify OpenMP device codegen After removing host operations from the device MLIR module, it is no longer necessary to provide special codegen logic to prevent these operations from causing compiler crashes or miscompilations. This patch removes these now unnecessary code paths to simplify codegen logic. Some MLIR tests are now replaced with Flang tests, since the responsibility of dealing with host operations has been moved earlier in the compilation flow. MLIR tests holding target device modules are updated to no longer include now unsupported host operations. --- .../OpenMP/target-nesting-in-host-ops.f90 | 87 .../Integration/OpenMP/task-target-device.f90 | 37 ++ .../OpenMP/threadprivate-target-device.f90| 40 ++ .../OpenMP/OpenMPToLLVMIRTranslation.cpp | 423 +++--- ...arget-constant-indexing-device-region.mlir | 25 +- .../Target/LLVMIR/omptarget-debug-var-1.mlir | 17 +- .../omptarget-memcpy-align-metadata.mlir | 61 +-- .../LLVMIR/omptarget-target-inside-task.mlir | 40 -- ...ptarget-threadprivate-device-lowering.mlir | 30 -- .../Target/LLVMIR/openmp-llvm-invalid.mlir| 45 ++ .../openmp-target-nesting-in-host-ops.mlir| 156 --- .../LLVMIR/openmp-task-target-device.mlir | 26 -- 12 files changed, 408 insertions(+), 579 deletions(-) create mode 100644 flang/test/Integration/OpenMP/target-nesting-in-host-ops.f90 create mode 100644 flang/test/Integration/OpenMP/task-target-device.f90 create mode 100644 flang/test/Integration/OpenMP/threadprivate-target-device.f90 delete mode 100644 mlir/test/Target/LLVMIR/omptarget-target-inside-task.mlir delete mode 100644 mlir/test/Target/LLVMIR/omptarget-threadprivate-device-lowering.mlir delete mode 100644 mlir/test/Target/LLVMIR/openmp-target-nesting-in-host-ops.mlir delete mode 100644 mlir/test/Target/LLVMIR/openmp-task-target-device.mlir diff --git a/flang/test/Integration/OpenMP/target-nesting-in-host-ops.f90 b/flang/test/Integration/OpenMP/target-nesting-in-host-ops.f90 new file mode 100644 index 0..8c85a3c1784ed --- /dev/null +++ b/flang/test/Integration/OpenMP/target-nesting-in-host-ops.f90 @@ -0,0 +1,87 @@ +!===--===! +! This directory can be used to add Integration tests involving multiple +! stages of the compiler (for eg. from Fortran to LLVM IR). It should not +! contain executable tests. We should only add tests here sparingly and only +! if there is no other way to test. Repeat this message in each test that is +! added to this directory and sub-directories. +!===--===! + +!REQUIRES: amdgpu-registered-target +!RUN: %flang_fc1 -triple amdgcn-amd-amdhsa -emit-llvm -fopenmp -fopenmp-version=50 -fopenmp-is-target-device %s -o - | FileCheck %s + +! CHECK-NOT: define void @nested_target_in_parallel +! CHECK: define weak_odr protected amdgpu_kernel void @__omp_offloading_{{.*}}_nested_target_in_parallel_{{.*}}(ptr %{{.*}}, ptr %{{.*}}) +subroutine nested_target_in_parallel(v) + implicit none + integer, intent(inout) :: v(10) + + !$omp parallel +!$omp target map(tofrom: v) +!$omp end target + !$omp end parallel +end subroutine + +! CHECK-NOT: define void @nested_target_in_wsloop +! CHECK: define weak_odr protected amdgpu_kernel void @__omp_offloading_{{.*}}_nested_target_in_wsloop_{{.*}}(ptr %{{.*}}, ptr %{{.*}}) +subroutine nested_target_in_wsloop(v) + implicit none + integer, intent(inout) :: v(10) + integer :: i + + !$omp do + do i=1, 10 +!$omp target map(tofrom: v) +!$omp end target + end do +end subroutine + +! CHECK-NOT: define void @nested_target_in_parallel_with_private +! CHECK: define weak_odr protected amdgpu_kernel void @__omp_offloading_{{.*}}_nested_target_in_parallel_with_private_{{.*}}(ptr %{{.*}}, ptr %{{.*}}, ptr %{{.*}}) +subroutine nested_target_in_parallel_with_private(v) + implicit none + integer, intent(inout) :: v(10) + integer :: x + x = 10 + + !$omp parallel firstprivate(x) +!$omp target map(tofrom: v(1:x)) +!$omp end target + !$omp end parallel +end subroutine + +! CHECK-NOT: defi
[llvm-branch-commits] [flang] [mlir] [MLIR][OpenMP] Simplify OpenMP device codegen (PR #137201)
llvmbot wrote: @llvm/pr-subscribers-flang-openmp @llvm/pr-subscribers-mlir-llvm @llvm/pr-subscribers-mlir Author: Sergio Afonso (skatrak) Changes After removing host operations from the device MLIR module, it is no longer necessary to provide special codegen logic to prevent these operations from causing compiler crashes or miscompilations. This patch removes these now unnecessary code paths to simplify codegen logic. Some MLIR tests are now replaced with Flang tests, since the responsibility of dealing with host operations has been moved earlier in the compilation flow. MLIR tests holding target device modules are updated to no longer include now unsupported host operations. --- Patch is 52.28 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/137201.diff 12 Files Affected: - (added) flang/test/Integration/OpenMP/target-nesting-in-host-ops.f90 (+87) - (added) flang/test/Integration/OpenMP/task-target-device.f90 (+37) - (added) flang/test/Integration/OpenMP/threadprivate-target-device.f90 (+40) - (modified) mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+159-264) - (modified) mlir/test/Target/LLVMIR/omptarget-constant-indexing-device-region.mlir (+10-15) - (modified) mlir/test/Target/LLVMIR/omptarget-debug-var-1.mlir (+6-11) - (modified) mlir/test/Target/LLVMIR/omptarget-memcpy-align-metadata.mlir (+24-37) - (removed) mlir/test/Target/LLVMIR/omptarget-target-inside-task.mlir (-40) - (removed) mlir/test/Target/LLVMIR/omptarget-threadprivate-device-lowering.mlir (-30) - (modified) mlir/test/Target/LLVMIR/openmp-llvm-invalid.mlir (+45) - (removed) mlir/test/Target/LLVMIR/openmp-target-nesting-in-host-ops.mlir (-156) - (removed) mlir/test/Target/LLVMIR/openmp-task-target-device.mlir (-26) ``diff diff --git a/flang/test/Integration/OpenMP/target-nesting-in-host-ops.f90 b/flang/test/Integration/OpenMP/target-nesting-in-host-ops.f90 new file mode 100644 index 0..8c85a3c1784ed --- /dev/null +++ b/flang/test/Integration/OpenMP/target-nesting-in-host-ops.f90 @@ -0,0 +1,87 @@ +!===--===! +! This directory can be used to add Integration tests involving multiple +! stages of the compiler (for eg. from Fortran to LLVM IR). It should not +! contain executable tests. We should only add tests here sparingly and only +! if there is no other way to test. Repeat this message in each test that is +! added to this directory and sub-directories. +!===--===! + +!REQUIRES: amdgpu-registered-target +!RUN: %flang_fc1 -triple amdgcn-amd-amdhsa -emit-llvm -fopenmp -fopenmp-version=50 -fopenmp-is-target-device %s -o - | FileCheck %s + +! CHECK-NOT: define void @nested_target_in_parallel +! CHECK: define weak_odr protected amdgpu_kernel void @__omp_offloading_{{.*}}_nested_target_in_parallel_{{.*}}(ptr %{{.*}}, ptr %{{.*}}) +subroutine nested_target_in_parallel(v) + implicit none + integer, intent(inout) :: v(10) + + !$omp parallel +!$omp target map(tofrom: v) +!$omp end target + !$omp end parallel +end subroutine + +! CHECK-NOT: define void @nested_target_in_wsloop +! CHECK: define weak_odr protected amdgpu_kernel void @__omp_offloading_{{.*}}_nested_target_in_wsloop_{{.*}}(ptr %{{.*}}, ptr %{{.*}}) +subroutine nested_target_in_wsloop(v) + implicit none + integer, intent(inout) :: v(10) + integer :: i + + !$omp do + do i=1, 10 +!$omp target map(tofrom: v) +!$omp end target + end do +end subroutine + +! CHECK-NOT: define void @nested_target_in_parallel_with_private +! CHECK: define weak_odr protected amdgpu_kernel void @__omp_offloading_{{.*}}_nested_target_in_parallel_with_private_{{.*}}(ptr %{{.*}}, ptr %{{.*}}, ptr %{{.*}}) +subroutine nested_target_in_parallel_with_private(v) + implicit none + integer, intent(inout) :: v(10) + integer :: x + x = 10 + + !$omp parallel firstprivate(x) +!$omp target map(tofrom: v(1:x)) +!$omp end target + !$omp end parallel +end subroutine + +! CHECK-NOT: define void @nested_target_in_task_with_private +! CHECK: define weak_odr protected amdgpu_kernel void @__omp_offloading_{{.*}}_nested_target_in_task_with_private_{{.*}}(ptr %{{.*}}, ptr %{{.*}}, ptr %{{.*}}) +subroutine nested_target_in_task_with_private(v) + implicit none + integer, intent(inout) :: v(10) + integer :: x + x = 10 + + !$omp task firstprivate(x) +!$omp target map(tofrom: v(1:x)) +!$omp end target + !$omp end task +end subroutine + +! CHECK-NOT: define void @target_and_atomic_update +! CHECK: define weak_odr protected amdgpu_kernel void @__omp_offloading_{{.*}}_target_and_atomic_update_{{.*}}(ptr %{{.*}}) +subroutine target_and_atomic_update(x, expr) + implicit none + integer, intent(inout) :: x, expr + + !$omp target + !$omp end target + + !$omp atomic update + x = x + expr +end subroutine + +! CHECK-NOT: define
[llvm-branch-commits] [mlir] [MLIR][OpenMP] Assert on map translation functions, NFC (PR #137199)
llvmbot wrote: @llvm/pr-subscribers-flang-openmp @llvm/pr-subscribers-mlir Author: Sergio Afonso (skatrak) Changes This patch adds assertions to map-related MLIR to LLVM IR translation functions and utils to explicitly document whether they are intended for host or device compilation only. Over time, map-related handling has increased in complexity. This is compounded by the fact that some handling is device-specific and some is host-specific. By explicitly asserting on these functions on the expected compilation pass, the flow should become slighlty easier to follow. --- Full diff: https://github.com/llvm/llvm-project/pull/137199.diff 1 Files Affected: - (modified) mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+22-2) ``diff diff --git a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp index 52aa1fbfab2c1..6d80c66e3596e 100644 --- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp +++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp @@ -3563,6 +3563,9 @@ static llvm::omp::OpenMPOffloadMappingFlags mapParentWithMembers( LLVM::ModuleTranslation &moduleTranslation, llvm::IRBuilderBase &builder, llvm::OpenMPIRBuilder &ompBuilder, DataLayout &dl, MapInfosTy &combinedInfo, MapInfoData &mapData, uint64_t mapDataIndex, bool isTargetParams) { + assert(!ompBuilder.Config.isTargetDevice() && + "function only supported for host device codegen"); + // Map the first segment of our structure combinedInfo.Types.emplace_back( isTargetParams @@ -3671,6 +3674,8 @@ static void processMapMembersWithParent( llvm::OpenMPIRBuilder &ompBuilder, DataLayout &dl, MapInfosTy &combinedInfo, MapInfoData &mapData, uint64_t mapDataIndex, llvm::omp::OpenMPOffloadMappingFlags memberOfFlag) { + assert(!ompBuilder.Config.isTargetDevice() && + "function only supported for host device codegen"); auto parentClause = llvm::cast(mapData.MapClause[mapDataIndex]); @@ -3784,6 +3789,9 @@ static void processMapWithMembersOf(LLVM::ModuleTranslation &moduleTranslation, DataLayout &dl, MapInfosTy &combinedInfo, MapInfoData &mapData, uint64_t mapDataIndex, bool isTargetParams) { + assert(!ompBuilder.Config.isTargetDevice() && + "function only supported for host device codegen"); + auto parentClause = llvm::cast(mapData.MapClause[mapDataIndex]); @@ -3825,6 +3833,8 @@ static void createAlteredByCaptureMap(MapInfoData &mapData, LLVM::ModuleTranslation &moduleTranslation, llvm::IRBuilderBase &builder) { + assert(!moduleTranslation.getOpenMPBuilder()->Config.isTargetDevice() && + "function only supported for host device codegen"); for (size_t i = 0; i < mapData.MapClause.size(); ++i) { // if it's declare target, skip it, it's handled separately. if (!mapData.IsDeclareTarget[i]) { @@ -3889,6 +3899,9 @@ static void genMapInfos(llvm::IRBuilderBase &builder, LLVM::ModuleTranslation &moduleTranslation, DataLayout &dl, MapInfosTy &combinedInfo, MapInfoData &mapData, bool isTargetParams = false) { + assert(!moduleTranslation.getOpenMPBuilder()->Config.isTargetDevice() && + "function only supported for host device codegen"); + // We wish to modify some of the methods in which arguments are // passed based on their capture type by the target region, this can // involve generating new loads and stores, which changes the @@ -3900,8 +3913,7 @@ static void genMapInfos(llvm::IRBuilderBase &builder, // kernel arg structure. It primarily becomes relevant in cases like // bycopy, or byref range'd arrays. In the default case, we simply // pass thee pointer byref as both basePointer and pointer. - if (!moduleTranslation.getOpenMPBuilder()->Config.isTargetDevice()) -createAlteredByCaptureMap(mapData, moduleTranslation, builder); + createAlteredByCaptureMap(mapData, moduleTranslation, builder); llvm::OpenMPIRBuilder *ompBuilder = moduleTranslation.getOpenMPBuilder(); @@ -3935,6 +3947,8 @@ emitUserDefinedMapper(Operation *declMapperOp, llvm::IRBuilderBase &builder, static llvm::Expected getOrCreateUserDefinedMapperFunc(Operation *op, llvm::IRBuilderBase &builder, LLVM::ModuleTranslation &moduleTranslation) { + assert(!moduleTranslation.getOpenMPBuilder()->Config.isTargetDevice() && + "function only supported for host device codegen"); auto declMapperOp = cast(op); std::string mapperFuncName = moduleTranslation.getOpenMPBuilder()->createPlatformSpecificName( @@ -3951,6 +3965,8 @@ static llvm::Expected emitUserDefinedMapper(Operation *op, llvm::IRBui
[llvm-branch-commits] [flang] [Flang][OpenMP] Minimize host ops remaining in device compilation (PR #137200)
llvmbot wrote: @llvm/pr-subscribers-flang-fir-hlfir @llvm/pr-subscribers-flang-openmp Author: Sergio Afonso (skatrak) Changes This patch updates the function filtering OpenMP pass intended to remove host functions from the MLIR module created by Flang lowering when targeting an OpenMP target device. Host functions holding target regions must be kept, so that the target regions within them can be translated for the device. The issue is that non-target operations inside these functions cannot be discarded because some of them hold information that is also relevant during target device codegen. Specifically, mapping information resides outside of `omp.target` regions. This patch updates the previous behavior where all host operations were preserved to then ignore all of those that are not actually needed by target device codegen. This, in practice, means only keeping target regions and mapping information needed by the device. Arguments for some of these remaining operations are replaced by placeholder allocations and `fir.undefined`, since they are only actually defined inside of the target regions themselves. As a result, this set of changes makes it possible to later simplify target device codegen, as it is no longer necessary to handle host operations differently to avoid issues. --- Patch is 50.50 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/137200.diff 7 Files Affected: - (modified) flang/include/flang/Optimizer/OpenMP/Passes.td (+2-1) - (modified) flang/lib/Optimizer/OpenMP/FunctionFiltering.cpp (+288) - (modified) flang/test/Lower/OpenMP/declare-target-link-tarop-cap.f90 (+10-9) - (modified) flang/test/Lower/OpenMP/host-eval.f90 (+37-18) - (modified) flang/test/Lower/OpenMP/real10.f90 (+1-4) - (added) flang/test/Transforms/OpenMP/function-filtering-host-ops.mlir (+400) - (renamed) flang/test/Transforms/OpenMP/function-filtering.mlir () ``diff diff --git a/flang/include/flang/Optimizer/OpenMP/Passes.td b/flang/include/flang/Optimizer/OpenMP/Passes.td index fcc7a4ca31fef..dcc97122efdf7 100644 --- a/flang/include/flang/Optimizer/OpenMP/Passes.td +++ b/flang/include/flang/Optimizer/OpenMP/Passes.td @@ -46,7 +46,8 @@ def FunctionFilteringPass : Pass<"omp-function-filtering"> { "for the target device."; let dependentDialects = [ "mlir::func::FuncDialect", -"fir::FIROpsDialect" +"fir::FIROpsDialect", +"mlir::omp::OpenMPDialect" ]; } diff --git a/flang/lib/Optimizer/OpenMP/FunctionFiltering.cpp b/flang/lib/Optimizer/OpenMP/FunctionFiltering.cpp index 9554808824ac3..9e11df77506d6 100644 --- a/flang/lib/Optimizer/OpenMP/FunctionFiltering.cpp +++ b/flang/lib/Optimizer/OpenMP/FunctionFiltering.cpp @@ -13,12 +13,14 @@ #include "flang/Optimizer/Dialect/FIRDialect.h" #include "flang/Optimizer/Dialect/FIROpsSupport.h" +#include "flang/Optimizer/HLFIR/HLFIROps.h" #include "flang/Optimizer/OpenMP/Passes.h" #include "mlir/Dialect/Func/IR/FuncOps.h" #include "mlir/Dialect/OpenMP/OpenMPDialect.h" #include "mlir/Dialect/OpenMP/OpenMPInterfaces.h" #include "mlir/IR/BuiltinOps.h" +#include "llvm/ADT/SetVector.h" #include "llvm/ADT/SmallVector.h" namespace flangomp { @@ -94,6 +96,12 @@ class FunctionFilteringPass funcOp.erase(); return WalkResult::skip(); } + +if (failed(rewriteHostRegion(funcOp.getRegion( { + funcOp.emitOpError() << "could not be rewritten for target device"; + return WalkResult::interrupt(); +} + if (declareTargetOp) declareTargetOp.setDeclareTarget(declareType, omp::DeclareTargetCaptureClause::to); @@ -101,5 +109,285 @@ class FunctionFilteringPass return WalkResult::advance(); }); } + +private: + /// Add the given \c omp.map.info to a sorted set while taking into account + /// its dependencies. + static void collectMapInfos(omp::MapInfoOp mapOp, Region ®ion, + llvm::SetVector &mapInfos) { +for (Value member : mapOp.getMembers()) + collectMapInfos(cast(member.getDefiningOp()), region, + mapInfos); + +if (region.isAncestor(mapOp->getParentRegion())) + mapInfos.insert(mapOp); + } + + /// Add the given value to a sorted set if it should be replaced by a + /// placeholder when used as a pointer-like argument to an operation + /// participating in the initialization of an \c omp.map.info. + static void markPtrOperandForRewrite(Value value, + llvm::SetVector &rewriteValues) { +// We don't need to rewrite operands if they are defined by block arguments +// of operations that will still remain after the region is rewritten. +if (isa(value) && +isa( +cast(value).getOwner()->getParentOp())) + return; + +rewriteValues.insert(value); + } + + /// Rewrite the given host
[llvm-branch-commits] [flang] [Flang][OpenMP] Minimize host ops remaining in device compilation (PR #137200)
skatrak wrote: PR stack: - #137198 - #137199 - #137200 - #137201 https://github.com/llvm/llvm-project/pull/137200 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [MLIR][OpenMP] Assert on map translation functions, NFC (PR #137199)
skatrak wrote: PR stack: - #137198 - #137199 - #137200 - #137201 https://github.com/llvm/llvm-project/pull/137199 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [MLIR][OpenMP] Assert on map translation functions, NFC (PR #137199)
https://github.com/skatrak created https://github.com/llvm/llvm-project/pull/137199 This patch adds assertions to map-related MLIR to LLVM IR translation functions and utils to explicitly document whether they are intended for host or device compilation only. Over time, map-related handling has increased in complexity. This is compounded by the fact that some handling is device-specific and some is host-specific. By explicitly asserting on these functions on the expected compilation pass, the flow should become slighlty easier to follow. >From b70cbfaa049fc215b467d325918570afabda4939 Mon Sep 17 00:00:00 2001 From: Sergio Afonso Date: Fri, 11 Apr 2025 13:40:14 +0100 Subject: [PATCH] [MLIR][OpenMP] Assert on map translation functions, NFC This patch adds assertions to map-related MLIR to LLVM IR translation functions and utils to explicitly document whether they are intended for host or device compilation only. Over time, map-related handling has increased in complexity. This is compounded by the fact that some handling is device-specific and some is host-specific. By explicitly asserting on these functions on the expected compilation pass, the flow should become slighlty easier to follow. --- .../OpenMP/OpenMPToLLVMIRTranslation.cpp | 24 +-- 1 file changed, 22 insertions(+), 2 deletions(-) diff --git a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp index 52aa1fbfab2c1..6d80c66e3596e 100644 --- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp +++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp @@ -3563,6 +3563,9 @@ static llvm::omp::OpenMPOffloadMappingFlags mapParentWithMembers( LLVM::ModuleTranslation &moduleTranslation, llvm::IRBuilderBase &builder, llvm::OpenMPIRBuilder &ompBuilder, DataLayout &dl, MapInfosTy &combinedInfo, MapInfoData &mapData, uint64_t mapDataIndex, bool isTargetParams) { + assert(!ompBuilder.Config.isTargetDevice() && + "function only supported for host device codegen"); + // Map the first segment of our structure combinedInfo.Types.emplace_back( isTargetParams @@ -3671,6 +3674,8 @@ static void processMapMembersWithParent( llvm::OpenMPIRBuilder &ompBuilder, DataLayout &dl, MapInfosTy &combinedInfo, MapInfoData &mapData, uint64_t mapDataIndex, llvm::omp::OpenMPOffloadMappingFlags memberOfFlag) { + assert(!ompBuilder.Config.isTargetDevice() && + "function only supported for host device codegen"); auto parentClause = llvm::cast(mapData.MapClause[mapDataIndex]); @@ -3784,6 +3789,9 @@ static void processMapWithMembersOf(LLVM::ModuleTranslation &moduleTranslation, DataLayout &dl, MapInfosTy &combinedInfo, MapInfoData &mapData, uint64_t mapDataIndex, bool isTargetParams) { + assert(!ompBuilder.Config.isTargetDevice() && + "function only supported for host device codegen"); + auto parentClause = llvm::cast(mapData.MapClause[mapDataIndex]); @@ -3825,6 +3833,8 @@ static void createAlteredByCaptureMap(MapInfoData &mapData, LLVM::ModuleTranslation &moduleTranslation, llvm::IRBuilderBase &builder) { + assert(!moduleTranslation.getOpenMPBuilder()->Config.isTargetDevice() && + "function only supported for host device codegen"); for (size_t i = 0; i < mapData.MapClause.size(); ++i) { // if it's declare target, skip it, it's handled separately. if (!mapData.IsDeclareTarget[i]) { @@ -3889,6 +3899,9 @@ static void genMapInfos(llvm::IRBuilderBase &builder, LLVM::ModuleTranslation &moduleTranslation, DataLayout &dl, MapInfosTy &combinedInfo, MapInfoData &mapData, bool isTargetParams = false) { + assert(!moduleTranslation.getOpenMPBuilder()->Config.isTargetDevice() && + "function only supported for host device codegen"); + // We wish to modify some of the methods in which arguments are // passed based on their capture type by the target region, this can // involve generating new loads and stores, which changes the @@ -3900,8 +3913,7 @@ static void genMapInfos(llvm::IRBuilderBase &builder, // kernel arg structure. It primarily becomes relevant in cases like // bycopy, or byref range'd arrays. In the default case, we simply // pass thee pointer byref as both basePointer and pointer. - if (!moduleTranslation.getOpenMPBuilder()->Config.isTargetDevice()) -createAlteredByCaptureMap(mapData, moduleTranslation, builder); + createAlteredByCaptureMap(mapData, moduleTranslation, builder); llvm::OpenMPIRBuilder *ompBuilder = moduleTranslation.getOpenMPBuilder(); @@ -3935,6 +3947,8 @@ emitUserDefinedMapper(Operation *declMapperOp, llvm::IRBuilderBase &bu
[llvm-branch-commits] [flang] [Flang][OpenMP] Minimize host ops remaining in device compilation (PR #137200)
https://github.com/skatrak created https://github.com/llvm/llvm-project/pull/137200 This patch updates the function filtering OpenMP pass intended to remove host functions from the MLIR module created by Flang lowering when targeting an OpenMP target device. Host functions holding target regions must be kept, so that the target regions within them can be translated for the device. The issue is that non-target operations inside these functions cannot be discarded because some of them hold information that is also relevant during target device codegen. Specifically, mapping information resides outside of `omp.target` regions. This patch updates the previous behavior where all host operations were preserved to then ignore all of those that are not actually needed by target device codegen. This, in practice, means only keeping target regions and mapping information needed by the device. Arguments for some of these remaining operations are replaced by placeholder allocations and `fir.undefined`, since they are only actually defined inside of the target regions themselves. As a result, this set of changes makes it possible to later simplify target device codegen, as it is no longer necessary to handle host operations differently to avoid issues. >From c6c8443710c59e765e37c8a21267fe619f9f792a Mon Sep 17 00:00:00 2001 From: Sergio Afonso Date: Tue, 15 Apr 2025 16:59:18 +0100 Subject: [PATCH] [Flang][OpenMP] Minimize host ops remaining in device compilation This patch updates the function filtering OpenMP pass intended to remove host functions from the MLIR module created by Flang lowering when targeting an OpenMP target device. Host functions holding target regions must be kept, so that the target regions within them can be translated for the device. The issue is that non-target operations inside these functions cannot be discarded because some of them hold information that is also relevant during target device codegen. Specifically, mapping information resides outside of `omp.target` regions. This patch updates the previous behavior where all host operations were preserved to then ignore all of those that are not actually needed by target device codegen. This, in practice, means only keeping target regions and mapping information needed by the device. Arguments for some of these remaining operations are replaced by placeholder allocations and `fir.undefined`, since they are only actually defined inside of the target regions themselves. As a result, this set of changes makes it possible to later simplify target device codegen, as it is no longer necessary to handle host operations differently to avoid issues. --- .../include/flang/Optimizer/OpenMP/Passes.td | 3 +- .../Optimizer/OpenMP/FunctionFiltering.cpp| 288 + .../OpenMP/declare-target-link-tarop-cap.f90 | 19 +- flang/test/Lower/OpenMP/host-eval.f90 | 55 ++- flang/test/Lower/OpenMP/real10.f90| 5 +- .../OpenMP/function-filtering-host-ops.mlir | 400 ++ .../function-filtering.mlir} | 0 7 files changed, 738 insertions(+), 32 deletions(-) create mode 100644 flang/test/Transforms/OpenMP/function-filtering-host-ops.mlir rename flang/test/Transforms/{omp-function-filtering.mlir => OpenMP/function-filtering.mlir} (100%) diff --git a/flang/include/flang/Optimizer/OpenMP/Passes.td b/flang/include/flang/Optimizer/OpenMP/Passes.td index fcc7a4ca31fef..dcc97122efdf7 100644 --- a/flang/include/flang/Optimizer/OpenMP/Passes.td +++ b/flang/include/flang/Optimizer/OpenMP/Passes.td @@ -46,7 +46,8 @@ def FunctionFilteringPass : Pass<"omp-function-filtering"> { "for the target device."; let dependentDialects = [ "mlir::func::FuncDialect", -"fir::FIROpsDialect" +"fir::FIROpsDialect", +"mlir::omp::OpenMPDialect" ]; } diff --git a/flang/lib/Optimizer/OpenMP/FunctionFiltering.cpp b/flang/lib/Optimizer/OpenMP/FunctionFiltering.cpp index 9554808824ac3..9e11df77506d6 100644 --- a/flang/lib/Optimizer/OpenMP/FunctionFiltering.cpp +++ b/flang/lib/Optimizer/OpenMP/FunctionFiltering.cpp @@ -13,12 +13,14 @@ #include "flang/Optimizer/Dialect/FIRDialect.h" #include "flang/Optimizer/Dialect/FIROpsSupport.h" +#include "flang/Optimizer/HLFIR/HLFIROps.h" #include "flang/Optimizer/OpenMP/Passes.h" #include "mlir/Dialect/Func/IR/FuncOps.h" #include "mlir/Dialect/OpenMP/OpenMPDialect.h" #include "mlir/Dialect/OpenMP/OpenMPInterfaces.h" #include "mlir/IR/BuiltinOps.h" +#include "llvm/ADT/SetVector.h" #include "llvm/ADT/SmallVector.h" namespace flangomp { @@ -94,6 +96,12 @@ class FunctionFilteringPass funcOp.erase(); return WalkResult::skip(); } + +if (failed(rewriteHostRegion(funcOp.getRegion( { + funcOp.emitOpError() << "could not be rewritten for target device"; + return WalkResult::interrupt(); +} + if (declareTargetOp) declareTargetOp.s
[llvm-branch-commits] [llvm] [mlir] [mlir][OpenMP] Convert omp.cancellation_point to LLVMIR (PR #137205)
llvmbot wrote: @llvm/pr-subscribers-mlir-llvm Author: Tom Eccles (tblah) Changes This is basically identical to cancel except without the if clause. taskgroup will be implemented in a followup PR. --- Full diff: https://github.com/llvm/llvm-project/pull/137205.diff 5 Files Affected: - (modified) llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h (+10) - (modified) llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp (+51) - (modified) mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+34-3) - (added) mlir/test/Target/LLVMIR/openmp-cancellation-point.mlir (+188) - (modified) mlir/test/Target/LLVMIR/openmp-todo.mlir (+10-6) ``diff diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h index 10d69e561a987..14ad8629537f7 100644 --- a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h +++ b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h @@ -686,6 +686,16 @@ class OpenMPIRBuilder { Value *IfCondition, omp::Directive CanceledDirective); + /// Generator for '#omp cancellation point' + /// + /// \param Loc The location where the directive was encountered. + /// \param CanceledDirective The kind of directive that is cancled. + /// + /// \returns The insertion point after the barrier. + InsertPointOrErrorTy + createCancellationPoint(const LocationDescription &Loc, + omp::Directive CanceledDirective); + /// Generator for '#omp parallel' /// /// \param Loc The insert and source location description. diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp index 3f19088e6c73d..06aa61adcd739 100644 --- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp +++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp @@ -1118,6 +1118,57 @@ OpenMPIRBuilder::createCancel(const LocationDescription &Loc, return Builder.saveIP(); } +OpenMPIRBuilder::InsertPointOrErrorTy +OpenMPIRBuilder::createCancellationPoint(const LocationDescription &Loc, + omp::Directive CanceledDirective) { + if (!updateToLocation(Loc)) +return Loc.IP; + + // LLVM utilities like blocks with terminators. + auto *UI = Builder.CreateUnreachable(); + Builder.SetInsertPoint(UI); + + Value *CancelKind = nullptr; + switch (CanceledDirective) { +#define OMP_CANCEL_KIND(Enum, Str, DirectiveEnum, Value) \ + case DirectiveEnum: \ +CancelKind = Builder.getInt32(Value); \ +break; +#include "llvm/Frontend/OpenMP/OMPKinds.def" + default: +llvm_unreachable("Unknown cancel kind!"); + } + + uint32_t SrcLocStrSize; + Constant *SrcLocStr = getOrCreateSrcLocStr(Loc, SrcLocStrSize); + Value *Ident = getOrCreateIdent(SrcLocStr, SrcLocStrSize); + Value *Args[] = {Ident, getOrCreateThreadID(Ident), CancelKind}; + Value *Result = Builder.CreateCall( + getOrCreateRuntimeFunctionPtr(OMPRTL___kmpc_cancellationpoint), Args); + auto ExitCB = [this, CanceledDirective, Loc](InsertPointTy IP) -> Error { +if (CanceledDirective == OMPD_parallel) { + IRBuilder<>::InsertPointGuard IPG(Builder); + Builder.restoreIP(IP); + return createBarrier(LocationDescription(Builder.saveIP(), Loc.DL), + omp::Directive::OMPD_unknown, + /* ForceSimpleCall */ false, + /* CheckCancelFlag */ false) + .takeError(); +} +return Error::success(); + }; + + // The actual cancel logic is shared with others, e.g., cancel_barriers. + if (Error Err = emitCancelationCheckImpl(Result, CanceledDirective, ExitCB)) +return Err; + + // Update the insertion point and remove the terminator we introduced. + Builder.SetInsertPoint(UI->getParent()); + UI->eraseFromParent(); + + return Builder.saveIP(); +} + OpenMPIRBuilder::InsertPointTy OpenMPIRBuilder::emitTargetKernel( const LocationDescription &Loc, InsertPointTy AllocaIP, Value *&Return, Value *Ident, Value *DeviceID, Value *NumTeams, Value *NumThreads, diff --git a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp index 7d8a7ccb6e4ac..afae41f001736 100644 --- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp +++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp @@ -255,6 +255,9 @@ static LogicalResult checkImplementationStatus(Operation &op) { LogicalResult result = success(); llvm::TypeSwitch(op) .Case([&](omp::CancelOp op) { checkCancelDirective(op, result); }) + .Case([&](omp::CancellationPointOp op) { +checkCancelDirective(op, result); + }) .Case([&](omp::DistributeOp op) { checkAllocate(op, result); checkDistSchedule(op, result
[llvm-branch-commits] [llvm] [mlir] [mlir][OpenMP] Convert omp.cancellation_point to LLVMIR (PR #137205)
llvmbot wrote: @llvm/pr-subscribers-mlir Author: Tom Eccles (tblah) Changes This is basically identical to cancel except without the if clause. taskgroup will be implemented in a followup PR. --- Full diff: https://github.com/llvm/llvm-project/pull/137205.diff 5 Files Affected: - (modified) llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h (+10) - (modified) llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp (+51) - (modified) mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+34-3) - (added) mlir/test/Target/LLVMIR/openmp-cancellation-point.mlir (+188) - (modified) mlir/test/Target/LLVMIR/openmp-todo.mlir (+10-6) ``diff diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h index 10d69e561a987..14ad8629537f7 100644 --- a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h +++ b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h @@ -686,6 +686,16 @@ class OpenMPIRBuilder { Value *IfCondition, omp::Directive CanceledDirective); + /// Generator for '#omp cancellation point' + /// + /// \param Loc The location where the directive was encountered. + /// \param CanceledDirective The kind of directive that is cancled. + /// + /// \returns The insertion point after the barrier. + InsertPointOrErrorTy + createCancellationPoint(const LocationDescription &Loc, + omp::Directive CanceledDirective); + /// Generator for '#omp parallel' /// /// \param Loc The insert and source location description. diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp index 3f19088e6c73d..06aa61adcd739 100644 --- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp +++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp @@ -1118,6 +1118,57 @@ OpenMPIRBuilder::createCancel(const LocationDescription &Loc, return Builder.saveIP(); } +OpenMPIRBuilder::InsertPointOrErrorTy +OpenMPIRBuilder::createCancellationPoint(const LocationDescription &Loc, + omp::Directive CanceledDirective) { + if (!updateToLocation(Loc)) +return Loc.IP; + + // LLVM utilities like blocks with terminators. + auto *UI = Builder.CreateUnreachable(); + Builder.SetInsertPoint(UI); + + Value *CancelKind = nullptr; + switch (CanceledDirective) { +#define OMP_CANCEL_KIND(Enum, Str, DirectiveEnum, Value) \ + case DirectiveEnum: \ +CancelKind = Builder.getInt32(Value); \ +break; +#include "llvm/Frontend/OpenMP/OMPKinds.def" + default: +llvm_unreachable("Unknown cancel kind!"); + } + + uint32_t SrcLocStrSize; + Constant *SrcLocStr = getOrCreateSrcLocStr(Loc, SrcLocStrSize); + Value *Ident = getOrCreateIdent(SrcLocStr, SrcLocStrSize); + Value *Args[] = {Ident, getOrCreateThreadID(Ident), CancelKind}; + Value *Result = Builder.CreateCall( + getOrCreateRuntimeFunctionPtr(OMPRTL___kmpc_cancellationpoint), Args); + auto ExitCB = [this, CanceledDirective, Loc](InsertPointTy IP) -> Error { +if (CanceledDirective == OMPD_parallel) { + IRBuilder<>::InsertPointGuard IPG(Builder); + Builder.restoreIP(IP); + return createBarrier(LocationDescription(Builder.saveIP(), Loc.DL), + omp::Directive::OMPD_unknown, + /* ForceSimpleCall */ false, + /* CheckCancelFlag */ false) + .takeError(); +} +return Error::success(); + }; + + // The actual cancel logic is shared with others, e.g., cancel_barriers. + if (Error Err = emitCancelationCheckImpl(Result, CanceledDirective, ExitCB)) +return Err; + + // Update the insertion point and remove the terminator we introduced. + Builder.SetInsertPoint(UI->getParent()); + UI->eraseFromParent(); + + return Builder.saveIP(); +} + OpenMPIRBuilder::InsertPointTy OpenMPIRBuilder::emitTargetKernel( const LocationDescription &Loc, InsertPointTy AllocaIP, Value *&Return, Value *Ident, Value *DeviceID, Value *NumTeams, Value *NumThreads, diff --git a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp index 7d8a7ccb6e4ac..afae41f001736 100644 --- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp +++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp @@ -255,6 +255,9 @@ static LogicalResult checkImplementationStatus(Operation &op) { LogicalResult result = success(); llvm::TypeSwitch(op) .Case([&](omp::CancelOp op) { checkCancelDirective(op, result); }) + .Case([&](omp::CancellationPointOp op) { +checkCancelDirective(op, result); + }) .Case([&](omp::DistributeOp op) { checkAllocate(op, result); checkDistSchedule(op, result); @@
[llvm-branch-commits] [llvm] [mlir] [mlir][OpenMP] Convert omp.cancellation_point to LLVMIR (PR #137205)
tblah wrote: PR Stack: - Cancel parallel https://github.com/llvm/llvm-project/pull/137192 - Cancel sections https://github.com/llvm/llvm-project/pull/137193 - Cancel wsloop https://github.com/llvm/llvm-project/pull/137194 - Cancellation point https://github.com/llvm/llvm-project/pull/137205 - Cancel(lation point) taskgroup (TODO) https://github.com/llvm/llvm-project/pull/137205 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [mlir] [mlir][OpenMP] Convert omp.cancel sections to LLVMIR (PR #137193)
llvmbot wrote: @llvm/pr-subscribers-flang-openmp Author: Tom Eccles (tblah) Changes This is quite ugly but it is the best I could think of. The old FiniCBWrapper was way too brittle depending upon the exact block structure inside of the section, and could be confused by any control flow in the section (e.g. an if clause on cancel). The wording in the comment and variable names didn't seem to match where it was actually branching too as well. Clang's (non-OpenMPIRBuilder) lowering for cancel inside of sections branches to a block containing __kmpc_for_static_fini. This was hard to achieve here because sometimes the FiniCBWrapper has to run before the worksharing loop finalization has been crated. To get around this ordering issue I created a dummy branch to a dummy block, which is then fixed later once all of the information is available. --- Full diff: https://github.com/llvm/llvm-project/pull/137193.diff 4 Files Affected: - (modified) llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp (+17-10) - (modified) mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+4-2) - (modified) mlir/test/Target/LLVMIR/openmp-cancel.mlir (+76) - (modified) mlir/test/Target/LLVMIR/openmp-todo.mlir (-16) ``diff diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp index be05f01c94603..3f19088e6c73d 100644 --- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp +++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp @@ -2172,6 +2172,9 @@ OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::createSections( if (!updateToLocation(Loc)) return Loc.IP; + // FiniCBWrapper needs to create a branch to the loop finalization block, but + // this has not been created yet at some times when this callback runs. + SmallVector CancellationBranches; auto FiniCBWrapper = [&](InsertPointTy IP) { if (IP.getBlock()->end() != IP.getPoint()) return FiniCB(IP); @@ -2179,16 +2182,9 @@ OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::createSections( // will fail because that function requires the Finalization Basic Block to // have a terminator, which is already removed by EmitOMPRegionBody. // IP is currently at cancelation block. -// We need to backtrack to the condition block to fetch -// the exit block and create a branch from cancelation -// to exit block. -IRBuilder<>::InsertPointGuard IPG(Builder); -Builder.restoreIP(IP); -auto *CaseBB = IP.getBlock()->getSinglePredecessor(); -auto *CondBB = CaseBB->getSinglePredecessor()->getSinglePredecessor(); -auto *ExitBB = CondBB->getTerminator()->getSuccessor(1); -Instruction *I = Builder.CreateBr(ExitBB); -IP = InsertPointTy(I->getParent(), I->getIterator()); +BranchInst *DummyBranch = Builder.CreateBr(IP.getBlock()); +IP = InsertPointTy(DummyBranch->getParent(), DummyBranch->getIterator()); +CancellationBranches.push_back(DummyBranch); return FiniCB(IP); }; @@ -2251,6 +2247,9 @@ OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::createSections( return WsloopIP.takeError(); InsertPointTy AfterIP = *WsloopIP; + BasicBlock *LoopFini = AfterIP.getBlock()->getSinglePredecessor(); + assert(LoopFini && "Bad structure of static workshare loop finalization"); + // Apply the finalization callback in LoopAfterBB auto FiniInfo = FinalizationStack.pop_back_val(); assert(FiniInfo.DK == OMPD_sections && @@ -2264,6 +2263,14 @@ OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::createSections( AfterIP = {FiniBB, FiniBB->begin()}; } + // Now we can fix the dummy branch to point to the right place + if (!CancellationBranches.empty()) { +for (BranchInst *DummyBranch : CancellationBranches) { + assert(DummyBranch->getNumSuccessors() == 1); + DummyBranch->setSuccessor(0, LoopFini); +} + } + return AfterIP; } diff --git a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp index 6185a433a8199..d1885641f389d 100644 --- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp +++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp @@ -161,7 +161,8 @@ static LogicalResult checkImplementationStatus(Operation &op) { auto checkCancelDirective = [&todo](auto op, LogicalResult &result) { omp::ClauseCancellationConstructType cancelledDirective = op.getCancelDirective(); -if (cancelledDirective != omp::ClauseCancellationConstructType::Parallel) +if (cancelledDirective != omp::ClauseCancellationConstructType::Parallel && +cancelledDirective != omp::ClauseCancellationConstructType::Sections) result = todo("cancel directive"); }; auto checkDepend = [&todo](auto op, LogicalResult &result) { @@ -1690,10 +1691,11 @@ convertOmpSections(Operation &opInst, llvm::IRBuilderBase &builder, auto finiCB = [&](InsertPointTy codeGe
[llvm-branch-commits] [mlir] [mlir][OpenMP] convert wsloop cancellation to LLVMIR (PR #137194)
https://github.com/tblah created https://github.com/llvm/llvm-project/pull/137194 Taskloop support will follow in a later patch. >From bb374c9f98cb13e55c9ce7d129f567428e58c24e Mon Sep 17 00:00:00 2001 From: Tom Eccles Date: Tue, 15 Apr 2025 15:05:50 + Subject: [PATCH] [mlir][OpenMP] convert wsloop cancellation to LLVMIR Taskloop support will follow in a later patch. --- .../OpenMP/OpenMPToLLVMIRTranslation.cpp | 40 - mlir/test/Target/LLVMIR/openmp-cancel.mlir| 87 +++ mlir/test/Target/LLVMIR/openmp-todo.mlir | 16 3 files changed, 125 insertions(+), 18 deletions(-) diff --git a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp index d1885641f389d..7d8a7ccb6e4ac 100644 --- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp +++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp @@ -161,8 +161,7 @@ static LogicalResult checkImplementationStatus(Operation &op) { auto checkCancelDirective = [&todo](auto op, LogicalResult &result) { omp::ClauseCancellationConstructType cancelledDirective = op.getCancelDirective(); -if (cancelledDirective != omp::ClauseCancellationConstructType::Parallel && -cancelledDirective != omp::ClauseCancellationConstructType::Sections) +if (cancelledDirective == omp::ClauseCancellationConstructType::Taskgroup) result = todo("cancel directive"); }; auto checkDepend = [&todo](auto op, LogicalResult &result) { @@ -2360,6 +2359,30 @@ convertOmpWsloop(Operation &opInst, llvm::IRBuilderBase &builder, ? llvm::omp::WorksharingLoopType::DistributeForStaticLoop : llvm::omp::WorksharingLoopType::ForStaticLoop; + SmallVector cancelTerminators; + // This callback is invoked only if there is cancellation inside of the wsloop + // body. + auto finiCB = [&](llvm::OpenMPIRBuilder::InsertPointTy ip) -> llvm::Error { +llvm::IRBuilderBase &llvmBuilder = ompBuilder->Builder; +llvm::IRBuilderBase::InsertPointGuard guard(llvmBuilder); + +// ip is currently in the block branched to if cancellation occured. +// We need to create a branch to terminate that block. +llvmBuilder.restoreIP(ip); + +// We must still clean up the wsloop after cancelling it, so we need to +// branch to the block that finalizes the wsloop. +// That block has not been created yet so use this block as a dummy for now +// and fix this after creating the wsloop. +cancelTerminators.push_back(llvmBuilder.CreateBr(ip.getBlock())); +return llvm::Error::success(); + }; + // We have to add the cleanup to the OpenMPIRBuilder before the body gets + // created in case the body contains omp.cancel (which will then expect to be + // able to find this cleanup callback). + ompBuilder->pushFinalizationCB({finiCB, llvm::omp::Directive::OMPD_for, + constructIsCancellable(wsloopOp)}); + llvm::OpenMPIRBuilder::LocationDescription ompLoc(builder); llvm::Expected regionBlock = convertOmpOpRegions( wsloopOp.getRegion(), "omp.wsloop.region", builder, moduleTranslation); @@ -2381,6 +2404,19 @@ convertOmpWsloop(Operation &opInst, llvm::IRBuilderBase &builder, if (failed(handleError(wsloopIP, opInst))) return failure(); + ompBuilder->popFinalizationCB(); + if (!cancelTerminators.empty()) { +// If we cancelled the loop, we should branch to the finalization block of +// the wsloop (which is always immediately before the loop continuation +// block). Now the finalization has been created, we can fix the branch. +llvm::BasicBlock *wsloopFini = wsloopIP->getBlock()->getSinglePredecessor(); +for (llvm::BranchInst *cancelBranch : cancelTerminators) { + assert(cancelBranch->getNumSuccessors() == 1 && + "cancel branch should have one target"); + cancelBranch->setSuccessor(0, wsloopFini); +} + } + // Process the reductions if required. if (failed(createReductionsAndCleanup( wsloopOp, builder, moduleTranslation, allocaIP, reductionDecls, diff --git a/mlir/test/Target/LLVMIR/openmp-cancel.mlir b/mlir/test/Target/LLVMIR/openmp-cancel.mlir index fca16b936fc85..3c195a98d1000 100644 --- a/mlir/test/Target/LLVMIR/openmp-cancel.mlir +++ b/mlir/test/Target/LLVMIR/openmp-cancel.mlir @@ -156,3 +156,90 @@ llvm.func @cancel_sections_if(%cond : i1) { // CHECK: ret void // CHECK: .cncl:; preds = %[[VAL_27]] // CHECK: br label %[[VAL_19]] + +llvm.func @cancel_wsloop_if(%lb : i32, %ub : i32, %step : i32, %cond : i1) { + omp.wsloop { +omp.loop_nest (%iv) : i32 = (%lb) to (%ub) step (%step) { + omp.cancel cancellation_construct_type(loop) if(%cond) + omp.yield +} + } + llvm.return +} +// CHECK-LABEL: define void @cancel_wsloop_if +// CHECK: %[[VAL_0:.*]] = alloca i32, alig
[llvm-branch-commits] [mlir] [mlir][OpenMP] convert wsloop cancellation to LLVMIR (PR #137194)
llvmbot wrote: @llvm/pr-subscribers-mlir Author: Tom Eccles (tblah) Changes Taskloop support will follow in a later patch. --- Full diff: https://github.com/llvm/llvm-project/pull/137194.diff 3 Files Affected: - (modified) mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+38-2) - (modified) mlir/test/Target/LLVMIR/openmp-cancel.mlir (+87) - (modified) mlir/test/Target/LLVMIR/openmp-todo.mlir (-16) ``diff diff --git a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp index d1885641f389d..7d8a7ccb6e4ac 100644 --- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp +++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp @@ -161,8 +161,7 @@ static LogicalResult checkImplementationStatus(Operation &op) { auto checkCancelDirective = [&todo](auto op, LogicalResult &result) { omp::ClauseCancellationConstructType cancelledDirective = op.getCancelDirective(); -if (cancelledDirective != omp::ClauseCancellationConstructType::Parallel && -cancelledDirective != omp::ClauseCancellationConstructType::Sections) +if (cancelledDirective == omp::ClauseCancellationConstructType::Taskgroup) result = todo("cancel directive"); }; auto checkDepend = [&todo](auto op, LogicalResult &result) { @@ -2360,6 +2359,30 @@ convertOmpWsloop(Operation &opInst, llvm::IRBuilderBase &builder, ? llvm::omp::WorksharingLoopType::DistributeForStaticLoop : llvm::omp::WorksharingLoopType::ForStaticLoop; + SmallVector cancelTerminators; + // This callback is invoked only if there is cancellation inside of the wsloop + // body. + auto finiCB = [&](llvm::OpenMPIRBuilder::InsertPointTy ip) -> llvm::Error { +llvm::IRBuilderBase &llvmBuilder = ompBuilder->Builder; +llvm::IRBuilderBase::InsertPointGuard guard(llvmBuilder); + +// ip is currently in the block branched to if cancellation occured. +// We need to create a branch to terminate that block. +llvmBuilder.restoreIP(ip); + +// We must still clean up the wsloop after cancelling it, so we need to +// branch to the block that finalizes the wsloop. +// That block has not been created yet so use this block as a dummy for now +// and fix this after creating the wsloop. +cancelTerminators.push_back(llvmBuilder.CreateBr(ip.getBlock())); +return llvm::Error::success(); + }; + // We have to add the cleanup to the OpenMPIRBuilder before the body gets + // created in case the body contains omp.cancel (which will then expect to be + // able to find this cleanup callback). + ompBuilder->pushFinalizationCB({finiCB, llvm::omp::Directive::OMPD_for, + constructIsCancellable(wsloopOp)}); + llvm::OpenMPIRBuilder::LocationDescription ompLoc(builder); llvm::Expected regionBlock = convertOmpOpRegions( wsloopOp.getRegion(), "omp.wsloop.region", builder, moduleTranslation); @@ -2381,6 +2404,19 @@ convertOmpWsloop(Operation &opInst, llvm::IRBuilderBase &builder, if (failed(handleError(wsloopIP, opInst))) return failure(); + ompBuilder->popFinalizationCB(); + if (!cancelTerminators.empty()) { +// If we cancelled the loop, we should branch to the finalization block of +// the wsloop (which is always immediately before the loop continuation +// block). Now the finalization has been created, we can fix the branch. +llvm::BasicBlock *wsloopFini = wsloopIP->getBlock()->getSinglePredecessor(); +for (llvm::BranchInst *cancelBranch : cancelTerminators) { + assert(cancelBranch->getNumSuccessors() == 1 && + "cancel branch should have one target"); + cancelBranch->setSuccessor(0, wsloopFini); +} + } + // Process the reductions if required. if (failed(createReductionsAndCleanup( wsloopOp, builder, moduleTranslation, allocaIP, reductionDecls, diff --git a/mlir/test/Target/LLVMIR/openmp-cancel.mlir b/mlir/test/Target/LLVMIR/openmp-cancel.mlir index fca16b936fc85..3c195a98d1000 100644 --- a/mlir/test/Target/LLVMIR/openmp-cancel.mlir +++ b/mlir/test/Target/LLVMIR/openmp-cancel.mlir @@ -156,3 +156,90 @@ llvm.func @cancel_sections_if(%cond : i1) { // CHECK: ret void // CHECK: .cncl:; preds = %[[VAL_27]] // CHECK: br label %[[VAL_19]] + +llvm.func @cancel_wsloop_if(%lb : i32, %ub : i32, %step : i32, %cond : i1) { + omp.wsloop { +omp.loop_nest (%iv) : i32 = (%lb) to (%ub) step (%step) { + omp.cancel cancellation_construct_type(loop) if(%cond) + omp.yield +} + } + llvm.return +} +// CHECK-LABEL: define void @cancel_wsloop_if +// CHECK: %[[VAL_0:.*]] = alloca i32, align 4 +// CHECK: %[[VAL_1:.*]] = alloca i32, align 4 +// CHECK: %[[VAL_2:.*]] = alloca i32, align 4 +// CHECK: %[[VAL_3:.*]] = alloca i32, align 4 +//
[llvm-branch-commits] [llvm] release/20.x: [GlobalOpt] Do not promote malloc if there are atomic loads/stores (#137158) (PR #137179)
https://github.com/llvmbot created https://github.com/llvm/llvm-project/pull/137179 Backport 57530c23a53b5e003d389437637f61c5b9814e22 Requested by: @nikic >From 7972daff23be375db8218ac8ea04e8b9e18fb2b3 Mon Sep 17 00:00:00 2001 From: Nikita Popov Date: Thu, 24 Apr 2025 15:15:47 +0200 Subject: [PATCH] [GlobalOpt] Do not promote malloc if there are atomic loads/stores (#137158) When converting a malloc stored to a global into a global, we will introduce an i1 flag to track whether the global has been initialized. In case of atomic loads/stores, this will result in verifier failures, because atomic ops on i1 are illegal. Even if we changed this to i8, I don't think it is a good idea to change atomic types in that way. Instead, bail out of the transform is we encounter any atomic loads/stores of the global. Fixes https://github.com/llvm/llvm-project/issues/137152. (cherry picked from commit 57530c23a53b5e003d389437637f61c5b9814e22) --- llvm/lib/Transforms/IPO/GlobalOpt.cpp | 4 +++ .../GlobalOpt/malloc-promote-atomic.ll| 28 +++ 2 files changed, 32 insertions(+) create mode 100644 llvm/test/Transforms/GlobalOpt/malloc-promote-atomic.ll diff --git a/llvm/lib/Transforms/IPO/GlobalOpt.cpp b/llvm/lib/Transforms/IPO/GlobalOpt.cpp index 9586fc97a39f7..236a531317678 100644 --- a/llvm/lib/Transforms/IPO/GlobalOpt.cpp +++ b/llvm/lib/Transforms/IPO/GlobalOpt.cpp @@ -719,10 +719,14 @@ static bool allUsesOfLoadedValueWillTrapIfNull(const GlobalVariable *GV) { const Value *P = Worklist.pop_back_val(); for (const auto *U : P->users()) { if (auto *LI = dyn_cast(U)) { +if (!LI->isSimple()) + return false; SmallPtrSet PHIs; if (!AllUsesOfValueWillTrapIfNull(LI, PHIs)) return false; } else if (auto *SI = dyn_cast(U)) { +if (!SI->isSimple()) + return false; // Ignore stores to the global. if (SI->getPointerOperand() != P) return false; diff --git a/llvm/test/Transforms/GlobalOpt/malloc-promote-atomic.ll b/llvm/test/Transforms/GlobalOpt/malloc-promote-atomic.ll new file mode 100644 index 0..0ecdf095efdd8 --- /dev/null +++ b/llvm/test/Transforms/GlobalOpt/malloc-promote-atomic.ll @@ -0,0 +1,28 @@ +; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5 +; RUN: opt -passes=globalopt -S < %s | FileCheck %s + +@g = internal global ptr null, align 8 + +define void @init() { +; CHECK-LABEL: define void @init() local_unnamed_addr { +; CHECK-NEXT:[[ALLOC:%.*]] = call ptr @malloc(i64 48) +; CHECK-NEXT:store atomic ptr [[ALLOC]], ptr @g seq_cst, align 8 +; CHECK-NEXT:ret void +; + %alloc = call ptr @malloc(i64 48) + store atomic ptr %alloc, ptr @g seq_cst, align 8 + ret void +} + +define i1 @check() { +; CHECK-LABEL: define i1 @check() local_unnamed_addr { +; CHECK-NEXT:[[VAL:%.*]] = load atomic ptr, ptr @g seq_cst, align 8 +; CHECK-NEXT:[[CMP:%.*]] = icmp eq ptr [[VAL]], null +; CHECK-NEXT:ret i1 [[CMP]] +; + %val = load atomic ptr, ptr @g seq_cst, align 8 + %cmp = icmp eq ptr %val, null + ret i1 %cmp +} + +declare ptr @malloc(i64) allockind("alloc,uninitialized") allocsize(0) ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/20.x: [GlobalOpt] Do not promote malloc if there are atomic loads/stores (#137158) (PR #137179)
llvmbot wrote: @fhahn What do you think about merging this PR to the release branch? https://github.com/llvm/llvm-project/pull/137179 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/20.x: [GlobalOpt] Do not promote malloc if there are atomic loads/stores (#137158) (PR #137179)
https://github.com/llvmbot milestoned https://github.com/llvm/llvm-project/pull/137179 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] LiveRangeShrink: Early exit when encountering a code motion barrier. (PR #136806)
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/136806 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Add noundef to mbcnt intrinsic returns (PR #136304)
https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/136304 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Add noundef to mbcnt intrinsic returns (PR #136304)
https://github.com/arsenm commented: ping https://github.com/llvm/llvm-project/pull/136304 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [mlir] [mlir][OpenMP] Convert omp.cancel sections to LLVMIR (PR #137193)
https://github.com/tblah created https://github.com/llvm/llvm-project/pull/137193 This is quite ugly but it is the best I could think of. The old FiniCBWrapper was way too brittle depending upon the exact block structure inside of the section, and could be confused by any control flow in the section (e.g. an if clause on cancel). The wording in the comment and variable names didn't seem to match where it was actually branching too as well. Clang's (non-OpenMPIRBuilder) lowering for cancel inside of sections branches to a block containing __kmpc_for_static_fini. This was hard to achieve here because sometimes the FiniCBWrapper has to run before the worksharing loop finalization has been crated. To get around this ordering issue I created a dummy branch to a dummy block, which is then fixed later once all of the information is available. >From aa2445b4b8cfd3253464ffb466bf4f84fb5a488f Mon Sep 17 00:00:00 2001 From: Tom Eccles Date: Thu, 10 Apr 2025 11:43:18 + Subject: [PATCH] [mlir][OpenMP] Convert omp.cancel sections to LLVMIR This is quite ugly but it is the best I could think of. The old FiniCBWrapper was way too brittle depending upon the exact block structure inside of the section, and could be confused by any control flow in the section (e.g. an if clause on cancel). The wording in the comment and variable names didn't seem to match where it was actually branching too as well. Clang's (non-OpenMPIRBuilder) lowering for cancel inside of sections branches to a block containing __kmpc_for_static_fini. This was hard to achieve here because sometimes the FiniCBWrapper has to run before the worksharing loop finalization has been crated. To get around this ordering issue I created a dummy branch to a dummy block, which is then fixed later once all of the information is available. --- llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp | 27 --- .../OpenMP/OpenMPToLLVMIRTranslation.cpp | 6 +- mlir/test/Target/LLVMIR/openmp-cancel.mlir| 76 +++ mlir/test/Target/LLVMIR/openmp-todo.mlir | 16 4 files changed, 97 insertions(+), 28 deletions(-) diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp index be05f01c94603..3f19088e6c73d 100644 --- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp +++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp @@ -2172,6 +2172,9 @@ OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::createSections( if (!updateToLocation(Loc)) return Loc.IP; + // FiniCBWrapper needs to create a branch to the loop finalization block, but + // this has not been created yet at some times when this callback runs. + SmallVector CancellationBranches; auto FiniCBWrapper = [&](InsertPointTy IP) { if (IP.getBlock()->end() != IP.getPoint()) return FiniCB(IP); @@ -2179,16 +2182,9 @@ OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::createSections( // will fail because that function requires the Finalization Basic Block to // have a terminator, which is already removed by EmitOMPRegionBody. // IP is currently at cancelation block. -// We need to backtrack to the condition block to fetch -// the exit block and create a branch from cancelation -// to exit block. -IRBuilder<>::InsertPointGuard IPG(Builder); -Builder.restoreIP(IP); -auto *CaseBB = IP.getBlock()->getSinglePredecessor(); -auto *CondBB = CaseBB->getSinglePredecessor()->getSinglePredecessor(); -auto *ExitBB = CondBB->getTerminator()->getSuccessor(1); -Instruction *I = Builder.CreateBr(ExitBB); -IP = InsertPointTy(I->getParent(), I->getIterator()); +BranchInst *DummyBranch = Builder.CreateBr(IP.getBlock()); +IP = InsertPointTy(DummyBranch->getParent(), DummyBranch->getIterator()); +CancellationBranches.push_back(DummyBranch); return FiniCB(IP); }; @@ -2251,6 +2247,9 @@ OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::createSections( return WsloopIP.takeError(); InsertPointTy AfterIP = *WsloopIP; + BasicBlock *LoopFini = AfterIP.getBlock()->getSinglePredecessor(); + assert(LoopFini && "Bad structure of static workshare loop finalization"); + // Apply the finalization callback in LoopAfterBB auto FiniInfo = FinalizationStack.pop_back_val(); assert(FiniInfo.DK == OMPD_sections && @@ -2264,6 +2263,14 @@ OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::createSections( AfterIP = {FiniBB, FiniBB->begin()}; } + // Now we can fix the dummy branch to point to the right place + if (!CancellationBranches.empty()) { +for (BranchInst *DummyBranch : CancellationBranches) { + assert(DummyBranch->getNumSuccessors() == 1); + DummyBranch->setSuccessor(0, LoopFini); +} + } + return AfterIP; } diff --git a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp index 6185a433a8199..d1885641f389d 100644 --- a/mlir/li
[llvm-branch-commits] [llvm] [GOFF] Add writing of text records (PR #137235)
llvmbot wrote: @llvm/pr-subscribers-backend-systemz Author: Kai Nacke (redstar) Changes Sections which are not allowed to carry data are marked as virtual. Only complication when writing out the text is that it must be written in chunks of 32k-1 bytes, which is done by having a wrapper stream writing those records. Data of BSS sections is not written, since the contents is known to be zero. Instead, the fill byte value is used. --- Full diff: https://github.com/llvm/llvm-project/pull/137235.diff 8 Files Affected: - (modified) llvm/include/llvm/MC/MCContext.h (+4-2) - (modified) llvm/include/llvm/MC/MCSectionGOFF.h (+32-15) - (modified) llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp (+3-2) - (modified) llvm/lib/MC/GOFFObjectWriter.cpp (+78) - (modified) llvm/lib/MC/MCContext.cpp (+12-7) - (modified) llvm/lib/MC/MCObjectFileInfo.cpp (+4-4) - (modified) llvm/test/CodeGen/SystemZ/zos-section-1.ll (+22-5) - (modified) llvm/test/CodeGen/SystemZ/zos-section-2.ll (+25-5) ``diff diff --git a/llvm/include/llvm/MC/MCContext.h b/llvm/include/llvm/MC/MCContext.h index 8eb904966f4de..2cdba64be116d 100644 --- a/llvm/include/llvm/MC/MCContext.h +++ b/llvm/include/llvm/MC/MCContext.h @@ -366,7 +366,8 @@ class MCContext { template MCSectionGOFF *getGOFFSection(SectionKind Kind, StringRef Name, -TAttr SDAttributes, MCSection *Parent); +TAttr SDAttributes, MCSection *Parent, +bool IsVirtual); /// Map of currently defined macros. StringMap MacroMap; @@ -607,7 +608,8 @@ class MCContext { MCSectionGOFF *getGOFFSection(SectionKind Kind, StringRef Name, GOFF::SDAttr SDAttributes); MCSectionGOFF *getGOFFSection(SectionKind Kind, StringRef Name, -GOFF::EDAttr EDAttributes, MCSection *Parent); +GOFF::EDAttr EDAttributes, MCSection *Parent, +bool IsVirtual); MCSectionGOFF *getGOFFSection(SectionKind Kind, StringRef Name, GOFF::PRAttr PRAttributes, MCSection *Parent); diff --git a/llvm/include/llvm/MC/MCSectionGOFF.h b/llvm/include/llvm/MC/MCSectionGOFF.h index b8b8cf112a34d..b2ca74c3ba78a 100644 --- a/llvm/include/llvm/MC/MCSectionGOFF.h +++ b/llvm/include/llvm/MC/MCSectionGOFF.h @@ -39,6 +39,9 @@ class MCSectionGOFF final : public MCSection { // The type of this section. GOFF::ESDSymbolType SymbolType; + // This section is a BSS section. + unsigned IsBSS : 1; + // Indicates that the PR symbol needs to set the length of the section to a // non-zero value. This is only a problem with the ADA PR - the binder will // generate an error in this case. @@ -50,26 +53,26 @@ class MCSectionGOFF final : public MCSection { friend class MCContext; friend class MCSymbolGOFF; - MCSectionGOFF(StringRef Name, SectionKind K, GOFF::SDAttr SDAttributes, -MCSectionGOFF *Parent) - : MCSection(SV_GOFF, Name, K.isText(), /*IsVirtual=*/false, nullptr), + MCSectionGOFF(StringRef Name, SectionKind K, bool IsVirtual, +GOFF::SDAttr SDAttributes, MCSectionGOFF *Parent) + : MCSection(SV_GOFF, Name, K.isText(), IsVirtual, nullptr), Parent(Parent), SDAttributes(SDAttributes), -SymbolType(GOFF::ESD_ST_SectionDefinition), RequiresNonZeroLength(0), -Emitted(0) {} +SymbolType(GOFF::ESD_ST_SectionDefinition), IsBSS(K.isBSS()), +RequiresNonZeroLength(0), Emitted(0) {} - MCSectionGOFF(StringRef Name, SectionKind K, GOFF::EDAttr EDAttributes, -MCSectionGOFF *Parent) - : MCSection(SV_GOFF, Name, K.isText(), /*IsVirtual=*/false, nullptr), + MCSectionGOFF(StringRef Name, SectionKind K, bool IsVirtual, +GOFF::EDAttr EDAttributes, MCSectionGOFF *Parent) + : MCSection(SV_GOFF, Name, K.isText(), IsVirtual, nullptr), Parent(Parent), EDAttributes(EDAttributes), -SymbolType(GOFF::ESD_ST_ElementDefinition), RequiresNonZeroLength(0), -Emitted(0) {} +SymbolType(GOFF::ESD_ST_ElementDefinition), IsBSS(K.isBSS()), +RequiresNonZeroLength(0), Emitted(0) {} - MCSectionGOFF(StringRef Name, SectionKind K, GOFF::PRAttr PRAttributes, -MCSectionGOFF *Parent) - : MCSection(SV_GOFF, Name, K.isText(), /*IsVirtual=*/false, nullptr), + MCSectionGOFF(StringRef Name, SectionKind K, bool IsVirtual, +GOFF::PRAttr PRAttributes, MCSectionGOFF *Parent) + : MCSection(SV_GOFF, Name, K.isText(), IsVirtual, nullptr), Parent(Parent), PRAttributes(PRAttributes), -SymbolType(GOFF::ESD_ST_PartReference), RequiresNonZeroLength(0), -Emitted(0) {} +SymbolType(GOFF::ESD_ST_PartReference), IsBSS(K.isBSS()), +RequiresNonZeroLength(0), Emitted(0) {} public: void printSwitchToSection(const MCAsmInfo &MAI, const Triple &T,
[llvm-branch-commits] [llvm] [GOFF] Add writing of text records (PR #137235)
https://github.com/redstar created https://github.com/llvm/llvm-project/pull/137235 Sections which are not allowed to carry data are marked as virtual. Only complication when writing out the text is that it must be written in chunks of 32k-1 bytes, which is done by having a wrapper stream writing those records. Data of BSS sections is not written, since the contents is known to be zero. Instead, the fill byte value is used. >From 0e5c36a691fcbaa6f63c46f4cf86fa16857e137c Mon Sep 17 00:00:00 2001 From: Kai Nacke Date: Wed, 9 Apr 2025 15:08:52 -0400 Subject: [PATCH] [GOFF] Add writing of text records Sections which are not allowed to carry data are marked as virtual. Only complication when writing out the text is that it must be written in chunks of 32k-1 bytes, which is done by having a wrapper stream writing those records. Data of BSS sections is not written, since the contents is known to be zero. Instead, the fill byte value is used. --- llvm/include/llvm/MC/MCContext.h | 6 +- llvm/include/llvm/MC/MCSectionGOFF.h | 47 +++ .../CodeGen/TargetLoweringObjectFileImpl.cpp | 5 +- llvm/lib/MC/GOFFObjectWriter.cpp | 78 +++ llvm/lib/MC/MCContext.cpp | 19 +++-- llvm/lib/MC/MCObjectFileInfo.cpp | 8 +- llvm/test/CodeGen/SystemZ/zos-section-1.ll| 27 +-- llvm/test/CodeGen/SystemZ/zos-section-2.ll| 30 +-- 8 files changed, 180 insertions(+), 40 deletions(-) diff --git a/llvm/include/llvm/MC/MCContext.h b/llvm/include/llvm/MC/MCContext.h index 8eb904966f4de..2cdba64be116d 100644 --- a/llvm/include/llvm/MC/MCContext.h +++ b/llvm/include/llvm/MC/MCContext.h @@ -366,7 +366,8 @@ class MCContext { template MCSectionGOFF *getGOFFSection(SectionKind Kind, StringRef Name, -TAttr SDAttributes, MCSection *Parent); +TAttr SDAttributes, MCSection *Parent, +bool IsVirtual); /// Map of currently defined macros. StringMap MacroMap; @@ -607,7 +608,8 @@ class MCContext { MCSectionGOFF *getGOFFSection(SectionKind Kind, StringRef Name, GOFF::SDAttr SDAttributes); MCSectionGOFF *getGOFFSection(SectionKind Kind, StringRef Name, -GOFF::EDAttr EDAttributes, MCSection *Parent); +GOFF::EDAttr EDAttributes, MCSection *Parent, +bool IsVirtual); MCSectionGOFF *getGOFFSection(SectionKind Kind, StringRef Name, GOFF::PRAttr PRAttributes, MCSection *Parent); diff --git a/llvm/include/llvm/MC/MCSectionGOFF.h b/llvm/include/llvm/MC/MCSectionGOFF.h index b8b8cf112a34d..b2ca74c3ba78a 100644 --- a/llvm/include/llvm/MC/MCSectionGOFF.h +++ b/llvm/include/llvm/MC/MCSectionGOFF.h @@ -39,6 +39,9 @@ class MCSectionGOFF final : public MCSection { // The type of this section. GOFF::ESDSymbolType SymbolType; + // This section is a BSS section. + unsigned IsBSS : 1; + // Indicates that the PR symbol needs to set the length of the section to a // non-zero value. This is only a problem with the ADA PR - the binder will // generate an error in this case. @@ -50,26 +53,26 @@ class MCSectionGOFF final : public MCSection { friend class MCContext; friend class MCSymbolGOFF; - MCSectionGOFF(StringRef Name, SectionKind K, GOFF::SDAttr SDAttributes, -MCSectionGOFF *Parent) - : MCSection(SV_GOFF, Name, K.isText(), /*IsVirtual=*/false, nullptr), + MCSectionGOFF(StringRef Name, SectionKind K, bool IsVirtual, +GOFF::SDAttr SDAttributes, MCSectionGOFF *Parent) + : MCSection(SV_GOFF, Name, K.isText(), IsVirtual, nullptr), Parent(Parent), SDAttributes(SDAttributes), -SymbolType(GOFF::ESD_ST_SectionDefinition), RequiresNonZeroLength(0), -Emitted(0) {} +SymbolType(GOFF::ESD_ST_SectionDefinition), IsBSS(K.isBSS()), +RequiresNonZeroLength(0), Emitted(0) {} - MCSectionGOFF(StringRef Name, SectionKind K, GOFF::EDAttr EDAttributes, -MCSectionGOFF *Parent) - : MCSection(SV_GOFF, Name, K.isText(), /*IsVirtual=*/false, nullptr), + MCSectionGOFF(StringRef Name, SectionKind K, bool IsVirtual, +GOFF::EDAttr EDAttributes, MCSectionGOFF *Parent) + : MCSection(SV_GOFF, Name, K.isText(), IsVirtual, nullptr), Parent(Parent), EDAttributes(EDAttributes), -SymbolType(GOFF::ESD_ST_ElementDefinition), RequiresNonZeroLength(0), -Emitted(0) {} +SymbolType(GOFF::ESD_ST_ElementDefinition), IsBSS(K.isBSS()), +RequiresNonZeroLength(0), Emitted(0) {} - MCSectionGOFF(StringRef Name, SectionKind K, GOFF::PRAttr PRAttributes, -MCSectionGOFF *Parent) - : MCSection(SV_GOFF, Name, K.isText(), /*IsVirtual=*/false, nullptr), + MCSectionGOFF(StringRef Name, SectionKind K, bool IsVirtual, +
[llvm-branch-commits] [llvm] [GOFF] Add writing of text records (PR #137235)
github-actions[bot] wrote: :warning: C/C++ code formatter, clang-format found issues in your code. :warning: You can test this locally with the following command: ``bash git-clang-format --diff ba1223ae5a08197d30d3fb285a725a5278549e74 0e5c36a691fcbaa6f63c46f4cf86fa16857e137c --extensions cpp,h -- llvm/include/llvm/MC/MCContext.h llvm/include/llvm/MC/MCSectionGOFF.h llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp llvm/lib/MC/GOFFObjectWriter.cpp llvm/lib/MC/MCContext.cpp llvm/lib/MC/MCObjectFileInfo.cpp `` View the diff from clang-format here. ``diff diff --git a/llvm/lib/MC/GOFFObjectWriter.cpp b/llvm/lib/MC/GOFFObjectWriter.cpp index 119cab6e6e..f2b5ac0a4d 100644 --- a/llvm/lib/MC/GOFFObjectWriter.cpp +++ b/llvm/lib/MC/GOFFObjectWriter.cpp @@ -506,7 +506,7 @@ uint64_t GOFFWriter::writeObject() { defineSymbols(); for (const MCSection &Section : Asm) -writeText(static_cast(&Section)); +writeText(static_cast(&Section)); writeEnd(); `` https://github.com/llvm/llvm-project/pull/137235 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [llvm][AsmPrinter] Emit call graph section (PR #87576)
@@ -1642,6 +1642,94 @@ void AsmPrinter::emitStackUsage(const MachineFunction &MF) { *StackUsageStream << "static\n"; } +/// Extracts a generalized numeric type identifier of a Function's type from +/// type metadata. Returns null if metadata cannot be found. +static ConstantInt *extractNumericCGTypeId(const Function &F) { + SmallVector Types; + F.getMetadata(LLVMContext::MD_type, Types); + for (const auto &Type : Types) { +if (Type->hasGeneralizedMDString()) { + MDString *MDGeneralizedTypeId = cast(Type->getOperand(1)); + uint64_t TypeIdVal = llvm::MD5Hash(MDGeneralizedTypeId->getString()); + IntegerType *Int64Ty = Type::getInt64Ty(F.getContext()); + return ConstantInt::get(Int64Ty, TypeIdVal); +} + } + + return nullptr; +} + +/// Emits .callgraph section. +void AsmPrinter::emitCallGraphSection(const MachineFunction &MF, + FunctionInfo &FuncInfo) { + if (!MF.getTarget().Options.EmitCallGraphSection) +return; + + // Switch to the call graph section for the function + MCSection *FuncCGSection = + getObjFileLowering().getCallGraphSection(*getCurrentSection()); + assert(FuncCGSection && "null callgraph section"); + OutStreamer->pushSection(); + OutStreamer->switchSection(FuncCGSection); + + // Emit format version number. + OutStreamer->emitInt64(0); ilovepi wrote: Lets use a constant defined somewhere for the version number, so its easy to update. Probably looking at PGO profile versioning is a reasonable place to take inspiration. https://github.com/llvm/llvm-project/pull/87576 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [llvm][AsmPrinter] Emit call graph section (PR #87576)
@@ -68,6 +68,9 @@ class MCObjectFileInfo { /// Language Specific Data Area information is emitted to. MCSection *LSDASection = nullptr; + /// Section containing metadata on call graph. ilovepi wrote: ```suggestion /// Section containing call graph metadata. ``` https://github.com/llvm/llvm-project/pull/87576 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] 2db36d8 - Revert "[DLCov] Implement DebugLoc coverage tracking (#107279)"
Author: Stephen Tozer Date: 2025-04-25T00:33:24+01:00 New Revision: 2db36d8693610acbe87bed2f10e49ca938429bde URL: https://github.com/llvm/llvm-project/commit/2db36d8693610acbe87bed2f10e49ca938429bde DIFF: https://github.com/llvm/llvm-project/commit/2db36d8693610acbe87bed2f10e49ca938429bde.diff LOG: Revert "[DLCov] Implement DebugLoc coverage tracking (#107279)" This reverts commit a9d93ecf1f8d2cfe3f77851e0df179b386cff353. Added: Modified: clang/lib/CodeGen/BackendUtil.cpp llvm/docs/HowToUpdateDebugInfo.rst llvm/include/llvm/IR/DebugLoc.h llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp llvm/lib/IR/DebugInfo.cpp llvm/lib/IR/DebugLoc.cpp llvm/lib/Transforms/Utils/Debugify.cpp Removed: diff --git a/clang/lib/CodeGen/BackendUtil.cpp b/clang/lib/CodeGen/BackendUtil.cpp index cafc703420183..f7eb853beb23c 100644 --- a/clang/lib/CodeGen/BackendUtil.cpp +++ b/clang/lib/CodeGen/BackendUtil.cpp @@ -961,22 +961,6 @@ void EmitAssemblyHelper::RunOptimizationPipeline( Debugify.setOrigDIVerifyBugsReportFilePath( CodeGenOpts.DIBugsReportFilePath); Debugify.registerCallbacks(PIC, MAM); - -#if ENABLE_DEBUGLOC_COVERAGE_TRACKING -// If we're using debug location coverage tracking, mark all the -// instructions coming out of the frontend without a DebugLoc as being -// compiler-generated, to prevent both those instructions and new -// instructions that inherit their location from being treated as -// incorrectly empty locations. -for (Function &F : *TheModule) { - if (!F.getSubprogram()) -continue; - for (BasicBlock &BB : F) -for (Instruction &I : BB) - if (!I.getDebugLoc()) -I.setDebugLoc(DebugLoc::getCompilerGenerated()); -} -#endif } // Attempt to load pass plugins and register their callbacks with PB. for (auto &PluginFN : CodeGenOpts.PassPlugins) { diff --git a/llvm/docs/HowToUpdateDebugInfo.rst b/llvm/docs/HowToUpdateDebugInfo.rst index a87efe7e6e43f..3088f59c1066a 100644 --- a/llvm/docs/HowToUpdateDebugInfo.rst +++ b/llvm/docs/HowToUpdateDebugInfo.rst @@ -169,47 +169,6 @@ See the discussion in the section about :ref:`merging locations` for examples of when the rule for dropping locations applies. -.. _NewInstLocations: - -Setting locations for new instructions --- - -Whenever a new instruction is created and there is no suitable location for that -instruction, that instruction should be annotated accordingly. There are a set -of special ``DebugLoc`` values that can be set on an instruction to annotate the -reason that it does not have a valid location. These are as follows: - -* ``DebugLoc::getCompilerGenerated()``: This indicates that the instruction is a - compiler-generated instruction, i.e. it is not associated with any user source - code. - -* ``DebugLoc::getDropped()``: This indicates that the instruction has - intentionally had its source location removed, according to the rules for - :ref:`dropping locations`; this is set automatically by - ``Instruction::dropLocation()``. - -* ``DebugLoc::getUnknown()``: This indicates that the instruction does not have - a known or currently knowable source location, e.g. that it is infeasible to - determine the correct source location, or that the source location is - ambiguous in a way that LLVM cannot currently represent. - -* ``DebugLoc::getTemporary()``: This is used for instructions that we don't - expect to be emitted (e.g. ``UnreachableInst``), and so should not need a - valid location; if we ever try to emit a temporary location into an object/asm - file, this indicates that something has gone wrong. - -Where applicable, these should be used instead of leaving an instruction without -an assigned location or explicitly setting the location as ``DebugLoc()``. -Ordinarily these special locations are identical to an absent location, but LLVM -built with coverage-tracking -(``-DLLVM_ENABLE_DEBUGLOC_COVERAGE_TRACKING="COVERAGE"``) will keep track of -these special locations in order to detect unintentionally-missing locations; -for this reason, the most important rule is to *not* apply any of these if it -isn't clear which, if any, is appropriate - an absent location can be detected -and fixed, while an incorrectly annotated instruction is much harder to detect. -On the other hand, if any of these clearly apply, then they should be used to -prevent false positives from being flagged up. - Rules for updating debug values === diff --git a/llvm/include/llvm/IR/DebugLoc.h b/llvm/include/llvm/IR/DebugLoc.h index 9f1dafa8b71d9..c22d3e9b10d27 100644 --- a/llvm/include/llvm/IR/DebugLoc.h +++ b/llvm/include/llvm/IR/DebugLoc.h @@ -14,7 +14,6 @@ #ifndef LLVM_IR_DEBUGLOC_H #define LLVM_IR_DEBUGLOC_H -#include "llvm/Config/config.h" #include "llvm/IR/Trackin
[llvm-branch-commits] [libc] 8ce5c04 - update jmp_buf.h to always gate this by __linux__
Author: Schrodinger ZHU Yifan Date: 2025-04-24T19:32:58-04:00 New Revision: 8ce5c04559d3c66700fa8e62bcc47a6b02c29535 URL: https://github.com/llvm/llvm-project/commit/8ce5c04559d3c66700fa8e62bcc47a6b02c29535 DIFF: https://github.com/llvm/llvm-project/commit/8ce5c04559d3c66700fa8e62bcc47a6b02c29535.diff LOG: update jmp_buf.h to always gate this by __linux__ Added: Modified: libc/include/llvm-libc-types/jmp_buf.h Removed: diff --git a/libc/include/llvm-libc-types/jmp_buf.h b/libc/include/llvm-libc-types/jmp_buf.h index a6638b222138a..90bd60d741293 100644 --- a/libc/include/llvm-libc-types/jmp_buf.h +++ b/libc/include/llvm-libc-types/jmp_buf.h @@ -9,7 +9,13 @@ #ifndef LLVM_LIBC_TYPES_JMP_BUF_H #define LLVM_LIBC_TYPES_JMP_BUF_H -#if defined(__i386__) || defined(__x86_64__) +// TODO: implement sigjmp_buf related functions for other architectures +// Issue: https://github.com/llvm/llvm-project/issues/136358 +#if defined(__linux__) && (defined(__i386__) || defined(__x86_64__)) +#define __LIBC_HAS_SIGJMP_BUF +#endif + +#if defined(__LIBC_HAS_SIGJMP_BUF) #include "sigset_t.h" #endif @@ -54,9 +60,7 @@ typedef struct { #else #error "__jmp_buf not available for your target architecture." #endif - // TODO: implement sigjmp_buf related functions for other architectures - // Issue: https://github.com/llvm/llvm-project/issues/136358 -#if defined(__i386__) || defined(__x86_64__) +#if defined(__LIBC_HAS_SIGJMP_BUF) // return address void *sig_retaddr; // extra register buffer to avoid indefinite stack growth in sigsetjmp @@ -68,7 +72,10 @@ typedef struct { typedef __jmp_buf jmp_buf[1]; -#if defined(__i386__) || defined(__x86_64__) +#if defined(__LIBC_HAS_SIGJMP_BUF) typedef __jmp_buf sigjmp_buf[1]; #endif + +#undef __LIBC_HAS_SIGJMP_BUF + #endif // LLVM_LIBC_TYPES_JMP_BUF_H ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [llvm][AsmPrinter] Emit call graph section (PR #87576)
@@ -0,0 +1,50 @@ +;; Tests that we store the type identifiers in .callgraph section of the binary. + +; RUN: llc --call-graph-section -filetype=obj -o - < %s | \ +; RUN: llvm-readelf -x .callgraph - | FileCheck %s ilovepi wrote: I don't see any tests under `tests/MC` though. I agree w/ @arsenm that we need to have dedicated tests for the assembler/disassembler, and not just for codegen. Those commonly find bugs in the implementation that are otherwise easy to miss. https://github.com/llvm/llvm-project/pull/87576 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [RISCV][Driver] Add riscv emulation mode to linker job of BareMetal toolchain (PR #134442)
@@ -534,8 +534,18 @@ void baremetal::Linker::ConstructJob(Compilation &C, const JobAction &JA, CmdArgs.push_back("-Bstatic"); - if (TC.getTriple().isRISCV() && Args.hasArg(options::OPT_mno_relax)) -CmdArgs.push_back("--no-relax"); + if (Triple.isRISCV()) { +CmdArgs.push_back("-X"); +if (Args.hasArg(options::OPT_mno_relax)) + CmdArgs.push_back("--no-relax"); +if (const char *LDMOption = getLDMOption(TC.getTriple(), Args)) { + CmdArgs.push_back("-m"); + CmdArgs.push_back(LDMOption); +} else { + D.Diag(diag::err_target_unknown_triple) << Triple.str(); + return; +} petrhosek wrote: Can you also swap the order of the `-m` option to be the same as in the `Gnu` driver? ```suggestion if (const char *LDMOption = getLDMOption(TC.getTriple(), Args)) { CmdArgs.push_back("-m"); CmdArgs.push_back(LDMOption); } else { D.Diag(diag::err_target_unknown_triple) << Triple.str(); return; } CmdArgs.push_back("-X"); if (Args.hasArg(options::OPT_mno_relax)) CmdArgs.push_back("--no-relax"); ``` In a follow up change, I'd like to move the `-m` out of this condition since it'd be also beneficial for other targets. https://github.com/llvm/llvm-project/pull/134442 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [RISCV][Driver] Add riscv emulation mode to linker job of BareMetal toolchain (PR #134442)
https://github.com/petrhosek approved this pull request. https://github.com/llvm/llvm-project/pull/134442 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)
https://github.com/sdesmalen-arm deleted https://github.com/llvm/llvm-project/pull/136173 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)
@@ -4923,9 +4923,7 @@ InstructionCost AArch64TTIImpl::getPartialReductionCost( return Invalid; break; case 16: - if (AccumEVT == MVT::i64) -Cost *= 2; - else if (AccumEVT != MVT::i32) + if (AccumEVT != MVT::i32) sdesmalen-arm wrote: Why are you making this change? https://github.com/llvm/llvm-project/pull/136173 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)
@@ -219,6 +219,8 @@ class TargetTransformInfo { /// Get the kind of extension that an instruction represents. static PartialReductionExtendKind getPartialReductionExtendKind(Instruction *I); + static PartialReductionExtendKind + getPartialReductionExtendKind(Instruction::CastOps ExtOpcode); sdesmalen-arm wrote: What about either replacing `getPartialReductionExtendKind(Instruction *I);` with this one, rather than adding a new interface, Or otherwise changing the implementation of `getPartialReductionExtendKind(Instruction *I)` to use `getPartialReductionExtendKind(Instruction::CastOps)` ? https://github.com/llvm/llvm-project/pull/136173 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)
@@ -2056,55 +2056,6 @@ class VPReductionPHIRecipe : public VPHeaderPHIRecipe, } }; -/// A recipe for forming partial reductions. In the loop, an accumulator and sdesmalen-arm wrote: Would it be possible to make the change of `VPPartialReductionRecipe : public VPSingleDefRecipe` -> `VPPartialReductionRecipe : public VPReductionRecipe` as an NFC change? (For cases around VPMulAccumulateReductionRecipes you can initially add some asserts that the recipe isn't a partial reduction, because that won't be supported until this PR lands) https://github.com/llvm/llvm-project/pull/136173 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)
@@ -4923,9 +4923,7 @@ InstructionCost AArch64TTIImpl::getPartialReductionCost( return Invalid; break; case 16: - if (AccumEVT == MVT::i64) -Cost *= 2; - else if (AccumEVT != MVT::i32) + if (AccumEVT != MVT::i32) MacDue wrote: It's due to: https://github.com/llvm/llvm-project/pull/136173#discussion_r2053920360 https://github.com/llvm/llvm-project/pull/136173 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions with different extensions (PR #136997)
@@ -2438,14 +2438,14 @@ VPMulAccumulateReductionRecipe::computeCost(ElementCount VF, return Ctx.TTI.getPartialReductionCost( Instruction::Add, Ctx.Types.inferScalarType(getVecOp0()), Ctx.Types.inferScalarType(getVecOp1()), getResultType(), VF, -TTI::getPartialReductionExtendKind(getExtOpcode()), -TTI::getPartialReductionExtendKind(getExtOpcode()), Instruction::Mul); +TTI::getPartialReductionExtendKind(getExt0Opcode()), +TTI::getPartialReductionExtendKind(getExt1Opcode()), Instruction::Mul); } Type *RedTy = Ctx.Types.inferScalarType(this); auto *SrcVecTy = cast(toVectorTy(Ctx.Types.inferScalarType(getVecOp0()), VF)); - return Ctx.TTI.getMulAccReductionCost(isZExt(), RedTy, SrcVecTy, + return Ctx.TTI.getMulAccReductionCost(isZExt0(), RedTy, SrcVecTy, sdesmalen-arm wrote: The TTI hook also needs updating to reflect the separate extends. https://github.com/llvm/llvm-project/pull/136997 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)
@@ -2496,6 +2501,9 @@ class VPMulAccumulateReductionRecipe : public VPReductionRecipe { Type *ResultTy; + /// If the reduction this is based on is a partial reduction. sdesmalen-arm wrote: This comment makes no sense. https://github.com/llvm/llvm-project/pull/136173 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: add RegBankLegalize rules for extends and trunc (PR #132383)
https://github.com/petar-avramovic updated https://github.com/llvm/llvm-project/pull/132383 >From 61c28d2d564c63b986c6adfae26f17d868b53cd1 Mon Sep 17 00:00:00 2001 From: Petar Avramovic Date: Mon, 14 Apr 2025 16:34:00 +0200 Subject: [PATCH] AMDGPU/GlobalISel: add RegBankLegalize rules for extends and trunc Uniform S1: Truncs to uniform S1 and AnyExts from S1 are left as is as they are meant to be combined away. Uniform S1 ZExt and SExt are lowered using select. Divergent S1: Trunc of VGPR to VCC is lowered as compare. Extends of VCC are lowered using select. For remaining types: S32 to S64 ZExt and SExt are lowered using merge values, AnyExt and Trunc are again left as is to be combined away. Notably uniform S16 for SExt and Zext is not lowered to S32 and left as is for instruction select to deal with them. This is because there are patterns that check for S16 type. --- .../Target/AMDGPU/AMDGPURegBankLegalize.cpp | 7 ++ .../AMDGPU/AMDGPURegBankLegalizeHelper.cpp| 107 +- .../AMDGPU/AMDGPURegBankLegalizeHelper.h | 1 + .../AMDGPU/AMDGPURegBankLegalizeRules.cpp | 47 +++- .../AMDGPU/AMDGPURegBankLegalizeRules.h | 3 + .../GlobalISel/regbankselect-and-s1.mir | 105 + .../GlobalISel/regbankselect-anyext.mir | 59 +- .../AMDGPU/GlobalISel/regbankselect-sext.mir | 100 ++-- .../AMDGPU/GlobalISel/regbankselect-trunc.mir | 22 +++- .../AMDGPU/GlobalISel/regbankselect-zext.mir | 89 +-- 10 files changed, 357 insertions(+), 183 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp index ad6a0772fe8b6..9544c9f43eeaf 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp @@ -214,6 +214,13 @@ class AMDGPURegBankLegalizeCombiner { return; } +if (DstTy == S64 && TruncSrcTy == S32) { + B.buildMergeLikeInstr(MI.getOperand(0).getReg(), +{TruncSrc, B.buildUndef({SgprRB, S32})}); + cleanUpAfterCombine(MI, Trunc); + return; +} + if (DstTy == S32 && TruncSrcTy == S16) { B.buildAnyExt(Dst, TruncSrc); cleanUpAfterCombine(MI, Trunc); diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp index 37cdf0c926b09..8e71c172cd20d 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp @@ -131,6 +131,40 @@ void RegBankLegalizeHelper::widenLoad(MachineInstr &MI, LLT WideTy, MI.eraseFromParent(); } +void RegBankLegalizeHelper::lowerVccExtToSel(MachineInstr &MI) { + Register Dst = MI.getOperand(0).getReg(); + LLT Ty = MRI.getType(Dst); + Register Src = MI.getOperand(1).getReg(); + unsigned Opc = MI.getOpcode(); + if (Ty == S32 || Ty == S16) { +auto True = B.buildConstant({VgprRB, Ty}, Opc == G_SEXT ? -1 : 1); +auto False = B.buildConstant({VgprRB, Ty}, 0); +B.buildSelect(Dst, Src, True, False); + } + if (Ty == S64) { +auto True = B.buildConstant({VgprRB, S32}, Opc == G_SEXT ? -1 : 1); +auto False = B.buildConstant({VgprRB, S32}, 0); +auto Lo = B.buildSelect({VgprRB, S32}, Src, True, False); +MachineInstrBuilder Hi; +switch (Opc) { +case G_SEXT: + Hi = Lo; + break; +case G_ZEXT: + Hi = False; + break; +case G_ANYEXT: + Hi = B.buildUndef({VgprRB_S32}); + break; +default: + llvm_unreachable("Opcode not supported"); +} + +B.buildMergeValues(Dst, {Lo.getReg(0), Hi.getReg(0)}); + } + MI.eraseFromParent(); +} + static bool isSignedBFE(MachineInstr &MI) { if (isa(MI)) { if (MI.getOperand(1).getIntrinsicID() == Intrinsic::amdgcn_sbfe) @@ -259,26 +293,8 @@ void RegBankLegalizeHelper::lower(MachineInstr &MI, switch (Mapping.LoweringMethod) { case DoNotLower: return; - case VccExtToSel: { -LLT Ty = MRI.getType(MI.getOperand(0).getReg()); -Register Src = MI.getOperand(1).getReg(); -unsigned Opc = MI.getOpcode(); -if (Ty == S32 || Ty == S16) { - auto True = B.buildConstant({VgprRB, Ty}, Opc == G_SEXT ? -1 : 1); - auto False = B.buildConstant({VgprRB, Ty}, 0); - B.buildSelect(MI.getOperand(0).getReg(), Src, True, False); -} -if (Ty == S64) { - auto True = B.buildConstant({VgprRB, S32}, Opc == G_SEXT ? -1 : 1); - auto False = B.buildConstant({VgprRB, S32}, 0); - auto Sel = B.buildSelect({VgprRB, S32}, Src, True, False); - B.buildMergeValues( - MI.getOperand(0).getReg(), - {Sel.getReg(0), Opc == G_SEXT ? Sel.getReg(0) : False.getReg(0)}); -} -MI.eraseFromParent(); -return; - } + case VccExtToSel: +return lowerVccExtToSel(MI); case UniExtToSel: { LLT Ty = MRI.getType(MI.getOperand(0).getReg()); auto True = B.buildConstant({SgprRB, Ty}, @@ -295,13 +311,
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: add RegBankLegalize rules for extends and trunc (PR #132383)
https://github.com/petar-avramovic updated https://github.com/llvm/llvm-project/pull/132383 >From 61c28d2d564c63b986c6adfae26f17d868b53cd1 Mon Sep 17 00:00:00 2001 From: Petar Avramovic Date: Mon, 14 Apr 2025 16:34:00 +0200 Subject: [PATCH] AMDGPU/GlobalISel: add RegBankLegalize rules for extends and trunc Uniform S1: Truncs to uniform S1 and AnyExts from S1 are left as is as they are meant to be combined away. Uniform S1 ZExt and SExt are lowered using select. Divergent S1: Trunc of VGPR to VCC is lowered as compare. Extends of VCC are lowered using select. For remaining types: S32 to S64 ZExt and SExt are lowered using merge values, AnyExt and Trunc are again left as is to be combined away. Notably uniform S16 for SExt and Zext is not lowered to S32 and left as is for instruction select to deal with them. This is because there are patterns that check for S16 type. --- .../Target/AMDGPU/AMDGPURegBankLegalize.cpp | 7 ++ .../AMDGPU/AMDGPURegBankLegalizeHelper.cpp| 107 +- .../AMDGPU/AMDGPURegBankLegalizeHelper.h | 1 + .../AMDGPU/AMDGPURegBankLegalizeRules.cpp | 47 +++- .../AMDGPU/AMDGPURegBankLegalizeRules.h | 3 + .../GlobalISel/regbankselect-and-s1.mir | 105 + .../GlobalISel/regbankselect-anyext.mir | 59 +- .../AMDGPU/GlobalISel/regbankselect-sext.mir | 100 ++-- .../AMDGPU/GlobalISel/regbankselect-trunc.mir | 22 +++- .../AMDGPU/GlobalISel/regbankselect-zext.mir | 89 +-- 10 files changed, 357 insertions(+), 183 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp index ad6a0772fe8b6..9544c9f43eeaf 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp @@ -214,6 +214,13 @@ class AMDGPURegBankLegalizeCombiner { return; } +if (DstTy == S64 && TruncSrcTy == S32) { + B.buildMergeLikeInstr(MI.getOperand(0).getReg(), +{TruncSrc, B.buildUndef({SgprRB, S32})}); + cleanUpAfterCombine(MI, Trunc); + return; +} + if (DstTy == S32 && TruncSrcTy == S16) { B.buildAnyExt(Dst, TruncSrc); cleanUpAfterCombine(MI, Trunc); diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp index 37cdf0c926b09..8e71c172cd20d 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp @@ -131,6 +131,40 @@ void RegBankLegalizeHelper::widenLoad(MachineInstr &MI, LLT WideTy, MI.eraseFromParent(); } +void RegBankLegalizeHelper::lowerVccExtToSel(MachineInstr &MI) { + Register Dst = MI.getOperand(0).getReg(); + LLT Ty = MRI.getType(Dst); + Register Src = MI.getOperand(1).getReg(); + unsigned Opc = MI.getOpcode(); + if (Ty == S32 || Ty == S16) { +auto True = B.buildConstant({VgprRB, Ty}, Opc == G_SEXT ? -1 : 1); +auto False = B.buildConstant({VgprRB, Ty}, 0); +B.buildSelect(Dst, Src, True, False); + } + if (Ty == S64) { +auto True = B.buildConstant({VgprRB, S32}, Opc == G_SEXT ? -1 : 1); +auto False = B.buildConstant({VgprRB, S32}, 0); +auto Lo = B.buildSelect({VgprRB, S32}, Src, True, False); +MachineInstrBuilder Hi; +switch (Opc) { +case G_SEXT: + Hi = Lo; + break; +case G_ZEXT: + Hi = False; + break; +case G_ANYEXT: + Hi = B.buildUndef({VgprRB_S32}); + break; +default: + llvm_unreachable("Opcode not supported"); +} + +B.buildMergeValues(Dst, {Lo.getReg(0), Hi.getReg(0)}); + } + MI.eraseFromParent(); +} + static bool isSignedBFE(MachineInstr &MI) { if (isa(MI)) { if (MI.getOperand(1).getIntrinsicID() == Intrinsic::amdgcn_sbfe) @@ -259,26 +293,8 @@ void RegBankLegalizeHelper::lower(MachineInstr &MI, switch (Mapping.LoweringMethod) { case DoNotLower: return; - case VccExtToSel: { -LLT Ty = MRI.getType(MI.getOperand(0).getReg()); -Register Src = MI.getOperand(1).getReg(); -unsigned Opc = MI.getOpcode(); -if (Ty == S32 || Ty == S16) { - auto True = B.buildConstant({VgprRB, Ty}, Opc == G_SEXT ? -1 : 1); - auto False = B.buildConstant({VgprRB, Ty}, 0); - B.buildSelect(MI.getOperand(0).getReg(), Src, True, False); -} -if (Ty == S64) { - auto True = B.buildConstant({VgprRB, S32}, Opc == G_SEXT ? -1 : 1); - auto False = B.buildConstant({VgprRB, S32}, 0); - auto Sel = B.buildSelect({VgprRB, S32}, Src, True, False); - B.buildMergeValues( - MI.getOperand(0).getReg(), - {Sel.getReg(0), Opc == G_SEXT ? Sel.getReg(0) : False.getReg(0)}); -} -MI.eraseFromParent(); -return; - } + case VccExtToSel: +return lowerVccExtToSel(MI); case UniExtToSel: { LLT Ty = MRI.getType(MI.getOperand(0).getReg()); auto True = B.buildConstant({SgprRB, Ty}, @@ -295,13 +311,
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: add RegBankLegalize rules for bit shifts and sext-inreg (PR #132385)
https://github.com/petar-avramovic updated https://github.com/llvm/llvm-project/pull/132385 >From 585448f8cf5f77c26c63c5b1dc126bec85a5ff53 Mon Sep 17 00:00:00 2001 From: Petar Avramovic Date: Mon, 14 Apr 2025 16:35:19 +0200 Subject: [PATCH] AMDGPU/GlobalISel: add RegBankLegalize rules for bit shifts and sext-inreg Uniform S16 shifts have to be extended to S32 using appropriate Extend before lowering to S32 instruction. Uniform packed V2S16 are lowered to SGPR S32 instructions, other option is to use VALU packed V2S16 and ReadAnyLane. For uniform S32 and S64 and divergent S16, S32, S64 and V2S16 there are instructions available. --- .../Target/AMDGPU/AMDGPURegBankLegalize.cpp | 2 +- .../AMDGPU/AMDGPURegBankLegalizeHelper.cpp| 107 ++ .../AMDGPU/AMDGPURegBankLegalizeHelper.h | 5 + .../AMDGPU/AMDGPURegBankLegalizeRules.cpp | 43 +++- .../AMDGPU/AMDGPURegBankLegalizeRules.h | 11 ++ llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll | 10 +- llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll | 187 +- .../AMDGPU/GlobalISel/regbankselect-ashr.mir | 6 +- .../AMDGPU/GlobalISel/regbankselect-lshr.mir | 17 +- .../GlobalISel/regbankselect-sext-inreg.mir | 24 +-- .../AMDGPU/GlobalISel/regbankselect-shl.mir | 6 +- .../CodeGen/AMDGPU/GlobalISel/sext_inreg.ll | 34 ++-- llvm/test/CodeGen/AMDGPU/GlobalISel/shl.ll| 10 +- 13 files changed, 311 insertions(+), 151 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp index 9544c9f43eeaf..15584f16a0638 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp @@ -310,7 +310,7 @@ bool AMDGPURegBankLegalize::runOnMachineFunction(MachineFunction &MF) { // Opcodes that support pretty much all combinations of reg banks and LLTs // (except S1). There is no point in writing rules for them. if (Opc == AMDGPU::G_BUILD_VECTOR || Opc == AMDGPU::G_UNMERGE_VALUES || -Opc == AMDGPU::G_MERGE_VALUES) { +Opc == AMDGPU::G_MERGE_VALUES || Opc == G_BITCAST) { RBLHelper.applyMappingTrivial(*MI); continue; } diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp index 670ebc0474264..fa040684fc567 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp @@ -14,11 +14,13 @@ #include "AMDGPURegBankLegalizeHelper.h" #include "AMDGPUGlobalISelUtils.h" #include "AMDGPUInstrInfo.h" +#include "AMDGPURegBankLegalizeRules.h" #include "AMDGPURegisterBankInfo.h" #include "GCNSubtarget.h" #include "MCTargetDesc/AMDGPUMCTargetDesc.h" #include "llvm/CodeGen/GlobalISel/GenericMachineInstrs.h" #include "llvm/CodeGen/GlobalISel/MachineIRBuilder.h" +#include "llvm/CodeGen/MachineInstr.h" #include "llvm/CodeGen/MachineUniformityAnalysis.h" #include "llvm/IR/IntrinsicsAMDGPU.h" #include "llvm/Support/AMDGPUAddrSpace.h" @@ -166,6 +168,59 @@ void RegBankLegalizeHelper::lowerVccExtToSel(MachineInstr &MI) { MI.eraseFromParent(); } +std::pair RegBankLegalizeHelper::unpackZExt(Register Reg) { + auto PackedS32 = B.buildBitcast(SgprRB_S32, Reg); + auto Mask = B.buildConstant(SgprRB_S32, 0x); + auto Lo = B.buildAnd(SgprRB_S32, PackedS32, Mask); + auto Hi = B.buildLShr(SgprRB_S32, PackedS32, B.buildConstant(SgprRB_S32, 16)); + return {Lo.getReg(0), Hi.getReg(0)}; +} + +std::pair RegBankLegalizeHelper::unpackSExt(Register Reg) { + auto PackedS32 = B.buildBitcast(SgprRB_S32, Reg); + auto Lo = B.buildSExtInReg(SgprRB_S32, PackedS32, 16); + auto Hi = B.buildAShr(SgprRB_S32, PackedS32, B.buildConstant(SgprRB_S32, 16)); + return {Lo.getReg(0), Hi.getReg(0)}; +} + +std::pair RegBankLegalizeHelper::unpackAExt(Register Reg) { + auto PackedS32 = B.buildBitcast(SgprRB_S32, Reg); + auto Lo = PackedS32; + auto Hi = B.buildLShr(SgprRB_S32, PackedS32, B.buildConstant(SgprRB_S32, 16)); + return {Lo.getReg(0), Hi.getReg(0)}; +} + +void RegBankLegalizeHelper::lowerUnpack(MachineInstr &MI) { + Register Lo, Hi; + switch (MI.getOpcode()) { + case AMDGPU::G_SHL: { +auto [Val0, Val1] = unpackAExt(MI.getOperand(1).getReg()); +auto [Amt0, Amt1] = unpackAExt(MI.getOperand(2).getReg()); +Lo = B.buildInstr(MI.getOpcode(), {SgprRB_S32}, {Val0, Amt0}).getReg(0); +Hi = B.buildInstr(MI.getOpcode(), {SgprRB_S32}, {Val1, Amt1}).getReg(0); +break; + } + case AMDGPU::G_LSHR: { +auto [Val0, Val1] = unpackZExt(MI.getOperand(1).getReg()); +auto [Amt0, Amt1] = unpackZExt(MI.getOperand(2).getReg()); +Lo = B.buildInstr(MI.getOpcode(), {SgprRB_S32}, {Val0, Amt0}).getReg(0); +Hi = B.buildInstr(MI.getOpcode(), {SgprRB_S32}, {Val1, Amt1}).getReg(0); +break; + } + case AMDGPU::G_ASHR: { +auto [Val0, Val1] = unpackSExt(MI.getOperand(1).getReg()); +auto [Amt0, Amt1] = unpackSExt(MI.get
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: add RegBankLegalize rules for AND OR and XOR (PR #132382)
https://github.com/petar-avramovic updated https://github.com/llvm/llvm-project/pull/132382 >From 5cc4f822aafd00eb4c88a76dadd07b716904ad97 Mon Sep 17 00:00:00 2001 From: Petar Avramovic Date: Mon, 14 Apr 2025 16:32:49 +0200 Subject: [PATCH] AMDGPU/GlobalISel: add RegBankLegalize rules for AND OR and XOR Uniform S1 is lowered to S32. Divergent S1 is selected as VCC(S1) instruction select will select SALU instruction based on wavesize (S32 or S64). S16 are selected as is. There are register classes for vgpr S16. Since some isel patterns check for sgpr S16 we don't lower to S32. For 32 and 64 bit types we use B32/B64 rules that cover scalar vector and pointers types. SALU B32 and B64 and VALU B32 instructions are available. Divergent B64 is lowered to B32. --- .../AMDGPU/AMDGPURegBankLegalizeHelper.cpp| 31 --- .../AMDGPU/AMDGPURegBankLegalizeHelper.h | 1 + .../AMDGPU/AMDGPURegBankLegalizeRules.cpp | 10 ++- .../AMDGPU/AMDGPURegBankLegalizeRules.h | 2 + .../AMDGPU/GlobalISel/regbankselect-and.mir | 33 --- .../AMDGPU/GlobalISel/regbankselect-or.mir| 85 +-- .../AMDGPU/GlobalISel/regbankselect-xor.mir | 84 +- 7 files changed, 133 insertions(+), 113 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp index dffabe3932cc3..37cdf0c926b09 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp @@ -237,6 +237,21 @@ void RegBankLegalizeHelper::lowerS_BFE(MachineInstr &MI) { MI.eraseFromParent(); } +void RegBankLegalizeHelper::lowerSplitTo32(MachineInstr &MI) { + Register Dst = MI.getOperand(0).getReg(); + LLT Ty = MRI.getType(Dst) == V4S16 ? V2S16 : S32; + auto Op1 = B.buildUnmerge({VgprRB, Ty}, MI.getOperand(1).getReg()); + auto Op2 = B.buildUnmerge({VgprRB, Ty}, MI.getOperand(2).getReg()); + unsigned Opc = MI.getOpcode(); + auto Flags = MI.getFlags(); + auto Lo = + B.buildInstr(Opc, {{VgprRB, Ty}}, {Op1.getReg(0), Op2.getReg(0)}, Flags); + auto Hi = + B.buildInstr(Opc, {{VgprRB, Ty}}, {Op1.getReg(1), Op2.getReg(1)}, Flags); + B.buildMergeLikeInstr(MI.getOperand(0).getReg(), {Lo, Hi}); + MI.eraseFromParent(); +} + void RegBankLegalizeHelper::lower(MachineInstr &MI, const RegBankLLTMapping &Mapping, SmallSet &WaterfallSgprs) { @@ -325,20 +340,12 @@ void RegBankLegalizeHelper::lower(MachineInstr &MI, MI.eraseFromParent(); return; } - case SplitTo32: { -auto Op1 = B.buildUnmerge(VgprRB_S32, MI.getOperand(1).getReg()); -auto Op2 = B.buildUnmerge(VgprRB_S32, MI.getOperand(2).getReg()); -unsigned Opc = MI.getOpcode(); -auto Lo = B.buildInstr(Opc, {VgprRB_S32}, {Op1.getReg(0), Op2.getReg(0)}); -auto Hi = B.buildInstr(Opc, {VgprRB_S32}, {Op1.getReg(1), Op2.getReg(1)}); -B.buildMergeLikeInstr(MI.getOperand(0).getReg(), {Lo, Hi}); -MI.eraseFromParent(); -break; - } case V_BFE: return lowerV_BFE(MI); case S_BFE: return lowerS_BFE(MI); + case SplitTo32: +return lowerSplitTo32(MI); case SplitLoad: { LLT DstTy = MRI.getType(MI.getOperand(0).getReg()); unsigned Size = DstTy.getSizeInBits(); @@ -398,6 +405,7 @@ LLT RegBankLegalizeHelper::getTyFromID(RegBankLLTMappingApplyID ID) { case UniInVcc: return LLT::scalar(1); case Sgpr16: + case Vgpr16: return LLT::scalar(16); case Sgpr32: case Sgpr32Trunc: @@ -517,6 +525,7 @@ RegBankLegalizeHelper::getRegBankFromID(RegBankLLTMappingApplyID ID) { case Sgpr32AExtBoolInReg: case Sgpr32SExt: return SgprRB; + case Vgpr16: case Vgpr32: case Vgpr64: case VgprP0: @@ -560,6 +569,7 @@ void RegBankLegalizeHelper::applyMappingDst( case SgprP4: case SgprP5: case SgprV4S32: +case Vgpr16: case Vgpr32: case Vgpr64: case VgprP0: @@ -691,6 +701,7 @@ void RegBankLegalizeHelper::applyMappingSrc( break; } // vgpr scalars, pointers and vectors +case Vgpr16: case Vgpr32: case Vgpr64: case VgprP0: diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.h b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.h index 2d4da4cc90ea7..bbfa7b3986fd2 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.h +++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.h @@ -112,6 +112,7 @@ class RegBankLegalizeHelper { void lowerV_BFE(MachineInstr &MI); void lowerS_BFE(MachineInstr &MI); + void lowerSplitTo32(MachineInstr &MI); }; } // end namespace AMDGPU diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp index d13748c0ef390..2d987e0647eba 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp @@ -106,6 +106,8 @@ bool matchUnifor
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: add RegBankLegalize rules for bit shifts and sext-inreg (PR #132385)
https://github.com/petar-avramovic updated https://github.com/llvm/llvm-project/pull/132385 >From 585448f8cf5f77c26c63c5b1dc126bec85a5ff53 Mon Sep 17 00:00:00 2001 From: Petar Avramovic Date: Mon, 14 Apr 2025 16:35:19 +0200 Subject: [PATCH] AMDGPU/GlobalISel: add RegBankLegalize rules for bit shifts and sext-inreg Uniform S16 shifts have to be extended to S32 using appropriate Extend before lowering to S32 instruction. Uniform packed V2S16 are lowered to SGPR S32 instructions, other option is to use VALU packed V2S16 and ReadAnyLane. For uniform S32 and S64 and divergent S16, S32, S64 and V2S16 there are instructions available. --- .../Target/AMDGPU/AMDGPURegBankLegalize.cpp | 2 +- .../AMDGPU/AMDGPURegBankLegalizeHelper.cpp| 107 ++ .../AMDGPU/AMDGPURegBankLegalizeHelper.h | 5 + .../AMDGPU/AMDGPURegBankLegalizeRules.cpp | 43 +++- .../AMDGPU/AMDGPURegBankLegalizeRules.h | 11 ++ llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll | 10 +- llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll | 187 +- .../AMDGPU/GlobalISel/regbankselect-ashr.mir | 6 +- .../AMDGPU/GlobalISel/regbankselect-lshr.mir | 17 +- .../GlobalISel/regbankselect-sext-inreg.mir | 24 +-- .../AMDGPU/GlobalISel/regbankselect-shl.mir | 6 +- .../CodeGen/AMDGPU/GlobalISel/sext_inreg.ll | 34 ++-- llvm/test/CodeGen/AMDGPU/GlobalISel/shl.ll| 10 +- 13 files changed, 311 insertions(+), 151 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp index 9544c9f43eeaf..15584f16a0638 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp @@ -310,7 +310,7 @@ bool AMDGPURegBankLegalize::runOnMachineFunction(MachineFunction &MF) { // Opcodes that support pretty much all combinations of reg banks and LLTs // (except S1). There is no point in writing rules for them. if (Opc == AMDGPU::G_BUILD_VECTOR || Opc == AMDGPU::G_UNMERGE_VALUES || -Opc == AMDGPU::G_MERGE_VALUES) { +Opc == AMDGPU::G_MERGE_VALUES || Opc == G_BITCAST) { RBLHelper.applyMappingTrivial(*MI); continue; } diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp index 670ebc0474264..fa040684fc567 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp @@ -14,11 +14,13 @@ #include "AMDGPURegBankLegalizeHelper.h" #include "AMDGPUGlobalISelUtils.h" #include "AMDGPUInstrInfo.h" +#include "AMDGPURegBankLegalizeRules.h" #include "AMDGPURegisterBankInfo.h" #include "GCNSubtarget.h" #include "MCTargetDesc/AMDGPUMCTargetDesc.h" #include "llvm/CodeGen/GlobalISel/GenericMachineInstrs.h" #include "llvm/CodeGen/GlobalISel/MachineIRBuilder.h" +#include "llvm/CodeGen/MachineInstr.h" #include "llvm/CodeGen/MachineUniformityAnalysis.h" #include "llvm/IR/IntrinsicsAMDGPU.h" #include "llvm/Support/AMDGPUAddrSpace.h" @@ -166,6 +168,59 @@ void RegBankLegalizeHelper::lowerVccExtToSel(MachineInstr &MI) { MI.eraseFromParent(); } +std::pair RegBankLegalizeHelper::unpackZExt(Register Reg) { + auto PackedS32 = B.buildBitcast(SgprRB_S32, Reg); + auto Mask = B.buildConstant(SgprRB_S32, 0x); + auto Lo = B.buildAnd(SgprRB_S32, PackedS32, Mask); + auto Hi = B.buildLShr(SgprRB_S32, PackedS32, B.buildConstant(SgprRB_S32, 16)); + return {Lo.getReg(0), Hi.getReg(0)}; +} + +std::pair RegBankLegalizeHelper::unpackSExt(Register Reg) { + auto PackedS32 = B.buildBitcast(SgprRB_S32, Reg); + auto Lo = B.buildSExtInReg(SgprRB_S32, PackedS32, 16); + auto Hi = B.buildAShr(SgprRB_S32, PackedS32, B.buildConstant(SgprRB_S32, 16)); + return {Lo.getReg(0), Hi.getReg(0)}; +} + +std::pair RegBankLegalizeHelper::unpackAExt(Register Reg) { + auto PackedS32 = B.buildBitcast(SgprRB_S32, Reg); + auto Lo = PackedS32; + auto Hi = B.buildLShr(SgprRB_S32, PackedS32, B.buildConstant(SgprRB_S32, 16)); + return {Lo.getReg(0), Hi.getReg(0)}; +} + +void RegBankLegalizeHelper::lowerUnpack(MachineInstr &MI) { + Register Lo, Hi; + switch (MI.getOpcode()) { + case AMDGPU::G_SHL: { +auto [Val0, Val1] = unpackAExt(MI.getOperand(1).getReg()); +auto [Amt0, Amt1] = unpackAExt(MI.getOperand(2).getReg()); +Lo = B.buildInstr(MI.getOpcode(), {SgprRB_S32}, {Val0, Amt0}).getReg(0); +Hi = B.buildInstr(MI.getOpcode(), {SgprRB_S32}, {Val1, Amt1}).getReg(0); +break; + } + case AMDGPU::G_LSHR: { +auto [Val0, Val1] = unpackZExt(MI.getOperand(1).getReg()); +auto [Amt0, Amt1] = unpackZExt(MI.getOperand(2).getReg()); +Lo = B.buildInstr(MI.getOpcode(), {SgprRB_S32}, {Val0, Amt0}).getReg(0); +Hi = B.buildInstr(MI.getOpcode(), {SgprRB_S32}, {Val1, Amt1}).getReg(0); +break; + } + case AMDGPU::G_ASHR: { +auto [Val0, Val1] = unpackSExt(MI.getOperand(1).getReg()); +auto [Amt0, Amt1] = unpackSExt(MI.get
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: add RegBankLegalize rules for AND OR and XOR (PR #132382)
https://github.com/petar-avramovic updated https://github.com/llvm/llvm-project/pull/132382 >From 5cc4f822aafd00eb4c88a76dadd07b716904ad97 Mon Sep 17 00:00:00 2001 From: Petar Avramovic Date: Mon, 14 Apr 2025 16:32:49 +0200 Subject: [PATCH] AMDGPU/GlobalISel: add RegBankLegalize rules for AND OR and XOR Uniform S1 is lowered to S32. Divergent S1 is selected as VCC(S1) instruction select will select SALU instruction based on wavesize (S32 or S64). S16 are selected as is. There are register classes for vgpr S16. Since some isel patterns check for sgpr S16 we don't lower to S32. For 32 and 64 bit types we use B32/B64 rules that cover scalar vector and pointers types. SALU B32 and B64 and VALU B32 instructions are available. Divergent B64 is lowered to B32. --- .../AMDGPU/AMDGPURegBankLegalizeHelper.cpp| 31 --- .../AMDGPU/AMDGPURegBankLegalizeHelper.h | 1 + .../AMDGPU/AMDGPURegBankLegalizeRules.cpp | 10 ++- .../AMDGPU/AMDGPURegBankLegalizeRules.h | 2 + .../AMDGPU/GlobalISel/regbankselect-and.mir | 33 --- .../AMDGPU/GlobalISel/regbankselect-or.mir| 85 +-- .../AMDGPU/GlobalISel/regbankselect-xor.mir | 84 +- 7 files changed, 133 insertions(+), 113 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp index dffabe3932cc3..37cdf0c926b09 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp @@ -237,6 +237,21 @@ void RegBankLegalizeHelper::lowerS_BFE(MachineInstr &MI) { MI.eraseFromParent(); } +void RegBankLegalizeHelper::lowerSplitTo32(MachineInstr &MI) { + Register Dst = MI.getOperand(0).getReg(); + LLT Ty = MRI.getType(Dst) == V4S16 ? V2S16 : S32; + auto Op1 = B.buildUnmerge({VgprRB, Ty}, MI.getOperand(1).getReg()); + auto Op2 = B.buildUnmerge({VgprRB, Ty}, MI.getOperand(2).getReg()); + unsigned Opc = MI.getOpcode(); + auto Flags = MI.getFlags(); + auto Lo = + B.buildInstr(Opc, {{VgprRB, Ty}}, {Op1.getReg(0), Op2.getReg(0)}, Flags); + auto Hi = + B.buildInstr(Opc, {{VgprRB, Ty}}, {Op1.getReg(1), Op2.getReg(1)}, Flags); + B.buildMergeLikeInstr(MI.getOperand(0).getReg(), {Lo, Hi}); + MI.eraseFromParent(); +} + void RegBankLegalizeHelper::lower(MachineInstr &MI, const RegBankLLTMapping &Mapping, SmallSet &WaterfallSgprs) { @@ -325,20 +340,12 @@ void RegBankLegalizeHelper::lower(MachineInstr &MI, MI.eraseFromParent(); return; } - case SplitTo32: { -auto Op1 = B.buildUnmerge(VgprRB_S32, MI.getOperand(1).getReg()); -auto Op2 = B.buildUnmerge(VgprRB_S32, MI.getOperand(2).getReg()); -unsigned Opc = MI.getOpcode(); -auto Lo = B.buildInstr(Opc, {VgprRB_S32}, {Op1.getReg(0), Op2.getReg(0)}); -auto Hi = B.buildInstr(Opc, {VgprRB_S32}, {Op1.getReg(1), Op2.getReg(1)}); -B.buildMergeLikeInstr(MI.getOperand(0).getReg(), {Lo, Hi}); -MI.eraseFromParent(); -break; - } case V_BFE: return lowerV_BFE(MI); case S_BFE: return lowerS_BFE(MI); + case SplitTo32: +return lowerSplitTo32(MI); case SplitLoad: { LLT DstTy = MRI.getType(MI.getOperand(0).getReg()); unsigned Size = DstTy.getSizeInBits(); @@ -398,6 +405,7 @@ LLT RegBankLegalizeHelper::getTyFromID(RegBankLLTMappingApplyID ID) { case UniInVcc: return LLT::scalar(1); case Sgpr16: + case Vgpr16: return LLT::scalar(16); case Sgpr32: case Sgpr32Trunc: @@ -517,6 +525,7 @@ RegBankLegalizeHelper::getRegBankFromID(RegBankLLTMappingApplyID ID) { case Sgpr32AExtBoolInReg: case Sgpr32SExt: return SgprRB; + case Vgpr16: case Vgpr32: case Vgpr64: case VgprP0: @@ -560,6 +569,7 @@ void RegBankLegalizeHelper::applyMappingDst( case SgprP4: case SgprP5: case SgprV4S32: +case Vgpr16: case Vgpr32: case Vgpr64: case VgprP0: @@ -691,6 +701,7 @@ void RegBankLegalizeHelper::applyMappingSrc( break; } // vgpr scalars, pointers and vectors +case Vgpr16: case Vgpr32: case Vgpr64: case VgprP0: diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.h b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.h index 2d4da4cc90ea7..bbfa7b3986fd2 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.h +++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.h @@ -112,6 +112,7 @@ class RegBankLegalizeHelper { void lowerV_BFE(MachineInstr &MI); void lowerS_BFE(MachineInstr &MI); + void lowerSplitTo32(MachineInstr &MI); }; } // end namespace AMDGPU diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp index d13748c0ef390..2d987e0647eba 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp @@ -106,6 +106,8 @@ bool matchUnifor
[llvm-branch-commits] [llvm] release/20.x: [GlobalOpt] Do not promote malloc if there are atomic loads/stores (#137158) (PR #137179)
llvmbot wrote: @llvm/pr-subscribers-llvm-transforms Author: None (llvmbot) Changes Backport 57530c23a53b5e003d389437637f61c5b9814e22 Requested by: @nikic --- Full diff: https://github.com/llvm/llvm-project/pull/137179.diff 2 Files Affected: - (modified) llvm/lib/Transforms/IPO/GlobalOpt.cpp (+4) - (added) llvm/test/Transforms/GlobalOpt/malloc-promote-atomic.ll (+28) ``diff diff --git a/llvm/lib/Transforms/IPO/GlobalOpt.cpp b/llvm/lib/Transforms/IPO/GlobalOpt.cpp index 9586fc97a39f7..236a531317678 100644 --- a/llvm/lib/Transforms/IPO/GlobalOpt.cpp +++ b/llvm/lib/Transforms/IPO/GlobalOpt.cpp @@ -719,10 +719,14 @@ static bool allUsesOfLoadedValueWillTrapIfNull(const GlobalVariable *GV) { const Value *P = Worklist.pop_back_val(); for (const auto *U : P->users()) { if (auto *LI = dyn_cast(U)) { +if (!LI->isSimple()) + return false; SmallPtrSet PHIs; if (!AllUsesOfValueWillTrapIfNull(LI, PHIs)) return false; } else if (auto *SI = dyn_cast(U)) { +if (!SI->isSimple()) + return false; // Ignore stores to the global. if (SI->getPointerOperand() != P) return false; diff --git a/llvm/test/Transforms/GlobalOpt/malloc-promote-atomic.ll b/llvm/test/Transforms/GlobalOpt/malloc-promote-atomic.ll new file mode 100644 index 0..0ecdf095efdd8 --- /dev/null +++ b/llvm/test/Transforms/GlobalOpt/malloc-promote-atomic.ll @@ -0,0 +1,28 @@ +; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5 +; RUN: opt -passes=globalopt -S < %s | FileCheck %s + +@g = internal global ptr null, align 8 + +define void @init() { +; CHECK-LABEL: define void @init() local_unnamed_addr { +; CHECK-NEXT:[[ALLOC:%.*]] = call ptr @malloc(i64 48) +; CHECK-NEXT:store atomic ptr [[ALLOC]], ptr @g seq_cst, align 8 +; CHECK-NEXT:ret void +; + %alloc = call ptr @malloc(i64 48) + store atomic ptr %alloc, ptr @g seq_cst, align 8 + ret void +} + +define i1 @check() { +; CHECK-LABEL: define i1 @check() local_unnamed_addr { +; CHECK-NEXT:[[VAL:%.*]] = load atomic ptr, ptr @g seq_cst, align 8 +; CHECK-NEXT:[[CMP:%.*]] = icmp eq ptr [[VAL]], null +; CHECK-NEXT:ret i1 [[CMP]] +; + %val = load atomic ptr, ptr @g seq_cst, align 8 + %cmp = icmp eq ptr %val, null + ret i1 %cmp +} + +declare ptr @malloc(i64) allockind("alloc,uninitialized") allocsize(0) `` https://github.com/llvm/llvm-project/pull/137179 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [libc] 2304226 - fix build again
Author: Schrodinger ZHU Yifan Date: 2025-04-24T13:15:42-04:00 New Revision: 230422621aeb97c4839ba133c75c33496aa5a75a URL: https://github.com/llvm/llvm-project/commit/230422621aeb97c4839ba133c75c33496aa5a75a DIFF: https://github.com/llvm/llvm-project/commit/230422621aeb97c4839ba133c75c33496aa5a75a.diff LOG: fix build again Added: Modified: libc/src/setjmp/CMakeLists.txt libc/src/setjmp/x86_64/CMakeLists.txt Removed: diff --git a/libc/src/setjmp/CMakeLists.txt b/libc/src/setjmp/CMakeLists.txt index 2591319f15240..239254fa57dc6 100644 --- a/libc/src/setjmp/CMakeLists.txt +++ b/libc/src/setjmp/CMakeLists.txt @@ -26,19 +26,21 @@ add_entrypoint_object( .${LIBC_TARGET_ARCHITECTURE}.longjmp ) -add_entrypoint_object( - siglongjmp - SRCS -siglongjmp.cpp - HDRS -siglongjmp.h - DEPENDS -.longjmp -) +if (TARGET libc.src.setjmp.sigsetjmp_epilogue) + add_entrypoint_object( +siglongjmp +SRCS + siglongjmp.cpp +HDRS + siglongjmp.h +DEPENDS + .longjmp + ) -add_entrypoint_object( - sigsetjmp - ALIAS - DEPENDS -.${LIBC_TARGET_ARCHITECTURE}.sigsetjmp -) + add_entrypoint_object( +sigsetjmp +ALIAS +DEPENDS + .${LIBC_TARGET_ARCHITECTURE}.sigsetjmp + ) +endif() diff --git a/libc/src/setjmp/x86_64/CMakeLists.txt b/libc/src/setjmp/x86_64/CMakeLists.txt index 0090e81655662..03ed5fb647084 100644 --- a/libc/src/setjmp/x86_64/CMakeLists.txt +++ b/libc/src/setjmp/x86_64/CMakeLists.txt @@ -8,20 +8,21 @@ add_entrypoint_object( libc.hdr.offsetof_macros libc.hdr.types.jmp_buf ) - -add_entrypoint_object( - sigsetjmp - SRCS -sigsetjmp.cpp - HDRS -../sigsetjmp.h - DEPENDS -libc.hdr.types.jmp_buf -libc.hdr.types.sigset_t -libc.hdr.offsetof_macros -libc.src.setjmp.sigsetjmp_epilogue -libc.src.setjmp.setjmp -) +if (TARGET libc.src.setjmp.sigsetjmp_epilogue) + add_entrypoint_object( +sigsetjmp +SRCS + sigsetjmp.cpp +HDRS + ../sigsetjmp.h +DEPENDS + libc.hdr.types.jmp_buf + libc.hdr.types.sigset_t + libc.hdr.offsetof_macros + libc.src.setjmp.sigsetjmp_epilogue + libc.src.setjmp.setjmp + ) +endif() add_entrypoint_object( longjmp ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [Attributor] Use `getAssumedAddrSpace` to get address space for `AllocaInst` (PR #136865)
jdoerfert wrote: FWIW, nvptx backend unfortunately works by "fixing stuff up" late. It shouldn't, but it does. I'd prefer to not to fix stuff up at all and maybe the best way is to have proper assertions in the creation of allocas/globals/... and/or the verifier. https://github.com/llvm/llvm-project/pull/136865 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [DirectX] Adding support for Root Descriptor in Obj2yaml/Yaml2Obj (PR #136732)
https://github.com/joaosaffran updated https://github.com/llvm/llvm-project/pull/136732 >From 578faea764d630b8782ba53b5153fdbeda2c45f8 Mon Sep 17 00:00:00 2001 From: joaosaffran Date: Thu, 24 Apr 2025 19:33:05 + Subject: [PATCH 1/8] addressing pr comments --- .../llvm/MC/DXContainerRootSignature.h| 2 +- llvm/lib/Target/DirectX/DXILRootSignature.cpp | 105 +++--- .../RootSignature-MultipleEntryFunctions.ll | 4 +- .../ContainerData/RootSignature-Parameters.ll | 24 ++-- 4 files changed, 57 insertions(+), 78 deletions(-) diff --git a/llvm/include/llvm/MC/DXContainerRootSignature.h b/llvm/include/llvm/MC/DXContainerRootSignature.h index 6d3329a2c6ce9..fee799249b255 100644 --- a/llvm/include/llvm/MC/DXContainerRootSignature.h +++ b/llvm/include/llvm/MC/DXContainerRootSignature.h @@ -25,7 +25,7 @@ struct RootSignatureDesc { uint32_t Version = 2U; uint32_t Flags = 0U; - uint32_t RootParameterOffset = 24U; + uint32_t RootParameterOffset = 0U; uint32_t StaticSamplersOffset = 0u; uint32_t NumStaticSamplers = 0u; SmallVector Parameters; diff --git a/llvm/lib/Target/DirectX/DXILRootSignature.cpp b/llvm/lib/Target/DirectX/DXILRootSignature.cpp index 5e615461df4f3..fe30793aa9853 100644 --- a/llvm/lib/Target/DirectX/DXILRootSignature.cpp +++ b/llvm/lib/Target/DirectX/DXILRootSignature.cpp @@ -40,21 +40,19 @@ static bool reportError(LLVMContext *Ctx, Twine Message, return true; } -static bool reportValueError(LLVMContext *Ctx, Twine ParamName, uint32_t Value, - DiagnosticSeverity Severity = DS_Error) { +static bool reportValueError(LLVMContext *Ctx, Twine ParamName, + uint32_t Value) { Ctx->diagnose(DiagnosticInfoGeneric( - "Invalid value for " + ParamName + ": " + Twine(Value), Severity)); + "Invalid value for " + ParamName + ": " + Twine(Value), DS_Error)); return true; } -static bool extractMdIntValue(uint32_t &Value, MDNode *Node, - unsigned int OpId) { - auto *CI = mdconst::dyn_extract(Node->getOperand(OpId).get()); - if (CI == nullptr) -return true; - - Value = CI->getZExtValue(); - return false; +static std::optional extractMdIntValue(MDNode *Node, + unsigned int OpId) { + if (auto *CI = + mdconst::dyn_extract(Node->getOperand(OpId).get())) +return CI->getZExtValue(); + return std::nullopt; } static bool parseRootFlags(LLVMContext *Ctx, mcdxbc::RootSignatureDesc &RSD, @@ -63,7 +61,9 @@ static bool parseRootFlags(LLVMContext *Ctx, mcdxbc::RootSignatureDesc &RSD, if (RootFlagNode->getNumOperands() != 2) return reportError(Ctx, "Invalid format for RootFlag Element"); - if (extractMdIntValue(RSD.Flags, RootFlagNode, 1)) + if (std::optional Val = extractMdIntValue(RootFlagNode, 1)) +RSD.Flags = *Val; + else return reportError(Ctx, "Invalid value for RootFlag"); return false; @@ -79,22 +79,24 @@ static bool parseRootConstants(LLVMContext *Ctx, mcdxbc::RootSignatureDesc &RSD, NewParameter.Header.ParameterType = llvm::to_underlying(dxbc::RootParameterType::Constants32Bit); - uint32_t SV; - if (extractMdIntValue(SV, RootConstantNode, 1)) + if (std::optional Val = extractMdIntValue(RootConstantNode, 1)) +NewParameter.Header.ShaderVisibility = *Val; + else return reportError(Ctx, "Invalid value for ShaderVisibility"); - NewParameter.Header.ShaderVisibility = SV; - - if (extractMdIntValue(NewParameter.Constants.ShaderRegister, RootConstantNode, -2)) + if (std::optional Val = extractMdIntValue(RootConstantNode, 2)) +NewParameter.Constants.ShaderRegister = *Val; + else return reportError(Ctx, "Invalid value for ShaderRegister"); - if (extractMdIntValue(NewParameter.Constants.RegisterSpace, RootConstantNode, -3)) + if (std::optional Val = extractMdIntValue(RootConstantNode, 3)) +NewParameter.Constants.RegisterSpace = *Val; + else return reportError(Ctx, "Invalid value for RegisterSpace"); - if (extractMdIntValue(NewParameter.Constants.Num32BitValues, RootConstantNode, -4)) + if (std::optional Val = extractMdIntValue(RootConstantNode, 4)) +NewParameter.Constants.Num32BitValues = *Val; + else return reportError(Ctx, "Invalid value for Num32BitValues"); RSD.Parameters.push_back(NewParameter); @@ -148,32 +150,6 @@ static bool parse(LLVMContext *Ctx, mcdxbc::RootSignatureDesc &RSD, static bool verifyRootFlag(uint32_t Flags) { return (Flags & ~0xfff) == 0; } -static bool verifyShaderVisibility(uint32_t Flags) { - switch (Flags) { - - case llvm::to_underlying(dxbc::ShaderVisibility::All): - case llvm::to_underlying(dxbc::ShaderVisibility::Vertex): - case llvm::to_underlying(dxbc::ShaderVisibility::Hull): - case llvm::to_underlying(dxbc::ShaderVisibility::Domain): - case llvm::to_underlying(dxbc::ShaderVisibilit
[llvm-branch-commits] [BOLT][NFCI] Switch heatmap to using parsed basic/branch events (PR #136531)
https://github.com/aaupov updated https://github.com/llvm/llvm-project/pull/136531 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [BOLT][NFCI] Switch heatmap to using parsed basic/branch events (PR #136531)
https://github.com/aaupov updated https://github.com/llvm/llvm-project/pull/136531 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [NFC] Refactoring MCDXBC to support out of order store of root parameters (PR #137284)
github-actions[bot] wrote: :warning: C/C++ code formatter, clang-format found issues in your code. :warning: You can test this locally with the following command: ``bash git-clang-format --diff HEAD~1 HEAD --extensions h,cpp -- llvm/include/llvm/MC/DXContainerRootSignature.h llvm/include/llvm/ObjectYAML/DXContainerYAML.h llvm/lib/MC/DXContainerRootSignature.cpp llvm/lib/ObjectYAML/DXContainerEmitter.cpp llvm/lib/Target/DirectX/DXILRootSignature.cpp `` View the diff from clang-format here. ``diff diff --git a/llvm/include/llvm/ObjectYAML/DXContainerYAML.h b/llvm/include/llvm/ObjectYAML/DXContainerYAML.h index e86a869da..c54c995ac 100644 --- a/llvm/include/llvm/ObjectYAML/DXContainerYAML.h +++ b/llvm/include/llvm/ObjectYAML/DXContainerYAML.h @@ -95,7 +95,7 @@ struct RootParameterYamlDesc { uint32_t Type; uint32_t Visibility; uint32_t Offset; - RootParameterYamlDesc(){}; + RootParameterYamlDesc() {}; RootParameterYamlDesc(uint32_t T) : Type(T) { switch (T) { `` https://github.com/llvm/llvm-project/pull/137284 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [NFC] Refactoring MCDXBC to support out of order store of root parameters (PR #137284)
https://github.com/joaosaffran created https://github.com/llvm/llvm-project/pull/137284 None >From 7219ed4328aff2929f021c5efbd6901bc4bd2e20 Mon Sep 17 00:00:00 2001 From: joaosaffran Date: Fri, 25 Apr 2025 05:09:08 + Subject: [PATCH] refactoring mcdxbc struct to store root parameters out of order --- .../llvm/MC/DXContainerRootSignature.h| 138 +- .../include/llvm/ObjectYAML/DXContainerYAML.h | 2 +- llvm/lib/MC/DXContainerRootSignature.cpp | 78 +- llvm/lib/ObjectYAML/DXContainerEmitter.cpp| 36 ++--- llvm/lib/Target/DirectX/DXILRootSignature.cpp | 45 +++--- 5 files changed, 208 insertions(+), 91 deletions(-) diff --git a/llvm/include/llvm/MC/DXContainerRootSignature.h b/llvm/include/llvm/MC/DXContainerRootSignature.h index 1f421d726bf38..e1f4abbcebf8f 100644 --- a/llvm/include/llvm/MC/DXContainerRootSignature.h +++ b/llvm/include/llvm/MC/DXContainerRootSignature.h @@ -6,22 +6,146 @@ // //===--===// +#include "llvm/ADT/STLForwardCompat.h" #include "llvm/BinaryFormat/DXContainer.h" +#include "llvm/Support/ErrorHandling.h" +#include #include -#include +#include namespace llvm { class raw_ostream; namespace mcdxbc { +struct RootParameterHeader : public dxbc::RootParameterHeader { + + size_t Location; + + RootParameterHeader() = default; + + RootParameterHeader(dxbc::RootParameterHeader H, size_t L) + : dxbc::RootParameterHeader(H), Location(L) {} +}; + +using RootDescriptor = std::variant; +using ParametersView = +std::variant; struct RootParameter { - dxbc::RootParameterHeader Header; - union { -dxbc::RootConstants Constants; -dxbc::RST0::v0::RootDescriptor Descriptor_V10; -dxbc::RST0::v1::RootDescriptor Descriptor_V11; + SmallVector Headers; + + SmallVector Constants; + SmallVector Descriptors; + + void addHeader(dxbc::RootParameterHeader H, size_t L) { +Headers.push_back(RootParameterHeader(H, L)); + } + + void addParameter(dxbc::RootParameterHeader H, dxbc::RootConstants C) { +addHeader(H, Constants.size()); +Constants.push_back(C); + } + + void addParameter(dxbc::RootParameterHeader H, +dxbc::RST0::v0::RootDescriptor D) { +addHeader(H, Descriptors.size()); +Descriptors.push_back(D); + } + + void addParameter(dxbc::RootParameterHeader H, +dxbc::RST0::v1::RootDescriptor D) { +addHeader(H, Descriptors.size()); +Descriptors.push_back(D); + } + + ParametersView get(const RootParameterHeader &H) const { +switch (H.ParameterType) { +case llvm::to_underlying(dxbc::RootParameterType::Constants32Bit): + return Constants[H.Location]; +case llvm::to_underlying(dxbc::RootParameterType::CBV): +case llvm::to_underlying(dxbc::RootParameterType::SRV): +case llvm::to_underlying(dxbc::RootParameterType::UAV): + RootDescriptor VersionedParam = Descriptors[H.Location]; + if (std::holds_alternative( + VersionedParam)) +return std::get(VersionedParam); + return std::get(VersionedParam); +} + +llvm_unreachable("Unimplemented parameter type"); + } + + struct iterator { +const RootParameter &Parameters; +SmallVector::const_iterator Current; + +// Changed parameter type to match member variable (removed const) +iterator(const RootParameter &P, + SmallVector::const_iterator C) +: Parameters(P), Current(C) {} +iterator(const iterator &) = default; + +ParametersView operator*() { + ParametersView Val; + switch (Current->ParameterType) { + case llvm::to_underlying(dxbc::RootParameterType::Constants32Bit): +Val = Parameters.Constants[Current->Location]; +break; + + case llvm::to_underlying(dxbc::RootParameterType::CBV): + case llvm::to_underlying(dxbc::RootParameterType::SRV): + case llvm::to_underlying(dxbc::RootParameterType::UAV): +RootDescriptor VersionedParam = +Parameters.Descriptors[Current->Location]; +if (std::holds_alternative( +VersionedParam)) + Val = std::get(VersionedParam); +else + Val = std::get(VersionedParam); +break; + } + return Val; +} + +iterator operator++() { + Current++; + return *this; +} + +iterator operator++(int) { + iterator Tmp = *this; + ++*this; + return Tmp; +} + +iterator operator--() { + Current--; + return *this; +} + +iterator operator--(int) { + iterator Tmp = *this; + --*this; + return Tmp; +} + +bool operator==(const iterator I) { return I.Current == Current; } +bool operator!=(const iterator I) { return !(*this == I); } }; + + iterator begin() const { return iterator(*this, Headers.begin()); } + + iterator end() const { return iterator(*this, Headers.end()); } + + size_t size() const { return Headers
[llvm-branch-commits] [llvm] [NFC] Refactoring MCDXBC to support out of order store of root parameters (PR #137284)
https://github.com/joaosaffran updated https://github.com/llvm/llvm-project/pull/137284 >From 7219ed4328aff2929f021c5efbd6901bc4bd2e20 Mon Sep 17 00:00:00 2001 From: joaosaffran Date: Fri, 25 Apr 2025 05:09:08 + Subject: [PATCH 1/2] refactoring mcdxbc struct to store root parameters out of order --- .../llvm/MC/DXContainerRootSignature.h| 138 +- .../include/llvm/ObjectYAML/DXContainerYAML.h | 2 +- llvm/lib/MC/DXContainerRootSignature.cpp | 78 +- llvm/lib/ObjectYAML/DXContainerEmitter.cpp| 36 ++--- llvm/lib/Target/DirectX/DXILRootSignature.cpp | 45 +++--- 5 files changed, 208 insertions(+), 91 deletions(-) diff --git a/llvm/include/llvm/MC/DXContainerRootSignature.h b/llvm/include/llvm/MC/DXContainerRootSignature.h index 1f421d726bf38..e1f4abbcebf8f 100644 --- a/llvm/include/llvm/MC/DXContainerRootSignature.h +++ b/llvm/include/llvm/MC/DXContainerRootSignature.h @@ -6,22 +6,146 @@ // //===--===// +#include "llvm/ADT/STLForwardCompat.h" #include "llvm/BinaryFormat/DXContainer.h" +#include "llvm/Support/ErrorHandling.h" +#include #include -#include +#include namespace llvm { class raw_ostream; namespace mcdxbc { +struct RootParameterHeader : public dxbc::RootParameterHeader { + + size_t Location; + + RootParameterHeader() = default; + + RootParameterHeader(dxbc::RootParameterHeader H, size_t L) + : dxbc::RootParameterHeader(H), Location(L) {} +}; + +using RootDescriptor = std::variant; +using ParametersView = +std::variant; struct RootParameter { - dxbc::RootParameterHeader Header; - union { -dxbc::RootConstants Constants; -dxbc::RST0::v0::RootDescriptor Descriptor_V10; -dxbc::RST0::v1::RootDescriptor Descriptor_V11; + SmallVector Headers; + + SmallVector Constants; + SmallVector Descriptors; + + void addHeader(dxbc::RootParameterHeader H, size_t L) { +Headers.push_back(RootParameterHeader(H, L)); + } + + void addParameter(dxbc::RootParameterHeader H, dxbc::RootConstants C) { +addHeader(H, Constants.size()); +Constants.push_back(C); + } + + void addParameter(dxbc::RootParameterHeader H, +dxbc::RST0::v0::RootDescriptor D) { +addHeader(H, Descriptors.size()); +Descriptors.push_back(D); + } + + void addParameter(dxbc::RootParameterHeader H, +dxbc::RST0::v1::RootDescriptor D) { +addHeader(H, Descriptors.size()); +Descriptors.push_back(D); + } + + ParametersView get(const RootParameterHeader &H) const { +switch (H.ParameterType) { +case llvm::to_underlying(dxbc::RootParameterType::Constants32Bit): + return Constants[H.Location]; +case llvm::to_underlying(dxbc::RootParameterType::CBV): +case llvm::to_underlying(dxbc::RootParameterType::SRV): +case llvm::to_underlying(dxbc::RootParameterType::UAV): + RootDescriptor VersionedParam = Descriptors[H.Location]; + if (std::holds_alternative( + VersionedParam)) +return std::get(VersionedParam); + return std::get(VersionedParam); +} + +llvm_unreachable("Unimplemented parameter type"); + } + + struct iterator { +const RootParameter &Parameters; +SmallVector::const_iterator Current; + +// Changed parameter type to match member variable (removed const) +iterator(const RootParameter &P, + SmallVector::const_iterator C) +: Parameters(P), Current(C) {} +iterator(const iterator &) = default; + +ParametersView operator*() { + ParametersView Val; + switch (Current->ParameterType) { + case llvm::to_underlying(dxbc::RootParameterType::Constants32Bit): +Val = Parameters.Constants[Current->Location]; +break; + + case llvm::to_underlying(dxbc::RootParameterType::CBV): + case llvm::to_underlying(dxbc::RootParameterType::SRV): + case llvm::to_underlying(dxbc::RootParameterType::UAV): +RootDescriptor VersionedParam = +Parameters.Descriptors[Current->Location]; +if (std::holds_alternative( +VersionedParam)) + Val = std::get(VersionedParam); +else + Val = std::get(VersionedParam); +break; + } + return Val; +} + +iterator operator++() { + Current++; + return *this; +} + +iterator operator++(int) { + iterator Tmp = *this; + ++*this; + return Tmp; +} + +iterator operator--() { + Current--; + return *this; +} + +iterator operator--(int) { + iterator Tmp = *this; + --*this; + return Tmp; +} + +bool operator==(const iterator I) { return I.Current == Current; } +bool operator!=(const iterator I) { return !(*this == I); } }; + + iterator begin() const { return iterator(*this, Headers.begin()); } + + iterator end() const { return iterator(*this, Headers.end()); } + + size_t size() const { return Headers.s
[llvm-branch-commits] [llvm] dd4b43f - Revert "[DebugInfo][DWARF] Emit DW_AT_abstract_origin for concrete/inlined DW…"
Author: Vladislav Dzhidzhoev Date: 2025-04-24T19:54:52+02:00 New Revision: dd4b43f1ffd419bbec1554ebb529038066d4eae5 URL: https://github.com/llvm/llvm-project/commit/dd4b43f1ffd419bbec1554ebb529038066d4eae5 DIFF: https://github.com/llvm/llvm-project/commit/dd4b43f1ffd419bbec1554ebb529038066d4eae5.diff LOG: Revert "[DebugInfo][DWARF] Emit DW_AT_abstract_origin for concrete/inlined DW…" This reverts commit 1143a04f349c4081a1a2d2503046f6ca422aa338. Added: Modified: llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.h llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp llvm/test/DebugInfo/Generic/inline-scopes.ll llvm/test/DebugInfo/X86/lexical-block-file-inline.ll llvm/test/DebugInfo/X86/missing-abstract-variable.ll Removed: llvm/test/DebugInfo/Generic/lexical-block-abstract-origin.ll diff --git a/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp b/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp index a20c374e08935..3939dae81841f 100644 --- a/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp +++ b/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp @@ -782,8 +782,6 @@ DIE *DwarfCompileUnit::constructLexicalScopeDIE(LexicalScope *Scope) { assert(!LexicalBlockDIEs.count(DS) && "Concrete out-of-line DIE for this scope exists!"); LexicalBlockDIEs[DS] = ScopeDIE; - } else { -InlinedLocalScopeDIEs[DS].push_back(ScopeDIE); } attachRangesOrLowHighPC(*ScopeDIE, Scope->getRanges()); @@ -1493,19 +1491,6 @@ void DwarfCompileUnit::finishEntityDefinition(const DbgEntity *Entity) { getDwarfDebug().addAccelName(*this, CUNode->getNameTableKind(), Name, *Die); } -void DwarfCompileUnit::attachLexicalScopesAbstractOrigins() { - auto AttachAO = [&](const DILocalScope *LS, DIE *ScopeDIE) { -if (auto *AbsLSDie = getAbstractScopeDIEs().lookup(LS)) - addDIEEntry(*ScopeDIE, dwarf::DW_AT_abstract_origin, *AbsLSDie); - }; - - for (auto [LScope, ScopeDIE] : LexicalBlockDIEs) -AttachAO(LScope, ScopeDIE); - for (auto &[LScope, ScopeDIEs] : InlinedLocalScopeDIEs) -for (auto *ScopeDIE : ScopeDIEs) - AttachAO(LScope, ScopeDIE); -} - DbgEntity *DwarfCompileUnit::getExistingAbstractEntity(const DINode *Node) { auto &AbstractEntities = getAbstractEntities(); auto I = AbstractEntities.find(Node); diff --git a/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.h b/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.h index 09be22ce35e36..104039db03c7c 100644 --- a/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.h +++ b/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.h @@ -82,10 +82,6 @@ class DwarfCompileUnit final : public DwarfUnit { // List of abstract local scopes (either DISubprogram or DILexicalBlock). DenseMap AbstractLocalScopeDIEs; - // List of inlined lexical block scopes that belong to subprograms within this - // CU. - DenseMap> InlinedLocalScopeDIEs; - DenseMap> AbstractEntities; /// DWO ID for correlating skeleton and split units. @@ -303,7 +299,6 @@ class DwarfCompileUnit final : public DwarfUnit { void finishSubprogramDefinition(const DISubprogram *SP); void finishEntityDefinition(const DbgEntity *Entity); - void attachLexicalScopesAbstractOrigins(); /// Find abstract variable associated with Var. using InlinedEntity = DbgValueHistoryMap::InlinedEntity; diff --git a/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp b/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp index 6c932651750ee..39f1299a24e81 100644 --- a/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp +++ b/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp @@ -1262,7 +1262,6 @@ void DwarfDebug::finalizeModuleInfo() { auto &TheCU = *P.second; if (TheCU.getCUNode()->isDebugDirectivesOnly()) continue; -TheCU.attachLexicalScopesAbstractOrigins(); // Emit DW_AT_containing_type attribute to connect types with their // vtable holding type. TheCU.constructContainingTypeDIEs(); diff --git a/llvm/test/DebugInfo/Generic/inline-scopes.ll b/llvm/test/DebugInfo/Generic/inline-scopes.ll index 45ecdd0594f64..8e7543eb16e69 100644 --- a/llvm/test/DebugInfo/Generic/inline-scopes.ll +++ b/llvm/test/DebugInfo/Generic/inline-scopes.ll @@ -20,29 +20,16 @@ ; } ; Ensure that lexical_blocks within inlined_subroutines are preserved/emitted. -; CHECK: DW_TAG_subprogram -; CHECK-NEXT: DW_AT_linkage_name ("_Z2f1v") -; CHECK: [[ADDR1:0x[0-9a-f]+]]: DW_TAG_lexical_block -; CHECK: DW_TAG_subprogram -; CHECK-NEXT: DW_AT_linkage_name ("_Z2f2v") -; CHECK: [[ADDR2:0x[0-9a-f]+]]: DW_TAG_lexical_block ; CHECK: DW_TAG_inlined_subroutine ; CHECK-NOT: DW_TAG ; CHECK-NOT: NULL -; CHECK: DW_TAG_lexical_block -; CHECK-NOT: {{DW_TAG|NULL}} -; CHECK: DW_AT_abstract_origin ([[ADDR1]] +; CHECK: DW_TAG_lexical_block ; CHECK-NOT: DW_TAG ; CHECK-NOT: NULL ; CHECK: DW_TAG_variable ; Ensure that file changes don't
[llvm-branch-commits] [mlir] [mlir][OpenMP] convert wsloop cancellation to LLVMIR (PR #137194)
tblah wrote: PR Stack: - Cancel parallel https://github.com/llvm/llvm-project/pull/137192 - Cancel sections https://github.com/llvm/llvm-project/pull/137193 - Cancel wsloop https://github.com/llvm/llvm-project/pull/137194 - Cancellation point (TODO) - Cancel(lation point) taskgroup (TODO) https://github.com/llvm/llvm-project/pull/137194 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: detect signing oracles (PR #134146)
@@ -339,6 +369,183 @@ class AArch64MCPlusBuilder : public MCPlusBuilder { } } + std::optional> + getAuthCheckedReg(BinaryBasicBlock &BB) const override { +// Match several possible hard-coded sequences of instructions which can be +// emitted by LLVM backend to check that the authenticated pointer is +// correct (see AArch64AsmPrinter::emitPtrauthCheckAuthenticatedValue). +// +// This function only matches sequences involving branch instructions. +// All these sequences have the form: +// +// (0) ... regular code that authenticates a pointer in Xn ... +// (1) analyze Xn +// (2) branch to .Lon_success if the pointer is correct +// (3) BRK #imm (fall-through basic block) +// +// In the above pseudocode, (1) + (2) is one of the following sequences: +// +// - eor Xtmp, Xn, Xn, lsl #1 +// tbz Xtmp, #62, .Lon_success +// +// - mov Xtmp, Xn +// xpac(i|d) Xn (or xpaclri if Xn is LR) +// cmp Xtmp, Xn +// b.eq .Lon_success +// +// Note that any branch destination operand is accepted as .Lon_success - +// it is the responsibility of the caller of getAuthCheckedReg to inspect +// the list of successors of this basic block as appropriate. + +// Any of the above code sequences assume the fall-through basic block +// is a dead-end BRK instruction (any immediate operand is accepted). +const BinaryBasicBlock *BreakBB = BB.getFallthrough(); +if (!BreakBB || BreakBB->empty() || +BreakBB->front().getOpcode() != AArch64::BRK) + return std::nullopt; + +// Iterate over the instructions of BB in reverse order, matching opcodes +// and operands. +MCPhysReg TestedReg = 0; +MCPhysReg ScratchReg = 0; +auto It = BB.end(); +auto StepAndGetOpcode = [&It, &BB]() -> int { + if (It == BB.begin()) +return -1; + --It; + return It->getOpcode(); +}; + +switch (StepAndGetOpcode()) { +default: + // Not matched the branch instruction. + return std::nullopt; +case AArch64::Bcc: + // Bcc EQ, .Lon_success + if (It->getOperand(0).getImm() != AArch64CC::EQ) +return std::nullopt; + // Not checking .Lon_success (see above). + + // SUBSXrs XZR, TestedReg, ScratchReg, 0 (used by "CMP reg, reg" alias) + if (StepAndGetOpcode() != AArch64::SUBSXrs || + It->getOperand(0).getReg() != AArch64::XZR || + It->getOperand(3).getImm() != 0) +return std::nullopt; + TestedReg = It->getOperand(1).getReg(); + ScratchReg = It->getOperand(2).getReg(); + + // Either XPAC(I|D) ScratchReg, ScratchReg + // or XPACLRI + switch (StepAndGetOpcode()) { + default: +return std::nullopt; + case AArch64::XPACLRI: +// No operands to check, but using XPACLRI forces TestedReg to be X30. +if (TestedReg != AArch64::LR) + return std::nullopt; +break; + case AArch64::XPACI: + case AArch64::XPACD: +if (It->getOperand(0).getReg() != ScratchReg || +It->getOperand(1).getReg() != ScratchReg) + return std::nullopt; +break; + } + + // ORRXrs ScratchReg, XZR, TestedReg, 0 (used by "MOV reg, reg" alias) + if (StepAndGetOpcode() != AArch64::ORRXrs) +return std::nullopt; + if (It->getOperand(0).getReg() != ScratchReg || + It->getOperand(1).getReg() != AArch64::XZR || + It->getOperand(2).getReg() != TestedReg || + It->getOperand(3).getImm() != 0) +return std::nullopt; + + return std::make_pair(TestedReg, &*It); + +case AArch64::TBZX: + // TBZX ScratchReg, 62, .Lon_success + ScratchReg = It->getOperand(0).getReg(); + if (It->getOperand(1).getImm() != 62) +return std::nullopt; + // Not checking .Lon_success (see above). + + // EORXrs ScratchReg, TestedReg, TestedReg, 1 + if (StepAndGetOpcode() != AArch64::EORXrs) +return std::nullopt; + TestedReg = It->getOperand(1).getReg(); + if (It->getOperand(0).getReg() != ScratchReg || + It->getOperand(2).getReg() != TestedReg || + It->getOperand(3).getImm() != 1) +return std::nullopt; + + return std::make_pair(TestedReg, &*It); +} + } + + MCPhysReg getAuthCheckedReg(const MCInst &Inst, + bool MayOverwrite) const override { +// Cannot trivially reuse AArch64InstrInfo::getMemOperandWithOffsetWidth() +// method as it accepts an instance of MachineInstr, not MCInst. +const MCInstrDesc &Desc = Info->get(Inst.getOpcode()); + +// If signing oracles are considered, the particular value left in the base +// register after this instruction is important. This function checks that +// if the base register was overwritten, it is due to address write-back. +// +// Note that this function is not needed for authentication oracles, as the +
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: detect untrusted LR before tail call (PR #137224)
https://github.com/atrosinenko created https://github.com/llvm/llvm-project/pull/137224 Implement the detection of tail calls performed with untrusted link register, which violates the assumption made on entry to every function. Unlike other pauth gadgets, this one involves some amount of guessing which branch instructions should be checked as tail calls. >From 5ccb98144a2625908c6abf3aab9fb6d0c226d80b Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Tue, 22 Apr 2025 21:43:14 +0300 Subject: [PATCH] [BOLT] Gadget scanner: detect untrusted LR before tail call Implement the detection of tail calls performed with untrusted link register, which violates the assumption made on entry to every function. Unlike other pauth gadgets, this one involves some amount of guessing which branch instructions should be checked as tail calls. --- bolt/lib/Passes/PAuthGadgetScanner.cpp| 112 +++- .../AArch64/gs-pacret-autiasp.s | 31 +- .../AArch64/gs-pauth-debug-output.s | 30 +- .../AArch64/gs-pauth-tail-calls.s | 597 ++ 4 files changed, 730 insertions(+), 40 deletions(-) create mode 100644 bolt/test/binary-analysis/AArch64/gs-pauth-tail-calls.s diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index 0ce9f51c44af4..d67f10a311396 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -655,8 +655,9 @@ class DataflowSrcSafetyAnalysis // // Then, a function can be split into a number of disjoint contiguous sequences // of instructions without labels in between. These sequences can be processed -// the same way basic blocks are processed by data-flow analysis, assuming -// pessimistically that all registers are unsafe at the start of each sequence. +// the same way basic blocks are processed by data-flow analysis, with the same +// pessimistic estimation of the initial state at the start of each sequence +// (except the first instruction of the function). class CFGUnawareSrcSafetyAnalysis : public SrcSafetyAnalysis { BinaryFunction &BF; MCPlusBuilder::AllocatorIdTy AllocId; @@ -667,6 +668,30 @@ class CFGUnawareSrcSafetyAnalysis : public SrcSafetyAnalysis { BC.MIB->removeAnnotation(I.second, StateAnnotationIndex); } + /// Compute a reasonably pessimistic estimation of the register state when + /// the previous instruction is not known for sure. Take the set of registers + /// which are trusted at function entry and remove all registers that can be + /// clobbered inside this function. + SrcState computePessimisticState(BinaryFunction &BF) { +BitVector ClobberedRegs(NumRegs); +for (auto &I : BF.instrs()) { + MCInst &Inst = I.second; + BC.MIB->getClobberedRegs(Inst, ClobberedRegs); + + // If this is a call instruction, no register is safe anymore, unless + // it is a tail call. Ignore tail calls for the purpose of estimating the + // worst-case scenario, assuming no instructions are executed in the + // caller after this point anyway. + if (BC.MIB->isCall(Inst) && !BC.MIB->isTailCall(Inst)) +ClobberedRegs.set(); +} + +SrcState S = createEntryState(); +S.SafeToDerefRegs.reset(ClobberedRegs); +S.TrustedRegs.reset(ClobberedRegs); +return S; + } + public: CFGUnawareSrcSafetyAnalysis(BinaryFunction &BF, MCPlusBuilder::AllocatorIdTy AllocId, @@ -677,6 +702,7 @@ class CFGUnawareSrcSafetyAnalysis : public SrcSafetyAnalysis { } void run() override { +const SrcState DefaultState = computePessimisticState(BF); SrcState S = createEntryState(); for (auto &I : BF.instrs()) { MCInst &Inst = I.second; @@ -691,7 +717,7 @@ class CFGUnawareSrcSafetyAnalysis : public SrcSafetyAnalysis { LLVM_DEBUG({ traceInst(BC, "Due to label, resetting the state before", Inst); }); -S = createUnsafeState(); +S = DefaultState; } // Check if we need to remove an old annotation (this is the case if @@ -1226,6 +1252,83 @@ shouldReportReturnGadget(const BinaryContext &BC, const MCInstReference &Inst, return make_report(RetKind, Inst, *RetReg); } +/// While BOLT already marks some of the branch instructions as tail calls, +/// this function tries to improve the coverage by including less obvious cases +/// when it is possible to do without introducing too many false positives. +static bool shouldAnalyzeTailCallInst(const BinaryContext &BC, + const BinaryFunction &BF, + const MCInstReference &Inst) { + // Some BC.MIB->isXYZ(Inst) methods simply delegate to MCInstrDesc::isXYZ() + // (such as isBranch at the time of writing this comment), some don't (such + // as isCall). For that reason, call MCInstrDesc's methods explicitly when + // it is important. + const MCInstrDesc &Desc = + BC.MII->get(stati
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: do not crash on debug-printing CFI instructions (PR #136151)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/136151 >From e22ae5ebfe066cec1335f2b4080b013560c5e844 Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Tue, 15 Apr 2025 21:47:18 +0300 Subject: [PATCH] [BOLT] Gadget scanner: do not crash on debug-printing CFI instructions Some instruction-printing code used under LLVM_DEBUG does not handle CFI instructions well. While CFI instructions seem to be harmless for the correctness of the analysis results, they do not convey any useful information to the analysis either, so skip them early. --- bolt/lib/Passes/PAuthGadgetScanner.cpp| 16 ++ .../AArch64/gs-pauth-debug-output.s | 32 +++ 2 files changed, 48 insertions(+) diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index 849272cac73d2..f7ac0b67d00da 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -431,6 +431,9 @@ class SrcSafetyAnalysis { } SrcState computeNext(const MCInst &Point, const SrcState &Cur) { +if (BC.MIB->isCFI(Point)) + return Cur; + SrcStatePrinter P(BC); LLVM_DEBUG({ dbgs() << " SrcSafetyAnalysis::ComputeNext("; @@ -670,6 +673,8 @@ class CFGUnawareSrcSafetyAnalysis : public SrcSafetyAnalysis { SrcState S = createEntryState(); for (auto &I : BF.instrs()) { MCInst &Inst = I.second; + if (BC.MIB->isCFI(Inst)) +continue; // If there is a label before this instruction, it is possible that it // can be jumped-to, thus conservatively resetting S. As an exception, @@ -947,6 +952,9 @@ class DstSafetyAnalysis { } DstState computeNext(const MCInst &Point, const DstState &Cur) { +if (BC.MIB->isCFI(Point)) + return Cur; + DstStatePrinter P(BC); LLVM_DEBUG({ dbgs() << " DstSafetyAnalysis::ComputeNext("; @@ -1123,6 +1131,8 @@ class CFGUnawareDstSafetyAnalysis : public DstSafetyAnalysis { DstState S = createUnsafeState(); for (auto &I : llvm::reverse(BF.instrs())) { MCInst &Inst = I.second; + if (BC.MIB->isCFI(Inst)) +continue; // If Inst can change the control flow, we cannot be sure that the next // instruction (to be executed in analyzed program) is the one processed @@ -1319,6 +1329,9 @@ void FunctionAnalysis::findUnsafeUses( }); iterateOverInstrs(BF, [&](MCInstReference Inst) { +if (BC.MIB->isCFI(Inst)) + return; + const SrcState &S = Analysis->getStateBefore(Inst); // If non-empty state was never propagated from the entry basic block @@ -1382,6 +1395,9 @@ void FunctionAnalysis::findUnsafeDefs( }); iterateOverInstrs(BF, [&](MCInstReference Inst) { +if (BC.MIB->isCFI(Inst)) + return; + const DstState &S = Analysis->getStateAfter(Inst); if (auto Report = shouldReportAuthOracle(BC, Inst, S)) diff --git a/bolt/test/binary-analysis/AArch64/gs-pauth-debug-output.s b/bolt/test/binary-analysis/AArch64/gs-pauth-debug-output.s index fd55880921d06..07b61bea77e94 100644 --- a/bolt/test/binary-analysis/AArch64/gs-pauth-debug-output.s +++ b/bolt/test/binary-analysis/AArch64/gs-pauth-debug-output.s @@ -329,6 +329,38 @@ auth_oracle: // PAUTH-EMPTY: // PAUTH-NEXT: Attaching leakage info to: : autia x0, x1 # DataflowDstSafetyAnalysis: dst-state +// Gadget scanner should not crash on CFI instructions, including when debug-printing them. +// Note that the particular debug output is not checked, but BOLT should be +// compiled with assertions enabled to support -debug-only argument. + +.globl cfi_inst_df +.type cfi_inst_df,@function +cfi_inst_df: +.cfi_startproc +sub sp, sp, #16 +.cfi_def_cfa_offset 16 +add sp, sp, #16 +.cfi_def_cfa_offset 0 +ret +.size cfi_inst_df, .-cfi_inst_df +.cfi_endproc + +.globl cfi_inst_nocfg +.type cfi_inst_nocfg,@function +cfi_inst_nocfg: +.cfi_startproc +sub sp, sp, #16 +.cfi_def_cfa_offset 16 + +adr x0, 1f +br x0 +1: +add sp, sp, #16 +.cfi_def_cfa_offset 0 +ret +.size cfi_inst_nocfg, .-cfi_inst_nocfg +.cfi_endproc + // CHECK-LABEL:Analyzing function main, AllocatorId = 1 .globl main .type main,@function ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: detect signing oracles (PR #134146)
https://github.com/atrosinenko edited https://github.com/llvm/llvm-project/pull/134146 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: detect signing oracles (PR #134146)
@@ -339,6 +369,198 @@ class AArch64MCPlusBuilder : public MCPlusBuilder { } } + std::optional> + getAuthCheckedReg(BinaryBasicBlock &BB) const override { +// Match several possible hard-coded sequences of instructions which can be +// emitted by LLVM backend to check that the authenticated pointer is +// correct (see AArch64AsmPrinter::emitPtrauthCheckAuthenticatedValue). +// +// This function only matches sequences involving branch instructions. +// All these sequences have the form: +// +// (0) ... regular code that authenticates a pointer in Xn ... +// (1) analyze Xn +// (2) branch to .Lon_success if the pointer is correct +// (3) BRK #imm (fall-through basic block) +// +// In the above pseudocode, (1) + (2) is one of the following sequences: +// +// - eor Xtmp, Xn, Xn, lsl #1 +// tbz Xtmp, #62, .Lon_success +// +// - mov Xtmp, Xn +// xpac(i|d) Xn (or xpaclri if Xn is LR) +// cmp Xtmp, Xn +// b.eq .Lon_success +// +// Note that any branch destination operand is accepted as .Lon_success - +// it is the responsibility of the caller of getAuthCheckedReg to inspect +// the list of successors of this basic block as appropriate. + +// Any of the above code sequences assume the fall-through basic block +// is a dead-end BRK instruction (any immediate operand is accepted). +const BinaryBasicBlock *BreakBB = BB.getFallthrough(); +if (!BreakBB || BreakBB->empty() || +BreakBB->front().getOpcode() != AArch64::BRK) + return std::nullopt; + +// Iterate over the instructions of BB in reverse order, matching opcodes +// and operands. +MCPhysReg TestedReg = 0; +MCPhysReg ScratchReg = 0; +auto It = BB.end(); +auto StepAndGetOpcode = [&It, &BB]() -> int { + if (It == BB.begin()) +return -1; + --It; + return It->getOpcode(); +}; + +switch (StepAndGetOpcode()) { +default: + // Not matched the branch instruction. + return std::nullopt; +case AArch64::Bcc: + // Bcc EQ, .Lon_success + if (It->getOperand(0).getImm() != AArch64CC::EQ) +return std::nullopt; + // Not checking .Lon_success (see above). + + // SUBSXrs XZR, TestedReg, ScratchReg, 0 (used by "CMP reg, reg" alias) + if (StepAndGetOpcode() != AArch64::SUBSXrs || + It->getOperand(0).getReg() != AArch64::XZR || + It->getOperand(3).getImm() != 0) +return std::nullopt; + TestedReg = It->getOperand(1).getReg(); + ScratchReg = It->getOperand(2).getReg(); + + // Either XPAC(I|D) ScratchReg, ScratchReg + // or XPACLRI + switch (StepAndGetOpcode()) { + default: +return std::nullopt; + case AArch64::XPACLRI: +// No operands to check, but using XPACLRI forces TestedReg to be X30. +if (TestedReg != AArch64::LR) + return std::nullopt; +break; + case AArch64::XPACI: + case AArch64::XPACD: +if (It->getOperand(0).getReg() != ScratchReg || +It->getOperand(1).getReg() != ScratchReg) + return std::nullopt; +break; + } + + // ORRXrs ScratchReg, XZR, TestedReg, 0 (used by "MOV reg, reg" alias) + if (StepAndGetOpcode() != AArch64::ORRXrs) +return std::nullopt; + if (It->getOperand(0).getReg() != ScratchReg || + It->getOperand(1).getReg() != AArch64::XZR || + It->getOperand(2).getReg() != TestedReg || + It->getOperand(3).getImm() != 0) +return std::nullopt; + + return std::make_pair(TestedReg, &*It); + +case AArch64::TBZX: + // TBZX ScratchReg, 62, .Lon_success + ScratchReg = It->getOperand(0).getReg(); + if (It->getOperand(1).getImm() != 62) +return std::nullopt; + // Not checking .Lon_success (see above). + + // EORXrs ScratchReg, TestedReg, TestedReg, 1 + if (StepAndGetOpcode() != AArch64::EORXrs) +return std::nullopt; + TestedReg = It->getOperand(1).getReg(); + if (It->getOperand(0).getReg() != ScratchReg || + It->getOperand(2).getReg() != TestedReg || + It->getOperand(3).getImm() != 1) +return std::nullopt; + + return std::make_pair(TestedReg, &*It); +} + } + + MCPhysReg getAuthCheckedReg(const MCInst &Inst, + bool MayOverwrite) const override { +// Cannot trivially reuse AArch64InstrInfo::getMemOperandWithOffsetWidth() +// method as it accepts an instance of MachineInstr, not MCInst. +const MCInstrDesc &Desc = Info->get(Inst.getOpcode()); + +// If signing oracles are considered, the particular value left in the base +// register after this instruction is important. This function checks that +// if the base register was overwritten, it is due to address write-back: +// +// ; good: +// autdza x1 ; x1 is authenticated (may fail
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: detect untrusted LR before tail call (PR #137224)
atrosinenko wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/137224?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#137224** https://app.graphite.dev/github/pr/llvm/llvm-project/137224?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/137224?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#136183** https://app.graphite.dev/github/pr/llvm/llvm-project/136183?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#136151** https://app.graphite.dev/github/pr/llvm/llvm-project/136151?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#135663** https://app.graphite.dev/github/pr/llvm/llvm-project/135663?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#136147** https://app.graphite.dev/github/pr/llvm/llvm-project/136147?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#135662** https://app.graphite.dev/github/pr/llvm/llvm-project/135662?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#135661** https://app.graphite.dev/github/pr/llvm/llvm-project/135661?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#134146** https://app.graphite.dev/github/pr/llvm/llvm-project/134146?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#133461** https://app.graphite.dev/github/pr/llvm/llvm-project/133461?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#135073** https://app.graphite.dev/github/pr/llvm/llvm-project/135073?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn more about https://stacking.dev/?utm_source=stack-comment";>stacking. https://github.com/llvm/llvm-project/pull/137224 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: detect signing oracles (PR #134146)
https://github.com/atrosinenko commented: @kbeyls Thank you for the comments! I have updated the comments accordingly. https://github.com/llvm/llvm-project/pull/134146 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: detect signing oracles (PR #134146)
https://github.com/atrosinenko edited https://github.com/llvm/llvm-project/pull/134146 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: detect signing oracles (PR #134146)
https://github.com/atrosinenko edited https://github.com/llvm/llvm-project/pull/134146 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: detect signing oracles (PR #134146)
https://github.com/atrosinenko edited https://github.com/llvm/llvm-project/pull/134146 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: clarify MCPlusBuilder callbacks interface (PR #136147)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/136147 >From 4a7d095c971e9ae8a3957847eebc56cae8ed6cad Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Thu, 17 Apr 2025 15:40:05 +0300 Subject: [PATCH] [BOLT] Gadget scanner: clarify MCPlusBuilder callbacks interface Clarify the semantics of `getAuthenticatedReg` and remove a redundant `isAuthenticationOfReg` method, as combined auth+something instructions (such as `retaa` on AArch64) should be handled carefully, especially when searching for authentication oracles: usually, such instructions cannot be authentication oracles and only some of them actually write an authenticated pointer to a register (such as "ldra x0, [x1]!"). Use `std::optional` returned type instead of plain MCPhysReg and returning `getNoRegister()` as a "not applicable" indication. Document a few existing methods, add information about preconditions. --- bolt/include/bolt/Core/MCPlusBuilder.h| 61 ++- bolt/lib/Passes/PAuthGadgetScanner.cpp| 64 +--- .../Target/AArch64/AArch64MCPlusBuilder.cpp | 76 --- .../AArch64/gs-pauth-debug-output.s | 3 - .../AArch64/gs-pauth-signing-oracles.s| 20 + 5 files changed, 130 insertions(+), 94 deletions(-) diff --git a/bolt/include/bolt/Core/MCPlusBuilder.h b/bolt/include/bolt/Core/MCPlusBuilder.h index 132d58f3f9f79..83ad70ea97076 100644 --- a/bolt/include/bolt/Core/MCPlusBuilder.h +++ b/bolt/include/bolt/Core/MCPlusBuilder.h @@ -562,30 +562,50 @@ class MCPlusBuilder { return {}; } - virtual ErrorOr getAuthenticatedReg(const MCInst &Inst) const { -llvm_unreachable("not implemented"); -return getNoRegister(); - } - - virtual bool isAuthenticationOfReg(const MCInst &Inst, - MCPhysReg AuthenticatedReg) const { + /// Returns the register where an authenticated pointer is written to by Inst, + /// or std::nullopt if not authenticating any register. + /// + /// Sets IsChecked if the instruction always checks authenticated pointer, + /// i.e. it either returns a successfully authenticated pointer or terminates + /// the program abnormally (such as "ldra x0, [x1]!" on AArch64, which crashes + /// on authentication failure even if FEAT_FPAC is not implemented). + virtual std::optional + getWrittenAuthenticatedReg(const MCInst &Inst, bool &IsChecked) const { llvm_unreachable("not implemented"); -return false; +return std::nullopt; } - virtual MCPhysReg getSignedReg(const MCInst &Inst) const { + /// Returns the register signed by Inst, or std::nullopt if not signing any + /// register. + /// + /// The returned register is assumed to be both input and output operand, + /// as it is done on AArch64. + virtual std::optional getSignedReg(const MCInst &Inst) const { llvm_unreachable("not implemented"); -return getNoRegister(); +return std::nullopt; } - virtual ErrorOr getRegUsedAsRetDest(const MCInst &Inst) const { + /// Returns the register used as a return address. Returns std::nullopt if + /// not applicable, such as reading the return address from a system register + /// or from the stack. + /// + /// Sets IsAuthenticatedInternally if the instruction accepts a signed + /// pointer as its operand and authenticates it internally. + /// + /// Should only be called when isReturn(Inst) is true. + virtual std::optional + getRegUsedAsRetDest(const MCInst &Inst, + bool &IsAuthenticatedInternally) const { llvm_unreachable("not implemented"); -return getNoRegister(); +return std::nullopt; } /// Returns the register used as the destination of an indirect branch or call /// instruction. Sets IsAuthenticatedInternally if the instruction accepts /// a signed pointer as its operand and authenticates it internally. + /// + /// Should only be called if isIndirectCall(Inst) or isIndirectBranch(Inst) + /// returns true. virtual MCPhysReg getRegUsedAsIndirectBranchDest(const MCInst &Inst, bool &IsAuthenticatedInternally) const { @@ -602,14 +622,14 @@ class MCPlusBuilder { ///controlled, under the Pointer Authentication threat model. /// /// If the instruction does not write to any register satisfying the above - /// two conditions, NoRegister is returned. + /// two conditions, std::nullopt is returned. /// /// The Pointer Authentication threat model assumes an attacker is able to /// modify any writable memory, but not executable code (due to W^X). - virtual MCPhysReg + virtual std::optional getMaterializedAddressRegForPtrAuth(const MCInst &Inst) const { llvm_unreachable("not implemented"); -return getNoRegister(); +return std::nullopt; } /// Analyzes if this instruction can safely perform address arithmetics @@ -622,10 +642,13 @@ class MCPlusBuilder { /// controlled, provided InReg and executa
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: detect authentication oracles (PR #135663)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/135663 >From cdaa1549c3a4804bc5a3788f1fc0c2bfc910168e Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Sat, 5 Apr 2025 14:54:01 +0300 Subject: [PATCH 1/3] [BOLT] Gadget scanner: detect authentication oracles Implement the detection of authentication instructions whose results can be inspected by an attacker to know whether authentication succeeded. As the properties of output registers of authentication instructions are inspected, add a second set of analysis-related classes to iterate over the instructions in reverse order. --- bolt/include/bolt/Passes/PAuthGadgetScanner.h | 12 + bolt/lib/Passes/PAuthGadgetScanner.cpp| 542 + .../AArch64/gs-pauth-authentication-oracles.s | 723 ++ .../AArch64/gs-pauth-debug-output.s | 78 ++ 4 files changed, 1355 insertions(+) create mode 100644 bolt/test/binary-analysis/AArch64/gs-pauth-authentication-oracles.s diff --git a/bolt/include/bolt/Passes/PAuthGadgetScanner.h b/bolt/include/bolt/Passes/PAuthGadgetScanner.h index 9a4e9598c9f98..7c51ba42c3294 100644 --- a/bolt/include/bolt/Passes/PAuthGadgetScanner.h +++ b/bolt/include/bolt/Passes/PAuthGadgetScanner.h @@ -260,6 +260,15 @@ class ClobberingInfo : public ExtraInfo { void print(raw_ostream &OS, const MCInstReference Location) const override; }; +class LeakageInfo : public ExtraInfo { + SmallVector LeakingInstrs; + +public: + LeakageInfo(const ArrayRef Instrs) : LeakingInstrs(Instrs) {} + + void print(raw_ostream &OS, const MCInstReference Location) const override; +}; + /// A brief version of a report that can be further augmented with the details. /// /// It is common for a particular type of gadget detector to be tied to some @@ -301,6 +310,9 @@ class FunctionAnalysis { void findUnsafeUses(SmallVector> &Reports); void augmentUnsafeUseReports(ArrayRef> Reports); + void findUnsafeDefs(SmallVector> &Reports); + void augmentUnsafeDefReports(const ArrayRef> Reports); + /// Process the reports which do not have to be augmented, and remove them /// from Reports. void handleSimpleReports(SmallVector> &Reports); diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index 0bd9a1ed5941c..fff52de8b9c73 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -712,6 +712,459 @@ SrcSafetyAnalysis::create(BinaryFunction &BF, RegsToTrackInstsFor); } +/// A state representing which registers are safe to be used as the destination +/// operand of an authentication instruction. +/// +/// Similar to SrcState, it is the analysis that should take register aliasing +/// into account. +/// +/// Depending on the implementation, it may be possible that an authentication +/// instruction returns an invalid pointer on failure instead of terminating +/// the program immediately (assuming the program will crash as soon as that +/// pointer is dereferenced). To prevent brute-forcing the correct signature, +/// it should be impossible for an attacker to test if a pointer is correctly +/// signed - either the program should be terminated on authentication failure +/// or it should be impossible to tell whether authentication succeeded or not. +/// +/// For that reason, a restricted set of operations is allowed on any register +/// containing a value derived from the result of an authentication instruction +/// until that register is either wiped or checked not to contain a result of a +/// failed authentication. +/// +/// Specifically, the safety property for a register is computed by iterating +/// the instructions in backward order: the source register Xn of an instruction +/// Inst is safe if at least one of the following is true: +/// * Inst checks if Xn contains the result of a successful authentication and +/// terminates the program on failure. Note that Inst can either naturally +/// dereference Xn (load, branch, return, etc. instructions) or be the first +/// instruction of an explicit checking sequence. +/// * Inst performs safe address arithmetic AND both source and result +/// registers, as well as any temporary registers, must be safe after +/// execution of Inst (temporaries are not used on AArch64 and thus not +/// currently supported/allowed). +/// See MCPlusBuilder::analyzeAddressArithmeticsForPtrAuth for the details. +/// * Inst fully overwrites Xn with an unrelated value. +struct DstState { + /// The set of registers whose values cannot be inspected by an attacker in + /// a way usable as an authentication oracle. The results of authentication + /// instructions should be written to such registers. + BitVector CannotEscapeUnchecked; + + std::vector> FirstInstLeakingReg; + + /// Construct an empty state. + DstState() {} + + DstState(unsigned NumRegs, unsigned NumRegsToTrack) + : CannotE
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: clarify MCPlusBuilder callbacks interface (PR #136147)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/136147 >From 4a7d095c971e9ae8a3957847eebc56cae8ed6cad Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Thu, 17 Apr 2025 15:40:05 +0300 Subject: [PATCH] [BOLT] Gadget scanner: clarify MCPlusBuilder callbacks interface Clarify the semantics of `getAuthenticatedReg` and remove a redundant `isAuthenticationOfReg` method, as combined auth+something instructions (such as `retaa` on AArch64) should be handled carefully, especially when searching for authentication oracles: usually, such instructions cannot be authentication oracles and only some of them actually write an authenticated pointer to a register (such as "ldra x0, [x1]!"). Use `std::optional` returned type instead of plain MCPhysReg and returning `getNoRegister()` as a "not applicable" indication. Document a few existing methods, add information about preconditions. --- bolt/include/bolt/Core/MCPlusBuilder.h| 61 ++- bolt/lib/Passes/PAuthGadgetScanner.cpp| 64 +--- .../Target/AArch64/AArch64MCPlusBuilder.cpp | 76 --- .../AArch64/gs-pauth-debug-output.s | 3 - .../AArch64/gs-pauth-signing-oracles.s| 20 + 5 files changed, 130 insertions(+), 94 deletions(-) diff --git a/bolt/include/bolt/Core/MCPlusBuilder.h b/bolt/include/bolt/Core/MCPlusBuilder.h index 132d58f3f9f79..83ad70ea97076 100644 --- a/bolt/include/bolt/Core/MCPlusBuilder.h +++ b/bolt/include/bolt/Core/MCPlusBuilder.h @@ -562,30 +562,50 @@ class MCPlusBuilder { return {}; } - virtual ErrorOr getAuthenticatedReg(const MCInst &Inst) const { -llvm_unreachable("not implemented"); -return getNoRegister(); - } - - virtual bool isAuthenticationOfReg(const MCInst &Inst, - MCPhysReg AuthenticatedReg) const { + /// Returns the register where an authenticated pointer is written to by Inst, + /// or std::nullopt if not authenticating any register. + /// + /// Sets IsChecked if the instruction always checks authenticated pointer, + /// i.e. it either returns a successfully authenticated pointer or terminates + /// the program abnormally (such as "ldra x0, [x1]!" on AArch64, which crashes + /// on authentication failure even if FEAT_FPAC is not implemented). + virtual std::optional + getWrittenAuthenticatedReg(const MCInst &Inst, bool &IsChecked) const { llvm_unreachable("not implemented"); -return false; +return std::nullopt; } - virtual MCPhysReg getSignedReg(const MCInst &Inst) const { + /// Returns the register signed by Inst, or std::nullopt if not signing any + /// register. + /// + /// The returned register is assumed to be both input and output operand, + /// as it is done on AArch64. + virtual std::optional getSignedReg(const MCInst &Inst) const { llvm_unreachable("not implemented"); -return getNoRegister(); +return std::nullopt; } - virtual ErrorOr getRegUsedAsRetDest(const MCInst &Inst) const { + /// Returns the register used as a return address. Returns std::nullopt if + /// not applicable, such as reading the return address from a system register + /// or from the stack. + /// + /// Sets IsAuthenticatedInternally if the instruction accepts a signed + /// pointer as its operand and authenticates it internally. + /// + /// Should only be called when isReturn(Inst) is true. + virtual std::optional + getRegUsedAsRetDest(const MCInst &Inst, + bool &IsAuthenticatedInternally) const { llvm_unreachable("not implemented"); -return getNoRegister(); +return std::nullopt; } /// Returns the register used as the destination of an indirect branch or call /// instruction. Sets IsAuthenticatedInternally if the instruction accepts /// a signed pointer as its operand and authenticates it internally. + /// + /// Should only be called if isIndirectCall(Inst) or isIndirectBranch(Inst) + /// returns true. virtual MCPhysReg getRegUsedAsIndirectBranchDest(const MCInst &Inst, bool &IsAuthenticatedInternally) const { @@ -602,14 +622,14 @@ class MCPlusBuilder { ///controlled, under the Pointer Authentication threat model. /// /// If the instruction does not write to any register satisfying the above - /// two conditions, NoRegister is returned. + /// two conditions, std::nullopt is returned. /// /// The Pointer Authentication threat model assumes an attacker is able to /// modify any writable memory, but not executable code (due to W^X). - virtual MCPhysReg + virtual std::optional getMaterializedAddressRegForPtrAuth(const MCInst &Inst) const { llvm_unreachable("not implemented"); -return getNoRegister(); +return std::nullopt; } /// Analyzes if this instruction can safely perform address arithmetics @@ -622,10 +642,13 @@ class MCPlusBuilder { /// controlled, provided InReg and executa
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: refactor issue reporting (PR #135662)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/135662 >From 499d3297fb86db41061e7371d419a0c05e98302c Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Mon, 14 Apr 2025 15:08:54 +0300 Subject: [PATCH 1/3] [BOLT] Gadget scanner: refactor issue reporting Remove `getAffectedRegisters` and `setOverwritingInstrs` methods from the base `Report` class. Instead, make `Report` always represent the brief version of the report. When an issue is detected on the first run of the analysis, return an optional request for extra details to attach to the report on the second run. --- bolt/include/bolt/Passes/PAuthGadgetScanner.h | 102 ++--- bolt/lib/Passes/PAuthGadgetScanner.cpp| 200 ++ .../AArch64/gs-pauth-debug-output.s | 8 +- 3 files changed, 187 insertions(+), 123 deletions(-) diff --git a/bolt/include/bolt/Passes/PAuthGadgetScanner.h b/bolt/include/bolt/Passes/PAuthGadgetScanner.h index 4c1bef3d2265f..3b6c1f6af94a0 100644 --- a/bolt/include/bolt/Passes/PAuthGadgetScanner.h +++ b/bolt/include/bolt/Passes/PAuthGadgetScanner.h @@ -219,11 +219,6 @@ struct Report { virtual void generateReport(raw_ostream &OS, const BinaryContext &BC) const = 0; - // The two methods below are called by Analysis::computeDetailedInfo when - // iterating over the reports. - virtual ArrayRef getAffectedRegisters() const { return {}; } - virtual void setOverwritingInstrs(ArrayRef Instrs) {} - void printBasicInfo(raw_ostream &OS, const BinaryContext &BC, StringRef IssueKind) const; }; @@ -231,27 +226,11 @@ struct Report { struct GadgetReport : public Report { // The particular kind of gadget that is detected. const GadgetKind &Kind; - // The set of registers related to this gadget report (possibly empty). - SmallVector AffectedRegisters; - // The instructions that clobber the affected registers. - // There is no one-to-one correspondence with AffectedRegisters: for example, - // the same register can be overwritten by different instructions in different - // preceding basic blocks. - SmallVector OverwritingInstrs; - - GadgetReport(const GadgetKind &Kind, MCInstReference Location, - MCPhysReg AffectedRegister) - : Report(Location), Kind(Kind), AffectedRegisters({AffectedRegister}) {} - - void generateReport(raw_ostream &OS, const BinaryContext &BC) const override; - ArrayRef getAffectedRegisters() const override { -return AffectedRegisters; - } + GadgetReport(const GadgetKind &Kind, MCInstReference Location) + : Report(Location), Kind(Kind) {} - void setOverwritingInstrs(ArrayRef Instrs) override { -OverwritingInstrs.assign(Instrs.begin(), Instrs.end()); - } + void generateReport(raw_ostream &OS, const BinaryContext &BC) const override; }; /// Report with a free-form message attached. @@ -263,8 +242,75 @@ struct GenericReport : public Report { const BinaryContext &BC) const override; }; +/// An information about an issue collected on the slower, detailed, +/// run of an analysis. +class ExtraInfo { +public: + virtual void print(raw_ostream &OS, const MCInstReference Location) const = 0; + + virtual ~ExtraInfo() {} +}; + +class ClobberingInfo : public ExtraInfo { + SmallVector ClobberingInstrs; + +public: + ClobberingInfo(const ArrayRef Instrs) + : ClobberingInstrs(Instrs) {} + + void print(raw_ostream &OS, const MCInstReference Location) const override; +}; + +/// A brief version of a report that can be further augmented with the details. +/// +/// It is common for a particular type of gadget detector to be tied to some +/// specific kind of analysis. If an issue is returned by that detector, it may +/// be further augmented with the detailed info in an analysis-specific way, +/// or just be left as-is (f.e. if a free-form warning was reported). +template struct BriefReport { + BriefReport(std::shared_ptr Issue, + const std::optional RequestedDetails) + : Issue(Issue), RequestedDetails(RequestedDetails) {} + + std::shared_ptr Issue; + std::optional RequestedDetails; +}; + +/// A detailed version of a report. +struct DetailedReport { + DetailedReport(std::shared_ptr Issue, + std::shared_ptr Details) + : Issue(Issue), Details(Details) {} + + std::shared_ptr Issue; + std::shared_ptr Details; +}; + struct FunctionAnalysisResult { - std::vector> Diagnostics; + std::vector Diagnostics; +}; + +/// A helper class storing per-function context to be instantiated by Analysis. +class FunctionAnalysis { + BinaryContext &BC; + BinaryFunction &BF; + MCPlusBuilder::AllocatorIdTy AllocatorId; + FunctionAnalysisResult Result; + + bool PacRetGadgetsOnly; + + void findUnsafeUses(SmallVector> &Reports); + void augmentUnsafeUseReports(const ArrayRef> Reports); + +public: + FunctionAnalysis(BinaryFunction &BF, MCPlusBuilder::AllocatorIdTy AllocatorId, +
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: detect authentication oracles (PR #135663)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/135663 >From cdaa1549c3a4804bc5a3788f1fc0c2bfc910168e Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Sat, 5 Apr 2025 14:54:01 +0300 Subject: [PATCH 1/3] [BOLT] Gadget scanner: detect authentication oracles Implement the detection of authentication instructions whose results can be inspected by an attacker to know whether authentication succeeded. As the properties of output registers of authentication instructions are inspected, add a second set of analysis-related classes to iterate over the instructions in reverse order. --- bolt/include/bolt/Passes/PAuthGadgetScanner.h | 12 + bolt/lib/Passes/PAuthGadgetScanner.cpp| 542 + .../AArch64/gs-pauth-authentication-oracles.s | 723 ++ .../AArch64/gs-pauth-debug-output.s | 78 ++ 4 files changed, 1355 insertions(+) create mode 100644 bolt/test/binary-analysis/AArch64/gs-pauth-authentication-oracles.s diff --git a/bolt/include/bolt/Passes/PAuthGadgetScanner.h b/bolt/include/bolt/Passes/PAuthGadgetScanner.h index 9a4e9598c9f98..7c51ba42c3294 100644 --- a/bolt/include/bolt/Passes/PAuthGadgetScanner.h +++ b/bolt/include/bolt/Passes/PAuthGadgetScanner.h @@ -260,6 +260,15 @@ class ClobberingInfo : public ExtraInfo { void print(raw_ostream &OS, const MCInstReference Location) const override; }; +class LeakageInfo : public ExtraInfo { + SmallVector LeakingInstrs; + +public: + LeakageInfo(const ArrayRef Instrs) : LeakingInstrs(Instrs) {} + + void print(raw_ostream &OS, const MCInstReference Location) const override; +}; + /// A brief version of a report that can be further augmented with the details. /// /// It is common for a particular type of gadget detector to be tied to some @@ -301,6 +310,9 @@ class FunctionAnalysis { void findUnsafeUses(SmallVector> &Reports); void augmentUnsafeUseReports(ArrayRef> Reports); + void findUnsafeDefs(SmallVector> &Reports); + void augmentUnsafeDefReports(const ArrayRef> Reports); + /// Process the reports which do not have to be augmented, and remove them /// from Reports. void handleSimpleReports(SmallVector> &Reports); diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index 0bd9a1ed5941c..fff52de8b9c73 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -712,6 +712,459 @@ SrcSafetyAnalysis::create(BinaryFunction &BF, RegsToTrackInstsFor); } +/// A state representing which registers are safe to be used as the destination +/// operand of an authentication instruction. +/// +/// Similar to SrcState, it is the analysis that should take register aliasing +/// into account. +/// +/// Depending on the implementation, it may be possible that an authentication +/// instruction returns an invalid pointer on failure instead of terminating +/// the program immediately (assuming the program will crash as soon as that +/// pointer is dereferenced). To prevent brute-forcing the correct signature, +/// it should be impossible for an attacker to test if a pointer is correctly +/// signed - either the program should be terminated on authentication failure +/// or it should be impossible to tell whether authentication succeeded or not. +/// +/// For that reason, a restricted set of operations is allowed on any register +/// containing a value derived from the result of an authentication instruction +/// until that register is either wiped or checked not to contain a result of a +/// failed authentication. +/// +/// Specifically, the safety property for a register is computed by iterating +/// the instructions in backward order: the source register Xn of an instruction +/// Inst is safe if at least one of the following is true: +/// * Inst checks if Xn contains the result of a successful authentication and +/// terminates the program on failure. Note that Inst can either naturally +/// dereference Xn (load, branch, return, etc. instructions) or be the first +/// instruction of an explicit checking sequence. +/// * Inst performs safe address arithmetic AND both source and result +/// registers, as well as any temporary registers, must be safe after +/// execution of Inst (temporaries are not used on AArch64 and thus not +/// currently supported/allowed). +/// See MCPlusBuilder::analyzeAddressArithmeticsForPtrAuth for the details. +/// * Inst fully overwrites Xn with an unrelated value. +struct DstState { + /// The set of registers whose values cannot be inspected by an attacker in + /// a way usable as an authentication oracle. The results of authentication + /// instructions should be written to such registers. + BitVector CannotEscapeUnchecked; + + std::vector> FirstInstLeakingReg; + + /// Construct an empty state. + DstState() {} + + DstState(unsigned NumRegs, unsigned NumRegsToTrack) + : CannotE
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: improve handling of unreachable basic blocks (PR #136183)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/136183 >From b03ffd01b9d634f604f39ab5225452e97b45479b Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Thu, 17 Apr 2025 20:51:16 +0300 Subject: [PATCH] [BOLT] Gadget scanner: improve handling of unreachable basic blocks Instead of refusing to analyze an instruction completely, when it is unreachable according to the CFG reconstructed by BOLT, pessimistically assume all registers to be unsafe at the start of basic blocks without any predecessors. Nevertheless, unreachable basic blocks found in optimized code likely means imprecise CFG reconstruction, thus report a warning once per basic block without predecessors. --- bolt/lib/Passes/PAuthGadgetScanner.cpp| 46 ++- .../AArch64/gs-pacret-autiasp.s | 7 ++- .../binary-analysis/AArch64/gs-pauth-calls.s | 57 +++ 3 files changed, 95 insertions(+), 15 deletions(-) diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index f7ac0b67d00da..0ce9f51c44af4 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -344,6 +344,12 @@ class SrcSafetyAnalysis { return S; } + /// Creates a state with all registers marked unsafe (not to be confused + /// with empty state). + SrcState createUnsafeState() const { +return SrcState(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters()); + } + BitVector getClobberedRegs(const MCInst &Point) const { BitVector Clobbered(NumRegs); // Assume a call can clobber all registers, including callee-saved @@ -581,6 +587,13 @@ class DataflowSrcSafetyAnalysis if (BB.isEntryPoint()) return createEntryState(); +// If a basic block without any predecessors is found in an optimized code, +// this likely means that some CFG edges were not detected. Pessimistically +// assume all registers to be unsafe before this basic block and warn about +// this fact in FunctionAnalysis::findUnsafeUses(). +if (BB.pred_empty()) + return createUnsafeState(); + return SrcState(); } @@ -654,12 +667,6 @@ class CFGUnawareSrcSafetyAnalysis : public SrcSafetyAnalysis { BC.MIB->removeAnnotation(I.second, StateAnnotationIndex); } - /// Creates a state with all registers marked unsafe (not to be confused - /// with empty state). - SrcState createUnsafeState() const { -return SrcState(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters()); - } - public: CFGUnawareSrcSafetyAnalysis(BinaryFunction &BF, MCPlusBuilder::AllocatorIdTy AllocId, @@ -1328,19 +1335,30 @@ void FunctionAnalysis::findUnsafeUses( BF.dump(); }); + if (BF.hasCFG()) { +// Warn on basic blocks being unreachable according to BOLT, as this +// likely means CFG is imprecise. +for (BinaryBasicBlock &BB : BF) { + if (!BB.pred_empty() || BB.isEntryPoint()) +continue; + // Arbitrarily attach the report to the first instruction of BB. + MCInst *InstToReport = BB.getFirstNonPseudoInstr(); + if (!InstToReport) +continue; // BB has no real instructions + + Reports.push_back( + make_generic_report(MCInstReference::get(InstToReport, BF), + "Warning: no predecessor basic blocks detected " + "(possibly incomplete CFG)")); +} + } + iterateOverInstrs(BF, [&](MCInstReference Inst) { if (BC.MIB->isCFI(Inst)) return; const SrcState &S = Analysis->getStateBefore(Inst); - -// If non-empty state was never propagated from the entry basic block -// to Inst, assume it to be unreachable and report a warning. -if (S.empty()) { - Reports.push_back( - make_generic_report(Inst, "Warning: unreachable instruction found")); - return; -} +assert(!S.empty() && "Instruction has no associated state"); if (auto Report = shouldReportReturnGadget(BC, Inst, S)) Reports.push_back(*Report); diff --git a/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s b/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s index 284f0bea607a5..6559ba336e8de 100644 --- a/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s +++ b/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s @@ -215,12 +215,17 @@ f_callclobbered_calleesaved: .globl f_unreachable_instruction .type f_unreachable_instruction,@function f_unreachable_instruction: -// CHECK-LABEL: GS-PAUTH: Warning: unreachable instruction found in function f_unreachable_instruction, basic block {{[0-9a-zA-Z.]+}}, at address +// CHECK-LABEL: GS-PAUTH: Warning: no predecessor basic blocks detected (possibly incomplete CFG) in function f_unreachable_instruction, basic block {{[0-9a-zA-Z.]+}}, at address // CHECK-NEXT:The instruction is {{[0-9a-f]+}}: add x0, x1, x2 // CHECK-NOT: instructions that wr
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: improve handling of unreachable basic blocks (PR #136183)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/136183 >From b03ffd01b9d634f604f39ab5225452e97b45479b Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Thu, 17 Apr 2025 20:51:16 +0300 Subject: [PATCH] [BOLT] Gadget scanner: improve handling of unreachable basic blocks Instead of refusing to analyze an instruction completely, when it is unreachable according to the CFG reconstructed by BOLT, pessimistically assume all registers to be unsafe at the start of basic blocks without any predecessors. Nevertheless, unreachable basic blocks found in optimized code likely means imprecise CFG reconstruction, thus report a warning once per basic block without predecessors. --- bolt/lib/Passes/PAuthGadgetScanner.cpp| 46 ++- .../AArch64/gs-pacret-autiasp.s | 7 ++- .../binary-analysis/AArch64/gs-pauth-calls.s | 57 +++ 3 files changed, 95 insertions(+), 15 deletions(-) diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index f7ac0b67d00da..0ce9f51c44af4 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -344,6 +344,12 @@ class SrcSafetyAnalysis { return S; } + /// Creates a state with all registers marked unsafe (not to be confused + /// with empty state). + SrcState createUnsafeState() const { +return SrcState(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters()); + } + BitVector getClobberedRegs(const MCInst &Point) const { BitVector Clobbered(NumRegs); // Assume a call can clobber all registers, including callee-saved @@ -581,6 +587,13 @@ class DataflowSrcSafetyAnalysis if (BB.isEntryPoint()) return createEntryState(); +// If a basic block without any predecessors is found in an optimized code, +// this likely means that some CFG edges were not detected. Pessimistically +// assume all registers to be unsafe before this basic block and warn about +// this fact in FunctionAnalysis::findUnsafeUses(). +if (BB.pred_empty()) + return createUnsafeState(); + return SrcState(); } @@ -654,12 +667,6 @@ class CFGUnawareSrcSafetyAnalysis : public SrcSafetyAnalysis { BC.MIB->removeAnnotation(I.second, StateAnnotationIndex); } - /// Creates a state with all registers marked unsafe (not to be confused - /// with empty state). - SrcState createUnsafeState() const { -return SrcState(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters()); - } - public: CFGUnawareSrcSafetyAnalysis(BinaryFunction &BF, MCPlusBuilder::AllocatorIdTy AllocId, @@ -1328,19 +1335,30 @@ void FunctionAnalysis::findUnsafeUses( BF.dump(); }); + if (BF.hasCFG()) { +// Warn on basic blocks being unreachable according to BOLT, as this +// likely means CFG is imprecise. +for (BinaryBasicBlock &BB : BF) { + if (!BB.pred_empty() || BB.isEntryPoint()) +continue; + // Arbitrarily attach the report to the first instruction of BB. + MCInst *InstToReport = BB.getFirstNonPseudoInstr(); + if (!InstToReport) +continue; // BB has no real instructions + + Reports.push_back( + make_generic_report(MCInstReference::get(InstToReport, BF), + "Warning: no predecessor basic blocks detected " + "(possibly incomplete CFG)")); +} + } + iterateOverInstrs(BF, [&](MCInstReference Inst) { if (BC.MIB->isCFI(Inst)) return; const SrcState &S = Analysis->getStateBefore(Inst); - -// If non-empty state was never propagated from the entry basic block -// to Inst, assume it to be unreachable and report a warning. -if (S.empty()) { - Reports.push_back( - make_generic_report(Inst, "Warning: unreachable instruction found")); - return; -} +assert(!S.empty() && "Instruction has no associated state"); if (auto Report = shouldReportReturnGadget(BC, Inst, S)) Reports.push_back(*Report); diff --git a/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s b/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s index 284f0bea607a5..6559ba336e8de 100644 --- a/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s +++ b/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s @@ -215,12 +215,17 @@ f_callclobbered_calleesaved: .globl f_unreachable_instruction .type f_unreachable_instruction,@function f_unreachable_instruction: -// CHECK-LABEL: GS-PAUTH: Warning: unreachable instruction found in function f_unreachable_instruction, basic block {{[0-9a-zA-Z.]+}}, at address +// CHECK-LABEL: GS-PAUTH: Warning: no predecessor basic blocks detected (possibly incomplete CFG) in function f_unreachable_instruction, basic block {{[0-9a-zA-Z.]+}}, at address // CHECK-NEXT:The instruction is {{[0-9a-f]+}}: add x0, x1, x2 // CHECK-NOT: instructions that wr
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: use more appropriate types (NFC) (PR #135661)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/135661 >From fa2f2aeb7624d57b25915aeb23d62a9df3a11044 Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Mon, 14 Apr 2025 14:35:56 +0300 Subject: [PATCH 1/2] [BOLT] Gadget scanner: use more appropriate types (NFC) * use more flexible `const ArrayRef` and `StringRef` types instead of `const std::vector &` and `const std::string &`, correspondingly, for function arguments * return plain `const SrcState &` instead of `ErrorOr` from `SrcSafetyAnalysis::getStateBefore`, as absent state is not handled gracefully by any caller --- bolt/include/bolt/Passes/PAuthGadgetScanner.h | 8 +--- bolt/lib/Passes/PAuthGadgetScanner.cpp| 39 --- 2 files changed, 19 insertions(+), 28 deletions(-) diff --git a/bolt/include/bolt/Passes/PAuthGadgetScanner.h b/bolt/include/bolt/Passes/PAuthGadgetScanner.h index 6765e2aff414f..3e39b64e59e0f 100644 --- a/bolt/include/bolt/Passes/PAuthGadgetScanner.h +++ b/bolt/include/bolt/Passes/PAuthGadgetScanner.h @@ -12,7 +12,6 @@ #include "bolt/Core/BinaryContext.h" #include "bolt/Core/BinaryFunction.h" #include "bolt/Passes/BinaryPasses.h" -#include "llvm/ADT/SmallSet.h" #include "llvm/Support/raw_ostream.h" #include @@ -199,9 +198,6 @@ raw_ostream &operator<<(raw_ostream &OS, const MCInstReference &); namespace PAuthGadgetScanner { -class SrcSafetyAnalysis; -struct SrcState; - /// Description of a gadget kind that can be detected. Intended to be /// statically allocated to be attached to reports by reference. class GadgetKind { @@ -210,7 +206,7 @@ class GadgetKind { public: GadgetKind(const char *Description) : Description(Description) {} - const StringRef getDescription() const { return Description; } + StringRef getDescription() const { return Description; } }; /// Base report located at some instruction, without any additional information. @@ -261,7 +257,7 @@ struct GadgetReport : public Report { /// Report with a free-form message attached. struct GenericReport : public Report { std::string Text; - GenericReport(MCInstReference Location, const std::string &Text) + GenericReport(MCInstReference Location, StringRef Text) : Report(Location), Text(Text) {} virtual void generateReport(raw_ostream &OS, const BinaryContext &BC) const override; diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index 4f601558dec4e..339673b600765 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -91,14 +91,14 @@ class TrackedRegisters { const std::vector Registers; std::vector RegToIndexMapping; - static size_t getMappingSize(const std::vector &RegsToTrack) { + static size_t getMappingSize(const ArrayRef RegsToTrack) { if (RegsToTrack.empty()) return 0; return 1 + *llvm::max_element(RegsToTrack); } public: - TrackedRegisters(const std::vector &RegsToTrack) + TrackedRegisters(const ArrayRef RegsToTrack) : Registers(RegsToTrack), RegToIndexMapping(getMappingSize(RegsToTrack), NoIndex) { for (unsigned I = 0; I < RegsToTrack.size(); ++I) @@ -234,7 +234,7 @@ struct SrcState { static void printLastInsts( raw_ostream &OS, -const std::vector> &LastInstWritingReg) { +const ArrayRef> LastInstWritingReg) { OS << "Insts: "; for (unsigned I = 0; I < LastInstWritingReg.size(); ++I) { auto &Set = LastInstWritingReg[I]; @@ -295,7 +295,7 @@ void SrcStatePrinter::print(raw_ostream &OS, const SrcState &S) const { class SrcSafetyAnalysis { public: SrcSafetyAnalysis(BinaryFunction &BF, -const std::vector &RegsToTrackInstsFor) +const ArrayRef RegsToTrackInstsFor) : BC(BF.getBinaryContext()), NumRegs(BC.MRI->getNumRegs()), RegsToTrackInstsFor(RegsToTrackInstsFor) {} @@ -303,11 +303,10 @@ class SrcSafetyAnalysis { static std::shared_ptr create(BinaryFunction &BF, MCPlusBuilder::AllocatorIdTy AllocId, - const std::vector &RegsToTrackInstsFor); + const ArrayRef RegsToTrackInstsFor); virtual void run() = 0; - virtual ErrorOr - getStateBefore(const MCInst &Inst) const = 0; + virtual const SrcState &getStateBefore(const MCInst &Inst) const = 0; protected: BinaryContext &BC; @@ -347,7 +346,7 @@ class SrcSafetyAnalysis { } BitVector getClobberedRegs(const MCInst &Point) const { -BitVector Clobbered(NumRegs, false); +BitVector Clobbered(NumRegs); // Assume a call can clobber all registers, including callee-saved // registers. There's a good chance that callee-saved registers will be // saved on the stack at some point during execution of the callee. @@ -408,8 +407,7 @@ class SrcSafetyAnalysis { // FirstCheckerInst should belong to the same basic block, meaning // it was deterministically processed a few steps before this ins
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: use more appropriate types (NFC) (PR #135661)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/135661 >From fa2f2aeb7624d57b25915aeb23d62a9df3a11044 Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Mon, 14 Apr 2025 14:35:56 +0300 Subject: [PATCH 1/2] [BOLT] Gadget scanner: use more appropriate types (NFC) * use more flexible `const ArrayRef` and `StringRef` types instead of `const std::vector &` and `const std::string &`, correspondingly, for function arguments * return plain `const SrcState &` instead of `ErrorOr` from `SrcSafetyAnalysis::getStateBefore`, as absent state is not handled gracefully by any caller --- bolt/include/bolt/Passes/PAuthGadgetScanner.h | 8 +--- bolt/lib/Passes/PAuthGadgetScanner.cpp| 39 --- 2 files changed, 19 insertions(+), 28 deletions(-) diff --git a/bolt/include/bolt/Passes/PAuthGadgetScanner.h b/bolt/include/bolt/Passes/PAuthGadgetScanner.h index 6765e2aff414f..3e39b64e59e0f 100644 --- a/bolt/include/bolt/Passes/PAuthGadgetScanner.h +++ b/bolt/include/bolt/Passes/PAuthGadgetScanner.h @@ -12,7 +12,6 @@ #include "bolt/Core/BinaryContext.h" #include "bolt/Core/BinaryFunction.h" #include "bolt/Passes/BinaryPasses.h" -#include "llvm/ADT/SmallSet.h" #include "llvm/Support/raw_ostream.h" #include @@ -199,9 +198,6 @@ raw_ostream &operator<<(raw_ostream &OS, const MCInstReference &); namespace PAuthGadgetScanner { -class SrcSafetyAnalysis; -struct SrcState; - /// Description of a gadget kind that can be detected. Intended to be /// statically allocated to be attached to reports by reference. class GadgetKind { @@ -210,7 +206,7 @@ class GadgetKind { public: GadgetKind(const char *Description) : Description(Description) {} - const StringRef getDescription() const { return Description; } + StringRef getDescription() const { return Description; } }; /// Base report located at some instruction, without any additional information. @@ -261,7 +257,7 @@ struct GadgetReport : public Report { /// Report with a free-form message attached. struct GenericReport : public Report { std::string Text; - GenericReport(MCInstReference Location, const std::string &Text) + GenericReport(MCInstReference Location, StringRef Text) : Report(Location), Text(Text) {} virtual void generateReport(raw_ostream &OS, const BinaryContext &BC) const override; diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index 4f601558dec4e..339673b600765 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -91,14 +91,14 @@ class TrackedRegisters { const std::vector Registers; std::vector RegToIndexMapping; - static size_t getMappingSize(const std::vector &RegsToTrack) { + static size_t getMappingSize(const ArrayRef RegsToTrack) { if (RegsToTrack.empty()) return 0; return 1 + *llvm::max_element(RegsToTrack); } public: - TrackedRegisters(const std::vector &RegsToTrack) + TrackedRegisters(const ArrayRef RegsToTrack) : Registers(RegsToTrack), RegToIndexMapping(getMappingSize(RegsToTrack), NoIndex) { for (unsigned I = 0; I < RegsToTrack.size(); ++I) @@ -234,7 +234,7 @@ struct SrcState { static void printLastInsts( raw_ostream &OS, -const std::vector> &LastInstWritingReg) { +const ArrayRef> LastInstWritingReg) { OS << "Insts: "; for (unsigned I = 0; I < LastInstWritingReg.size(); ++I) { auto &Set = LastInstWritingReg[I]; @@ -295,7 +295,7 @@ void SrcStatePrinter::print(raw_ostream &OS, const SrcState &S) const { class SrcSafetyAnalysis { public: SrcSafetyAnalysis(BinaryFunction &BF, -const std::vector &RegsToTrackInstsFor) +const ArrayRef RegsToTrackInstsFor) : BC(BF.getBinaryContext()), NumRegs(BC.MRI->getNumRegs()), RegsToTrackInstsFor(RegsToTrackInstsFor) {} @@ -303,11 +303,10 @@ class SrcSafetyAnalysis { static std::shared_ptr create(BinaryFunction &BF, MCPlusBuilder::AllocatorIdTy AllocId, - const std::vector &RegsToTrackInstsFor); + const ArrayRef RegsToTrackInstsFor); virtual void run() = 0; - virtual ErrorOr - getStateBefore(const MCInst &Inst) const = 0; + virtual const SrcState &getStateBefore(const MCInst &Inst) const = 0; protected: BinaryContext &BC; @@ -347,7 +346,7 @@ class SrcSafetyAnalysis { } BitVector getClobberedRegs(const MCInst &Point) const { -BitVector Clobbered(NumRegs, false); +BitVector Clobbered(NumRegs); // Assume a call can clobber all registers, including callee-saved // registers. There's a good chance that callee-saved registers will be // saved on the stack at some point during execution of the callee. @@ -408,8 +407,7 @@ class SrcSafetyAnalysis { // FirstCheckerInst should belong to the same basic block, meaning // it was deterministically processed a few steps before this ins
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: do not crash on debug-printing CFI instructions (PR #136151)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/136151 >From e22ae5ebfe066cec1335f2b4080b013560c5e844 Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Tue, 15 Apr 2025 21:47:18 +0300 Subject: [PATCH] [BOLT] Gadget scanner: do not crash on debug-printing CFI instructions Some instruction-printing code used under LLVM_DEBUG does not handle CFI instructions well. While CFI instructions seem to be harmless for the correctness of the analysis results, they do not convey any useful information to the analysis either, so skip them early. --- bolt/lib/Passes/PAuthGadgetScanner.cpp| 16 ++ .../AArch64/gs-pauth-debug-output.s | 32 +++ 2 files changed, 48 insertions(+) diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index 849272cac73d2..f7ac0b67d00da 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -431,6 +431,9 @@ class SrcSafetyAnalysis { } SrcState computeNext(const MCInst &Point, const SrcState &Cur) { +if (BC.MIB->isCFI(Point)) + return Cur; + SrcStatePrinter P(BC); LLVM_DEBUG({ dbgs() << " SrcSafetyAnalysis::ComputeNext("; @@ -670,6 +673,8 @@ class CFGUnawareSrcSafetyAnalysis : public SrcSafetyAnalysis { SrcState S = createEntryState(); for (auto &I : BF.instrs()) { MCInst &Inst = I.second; + if (BC.MIB->isCFI(Inst)) +continue; // If there is a label before this instruction, it is possible that it // can be jumped-to, thus conservatively resetting S. As an exception, @@ -947,6 +952,9 @@ class DstSafetyAnalysis { } DstState computeNext(const MCInst &Point, const DstState &Cur) { +if (BC.MIB->isCFI(Point)) + return Cur; + DstStatePrinter P(BC); LLVM_DEBUG({ dbgs() << " DstSafetyAnalysis::ComputeNext("; @@ -1123,6 +1131,8 @@ class CFGUnawareDstSafetyAnalysis : public DstSafetyAnalysis { DstState S = createUnsafeState(); for (auto &I : llvm::reverse(BF.instrs())) { MCInst &Inst = I.second; + if (BC.MIB->isCFI(Inst)) +continue; // If Inst can change the control flow, we cannot be sure that the next // instruction (to be executed in analyzed program) is the one processed @@ -1319,6 +1329,9 @@ void FunctionAnalysis::findUnsafeUses( }); iterateOverInstrs(BF, [&](MCInstReference Inst) { +if (BC.MIB->isCFI(Inst)) + return; + const SrcState &S = Analysis->getStateBefore(Inst); // If non-empty state was never propagated from the entry basic block @@ -1382,6 +1395,9 @@ void FunctionAnalysis::findUnsafeDefs( }); iterateOverInstrs(BF, [&](MCInstReference Inst) { +if (BC.MIB->isCFI(Inst)) + return; + const DstState &S = Analysis->getStateAfter(Inst); if (auto Report = shouldReportAuthOracle(BC, Inst, S)) diff --git a/bolt/test/binary-analysis/AArch64/gs-pauth-debug-output.s b/bolt/test/binary-analysis/AArch64/gs-pauth-debug-output.s index fd55880921d06..07b61bea77e94 100644 --- a/bolt/test/binary-analysis/AArch64/gs-pauth-debug-output.s +++ b/bolt/test/binary-analysis/AArch64/gs-pauth-debug-output.s @@ -329,6 +329,38 @@ auth_oracle: // PAUTH-EMPTY: // PAUTH-NEXT: Attaching leakage info to: : autia x0, x1 # DataflowDstSafetyAnalysis: dst-state +// Gadget scanner should not crash on CFI instructions, including when debug-printing them. +// Note that the particular debug output is not checked, but BOLT should be +// compiled with assertions enabled to support -debug-only argument. + +.globl cfi_inst_df +.type cfi_inst_df,@function +cfi_inst_df: +.cfi_startproc +sub sp, sp, #16 +.cfi_def_cfa_offset 16 +add sp, sp, #16 +.cfi_def_cfa_offset 0 +ret +.size cfi_inst_df, .-cfi_inst_df +.cfi_endproc + +.globl cfi_inst_nocfg +.type cfi_inst_nocfg,@function +cfi_inst_nocfg: +.cfi_startproc +sub sp, sp, #16 +.cfi_def_cfa_offset 16 + +adr x0, 1f +br x0 +1: +add sp, sp, #16 +.cfi_def_cfa_offset 0 +ret +.size cfi_inst_nocfg, .-cfi_inst_nocfg +.cfi_endproc + // CHECK-LABEL:Analyzing function main, AllocatorId = 1 .globl main .type main,@function ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: detect untrusted LR before tail call (PR #137224)
llvmbot wrote: @llvm/pr-subscribers-bolt Author: Anatoly Trosinenko (atrosinenko) Changes Implement the detection of tail calls performed with untrusted link register, which violates the assumption made on entry to every function. Unlike other pauth gadgets, detection of this one involves some amount of guessing which branch instructions should be checked as tail calls. --- Patch is 41.46 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/137224.diff 4 Files Affected: - (modified) bolt/lib/Passes/PAuthGadgetScanner.cpp (+109-3) - (modified) bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s (+22-9) - (modified) bolt/test/binary-analysis/AArch64/gs-pauth-debug-output.s (+2-28) - (added) bolt/test/binary-analysis/AArch64/gs-pauth-tail-calls.s (+597) ``diff diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index 0ce9f51c44af4..d67f10a311396 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -655,8 +655,9 @@ class DataflowSrcSafetyAnalysis // // Then, a function can be split into a number of disjoint contiguous sequences // of instructions without labels in between. These sequences can be processed -// the same way basic blocks are processed by data-flow analysis, assuming -// pessimistically that all registers are unsafe at the start of each sequence. +// the same way basic blocks are processed by data-flow analysis, with the same +// pessimistic estimation of the initial state at the start of each sequence +// (except the first instruction of the function). class CFGUnawareSrcSafetyAnalysis : public SrcSafetyAnalysis { BinaryFunction &BF; MCPlusBuilder::AllocatorIdTy AllocId; @@ -667,6 +668,30 @@ class CFGUnawareSrcSafetyAnalysis : public SrcSafetyAnalysis { BC.MIB->removeAnnotation(I.second, StateAnnotationIndex); } + /// Compute a reasonably pessimistic estimation of the register state when + /// the previous instruction is not known for sure. Take the set of registers + /// which are trusted at function entry and remove all registers that can be + /// clobbered inside this function. + SrcState computePessimisticState(BinaryFunction &BF) { +BitVector ClobberedRegs(NumRegs); +for (auto &I : BF.instrs()) { + MCInst &Inst = I.second; + BC.MIB->getClobberedRegs(Inst, ClobberedRegs); + + // If this is a call instruction, no register is safe anymore, unless + // it is a tail call. Ignore tail calls for the purpose of estimating the + // worst-case scenario, assuming no instructions are executed in the + // caller after this point anyway. + if (BC.MIB->isCall(Inst) && !BC.MIB->isTailCall(Inst)) +ClobberedRegs.set(); +} + +SrcState S = createEntryState(); +S.SafeToDerefRegs.reset(ClobberedRegs); +S.TrustedRegs.reset(ClobberedRegs); +return S; + } + public: CFGUnawareSrcSafetyAnalysis(BinaryFunction &BF, MCPlusBuilder::AllocatorIdTy AllocId, @@ -677,6 +702,7 @@ class CFGUnawareSrcSafetyAnalysis : public SrcSafetyAnalysis { } void run() override { +const SrcState DefaultState = computePessimisticState(BF); SrcState S = createEntryState(); for (auto &I : BF.instrs()) { MCInst &Inst = I.second; @@ -691,7 +717,7 @@ class CFGUnawareSrcSafetyAnalysis : public SrcSafetyAnalysis { LLVM_DEBUG({ traceInst(BC, "Due to label, resetting the state before", Inst); }); -S = createUnsafeState(); +S = DefaultState; } // Check if we need to remove an old annotation (this is the case if @@ -1226,6 +1252,83 @@ shouldReportReturnGadget(const BinaryContext &BC, const MCInstReference &Inst, return make_report(RetKind, Inst, *RetReg); } +/// While BOLT already marks some of the branch instructions as tail calls, +/// this function tries to improve the coverage by including less obvious cases +/// when it is possible to do without introducing too many false positives. +static bool shouldAnalyzeTailCallInst(const BinaryContext &BC, + const BinaryFunction &BF, + const MCInstReference &Inst) { + // Some BC.MIB->isXYZ(Inst) methods simply delegate to MCInstrDesc::isXYZ() + // (such as isBranch at the time of writing this comment), some don't (such + // as isCall). For that reason, call MCInstrDesc's methods explicitly when + // it is important. + const MCInstrDesc &Desc = + BC.MII->get(static_cast(Inst).getOpcode()); + // Tail call should be a branch (but not necessarily an indirect one). + if (!Desc.isBranch()) +return false; + + // Always analyze the branches already marked as tail calls by BOLT. + if (BC.MIB->isTailCall(Inst)) +return true; + + // Try to also check the branches marked as "UNKNOWN CONTROL FLOW" - the + // below is a simplified condition
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: detect untrusted LR before tail call (PR #137224)
https://github.com/atrosinenko edited https://github.com/llvm/llvm-project/pull/137224 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: refactor issue reporting (PR #135662)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/135662 >From 499d3297fb86db41061e7371d419a0c05e98302c Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Mon, 14 Apr 2025 15:08:54 +0300 Subject: [PATCH 1/3] [BOLT] Gadget scanner: refactor issue reporting Remove `getAffectedRegisters` and `setOverwritingInstrs` methods from the base `Report` class. Instead, make `Report` always represent the brief version of the report. When an issue is detected on the first run of the analysis, return an optional request for extra details to attach to the report on the second run. --- bolt/include/bolt/Passes/PAuthGadgetScanner.h | 102 ++--- bolt/lib/Passes/PAuthGadgetScanner.cpp| 200 ++ .../AArch64/gs-pauth-debug-output.s | 8 +- 3 files changed, 187 insertions(+), 123 deletions(-) diff --git a/bolt/include/bolt/Passes/PAuthGadgetScanner.h b/bolt/include/bolt/Passes/PAuthGadgetScanner.h index 4c1bef3d2265f..3b6c1f6af94a0 100644 --- a/bolt/include/bolt/Passes/PAuthGadgetScanner.h +++ b/bolt/include/bolt/Passes/PAuthGadgetScanner.h @@ -219,11 +219,6 @@ struct Report { virtual void generateReport(raw_ostream &OS, const BinaryContext &BC) const = 0; - // The two methods below are called by Analysis::computeDetailedInfo when - // iterating over the reports. - virtual ArrayRef getAffectedRegisters() const { return {}; } - virtual void setOverwritingInstrs(ArrayRef Instrs) {} - void printBasicInfo(raw_ostream &OS, const BinaryContext &BC, StringRef IssueKind) const; }; @@ -231,27 +226,11 @@ struct Report { struct GadgetReport : public Report { // The particular kind of gadget that is detected. const GadgetKind &Kind; - // The set of registers related to this gadget report (possibly empty). - SmallVector AffectedRegisters; - // The instructions that clobber the affected registers. - // There is no one-to-one correspondence with AffectedRegisters: for example, - // the same register can be overwritten by different instructions in different - // preceding basic blocks. - SmallVector OverwritingInstrs; - - GadgetReport(const GadgetKind &Kind, MCInstReference Location, - MCPhysReg AffectedRegister) - : Report(Location), Kind(Kind), AffectedRegisters({AffectedRegister}) {} - - void generateReport(raw_ostream &OS, const BinaryContext &BC) const override; - ArrayRef getAffectedRegisters() const override { -return AffectedRegisters; - } + GadgetReport(const GadgetKind &Kind, MCInstReference Location) + : Report(Location), Kind(Kind) {} - void setOverwritingInstrs(ArrayRef Instrs) override { -OverwritingInstrs.assign(Instrs.begin(), Instrs.end()); - } + void generateReport(raw_ostream &OS, const BinaryContext &BC) const override; }; /// Report with a free-form message attached. @@ -263,8 +242,75 @@ struct GenericReport : public Report { const BinaryContext &BC) const override; }; +/// An information about an issue collected on the slower, detailed, +/// run of an analysis. +class ExtraInfo { +public: + virtual void print(raw_ostream &OS, const MCInstReference Location) const = 0; + + virtual ~ExtraInfo() {} +}; + +class ClobberingInfo : public ExtraInfo { + SmallVector ClobberingInstrs; + +public: + ClobberingInfo(const ArrayRef Instrs) + : ClobberingInstrs(Instrs) {} + + void print(raw_ostream &OS, const MCInstReference Location) const override; +}; + +/// A brief version of a report that can be further augmented with the details. +/// +/// It is common for a particular type of gadget detector to be tied to some +/// specific kind of analysis. If an issue is returned by that detector, it may +/// be further augmented with the detailed info in an analysis-specific way, +/// or just be left as-is (f.e. if a free-form warning was reported). +template struct BriefReport { + BriefReport(std::shared_ptr Issue, + const std::optional RequestedDetails) + : Issue(Issue), RequestedDetails(RequestedDetails) {} + + std::shared_ptr Issue; + std::optional RequestedDetails; +}; + +/// A detailed version of a report. +struct DetailedReport { + DetailedReport(std::shared_ptr Issue, + std::shared_ptr Details) + : Issue(Issue), Details(Details) {} + + std::shared_ptr Issue; + std::shared_ptr Details; +}; + struct FunctionAnalysisResult { - std::vector> Diagnostics; + std::vector Diagnostics; +}; + +/// A helper class storing per-function context to be instantiated by Analysis. +class FunctionAnalysis { + BinaryContext &BC; + BinaryFunction &BF; + MCPlusBuilder::AllocatorIdTy AllocatorId; + FunctionAnalysisResult Result; + + bool PacRetGadgetsOnly; + + void findUnsafeUses(SmallVector> &Reports); + void augmentUnsafeUseReports(const ArrayRef> Reports); + +public: + FunctionAnalysis(BinaryFunction &BF, MCPlusBuilder::AllocatorIdTy AllocatorId, +
[llvm-branch-commits] [llvm] [mlir] [mlir][OpenMP] Convert omp.cancellation_point to LLVMIR (PR #137205)
https://github.com/tblah created https://github.com/llvm/llvm-project/pull/137205 This is basically identical to cancel except without the if clause. taskgroup will be implemented in a followup PR. >From 88fc39d0b2a3a846006889d4320b9f29ced025b7 Mon Sep 17 00:00:00 2001 From: Tom Eccles Date: Tue, 15 Apr 2025 15:40:39 + Subject: [PATCH] [mlir][OpenMP] Convert omp.cancellation_point to LLVMIR This is basically identical to cancel except without the if clause. taskgroup will be implemented in a followup PR. --- .../llvm/Frontend/OpenMP/OMPIRBuilder.h | 10 + llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp | 51 + .../OpenMP/OpenMPToLLVMIRTranslation.cpp | 37 +++- .../LLVMIR/openmp-cancellation-point.mlir | 188 ++ mlir/test/Target/LLVMIR/openmp-todo.mlir | 16 +- 5 files changed, 293 insertions(+), 9 deletions(-) create mode 100644 mlir/test/Target/LLVMIR/openmp-cancellation-point.mlir diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h index 10d69e561a987..14ad8629537f7 100644 --- a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h +++ b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h @@ -686,6 +686,16 @@ class OpenMPIRBuilder { Value *IfCondition, omp::Directive CanceledDirective); + /// Generator for '#omp cancellation point' + /// + /// \param Loc The location where the directive was encountered. + /// \param CanceledDirective The kind of directive that is cancled. + /// + /// \returns The insertion point after the barrier. + InsertPointOrErrorTy + createCancellationPoint(const LocationDescription &Loc, + omp::Directive CanceledDirective); + /// Generator for '#omp parallel' /// /// \param Loc The insert and source location description. diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp index 3f19088e6c73d..06aa61adcd739 100644 --- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp +++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp @@ -1118,6 +1118,57 @@ OpenMPIRBuilder::createCancel(const LocationDescription &Loc, return Builder.saveIP(); } +OpenMPIRBuilder::InsertPointOrErrorTy +OpenMPIRBuilder::createCancellationPoint(const LocationDescription &Loc, + omp::Directive CanceledDirective) { + if (!updateToLocation(Loc)) +return Loc.IP; + + // LLVM utilities like blocks with terminators. + auto *UI = Builder.CreateUnreachable(); + Builder.SetInsertPoint(UI); + + Value *CancelKind = nullptr; + switch (CanceledDirective) { +#define OMP_CANCEL_KIND(Enum, Str, DirectiveEnum, Value) \ + case DirectiveEnum: \ +CancelKind = Builder.getInt32(Value); \ +break; +#include "llvm/Frontend/OpenMP/OMPKinds.def" + default: +llvm_unreachable("Unknown cancel kind!"); + } + + uint32_t SrcLocStrSize; + Constant *SrcLocStr = getOrCreateSrcLocStr(Loc, SrcLocStrSize); + Value *Ident = getOrCreateIdent(SrcLocStr, SrcLocStrSize); + Value *Args[] = {Ident, getOrCreateThreadID(Ident), CancelKind}; + Value *Result = Builder.CreateCall( + getOrCreateRuntimeFunctionPtr(OMPRTL___kmpc_cancellationpoint), Args); + auto ExitCB = [this, CanceledDirective, Loc](InsertPointTy IP) -> Error { +if (CanceledDirective == OMPD_parallel) { + IRBuilder<>::InsertPointGuard IPG(Builder); + Builder.restoreIP(IP); + return createBarrier(LocationDescription(Builder.saveIP(), Loc.DL), + omp::Directive::OMPD_unknown, + /* ForceSimpleCall */ false, + /* CheckCancelFlag */ false) + .takeError(); +} +return Error::success(); + }; + + // The actual cancel logic is shared with others, e.g., cancel_barriers. + if (Error Err = emitCancelationCheckImpl(Result, CanceledDirective, ExitCB)) +return Err; + + // Update the insertion point and remove the terminator we introduced. + Builder.SetInsertPoint(UI->getParent()); + UI->eraseFromParent(); + + return Builder.saveIP(); +} + OpenMPIRBuilder::InsertPointTy OpenMPIRBuilder::emitTargetKernel( const LocationDescription &Loc, InsertPointTy AllocaIP, Value *&Return, Value *Ident, Value *DeviceID, Value *NumTeams, Value *NumThreads, diff --git a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp index 7d8a7ccb6e4ac..afae41f001736 100644 --- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp +++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp @@ -255,6 +255,9 @@ static LogicalResult checkImplementationStatus(Operation &op) { LogicalResult result = success(); llvm::TypeSwitch(op) .
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: detect untrusted LR before tail call (PR #137224)
https://github.com/atrosinenko ready_for_review https://github.com/llvm/llvm-project/pull/137224 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [DirectX] Adding support for Root Descriptor in Obj2yaml/Yaml2Obj (PR #136732)
https://github.com/joaosaffran updated https://github.com/llvm/llvm-project/pull/136732 >From 156975528eefad5be199165fcfce6398f6a79ab8 Mon Sep 17 00:00:00 2001 From: joaosaffran Date: Fri, 18 Apr 2025 00:41:37 + Subject: [PATCH 1/6] adding support for root descriptor --- llvm/include/llvm/BinaryFormat/DXContainer.h | 25 .../BinaryFormat/DXContainerConstants.def | 15 + .../llvm/MC/DXContainerRootSignature.h| 2 + llvm/include/llvm/Object/DXContainer.h| 39 .../include/llvm/ObjectYAML/DXContainerYAML.h | 36 ++- llvm/lib/MC/DXContainerRootSignature.cpp | 26 llvm/lib/ObjectYAML/DXContainerEmitter.cpp| 19 +- llvm/lib/ObjectYAML/DXContainerYAML.cpp | 63 +-- .../RootSignature-MultipleParameters.yaml | 20 -- 9 files changed, 233 insertions(+), 12 deletions(-) diff --git a/llvm/include/llvm/BinaryFormat/DXContainer.h b/llvm/include/llvm/BinaryFormat/DXContainer.h index 455657980bf40..d6e585c94fed1 100644 --- a/llvm/include/llvm/BinaryFormat/DXContainer.h +++ b/llvm/include/llvm/BinaryFormat/DXContainer.h @@ -18,6 +18,7 @@ #include "llvm/Support/SwapByteOrder.h" #include "llvm/TargetParser/Triple.h" +#include #include namespace llvm { @@ -158,6 +159,11 @@ enum class RootElementFlag : uint32_t { #include "DXContainerConstants.def" }; +#define ROOT_DESCRIPTOR_FLAG(Num, Val) Val = 1ull << Num, +enum class RootDescriptorFlag : uint32_t { +#include "DXContainerConstants.def" +}; + #define ROOT_PARAMETER(Val, Enum) Enum = Val, enum class RootParameterType : uint32_t { #include "DXContainerConstants.def" @@ -594,6 +600,25 @@ struct RootConstants { sys::swapByteOrder(Num32BitValues); } }; +struct RootDescriptor_V1_0 { + uint32_t ShaderRegister; + uint32_t RegisterSpace; + void swapBytes() { +sys::swapByteOrder(ShaderRegister); +sys::swapByteOrder(RegisterSpace); + } +}; + +struct RootDescriptor_V1_1 { + uint32_t ShaderRegister; + uint32_t RegisterSpace; + uint32_t Flags; + void swapBytes() { +sys::swapByteOrder(ShaderRegister); +sys::swapByteOrder(RegisterSpace); +sys::swapByteOrder(Flags); + } +}; struct RootParameterHeader { uint32_t ParameterType; diff --git a/llvm/include/llvm/BinaryFormat/DXContainerConstants.def b/llvm/include/llvm/BinaryFormat/DXContainerConstants.def index 590ded5e8c899..6840901460ced 100644 --- a/llvm/include/llvm/BinaryFormat/DXContainerConstants.def +++ b/llvm/include/llvm/BinaryFormat/DXContainerConstants.def @@ -72,9 +72,24 @@ ROOT_ELEMENT_FLAG(11, SamplerHeapDirectlyIndexed) #undef ROOT_ELEMENT_FLAG #endif // ROOT_ELEMENT_FLAG + +// ROOT_ELEMENT_FLAG(bit offset for the flag, name). +#ifdef ROOT_DESCRIPTOR_FLAG + +ROOT_DESCRIPTOR_FLAG(0, NONE) +ROOT_DESCRIPTOR_FLAG(2, DATA_VOLATILE) +ROOT_DESCRIPTOR_FLAG(4, DATA_STATIC_WHILE_SET_AT_EXECUTE) +ROOT_DESCRIPTOR_FLAG(8, DATA_STATIC) +#undef ROOT_DESCRIPTOR_FLAG +#endif // ROOT_DESCRIPTOR_FLAG + + #ifdef ROOT_PARAMETER ROOT_PARAMETER(1, Constants32Bit) +ROOT_PARAMETER(2, CBV) +ROOT_PARAMETER(3, SRV) +ROOT_PARAMETER(4, UAV) #undef ROOT_PARAMETER #endif // ROOT_PARAMETER diff --git a/llvm/include/llvm/MC/DXContainerRootSignature.h b/llvm/include/llvm/MC/DXContainerRootSignature.h index 6d3329a2c6ce9..5e3bc9b873f82 100644 --- a/llvm/include/llvm/MC/DXContainerRootSignature.h +++ b/llvm/include/llvm/MC/DXContainerRootSignature.h @@ -19,6 +19,8 @@ struct RootParameter { dxbc::RootParameterHeader Header; union { dxbc::RootConstants Constants; +dxbc::RootDescriptor_V1_0 Descriptor_V10; +dxbc::RootDescriptor_V1_1 Descriptor_V11; }; }; struct RootSignatureDesc { diff --git a/llvm/include/llvm/Object/DXContainer.h b/llvm/include/llvm/Object/DXContainer.h index e8287ce078365..7812906700fe3 100644 --- a/llvm/include/llvm/Object/DXContainer.h +++ b/llvm/include/llvm/Object/DXContainer.h @@ -15,6 +15,7 @@ #ifndef LLVM_OBJECT_DXCONTAINER_H #define LLVM_OBJECT_DXCONTAINER_H +#include "llvm/ADT/STLForwardCompat.h" #include "llvm/ADT/SmallVector.h" #include "llvm/ADT/StringRef.h" #include "llvm/BinaryFormat/DXContainer.h" @@ -149,6 +150,36 @@ struct RootConstantView : RootParameterView { } }; +struct RootDescriptorView_V1_0 : RootParameterView { + static bool classof(const RootParameterView *V) { +return (V->Header.ParameterType == +llvm::to_underlying(dxbc::RootParameterType::CBV) || +V->Header.ParameterType == +llvm::to_underlying(dxbc::RootParameterType::SRV) || +V->Header.ParameterType == +llvm::to_underlying(dxbc::RootParameterType::UAV)); + } + + llvm::Expected read() { +return readParameter(); + } +}; + +struct RootDescriptorView_V1_1 : RootParameterView { + static bool classof(const RootParameterView *V) { +return (V->Header.ParameterType == +llvm::to_underlying(dxbc::RootParameterType::CBV) || +V->Header.Par
[llvm-branch-commits] [llvm] [llvm] Extract and propagate indirect call type id (PR #87575)
https://github.com/Prabhuk updated https://github.com/llvm/llvm-project/pull/87575 >From 1a8d810d352fbe84c0521c7614689b60ade693c8 Mon Sep 17 00:00:00 2001 From: Necip Fazil Yildiran Date: Tue, 19 Nov 2024 15:25:34 -0800 Subject: [PATCH 1/5] Fixed the tests and addressed most of the review comments. Created using spr 1.3.6-beta.1 --- llvm/include/llvm/CodeGen/MachineFunction.h | 15 +++-- .../CodeGen/AArch64/call-site-info-typeid.ll | 28 +++-- .../test/CodeGen/ARM/call-site-info-typeid.ll | 28 +++-- .../CodeGen/MIR/X86/call-site-info-typeid.ll | 58 --- .../CodeGen/MIR/X86/call-site-info-typeid.mir | 13 ++--- .../CodeGen/Mips/call-site-info-typeid.ll | 28 +++-- .../test/CodeGen/X86/call-site-info-typeid.ll | 28 +++-- 7 files changed, 71 insertions(+), 127 deletions(-) diff --git a/llvm/include/llvm/CodeGen/MachineFunction.h b/llvm/include/llvm/CodeGen/MachineFunction.h index bb0b87a3a04a3..44633df38a651 100644 --- a/llvm/include/llvm/CodeGen/MachineFunction.h +++ b/llvm/include/llvm/CodeGen/MachineFunction.h @@ -493,7 +493,7 @@ class LLVM_EXTERNAL_VISIBILITY MachineFunction { /// Callee type id. ConstantInt *TypeId = nullptr; -CallSiteInfo() {} +CallSiteInfo() = default; /// Extracts the numeric type id from the CallBase's type operand bundle, /// and sets TypeId. This is used as type id for the indirect call in the @@ -503,12 +503,11 @@ class LLVM_EXTERNAL_VISIBILITY MachineFunction { if (!CB.isIndirectCall()) return; - auto Opt = CB.getOperandBundle(LLVMContext::OB_type); - if (!Opt.has_value()) { -errs() << "warning: cannot find indirect call type operand bundle for " - "call graph section\n"; + std::optional Opt = + CB.getOperandBundle(LLVMContext::OB_type); + // Return if the operand bundle for call graph section cannot be found. + if (!Opt.has_value()) return; - } // Get generalized type id string auto OB = Opt.value(); @@ -520,9 +519,9 @@ class LLVM_EXTERNAL_VISIBILITY MachineFunction { "invalid type identifier"); // Compute numeric type id from generalized type id string - uint64_t TypeIdVal = llvm::MD5Hash(TypeIdStr->getString()); + uint64_t TypeIdVal = MD5Hash(TypeIdStr->getString()); IntegerType *Int64Ty = Type::getInt64Ty(CB.getContext()); - TypeId = llvm::ConstantInt::get(Int64Ty, TypeIdVal, /*IsSigned=*/false); + TypeId = ConstantInt::get(Int64Ty, TypeIdVal, /*IsSigned=*/false); } }; diff --git a/llvm/test/CodeGen/AArch64/call-site-info-typeid.ll b/llvm/test/CodeGen/AArch64/call-site-info-typeid.ll index f0a6b44755c5c..f3b98c2c7a395 100644 --- a/llvm/test/CodeGen/AArch64/call-site-info-typeid.ll +++ b/llvm/test/CodeGen/AArch64/call-site-info-typeid.ll @@ -1,14 +1,9 @@ -; Tests that call site type ids can be extracted and set from type operand -; bundles. +;; Tests that call site type ids can be extracted and set from type operand +;; bundles. -; Verify the exact typeId value to ensure it is not garbage but the value -; computed as the type id from the type operand bundle. -; RUN: llc --call-graph-section -mtriple aarch64-linux-gnu %s -stop-before=finalize-isel -o - | FileCheck %s - -; ModuleID = 'test.c' -source_filename = "test.c" -target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128" -target triple = "aarch64-unknown-linux-gnu" +;; Verify the exact typeId value to ensure it is not garbage but the value +;; computed as the type id from the type operand bundle. +; RUN: llc --call-graph-section -mtriple aarch64-linux-gnu < %s -stop-before=finalize-isel -o - | FileCheck %s define dso_local void @foo(i8 signext %a) !type !3 { entry: @@ -19,10 +14,10 @@ entry: define dso_local i32 @main() !type !4 { entry: %retval = alloca i32, align 4 - %fp = alloca void (i8)*, align 8 - store i32 0, i32* %retval, align 4 - store void (i8)* @foo, void (i8)** %fp, align 8 - %0 = load void (i8)*, void (i8)** %fp, align 8 + %fp = alloca ptr, align 8 + store i32 0, ptr %retval, align 4 + store ptr @foo, ptr %fp, align 8 + %0 = load ptr, ptr %fp, align 8 ; CHECK: callSites: ; CHECK-NEXT: - { bb: {{.*}}, offset: {{.*}}, fwdArgRegs: [], typeId: ; CHECK-NEXT: 7854600665770582568 } @@ -30,10 +25,5 @@ entry: ret i32 0 } -!llvm.module.flags = !{!0, !1, !2} - -!0 = !{i32 1, !"wchar_size", i32 4} -!1 = !{i32 7, !"uwtable", i32 1} -!2 = !{i32 7, !"frame-pointer", i32 2} !3 = !{i64 0, !"_ZTSFvcE.generalized"} !4 = !{i64 0, !"_ZTSFiE.generalized"} diff --git a/llvm/test/CodeGen/ARM/call-site-info-typeid.ll b/llvm/test/CodeGen/ARM/call-site-info-typeid.ll index ec7f8a425051b..9feeef9a564cc 100644 --- a/llvm/test/CodeGen/ARM/call-site-info-typeid.ll +++ b/llvm/test/CodeGen/ARM/call-site-info-typeid.ll @@ -1,14 +1,9 @@ -; Tests that call site type ids can be extracted and set from type operand -; bundles. +;; Tests that ca
[llvm-branch-commits] [llvm] [llvm] Extract and propagate indirect call type id (PR #87575)
https://github.com/Prabhuk updated https://github.com/llvm/llvm-project/pull/87575 >From 1a8d810d352fbe84c0521c7614689b60ade693c8 Mon Sep 17 00:00:00 2001 From: Necip Fazil Yildiran Date: Tue, 19 Nov 2024 15:25:34 -0800 Subject: [PATCH 1/5] Fixed the tests and addressed most of the review comments. Created using spr 1.3.6-beta.1 --- llvm/include/llvm/CodeGen/MachineFunction.h | 15 +++-- .../CodeGen/AArch64/call-site-info-typeid.ll | 28 +++-- .../test/CodeGen/ARM/call-site-info-typeid.ll | 28 +++-- .../CodeGen/MIR/X86/call-site-info-typeid.ll | 58 --- .../CodeGen/MIR/X86/call-site-info-typeid.mir | 13 ++--- .../CodeGen/Mips/call-site-info-typeid.ll | 28 +++-- .../test/CodeGen/X86/call-site-info-typeid.ll | 28 +++-- 7 files changed, 71 insertions(+), 127 deletions(-) diff --git a/llvm/include/llvm/CodeGen/MachineFunction.h b/llvm/include/llvm/CodeGen/MachineFunction.h index bb0b87a3a04a3..44633df38a651 100644 --- a/llvm/include/llvm/CodeGen/MachineFunction.h +++ b/llvm/include/llvm/CodeGen/MachineFunction.h @@ -493,7 +493,7 @@ class LLVM_EXTERNAL_VISIBILITY MachineFunction { /// Callee type id. ConstantInt *TypeId = nullptr; -CallSiteInfo() {} +CallSiteInfo() = default; /// Extracts the numeric type id from the CallBase's type operand bundle, /// and sets TypeId. This is used as type id for the indirect call in the @@ -503,12 +503,11 @@ class LLVM_EXTERNAL_VISIBILITY MachineFunction { if (!CB.isIndirectCall()) return; - auto Opt = CB.getOperandBundle(LLVMContext::OB_type); - if (!Opt.has_value()) { -errs() << "warning: cannot find indirect call type operand bundle for " - "call graph section\n"; + std::optional Opt = + CB.getOperandBundle(LLVMContext::OB_type); + // Return if the operand bundle for call graph section cannot be found. + if (!Opt.has_value()) return; - } // Get generalized type id string auto OB = Opt.value(); @@ -520,9 +519,9 @@ class LLVM_EXTERNAL_VISIBILITY MachineFunction { "invalid type identifier"); // Compute numeric type id from generalized type id string - uint64_t TypeIdVal = llvm::MD5Hash(TypeIdStr->getString()); + uint64_t TypeIdVal = MD5Hash(TypeIdStr->getString()); IntegerType *Int64Ty = Type::getInt64Ty(CB.getContext()); - TypeId = llvm::ConstantInt::get(Int64Ty, TypeIdVal, /*IsSigned=*/false); + TypeId = ConstantInt::get(Int64Ty, TypeIdVal, /*IsSigned=*/false); } }; diff --git a/llvm/test/CodeGen/AArch64/call-site-info-typeid.ll b/llvm/test/CodeGen/AArch64/call-site-info-typeid.ll index f0a6b44755c5c..f3b98c2c7a395 100644 --- a/llvm/test/CodeGen/AArch64/call-site-info-typeid.ll +++ b/llvm/test/CodeGen/AArch64/call-site-info-typeid.ll @@ -1,14 +1,9 @@ -; Tests that call site type ids can be extracted and set from type operand -; bundles. +;; Tests that call site type ids can be extracted and set from type operand +;; bundles. -; Verify the exact typeId value to ensure it is not garbage but the value -; computed as the type id from the type operand bundle. -; RUN: llc --call-graph-section -mtriple aarch64-linux-gnu %s -stop-before=finalize-isel -o - | FileCheck %s - -; ModuleID = 'test.c' -source_filename = "test.c" -target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128" -target triple = "aarch64-unknown-linux-gnu" +;; Verify the exact typeId value to ensure it is not garbage but the value +;; computed as the type id from the type operand bundle. +; RUN: llc --call-graph-section -mtriple aarch64-linux-gnu < %s -stop-before=finalize-isel -o - | FileCheck %s define dso_local void @foo(i8 signext %a) !type !3 { entry: @@ -19,10 +14,10 @@ entry: define dso_local i32 @main() !type !4 { entry: %retval = alloca i32, align 4 - %fp = alloca void (i8)*, align 8 - store i32 0, i32* %retval, align 4 - store void (i8)* @foo, void (i8)** %fp, align 8 - %0 = load void (i8)*, void (i8)** %fp, align 8 + %fp = alloca ptr, align 8 + store i32 0, ptr %retval, align 4 + store ptr @foo, ptr %fp, align 8 + %0 = load ptr, ptr %fp, align 8 ; CHECK: callSites: ; CHECK-NEXT: - { bb: {{.*}}, offset: {{.*}}, fwdArgRegs: [], typeId: ; CHECK-NEXT: 7854600665770582568 } @@ -30,10 +25,5 @@ entry: ret i32 0 } -!llvm.module.flags = !{!0, !1, !2} - -!0 = !{i32 1, !"wchar_size", i32 4} -!1 = !{i32 7, !"uwtable", i32 1} -!2 = !{i32 7, !"frame-pointer", i32 2} !3 = !{i64 0, !"_ZTSFvcE.generalized"} !4 = !{i64 0, !"_ZTSFiE.generalized"} diff --git a/llvm/test/CodeGen/ARM/call-site-info-typeid.ll b/llvm/test/CodeGen/ARM/call-site-info-typeid.ll index ec7f8a425051b..9feeef9a564cc 100644 --- a/llvm/test/CodeGen/ARM/call-site-info-typeid.ll +++ b/llvm/test/CodeGen/ARM/call-site-info-typeid.ll @@ -1,14 +1,9 @@ -; Tests that call site type ids can be extracted and set from type operand -; bundles. +;; Tests that ca
[llvm-branch-commits] [llvm] [llvm] Add option to emit `callgraph` section (PR #87574)
https://github.com/Prabhuk updated https://github.com/llvm/llvm-project/pull/87574 >From 1d7ee612e408ee7e64e984eb08e6d7089a435d09 Mon Sep 17 00:00:00 2001 From: Necip Fazil Yildiran Date: Sun, 2 Feb 2025 00:58:49 + Subject: [PATCH 1/6] Simplify MIR test. Created using spr 1.3.6-beta.1 --- .../CodeGen/MIR/X86/call-site-info-typeid.mir | 21 ++- 1 file changed, 6 insertions(+), 15 deletions(-) diff --git a/llvm/test/CodeGen/MIR/X86/call-site-info-typeid.mir b/llvm/test/CodeGen/MIR/X86/call-site-info-typeid.mir index 5ab797bfcc18f..a99ee50a608fb 100644 --- a/llvm/test/CodeGen/MIR/X86/call-site-info-typeid.mir +++ b/llvm/test/CodeGen/MIR/X86/call-site-info-typeid.mir @@ -8,11 +8,6 @@ # CHECK-NEXT: 123456789 } --- | - ; ModuleID = 'test.ll' - source_filename = "test.ll" - target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" - target triple = "x86_64-unknown-linux-gnu" - define dso_local void @foo(i8 signext %a) { entry: ret void @@ -21,10 +16,10 @@ define dso_local i32 @main() { entry: %retval = alloca i32, align 4 -%fp = alloca void (i8)*, align 8 -store i32 0, i32* %retval, align 4 -store void (i8)* @foo, void (i8)** %fp, align 8 -%0 = load void (i8)*, void (i8)** %fp, align 8 +%fp = alloca ptr, align 8 +store i32 0, ptr %retval, align 4 +store ptr @foo, ptr %fp, align 8 +%0 = load ptr, ptr %fp, align 8 call void %0(i8 signext 97) ret i32 0 } @@ -42,12 +37,8 @@ body: | name:main tracksRegLiveness: true stack: - - { id: 0, name: retval, type: default, offset: 0, size: 4, alignment: 4, - stack-id: default, callee-saved-register: '', callee-saved-restored: true, - debug-info-variable: '', debug-info-expression: '', debug-info-location: '' } - - { id: 1, name: fp, type: default, offset: 0, size: 8, alignment: 8, - stack-id: default, callee-saved-register: '', callee-saved-restored: true, - debug-info-variable: '', debug-info-expression: '', debug-info-location: '' } + - { id: 0, name: retval, size: 4, alignment: 4 } + - { id: 1, name: fp, size: 8, alignment: 8 } callSites: - { bb: 0, offset: 6, fwdArgRegs: [], typeId: 123456789 } >From 86e2c9dc37170499252ed50c6bbef2931e106fbb Mon Sep 17 00:00:00 2001 From: prabhukr Date: Thu, 13 Mar 2025 01:03:40 + Subject: [PATCH 2/6] Add requested tests part 1. Created using spr 1.3.6-beta.1 --- ...te-info-ambiguous-indirect-call-typeid.mir | 145 ++ .../call-site-info-direct-calls-typeid.mir| 145 ++ 2 files changed, 290 insertions(+) create mode 100644 llvm/test/CodeGen/MIR/X86/call-site-info-ambiguous-indirect-call-typeid.mir create mode 100644 llvm/test/CodeGen/MIR/X86/call-site-info-direct-calls-typeid.mir diff --git a/llvm/test/CodeGen/MIR/X86/call-site-info-ambiguous-indirect-call-typeid.mir b/llvm/test/CodeGen/MIR/X86/call-site-info-ambiguous-indirect-call-typeid.mir new file mode 100644 index 0..9d1b099cc9093 --- /dev/null +++ b/llvm/test/CodeGen/MIR/X86/call-site-info-ambiguous-indirect-call-typeid.mir @@ -0,0 +1,145 @@ +# Test MIR printer and parser for type id field in callSites. It is used +# for propogating call site type identifiers to emit in the call graph section. + +# RUN: llc --call-graph-section %s -run-pass=none -o - | FileCheck %s +# CHECK: name: main +# CHECK: callSites: +# CHECK-NEXT: - { bb: {{.*}}, offset: {{.*}}, fwdArgRegs: [] +# CHECK-NEXT: - { bb: {{.*}}, offset: {{.*}}, fwdArgRegs: [], typeId: +# CHECK-NEXT: 1234567890 } + +--- | + ; Function Attrs: mustprogress noinline nounwind optnone uwtable + define dso_local noundef i32 @_Z3addii(i32 noundef %a, i32 noundef %b) #0 !type !6 !type !6 { + entry: +%a.addr = alloca i32, align 4 +%b.addr = alloca i32, align 4 +store i32 %a, ptr %a.addr, align 4 +store i32 %b, ptr %b.addr, align 4 +%0 = load i32, ptr %a.addr, align 4 +%1 = load i32, ptr %b.addr, align 4 +%add = add nsw i32 %0, %1 +ret i32 %add + } + + ; Function Attrs: mustprogress noinline nounwind optnone uwtable + define dso_local noundef i32 @_Z8multiplyii(i32 noundef %a, i32 noundef %b) #0 !type !6 !type !6 { + entry: +%a.addr = alloca i32, align 4 +%b.addr = alloca i32, align 4 +store i32 %a, ptr %a.addr, align 4 +store i32 %b, ptr %b.addr, align 4 +%0 = load i32, ptr %a.addr, align 4 +%1 = load i32, ptr %b.addr, align 4 +%mul = mul nsw i32 %0, %1 +ret i32 %mul + } + + ; Function Attrs: mustprogress noinline nounwind optnone uwtable + define dso_local noundef ptr @_Z13get_operationb(i1 noundef zeroext %is_addition) #0 !type !7 !type !7 { + entry: +%is_addition.addr = alloca i8, align 1 +%storedv = zext i1 %is_addition to i8 +store i8 %storedv, ptr %is_addition.addr, align 1 +%0 = load i8, ptr %is_addition.addr, align 1 +%loadedv = trunc i8 %0 to i1 +br i1 %loade
[llvm-branch-commits] [llvm] [llvm] Add option to emit `callgraph` section (PR #87574)
https://github.com/Prabhuk updated https://github.com/llvm/llvm-project/pull/87574 >From 1d7ee612e408ee7e64e984eb08e6d7089a435d09 Mon Sep 17 00:00:00 2001 From: Necip Fazil Yildiran Date: Sun, 2 Feb 2025 00:58:49 + Subject: [PATCH 1/6] Simplify MIR test. Created using spr 1.3.6-beta.1 --- .../CodeGen/MIR/X86/call-site-info-typeid.mir | 21 ++- 1 file changed, 6 insertions(+), 15 deletions(-) diff --git a/llvm/test/CodeGen/MIR/X86/call-site-info-typeid.mir b/llvm/test/CodeGen/MIR/X86/call-site-info-typeid.mir index 5ab797bfcc18f..a99ee50a608fb 100644 --- a/llvm/test/CodeGen/MIR/X86/call-site-info-typeid.mir +++ b/llvm/test/CodeGen/MIR/X86/call-site-info-typeid.mir @@ -8,11 +8,6 @@ # CHECK-NEXT: 123456789 } --- | - ; ModuleID = 'test.ll' - source_filename = "test.ll" - target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" - target triple = "x86_64-unknown-linux-gnu" - define dso_local void @foo(i8 signext %a) { entry: ret void @@ -21,10 +16,10 @@ define dso_local i32 @main() { entry: %retval = alloca i32, align 4 -%fp = alloca void (i8)*, align 8 -store i32 0, i32* %retval, align 4 -store void (i8)* @foo, void (i8)** %fp, align 8 -%0 = load void (i8)*, void (i8)** %fp, align 8 +%fp = alloca ptr, align 8 +store i32 0, ptr %retval, align 4 +store ptr @foo, ptr %fp, align 8 +%0 = load ptr, ptr %fp, align 8 call void %0(i8 signext 97) ret i32 0 } @@ -42,12 +37,8 @@ body: | name:main tracksRegLiveness: true stack: - - { id: 0, name: retval, type: default, offset: 0, size: 4, alignment: 4, - stack-id: default, callee-saved-register: '', callee-saved-restored: true, - debug-info-variable: '', debug-info-expression: '', debug-info-location: '' } - - { id: 1, name: fp, type: default, offset: 0, size: 8, alignment: 8, - stack-id: default, callee-saved-register: '', callee-saved-restored: true, - debug-info-variable: '', debug-info-expression: '', debug-info-location: '' } + - { id: 0, name: retval, size: 4, alignment: 4 } + - { id: 1, name: fp, size: 8, alignment: 8 } callSites: - { bb: 0, offset: 6, fwdArgRegs: [], typeId: 123456789 } >From 86e2c9dc37170499252ed50c6bbef2931e106fbb Mon Sep 17 00:00:00 2001 From: prabhukr Date: Thu, 13 Mar 2025 01:03:40 + Subject: [PATCH 2/6] Add requested tests part 1. Created using spr 1.3.6-beta.1 --- ...te-info-ambiguous-indirect-call-typeid.mir | 145 ++ .../call-site-info-direct-calls-typeid.mir| 145 ++ 2 files changed, 290 insertions(+) create mode 100644 llvm/test/CodeGen/MIR/X86/call-site-info-ambiguous-indirect-call-typeid.mir create mode 100644 llvm/test/CodeGen/MIR/X86/call-site-info-direct-calls-typeid.mir diff --git a/llvm/test/CodeGen/MIR/X86/call-site-info-ambiguous-indirect-call-typeid.mir b/llvm/test/CodeGen/MIR/X86/call-site-info-ambiguous-indirect-call-typeid.mir new file mode 100644 index 0..9d1b099cc9093 --- /dev/null +++ b/llvm/test/CodeGen/MIR/X86/call-site-info-ambiguous-indirect-call-typeid.mir @@ -0,0 +1,145 @@ +# Test MIR printer and parser for type id field in callSites. It is used +# for propogating call site type identifiers to emit in the call graph section. + +# RUN: llc --call-graph-section %s -run-pass=none -o - | FileCheck %s +# CHECK: name: main +# CHECK: callSites: +# CHECK-NEXT: - { bb: {{.*}}, offset: {{.*}}, fwdArgRegs: [] +# CHECK-NEXT: - { bb: {{.*}}, offset: {{.*}}, fwdArgRegs: [], typeId: +# CHECK-NEXT: 1234567890 } + +--- | + ; Function Attrs: mustprogress noinline nounwind optnone uwtable + define dso_local noundef i32 @_Z3addii(i32 noundef %a, i32 noundef %b) #0 !type !6 !type !6 { + entry: +%a.addr = alloca i32, align 4 +%b.addr = alloca i32, align 4 +store i32 %a, ptr %a.addr, align 4 +store i32 %b, ptr %b.addr, align 4 +%0 = load i32, ptr %a.addr, align 4 +%1 = load i32, ptr %b.addr, align 4 +%add = add nsw i32 %0, %1 +ret i32 %add + } + + ; Function Attrs: mustprogress noinline nounwind optnone uwtable + define dso_local noundef i32 @_Z8multiplyii(i32 noundef %a, i32 noundef %b) #0 !type !6 !type !6 { + entry: +%a.addr = alloca i32, align 4 +%b.addr = alloca i32, align 4 +store i32 %a, ptr %a.addr, align 4 +store i32 %b, ptr %b.addr, align 4 +%0 = load i32, ptr %a.addr, align 4 +%1 = load i32, ptr %b.addr, align 4 +%mul = mul nsw i32 %0, %1 +ret i32 %mul + } + + ; Function Attrs: mustprogress noinline nounwind optnone uwtable + define dso_local noundef ptr @_Z13get_operationb(i1 noundef zeroext %is_addition) #0 !type !7 !type !7 { + entry: +%is_addition.addr = alloca i8, align 1 +%storedv = zext i1 %is_addition to i8 +store i8 %storedv, ptr %is_addition.addr, align 1 +%0 = load i8, ptr %is_addition.addr, align 1 +%loadedv = trunc i8 %0 to i1 +br i1 %loade
[llvm-branch-commits] [llvm] [llvm][AsmPrinter] Emit call graph section (PR #87576)
https://github.com/Prabhuk updated https://github.com/llvm/llvm-project/pull/87576 >From 6b67376bd5e1f21606017c83cc67f2186ba36a33 Mon Sep 17 00:00:00 2001 From: Necip Fazil Yildiran Date: Thu, 13 Mar 2025 01:41:04 + Subject: [PATCH 1/4] Updated the test as reviewers suggested. Created using spr 1.3.6-beta.1 --- llvm/test/CodeGen/X86/call-graph-section.ll | 66 +++ llvm/test/CodeGen/call-graph-section.ll | 73 - 2 files changed, 66 insertions(+), 73 deletions(-) create mode 100644 llvm/test/CodeGen/X86/call-graph-section.ll delete mode 100644 llvm/test/CodeGen/call-graph-section.ll diff --git a/llvm/test/CodeGen/X86/call-graph-section.ll b/llvm/test/CodeGen/X86/call-graph-section.ll new file mode 100644 index 0..a77a2b8051ed3 --- /dev/null +++ b/llvm/test/CodeGen/X86/call-graph-section.ll @@ -0,0 +1,66 @@ +;; Tests that we store the type identifiers in .callgraph section of the binary. + +; RUN: llc --call-graph-section -filetype=obj -o - < %s | \ +; RUN: llvm-readelf -x .callgraph - | FileCheck %s + +; Function Attrs: noinline nounwind optnone uwtable +define dso_local void @foo() #0 !type !4 { +entry: + ret void +} + +; Function Attrs: noinline nounwind optnone uwtable +define dso_local i32 @bar(i8 signext %a) #0 !type !5 { +entry: + %a.addr = alloca i8, align 1 + store i8 %a, ptr %a.addr, align 1 + ret i32 0 +} + +; Function Attrs: noinline nounwind optnone uwtable +define dso_local ptr @baz(ptr %a) #0 !type !6 { +entry: + %a.addr = alloca ptr, align 8 + store ptr %a, ptr %a.addr, align 8 + ret ptr null +} + +; Function Attrs: noinline nounwind optnone uwtable +define dso_local void @main() #0 !type !7 { +entry: + %retval = alloca i32, align 4 + %fp_foo = alloca ptr, align 8 + %a = alloca i8, align 1 + %fp_bar = alloca ptr, align 8 + %fp_baz = alloca ptr, align 8 + store i32 0, ptr %retval, align 4 + store ptr @foo, ptr %fp_foo, align 8 + %0 = load ptr, ptr %fp_foo, align 8 + call void (...) %0() [ "callee_type"(metadata !"_ZTSFvE.generalized") ] + store ptr @bar, ptr %fp_bar, align 8 + %1 = load ptr, ptr %fp_bar, align 8 + %2 = load i8, ptr %a, align 1 + %call = call i32 %1(i8 signext %2) [ "callee_type"(metadata !"_ZTSFicE.generalized") ] + store ptr @baz, ptr %fp_baz, align 8 + %3 = load ptr, ptr %fp_baz, align 8 + %call1 = call ptr %3(ptr %a) [ "callee_type"(metadata !"_ZTSFPvS_E.generalized") ] + call void @foo() [ "callee_type"(metadata !"_ZTSFvE.generalized") ] + %4 = load i8, ptr %a, align 1 + %call2 = call i32 @bar(i8 signext %4) [ "callee_type"(metadata !"_ZTSFicE.generalized") ] + %call3 = call ptr @baz(ptr %a) [ "callee_type"(metadata !"_ZTSFPvS_E.generalized") ] + ret void +} + +;; Check that the numeric type id (md5 hash) for the below type ids are emitted +;; to the callgraph section. + +; CHECK: Hex dump of section '.callgraph': + +; CHECK-DAG: 2444f731 f5eecb3e +!4 = !{i64 0, !"_ZTSFvE.generalized"} +; CHECK-DAG: 5486bc59 814b8e30 +!5 = !{i64 0, !"_ZTSFicE.generalized"} +; CHECK-DAG: 7ade6814 f897fd77 +!6 = !{i64 0, !"_ZTSFPvS_E.generalized"} +; CHECK-DAG: caaf769a 600968fa +!7 = !{i64 0, !"_ZTSFiE.generalized"} diff --git a/llvm/test/CodeGen/call-graph-section.ll b/llvm/test/CodeGen/call-graph-section.ll deleted file mode 100644 index bb158d11e82c9..0 --- a/llvm/test/CodeGen/call-graph-section.ll +++ /dev/null @@ -1,73 +0,0 @@ -; Tests that we store the type identifiers in .callgraph section of the binary. - -; RUN: llc --call-graph-section -filetype=obj -o - < %s | \ -; RUN: llvm-readelf -x .callgraph - | FileCheck %s - -target triple = "x86_64-unknown-linux-gnu" - -define dso_local void @foo() #0 !type !4 { -entry: - ret void -} - -define dso_local i32 @bar(i8 signext %a) #0 !type !5 { -entry: - %a.addr = alloca i8, align 1 - store i8 %a, i8* %a.addr, align 1 - ret i32 0 -} - -define dso_local i32* @baz(i8* %a) #0 !type !6 { -entry: - %a.addr = alloca i8*, align 8 - store i8* %a, i8** %a.addr, align 8 - ret i32* null -} - -define dso_local i32 @main() #0 !type !7 { -entry: - %retval = alloca i32, align 4 - %fp_foo = alloca void (...)*, align 8 - %a = alloca i8, align 1 - %fp_bar = alloca i32 (i8)*, align 8 - %fp_baz = alloca i32* (i8*)*, align 8 - store i32 0, i32* %retval, align 4 - store void (...)* bitcast (void ()* @foo to void (...)*), void (...)** %fp_foo, align 8 - %0 = load void (...)*, void (...)** %fp_foo, align 8 - call void (...) %0() [ "callee_type"(metadata !"_ZTSFvE.generalized") ] - store i32 (i8)* @bar, i32 (i8)** %fp_bar, align 8 - %1 = load i32 (i8)*, i32 (i8)** %fp_bar, align 8 - %2 = load i8, i8* %a, align 1 - %call = call i32 %1(i8 signext %2) [ "callee_type"(metadata !"_ZTSFicE.generalized") ] - store i32* (i8*)* @baz, i32* (i8*)** %fp_baz, align 8 - %3 = load i32* (i8*)*, i32* (i8*)** %fp_baz, align 8 - %call1 = call i32* %3(i8* %a) [ "callee_type"(metadata !"_ZTSFPvS_E.generalized") ] - call void @foo() [ "callee_type"(meta
[llvm-branch-commits] [clang] [clang] Introduce CallGraphSection option (PR #117037)
https://github.com/Prabhuk updated https://github.com/llvm/llvm-project/pull/117037 >From 6a12be2c5b60a95a06875b0b2c4f14228d1fa882 Mon Sep 17 00:00:00 2001 From: prabhukr Date: Wed, 12 Mar 2025 23:30:01 + Subject: [PATCH] Fix EOF newlines. Created using spr 1.3.6-beta.1 --- clang/test/Driver/call-graph-section.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/clang/test/Driver/call-graph-section.c b/clang/test/Driver/call-graph-section.c index 108446729d857..5832aa6754137 100644 --- a/clang/test/Driver/call-graph-section.c +++ b/clang/test/Driver/call-graph-section.c @@ -2,4 +2,4 @@ // RUN: %clang -### -S -fcall-graph-section -fno-call-graph-section %s 2>&1 | FileCheck --check-prefix=NO-CALL-GRAPH-SECTION %s // CALL-GRAPH-SECTION: "-fcall-graph-section" -// NO-CALL-GRAPH-SECTION-NOT: "-fcall-graph-section" \ No newline at end of file +// NO-CALL-GRAPH-SECTION-NOT: "-fcall-graph-section" ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [llvm][AsmPrinter] Emit call graph section (PR #87576)
https://github.com/Prabhuk updated https://github.com/llvm/llvm-project/pull/87576 >From 6b67376bd5e1f21606017c83cc67f2186ba36a33 Mon Sep 17 00:00:00 2001 From: Necip Fazil Yildiran Date: Thu, 13 Mar 2025 01:41:04 + Subject: [PATCH 1/4] Updated the test as reviewers suggested. Created using spr 1.3.6-beta.1 --- llvm/test/CodeGen/X86/call-graph-section.ll | 66 +++ llvm/test/CodeGen/call-graph-section.ll | 73 - 2 files changed, 66 insertions(+), 73 deletions(-) create mode 100644 llvm/test/CodeGen/X86/call-graph-section.ll delete mode 100644 llvm/test/CodeGen/call-graph-section.ll diff --git a/llvm/test/CodeGen/X86/call-graph-section.ll b/llvm/test/CodeGen/X86/call-graph-section.ll new file mode 100644 index 0..a77a2b8051ed3 --- /dev/null +++ b/llvm/test/CodeGen/X86/call-graph-section.ll @@ -0,0 +1,66 @@ +;; Tests that we store the type identifiers in .callgraph section of the binary. + +; RUN: llc --call-graph-section -filetype=obj -o - < %s | \ +; RUN: llvm-readelf -x .callgraph - | FileCheck %s + +; Function Attrs: noinline nounwind optnone uwtable +define dso_local void @foo() #0 !type !4 { +entry: + ret void +} + +; Function Attrs: noinline nounwind optnone uwtable +define dso_local i32 @bar(i8 signext %a) #0 !type !5 { +entry: + %a.addr = alloca i8, align 1 + store i8 %a, ptr %a.addr, align 1 + ret i32 0 +} + +; Function Attrs: noinline nounwind optnone uwtable +define dso_local ptr @baz(ptr %a) #0 !type !6 { +entry: + %a.addr = alloca ptr, align 8 + store ptr %a, ptr %a.addr, align 8 + ret ptr null +} + +; Function Attrs: noinline nounwind optnone uwtable +define dso_local void @main() #0 !type !7 { +entry: + %retval = alloca i32, align 4 + %fp_foo = alloca ptr, align 8 + %a = alloca i8, align 1 + %fp_bar = alloca ptr, align 8 + %fp_baz = alloca ptr, align 8 + store i32 0, ptr %retval, align 4 + store ptr @foo, ptr %fp_foo, align 8 + %0 = load ptr, ptr %fp_foo, align 8 + call void (...) %0() [ "callee_type"(metadata !"_ZTSFvE.generalized") ] + store ptr @bar, ptr %fp_bar, align 8 + %1 = load ptr, ptr %fp_bar, align 8 + %2 = load i8, ptr %a, align 1 + %call = call i32 %1(i8 signext %2) [ "callee_type"(metadata !"_ZTSFicE.generalized") ] + store ptr @baz, ptr %fp_baz, align 8 + %3 = load ptr, ptr %fp_baz, align 8 + %call1 = call ptr %3(ptr %a) [ "callee_type"(metadata !"_ZTSFPvS_E.generalized") ] + call void @foo() [ "callee_type"(metadata !"_ZTSFvE.generalized") ] + %4 = load i8, ptr %a, align 1 + %call2 = call i32 @bar(i8 signext %4) [ "callee_type"(metadata !"_ZTSFicE.generalized") ] + %call3 = call ptr @baz(ptr %a) [ "callee_type"(metadata !"_ZTSFPvS_E.generalized") ] + ret void +} + +;; Check that the numeric type id (md5 hash) for the below type ids are emitted +;; to the callgraph section. + +; CHECK: Hex dump of section '.callgraph': + +; CHECK-DAG: 2444f731 f5eecb3e +!4 = !{i64 0, !"_ZTSFvE.generalized"} +; CHECK-DAG: 5486bc59 814b8e30 +!5 = !{i64 0, !"_ZTSFicE.generalized"} +; CHECK-DAG: 7ade6814 f897fd77 +!6 = !{i64 0, !"_ZTSFPvS_E.generalized"} +; CHECK-DAG: caaf769a 600968fa +!7 = !{i64 0, !"_ZTSFiE.generalized"} diff --git a/llvm/test/CodeGen/call-graph-section.ll b/llvm/test/CodeGen/call-graph-section.ll deleted file mode 100644 index bb158d11e82c9..0 --- a/llvm/test/CodeGen/call-graph-section.ll +++ /dev/null @@ -1,73 +0,0 @@ -; Tests that we store the type identifiers in .callgraph section of the binary. - -; RUN: llc --call-graph-section -filetype=obj -o - < %s | \ -; RUN: llvm-readelf -x .callgraph - | FileCheck %s - -target triple = "x86_64-unknown-linux-gnu" - -define dso_local void @foo() #0 !type !4 { -entry: - ret void -} - -define dso_local i32 @bar(i8 signext %a) #0 !type !5 { -entry: - %a.addr = alloca i8, align 1 - store i8 %a, i8* %a.addr, align 1 - ret i32 0 -} - -define dso_local i32* @baz(i8* %a) #0 !type !6 { -entry: - %a.addr = alloca i8*, align 8 - store i8* %a, i8** %a.addr, align 8 - ret i32* null -} - -define dso_local i32 @main() #0 !type !7 { -entry: - %retval = alloca i32, align 4 - %fp_foo = alloca void (...)*, align 8 - %a = alloca i8, align 1 - %fp_bar = alloca i32 (i8)*, align 8 - %fp_baz = alloca i32* (i8*)*, align 8 - store i32 0, i32* %retval, align 4 - store void (...)* bitcast (void ()* @foo to void (...)*), void (...)** %fp_foo, align 8 - %0 = load void (...)*, void (...)** %fp_foo, align 8 - call void (...) %0() [ "callee_type"(metadata !"_ZTSFvE.generalized") ] - store i32 (i8)* @bar, i32 (i8)** %fp_bar, align 8 - %1 = load i32 (i8)*, i32 (i8)** %fp_bar, align 8 - %2 = load i8, i8* %a, align 1 - %call = call i32 %1(i8 signext %2) [ "callee_type"(metadata !"_ZTSFicE.generalized") ] - store i32* (i8*)* @baz, i32* (i8*)** %fp_baz, align 8 - %3 = load i32* (i8*)*, i32* (i8*)** %fp_baz, align 8 - %call1 = call i32* %3(i8* %a) [ "callee_type"(metadata !"_ZTSFPvS_E.generalized") ] - call void @foo() [ "callee_type"(meta
[llvm-branch-commits] [clang] [clang] Introduce CallGraphSection option (PR #117037)
https://github.com/Prabhuk updated https://github.com/llvm/llvm-project/pull/117037 >From 6a12be2c5b60a95a06875b0b2c4f14228d1fa882 Mon Sep 17 00:00:00 2001 From: prabhukr Date: Wed, 12 Mar 2025 23:30:01 + Subject: [PATCH] Fix EOF newlines. Created using spr 1.3.6-beta.1 --- clang/test/Driver/call-graph-section.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/clang/test/Driver/call-graph-section.c b/clang/test/Driver/call-graph-section.c index 108446729d857..5832aa6754137 100644 --- a/clang/test/Driver/call-graph-section.c +++ b/clang/test/Driver/call-graph-section.c @@ -2,4 +2,4 @@ // RUN: %clang -### -S -fcall-graph-section -fno-call-graph-section %s 2>&1 | FileCheck --check-prefix=NO-CALL-GRAPH-SECTION %s // CALL-GRAPH-SECTION: "-fcall-graph-section" -// NO-CALL-GRAPH-SECTION-NOT: "-fcall-graph-section" \ No newline at end of file +// NO-CALL-GRAPH-SECTION-NOT: "-fcall-graph-section" ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [clang] callee_type metadata for indirect calls (PR #117036)
https://github.com/Prabhuk updated https://github.com/llvm/llvm-project/pull/117036 >From b7fbe09b32ff02d4f7c52d82fbf8b5cd28138852 Mon Sep 17 00:00:00 2001 From: prabhukr Date: Wed, 23 Apr 2025 04:05:47 + Subject: [PATCH] Address review comments. Created using spr 1.3.6-beta.1 --- clang/lib/CodeGen/CGCall.cpp| 8 clang/lib/CodeGen/CodeGenModule.cpp | 10 +- clang/lib/CodeGen/CodeGenModule.h | 4 ++-- 3 files changed, 11 insertions(+), 11 deletions(-) diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp index 185ee1a970aac..d8ab7140f7943 100644 --- a/clang/lib/CodeGen/CGCall.cpp +++ b/clang/lib/CodeGen/CGCall.cpp @@ -5780,19 +5780,19 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo, if (callOrInvoke) { *callOrInvoke = CI; if (CGM.getCodeGenOpts().CallGraphSection) { - assert((TargetDecl && TargetDecl->getFunctionType() || - Callee.getAbstractInfo().getCalleeFunctionProtoType()) && - "cannot find callsite type"); QualType CST; if (TargetDecl && TargetDecl->getFunctionType()) CST = QualType(TargetDecl->getFunctionType(), 0); else if (const auto *FPT = Callee.getAbstractInfo().getCalleeFunctionProtoType()) CST = QualType(FPT, 0); + else +llvm_unreachable( +"Cannot find the callee type to generate callee_type metadata."); // Set type identifier metadata of indirect calls for call graph section. if (!CST.isNull()) -CGM.CreateCalleeTypeMetadataForIcall(CST, *callOrInvoke); +CGM.createCalleeTypeMetadataForIcall(CST, *callOrInvoke); } } diff --git a/clang/lib/CodeGen/CodeGenModule.cpp b/clang/lib/CodeGen/CodeGenModule.cpp index 43cd2405571cf..2fc99639a75cb 100644 --- a/clang/lib/CodeGen/CodeGenModule.cpp +++ b/clang/lib/CodeGen/CodeGenModule.cpp @@ -2654,7 +2654,7 @@ void CodeGenModule::SetLLVMFunctionAttributesForDefinition(const Decl *D, // Skip available_externally functions. They won't be codegen'ed in the // current module anyway. if (getContext().GetGVALinkageForFunction(FD) != GVA_AvailableExternally) -CreateFunctionTypeMetadataForIcall(FD, F); +createFunctionTypeMetadataForIcall(FD, F); } } @@ -2868,7 +2868,7 @@ static bool hasExistingGeneralizedTypeMD(llvm::Function *F) { return MD->hasGeneralizedMDString(); } -void CodeGenModule::CreateFunctionTypeMetadataForIcall(const FunctionDecl *FD, +void CodeGenModule::createFunctionTypeMetadataForIcall(const FunctionDecl *FD, llvm::Function *F) { if (CodeGenOpts.CallGraphSection && !hasExistingGeneralizedTypeMD(F) && (!F->hasLocalLinkage() || @@ -2898,7 +2898,7 @@ void CodeGenModule::CreateFunctionTypeMetadataForIcall(const FunctionDecl *FD, F->addTypeMetadata(0, llvm::ConstantAsMetadata::get(CrossDsoTypeId)); } -void CodeGenModule::CreateCalleeTypeMetadataForIcall(const QualType &QT, +void CodeGenModule::createCalleeTypeMetadataForIcall(const QualType &QT, llvm::CallBase *CB) { // Only if needed for call graph section and only for indirect calls. if (!CodeGenOpts.CallGraphSection || !CB->isIndirectCall()) @@ -2909,7 +2909,7 @@ void CodeGenModule::CreateCalleeTypeMetadataForIcall(const QualType &QT, getLLVMContext(), {llvm::ConstantAsMetadata::get(llvm::ConstantInt::get( llvm::Type::getInt64Ty(getLLVMContext()), 0)), TypeIdMD}); - llvm::MDTuple *MDN = llvm::MDNode::get(getLLVMContext(), { TypeTuple }); + llvm::MDTuple *MDN = llvm::MDNode::get(getLLVMContext(), {TypeTuple}); CB->setMetadata(llvm::LLVMContext::MD_callee_type, MDN); } @@ -3041,7 +3041,7 @@ void CodeGenModule::SetFunctionAttributes(GlobalDecl GD, llvm::Function *F, // jump table. if (!CodeGenOpts.SanitizeCfiCrossDso || !CodeGenOpts.SanitizeCfiCanonicalJumpTables) -CreateFunctionTypeMetadataForIcall(FD, F); +createFunctionTypeMetadataForIcall(FD, F); if (LangOpts.Sanitize.has(SanitizerKind::KCFI)) setKCFIType(FD, F); diff --git a/clang/lib/CodeGen/CodeGenModule.h b/clang/lib/CodeGen/CodeGenModule.h index dfbe4388349dd..4b53f0f241b52 100644 --- a/clang/lib/CodeGen/CodeGenModule.h +++ b/clang/lib/CodeGen/CodeGenModule.h @@ -1619,11 +1619,11 @@ class CodeGenModule : public CodeGenTypeCache { llvm::Metadata *CreateMetadataIdentifierGeneralized(QualType T); /// Create and attach type metadata to the given function. - void CreateFunctionTypeMetadataForIcall(const FunctionDecl *FD, + void createFunctionTypeMetadataForIcall(const FunctionDecl *FD, llvm::Function *F); /// Create and attach type metadata to the given call. - void CreateCalleeTypeMetadataForIcall(const QualType &QT, llvm::CallBase *CB); + void createCa
[llvm-branch-commits] [clang] [clang] callee_type metadata for indirect calls (PR #117036)
https://github.com/Prabhuk updated https://github.com/llvm/llvm-project/pull/117036 >From b7fbe09b32ff02d4f7c52d82fbf8b5cd28138852 Mon Sep 17 00:00:00 2001 From: prabhukr Date: Wed, 23 Apr 2025 04:05:47 + Subject: [PATCH] Address review comments. Created using spr 1.3.6-beta.1 --- clang/lib/CodeGen/CGCall.cpp| 8 clang/lib/CodeGen/CodeGenModule.cpp | 10 +- clang/lib/CodeGen/CodeGenModule.h | 4 ++-- 3 files changed, 11 insertions(+), 11 deletions(-) diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp index 185ee1a970aac..d8ab7140f7943 100644 --- a/clang/lib/CodeGen/CGCall.cpp +++ b/clang/lib/CodeGen/CGCall.cpp @@ -5780,19 +5780,19 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo, if (callOrInvoke) { *callOrInvoke = CI; if (CGM.getCodeGenOpts().CallGraphSection) { - assert((TargetDecl && TargetDecl->getFunctionType() || - Callee.getAbstractInfo().getCalleeFunctionProtoType()) && - "cannot find callsite type"); QualType CST; if (TargetDecl && TargetDecl->getFunctionType()) CST = QualType(TargetDecl->getFunctionType(), 0); else if (const auto *FPT = Callee.getAbstractInfo().getCalleeFunctionProtoType()) CST = QualType(FPT, 0); + else +llvm_unreachable( +"Cannot find the callee type to generate callee_type metadata."); // Set type identifier metadata of indirect calls for call graph section. if (!CST.isNull()) -CGM.CreateCalleeTypeMetadataForIcall(CST, *callOrInvoke); +CGM.createCalleeTypeMetadataForIcall(CST, *callOrInvoke); } } diff --git a/clang/lib/CodeGen/CodeGenModule.cpp b/clang/lib/CodeGen/CodeGenModule.cpp index 43cd2405571cf..2fc99639a75cb 100644 --- a/clang/lib/CodeGen/CodeGenModule.cpp +++ b/clang/lib/CodeGen/CodeGenModule.cpp @@ -2654,7 +2654,7 @@ void CodeGenModule::SetLLVMFunctionAttributesForDefinition(const Decl *D, // Skip available_externally functions. They won't be codegen'ed in the // current module anyway. if (getContext().GetGVALinkageForFunction(FD) != GVA_AvailableExternally) -CreateFunctionTypeMetadataForIcall(FD, F); +createFunctionTypeMetadataForIcall(FD, F); } } @@ -2868,7 +2868,7 @@ static bool hasExistingGeneralizedTypeMD(llvm::Function *F) { return MD->hasGeneralizedMDString(); } -void CodeGenModule::CreateFunctionTypeMetadataForIcall(const FunctionDecl *FD, +void CodeGenModule::createFunctionTypeMetadataForIcall(const FunctionDecl *FD, llvm::Function *F) { if (CodeGenOpts.CallGraphSection && !hasExistingGeneralizedTypeMD(F) && (!F->hasLocalLinkage() || @@ -2898,7 +2898,7 @@ void CodeGenModule::CreateFunctionTypeMetadataForIcall(const FunctionDecl *FD, F->addTypeMetadata(0, llvm::ConstantAsMetadata::get(CrossDsoTypeId)); } -void CodeGenModule::CreateCalleeTypeMetadataForIcall(const QualType &QT, +void CodeGenModule::createCalleeTypeMetadataForIcall(const QualType &QT, llvm::CallBase *CB) { // Only if needed for call graph section and only for indirect calls. if (!CodeGenOpts.CallGraphSection || !CB->isIndirectCall()) @@ -2909,7 +2909,7 @@ void CodeGenModule::CreateCalleeTypeMetadataForIcall(const QualType &QT, getLLVMContext(), {llvm::ConstantAsMetadata::get(llvm::ConstantInt::get( llvm::Type::getInt64Ty(getLLVMContext()), 0)), TypeIdMD}); - llvm::MDTuple *MDN = llvm::MDNode::get(getLLVMContext(), { TypeTuple }); + llvm::MDTuple *MDN = llvm::MDNode::get(getLLVMContext(), {TypeTuple}); CB->setMetadata(llvm::LLVMContext::MD_callee_type, MDN); } @@ -3041,7 +3041,7 @@ void CodeGenModule::SetFunctionAttributes(GlobalDecl GD, llvm::Function *F, // jump table. if (!CodeGenOpts.SanitizeCfiCrossDso || !CodeGenOpts.SanitizeCfiCanonicalJumpTables) -CreateFunctionTypeMetadataForIcall(FD, F); +createFunctionTypeMetadataForIcall(FD, F); if (LangOpts.Sanitize.has(SanitizerKind::KCFI)) setKCFIType(FD, F); diff --git a/clang/lib/CodeGen/CodeGenModule.h b/clang/lib/CodeGen/CodeGenModule.h index dfbe4388349dd..4b53f0f241b52 100644 --- a/clang/lib/CodeGen/CodeGenModule.h +++ b/clang/lib/CodeGen/CodeGenModule.h @@ -1619,11 +1619,11 @@ class CodeGenModule : public CodeGenTypeCache { llvm::Metadata *CreateMetadataIdentifierGeneralized(QualType T); /// Create and attach type metadata to the given function. - void CreateFunctionTypeMetadataForIcall(const FunctionDecl *FD, + void createFunctionTypeMetadataForIcall(const FunctionDecl *FD, llvm::Function *F); /// Create and attach type metadata to the given call. - void CreateCalleeTypeMetadataForIcall(const QualType &QT, llvm::CallBase *CB); + void createCa
[llvm-branch-commits] [clang] [llvm] [llvm] Introduce callee_type metadata (PR #87573)
https://github.com/Prabhuk updated https://github.com/llvm/llvm-project/pull/87573 >From a8a5848885e12c771f12cfa33b4dbc6a0272e925 Mon Sep 17 00:00:00 2001 From: Prabhuk Date: Mon, 22 Apr 2024 11:34:04 -0700 Subject: [PATCH 01/12] Update clang/lib/CodeGen/CodeGenModule.cpp Cleaner if checks. Co-authored-by: Matt Arsenault --- clang/lib/CodeGen/CodeGenModule.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/clang/lib/CodeGen/CodeGenModule.cpp b/clang/lib/CodeGen/CodeGenModule.cpp index e19bbee996f58..ff1586d2fa8ab 100644 --- a/clang/lib/CodeGen/CodeGenModule.cpp +++ b/clang/lib/CodeGen/CodeGenModule.cpp @@ -2711,7 +2711,7 @@ void CodeGenModule::CreateFunctionTypeMetadataForIcall(const FunctionDecl *FD, void CodeGenModule::CreateFunctionTypeMetadataForIcall(const QualType &QT, llvm::CallBase *CB) { // Only if needed for call graph section and only for indirect calls. - if (!(CodeGenOpts.CallGraphSection && CB && CB->isIndirectCall())) + if (!CodeGenOpts.CallGraphSection || !CB || !CB->isIndirectCall()) return; auto *MD = CreateMetadataIdentifierGeneralized(QT); >From 019b2ca5e1c263183ed114e0b967b4e77b4a17a8 Mon Sep 17 00:00:00 2001 From: Prabhuk Date: Mon, 22 Apr 2024 11:34:31 -0700 Subject: [PATCH 02/12] Update clang/lib/CodeGen/CodeGenModule.cpp Update the comments as suggested. Co-authored-by: Matt Arsenault --- clang/lib/CodeGen/CodeGenModule.cpp | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/clang/lib/CodeGen/CodeGenModule.cpp b/clang/lib/CodeGen/CodeGenModule.cpp index ff1586d2fa8ab..5635a87d2358a 100644 --- a/clang/lib/CodeGen/CodeGenModule.cpp +++ b/clang/lib/CodeGen/CodeGenModule.cpp @@ -2680,9 +2680,9 @@ void CodeGenModule::CreateFunctionTypeMetadataForIcall(const FunctionDecl *FD, bool EmittedMDIdGeneralized = false; if (CodeGenOpts.CallGraphSection && (!F->hasLocalLinkage() || - F->getFunction().hasAddressTaken(nullptr, /* IgnoreCallbackUses */ true, -/* IgnoreAssumeLikeCalls */ true, -/* IgnoreLLVMUsed */ false))) { + F->getFunction().hasAddressTaken(nullptr, /*IgnoreCallbackUses=*/ true, +/*IgnoreAssumeLikeCalls=*/ true, +/*IgnoreLLVMUsed=*/ false))) { F->addTypeMetadata(0, CreateMetadataIdentifierGeneralized(FD->getType())); EmittedMDIdGeneralized = true; } >From 99242900c51778abd4b7e7f4361b09202b7abcda Mon Sep 17 00:00:00 2001 From: Prabhuk Date: Mon, 29 Apr 2024 11:53:40 -0700 Subject: [PATCH 03/12] dyn_cast to isa Created using spr 1.3.6-beta.1 --- clang/lib/CodeGen/CGCall.cpp | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp index 526a63b24ff83..45033ced1d834 100644 --- a/clang/lib/CodeGen/CGCall.cpp +++ b/clang/lib/CodeGen/CGCall.cpp @@ -5713,8 +5713,8 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo, if (callOrInvoke && *callOrInvoke && (*callOrInvoke)->isIndirectCall()) { if (const FunctionDecl *FD = dyn_cast_or_null(TargetDecl)) { // Type id metadata is set only for C/C++ contexts. -if (dyn_cast(FD) || dyn_cast(FD) || -dyn_cast(FD)) { +if (isa(FD) || isa(FD) || +isa(FD)) { CGM.CreateFunctionTypeMetadataForIcall(FD->getType(), *callOrInvoke); } } >From 24882b15939b781bcf28d87fdf4f6e8834b6cfde Mon Sep 17 00:00:00 2001 From: prabhukr Date: Tue, 10 Dec 2024 14:54:27 -0800 Subject: [PATCH 04/12] Address review comments. Break llvm and clang patches. Created using spr 1.3.6-beta.1 --- llvm/lib/IR/Verifier.cpp | 7 +++ llvm/test/Verifier/operand-bundles.ll | 4 ++-- 2 files changed, 5 insertions(+), 6 deletions(-) diff --git a/llvm/lib/IR/Verifier.cpp b/llvm/lib/IR/Verifier.cpp index 0ad7ba555bfad..b72672e7b8e56 100644 --- a/llvm/lib/IR/Verifier.cpp +++ b/llvm/lib/IR/Verifier.cpp @@ -3707,10 +3707,9 @@ void Verifier::visitCallBase(CallBase &Call) { if (Intrinsic::ID ID = (Intrinsic::ID)F->getIntrinsicID()) visitIntrinsicCall(ID, Call); - // Verify that a callsite has at most one "deopt", at most one "funclet", at - // most one "gc-transition", at most one "cfguardtarget", at most one "type", - // at most one "preallocated" operand bundle, and at most one "ptrauth" - // operand bundle. + // Verify that a callsite has at most one operand bundle for each of the + // following: "deopt", "funclet", "gc-transition", "cfguardtarget", "type", + // "preallocated", and "ptrauth". bool FoundDeoptBundle = false, FoundFuncletBundle = false, FoundGCTransitionBundle = false, FoundCFGuardTargetBundle = false, FoundPreallocatedBundle = false, FoundGCLiveBundle = false, diff --git a/llvm/test/Verifier/operand-bundles.ll b/llvm/t
[llvm-branch-commits] AArch64: Relax x16/x17 constraint on AUT in certain cases. (PR #132857)
pcc wrote: @asl Ping https://github.com/llvm/llvm-project/pull/132857 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [MLIR][OpenMP] Assert on map translation functions, NFC (PR #137199)
skatrak wrote: This PR is based on the following analysis of the current state of OpenMP map translation to LLVM IR. If there are any issues with this PR, they are likely easier to spot on this graph:  https://github.com/llvm/llvm-project/pull/137199 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [mlir] [MLIR][OpenMP] Simplify OpenMP device codegen (PR #137201)
skatrak wrote: PR stack: - #137198 - #137199 - #137200 - #137201 https://github.com/llvm/llvm-project/pull/137201 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [mlir] [mlir][OpenMP] Convert omp.cancel sections to LLVMIR (PR #137193)
tblah wrote: PR Stack: Cancel parallel https://github.com/llvm/llvm-project/pull/137192 Cancel sections https://github.com/llvm/llvm-project/pull/137193 Cancel wsloop https://github.com/llvm/llvm-project/pull/137194 Cancellation point (TODO) Cancel(lation point) taskgroup (TODO) https://github.com/llvm/llvm-project/pull/137193 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] Bundle operands to specify denormal modes (PR #136501)
https://github.com/spavloff updated https://github.com/llvm/llvm-project/pull/136501 >From 22742e24c1eef3ecc0fb4294dac9f42c9d160019 Mon Sep 17 00:00:00 2001 From: Serge Pavlov Date: Thu, 17 Apr 2025 18:42:15 +0700 Subject: [PATCH 1/2] Bundle operands to specify denormal modes Two new operands are now supported in the "fp.control" operand bundle: * "denorm.in=xxx" - specifies the inpot denormal mode. * "denorm.out=xxx" - specifies the output denormal mode. Here xxx must be one of the following values: * "ieee" - preserve denormals. * "zero" - flush to zero preserving sign. * "pzero" - flush to positive zero. * "dyn" - mode is dynamically read from a control register. These values align those permitted in the "denormal-fp-math" function attribute. --- llvm/docs/LangRef.rst | 18 +- llvm/include/llvm/ADT/FloatingPointMode.h | 33 llvm/include/llvm/IR/InstrTypes.h | 21 +++ llvm/lib/Analysis/ConstantFolding.cpp | 24 ++- llvm/lib/IR/Instructions.cpp | 168 +- llvm/lib/IR/Verifier.cpp | 14 ++ .../constant-fold-fp-denormal-strict.ll | 91 ++ llvm/test/Verifier/fp-intrinsics.ll | 36 8 files changed, 394 insertions(+), 11 deletions(-) create mode 100644 llvm/test/Transforms/InstSimplify/constant-fold-fp-denormal-strict.ll diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst index 8252971aa8a58..954f0e96b46f6 100644 --- a/llvm/docs/LangRef.rst +++ b/llvm/docs/LangRef.rst @@ -3084,7 +3084,10 @@ floating-point control modes and the treatment of status bits respectively. An operand bundle tagged with "fp.control" contains information about the control modes used for the operation execution. Operands specified in this -bundle represent particular options. Currently, only rounding mode is supported. +bundle represent particular options. The following modes are supported: + +* rounding mode, +* denormal behavior. Rounding mode is represented by a metadata string value, which specifies the the mode used for the operation evaluation. Possible values are: @@ -3103,6 +3106,19 @@ rounding rounding mode is taken from the control register (dynamic rounding). In the particular case of :ref:`default floating-point environment `, the operation uses rounding to nearest, ties to even. +Denormal behavior defines whether denormal values are flushed to zero during +the call's execution. This behavior is specified separately for input and +output values. Such specification is a string, which starts with +"denorm.in=" or "denorm.out=" respectively. The remainder of the string should +be one of the values: + +:: + +``"ieee"`` - preserve denormals, +``"zero"`` - flush to +0.0 or -0.0 depending on value sign, +``"pzero"`` - flush to +0.0, +``"dyn"`` - concrete mode is read from some register. + An operand bundle tagged with "fp.except" may be associated with operations that can read or write floating-point exception flags. It contains a single metadata string value, which can have one of the following values: diff --git a/llvm/include/llvm/ADT/FloatingPointMode.h b/llvm/include/llvm/ADT/FloatingPointMode.h index 639d931ef88fe..5fceccfd1d0bf 100644 --- a/llvm/include/llvm/ADT/FloatingPointMode.h +++ b/llvm/include/llvm/ADT/FloatingPointMode.h @@ -234,6 +234,39 @@ void DenormalMode::print(raw_ostream &OS) const { OS << denormalModeKindName(Output) << ',' << denormalModeKindName(Input); } +/// If the specified string represents denormal mode as used in operand bundles, +/// returns the corresponding mode. +inline std::optional +parseDenormalKindFromOperandBundle(StringRef Str) { + if (Str == "ieee") +return DenormalMode::IEEE; + if (Str == "zero") +return DenormalMode::PreserveSign; + if (Str == "pzero") +return DenormalMode::PositiveZero; + if (Str == "dyn") +return DenormalMode::Dynamic; + return std::nullopt; +} + +/// Converts the specified denormal mode into string suitable for use in an +/// operand bundle. +inline std::optional +printDenormalForOperandBundle(DenormalMode::DenormalModeKind Mode) { + switch (Mode) { + case DenormalMode::IEEE: +return "ieee"; + case DenormalMode::PreserveSign: +return "zero"; + case DenormalMode::PositiveZero: +return "pzero"; + case DenormalMode::Dynamic: +return "dyn"; + default: +return std::nullopt; + } +} + /// Floating-point class tests, supported by 'is_fpclass' intrinsic. Actual /// test may be an OR combination of basic tests. enum FPClassTest : unsigned { diff --git a/llvm/include/llvm/IR/InstrTypes.h b/llvm/include/llvm/IR/InstrTypes.h index 8425243e5efe9..8492c911ffc6a 100644 --- a/llvm/include/llvm/IR/InstrTypes.h +++ b/llvm/include/llvm/IR/InstrTypes.h @@ -1092,12 +1092,24 @@ template class OperandBundleDefT { using OperandBundleDef = OperandBundleDefT; using ConstOperandBundleDef = OperandBundleDefT; +s
[llvm-branch-commits] [mlir] [MLIR][OpenMP] Assert on map translation functions, NFC (PR #137199)
llvmbot wrote: @llvm/pr-subscribers-mlir-openmp Author: Sergio Afonso (skatrak) Changes This patch adds assertions to map-related MLIR to LLVM IR translation functions and utils to explicitly document whether they are intended for host or device compilation only. Over time, map-related handling has increased in complexity. This is compounded by the fact that some handling is device-specific and some is host-specific. By explicitly asserting on these functions on the expected compilation pass, the flow should become slighlty easier to follow. --- Full diff: https://github.com/llvm/llvm-project/pull/137199.diff 1 Files Affected: - (modified) mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+22-2) ``diff diff --git a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp index 52aa1fbfab2c1..6d80c66e3596e 100644 --- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp +++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp @@ -3563,6 +3563,9 @@ static llvm::omp::OpenMPOffloadMappingFlags mapParentWithMembers( LLVM::ModuleTranslation &moduleTranslation, llvm::IRBuilderBase &builder, llvm::OpenMPIRBuilder &ompBuilder, DataLayout &dl, MapInfosTy &combinedInfo, MapInfoData &mapData, uint64_t mapDataIndex, bool isTargetParams) { + assert(!ompBuilder.Config.isTargetDevice() && + "function only supported for host device codegen"); + // Map the first segment of our structure combinedInfo.Types.emplace_back( isTargetParams @@ -3671,6 +3674,8 @@ static void processMapMembersWithParent( llvm::OpenMPIRBuilder &ompBuilder, DataLayout &dl, MapInfosTy &combinedInfo, MapInfoData &mapData, uint64_t mapDataIndex, llvm::omp::OpenMPOffloadMappingFlags memberOfFlag) { + assert(!ompBuilder.Config.isTargetDevice() && + "function only supported for host device codegen"); auto parentClause = llvm::cast(mapData.MapClause[mapDataIndex]); @@ -3784,6 +3789,9 @@ static void processMapWithMembersOf(LLVM::ModuleTranslation &moduleTranslation, DataLayout &dl, MapInfosTy &combinedInfo, MapInfoData &mapData, uint64_t mapDataIndex, bool isTargetParams) { + assert(!ompBuilder.Config.isTargetDevice() && + "function only supported for host device codegen"); + auto parentClause = llvm::cast(mapData.MapClause[mapDataIndex]); @@ -3825,6 +3833,8 @@ static void createAlteredByCaptureMap(MapInfoData &mapData, LLVM::ModuleTranslation &moduleTranslation, llvm::IRBuilderBase &builder) { + assert(!moduleTranslation.getOpenMPBuilder()->Config.isTargetDevice() && + "function only supported for host device codegen"); for (size_t i = 0; i < mapData.MapClause.size(); ++i) { // if it's declare target, skip it, it's handled separately. if (!mapData.IsDeclareTarget[i]) { @@ -3889,6 +3899,9 @@ static void genMapInfos(llvm::IRBuilderBase &builder, LLVM::ModuleTranslation &moduleTranslation, DataLayout &dl, MapInfosTy &combinedInfo, MapInfoData &mapData, bool isTargetParams = false) { + assert(!moduleTranslation.getOpenMPBuilder()->Config.isTargetDevice() && + "function only supported for host device codegen"); + // We wish to modify some of the methods in which arguments are // passed based on their capture type by the target region, this can // involve generating new loads and stores, which changes the @@ -3900,8 +3913,7 @@ static void genMapInfos(llvm::IRBuilderBase &builder, // kernel arg structure. It primarily becomes relevant in cases like // bycopy, or byref range'd arrays. In the default case, we simply // pass thee pointer byref as both basePointer and pointer. - if (!moduleTranslation.getOpenMPBuilder()->Config.isTargetDevice()) -createAlteredByCaptureMap(mapData, moduleTranslation, builder); + createAlteredByCaptureMap(mapData, moduleTranslation, builder); llvm::OpenMPIRBuilder *ompBuilder = moduleTranslation.getOpenMPBuilder(); @@ -3935,6 +3947,8 @@ emitUserDefinedMapper(Operation *declMapperOp, llvm::IRBuilderBase &builder, static llvm::Expected getOrCreateUserDefinedMapperFunc(Operation *op, llvm::IRBuilderBase &builder, LLVM::ModuleTranslation &moduleTranslation) { + assert(!moduleTranslation.getOpenMPBuilder()->Config.isTargetDevice() && + "function only supported for host device codegen"); auto declMapperOp = cast(op); std::string mapperFuncName = moduleTranslation.getOpenMPBuilder()->createPlatformSpecificName( @@ -3951,6 +3965,8 @@ static llvm::Expected emitUserDefinedMapper(Operation *op, llvm::IRBuilderBase &builder,
[llvm-branch-commits] [llvm] Bundle operands to specify denormal modes (PR #136501)
@@ -678,6 +682,71 @@ fp::ExceptionBehavior CallBase::getExceptionBehavior() const { return fp::ebIgnore; } +DenormalMode::DenormalModeKind CallBase::getInputDenormMode() const { + if (auto InDenormBundle = getOperandBundle(LLVMContext::OB_fp_control)) { +auto DenormOperand = +getBundleOperandByPrefix(*InDenormBundle, "denorm.in="); +if (DenormOperand) { + if (auto Mode = parseDenormalKindFromOperandBundle(*DenormOperand)) +return *Mode; +} else { + return DenormalMode::IEEE; +} + } + + if (!getParent()) +return DenormalMode::IEEE; + const Function *F = getFunction(); + if (!F) +return DenormalMode::IEEE; + + Type *Ty = nullptr; + for (auto &A : args()) +if (auto *T = A.get()->getType(); T->isFPOrFPVectorTy()) { + Ty = T; + break; +} + assert(Ty && "Some input argument must be of floating-point type"); spavloff wrote: Introduced "denorm.f32.in=" and "denorm.f32.out=" bundle operands. https://github.com/llvm/llvm-project/pull/136501 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] Bundle operands to specify denormal modes (PR #136501)
@@ -678,6 +682,71 @@ fp::ExceptionBehavior CallBase::getExceptionBehavior() const { return fp::ebIgnore; } +DenormalMode::DenormalModeKind CallBase::getInputDenormMode() const { + if (auto InDenormBundle = getOperandBundle(LLVMContext::OB_fp_control)) { +auto DenormOperand = +getBundleOperandByPrefix(*InDenormBundle, "denorm.in="); +if (DenormOperand) { + if (auto Mode = parseDenormalKindFromOperandBundle(*DenormOperand)) +return *Mode; +} else { + return DenormalMode::IEEE; +} + } + + if (!getParent()) +return DenormalMode::IEEE; + const Function *F = getFunction(); + if (!F) +return DenormalMode::IEEE; spavloff wrote: Removed this search. https://github.com/llvm/llvm-project/pull/136501 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits