[llvm-branch-commits] [llvm] [LAA] Use PSE::getSymbolicMaxBackedgeTakenCount. (PR #93499)
https://github.com/fhahn closed https://github.com/llvm/llvm-project/pull/93499 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LAA] Use PSE::getSymbolicMaxBackedgeTakenCount. (PR #93499)
https://github.com/fhahn reopened https://github.com/llvm/llvm-project/pull/93499 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LAA] Use SCEVUse to add extra NUW flags to pointer bounds. (WIP) (PR #91962)
https://github.com/fhahn updated https://github.com/llvm/llvm-project/pull/91962 >From 9a8305b0041586627b3c3c8a1dc954306767cadc Mon Sep 17 00:00:00 2001 From: Florian Hahn Date: Wed, 1 May 2024 11:03:42 +0100 Subject: [PATCH 1/3] [SCEV,LAA] Add tests to make sure scoped SCEVs don't impact other SCEVs. --- .../LoopAccessAnalysis/scoped-scevs.ll| 182 ++ 1 file changed, 182 insertions(+) create mode 100644 llvm/test/Analysis/LoopAccessAnalysis/scoped-scevs.ll diff --git a/llvm/test/Analysis/LoopAccessAnalysis/scoped-scevs.ll b/llvm/test/Analysis/LoopAccessAnalysis/scoped-scevs.ll new file mode 100644 index 0..323ba2a739cf8 --- /dev/null +++ b/llvm/test/Analysis/LoopAccessAnalysis/scoped-scevs.ll @@ -0,0 +1,182 @@ +; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py UTC_ARGS: --version 4 +; RUN: opt -passes='print,print' -disable-output %s 2>&1 | FileCheck --check-prefixes=LAA,AFTER %s +; RUN: opt -passes='print,print,print' -disable-output %s 2>&1 | FileCheck --check-prefixes=BEFORE,LAA,AFTER %s + +target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128" + +declare void @use(ptr) + +; Check that scoped expressions created by LAA do not interfere with non-scoped +; SCEVs with the same operands. The tests first run print to +; populate the SCEV cache. They contain a GEP computing A+405, which is the end +; of the accessed range, before and/or after the loop. No nuw flags should be +; added to them in the second print output. + +define ptr @test_ptr_range_end_computed_before_and_after_loop(ptr %A) { +; BEFORE-LABEL: 'test_ptr_range_end_computed_before_and_after_loop' +; BEFORE-NEXT: Classifying expressions for: @test_ptr_range_end_computed_before_and_after_loop +; BEFORE:%x = getelementptr inbounds i8, ptr %A, i64 405 +; BEFORE-NEXT:--> (405 + %A) U: full-set S: full-set +; BEFORE:%y = getelementptr inbounds i8, ptr %A, i64 405 +; BEFORE-NEXT:--> (405 + %A) U: full-set S: full-set +; +; LAA-LABEL: 'test_ptr_range_end_computed_before_and_after_loop' +; LAA-NEXT:loop: +; LAA-NEXT: Memory dependences are safe with run-time checks +; LAA-NEXT: Dependences: +; LAA-NEXT: Run-time memory checks: +; LAA-NEXT: Check 0: +; LAA-NEXT:Comparing group ([[GRP1:0x[0-9a-f]+]]): +; LAA-NEXT: %gep.A.400 = getelementptr inbounds i32, ptr %A.1, i64 %iv +; LAA-NEXT:Against group ([[GRP2:0x[0-9a-f]+]]): +; LAA-NEXT: %gep.A = getelementptr inbounds i8, ptr %A, i64 %iv +; LAA-NEXT: Grouped accesses: +; LAA-NEXT:Group [[GRP1]]: +; LAA-NEXT: (Low: (1 + %A) High: (405 + %A)) +; LAA-NEXT:Member: {(1 + %A),+,4}<%loop> +; LAA-NEXT:Group [[GRP2]]: +; LAA-NEXT: (Low: %A High: (101 + %A)) +; LAA-NEXT:Member: {%A,+,1}<%loop> +; LAA-EMPTY: +; LAA-NEXT: Non vectorizable stores to invariant address were not found in loop. +; LAA-NEXT: SCEV assumptions: +; LAA-EMPTY: +; LAA-NEXT: Expressions re-written: +; +; AFTER-LABEL: 'test_ptr_range_end_computed_before_and_after_loop' +; AFTER-NEXT: Classifying expressions for: @test_ptr_range_end_computed_before_and_after_loop +; AFTER:%x = getelementptr inbounds i8, ptr %A, i64 405 +; AFTER-NEXT:--> (405 + %A) U: full-set S: full-set +; AFTER:%y = getelementptr inbounds i8, ptr %A, i64 405 +; AFTER-NEXT:--> (405 + %A) U: full-set S: full-set +entry: + %A.1 = getelementptr inbounds i8, ptr %A, i64 1 + %x = getelementptr inbounds i8, ptr %A, i64 405 + call void @use(ptr %x) + br label %loop + +loop: + %iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ] + %gep.A.400 = getelementptr inbounds i32, ptr %A.1, i64 %iv + %gep.A = getelementptr inbounds i8, ptr %A, i64 %iv + %l = load i8, ptr %gep.A, align 1 + %ext = zext i8 %l to i32 + store i32 %ext, ptr %gep.A.400, align 4 + %iv.next = add nuw nsw i64 %iv, 1 + %ec = icmp eq i64 %iv, 100 + br i1 %ec, label %exit, label %loop + +exit: + %y = getelementptr inbounds i8, ptr %A, i64 405 + ret ptr %y +} + +define void @test_ptr_range_end_computed_before_loop(ptr %A) { +; BEFORE-LABEL: 'test_ptr_range_end_computed_before_loop' +; BEFORE-NEXT: Classifying expressions for: @test_ptr_range_end_computed_before_loop +; BEFORE-NEXT:%A.1 = getelementptr inbounds i8, ptr %A, i64 1 +; BEFORE-NEXT:--> (1 + %A) U: full-set S: full-set +; BEFORE-NEXT:%x = getelementptr inbounds i8, ptr %A, i64 405 +; +; LAA-LABEL: 'test_ptr_range_end_computed_before_loop' +; LAA-NEXT:loop: +; LAA-NEXT: Memory dependences are safe with run-time checks +; LAA-NEXT: Dependences: +; LAA-NEXT: Run-time memory checks: +; LAA-NEXT: Check 0: +; LAA-NEXT:Comparing group ([[GRP3:0x[0-9a-f]+]]): +; LAA-NEXT: %gep.A.400 = getelementptr inbounds i32, ptr %A.1, i64 %iv +; LAA-NEXT:Against group ([[GRP4:0x[0-9a-f]+]]): +; LAA-NEXT: %gep.A = getelementptr inbounds i8, ptr
[llvm-branch-commits] [compiler-rt] [TySan] Fixed false positive when accessing offset member variables (PR #95387)
@@ -221,7 +221,17 @@ __tysan_check(void *addr, int size, tysan_type_descriptor *td, int flags) { OldTDPtr -= i; OldTD = *OldTDPtr; -if (!isAliasingLegal(td, OldTD)) +tysan_type_descriptor *InternalMember = OldTD; fhahn wrote: Could you add a comment here indicating what this does? https://github.com/llvm/llvm-project/pull/95387 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [TySan] A Type Sanitizer (Clang) (PR #76260)
https://github.com/fhahn updated https://github.com/llvm/llvm-project/pull/76260 >From f45d4dc65537f3664472c873062fbda2a9bed984 Mon Sep 17 00:00:00 2001 From: Florian Hahn Date: Thu, 18 Apr 2024 23:01:03 +0100 Subject: [PATCH 1/2] [TySan] A Type Sanitizer (Clang) --- clang/include/clang/Basic/Features.def | 1 + clang/include/clang/Basic/Sanitizers.def | 3 ++ clang/include/clang/Driver/SanitizerArgs.h | 1 + clang/lib/CodeGen/BackendUtil.cpp | 6 +++ clang/lib/CodeGen/CGDecl.cpp | 3 +- clang/lib/CodeGen/CGDeclCXX.cpp| 4 ++ clang/lib/CodeGen/CodeGenFunction.cpp | 2 + clang/lib/CodeGen/CodeGenModule.cpp| 12 +++--- clang/lib/CodeGen/CodeGenTBAA.cpp | 6 ++- clang/lib/CodeGen/SanitizerMetadata.cpp| 44 +- clang/lib/CodeGen/SanitizerMetadata.h | 13 --- clang/lib/Driver/SanitizerArgs.cpp | 13 +-- clang/lib/Driver/ToolChains/CommonArgs.cpp | 6 ++- clang/lib/Driver/ToolChains/Darwin.cpp | 6 +++ clang/lib/Driver/ToolChains/Linux.cpp | 2 + clang/test/Driver/sanitizer-ld.c | 23 +++ 16 files changed, 116 insertions(+), 29 deletions(-) diff --git a/clang/include/clang/Basic/Features.def b/clang/include/clang/Basic/Features.def index 53f410d3cb4bd..6a9921ffee884 100644 --- a/clang/include/clang/Basic/Features.def +++ b/clang/include/clang/Basic/Features.def @@ -100,6 +100,7 @@ FEATURE(numerical_stability_sanitizer, LangOpts.Sanitize.has(SanitizerKind::Nume FEATURE(memory_sanitizer, LangOpts.Sanitize.hasOneOf(SanitizerKind::Memory | SanitizerKind::KernelMemory)) +FEATURE(type_sanitizer, LangOpts.Sanitize.has(SanitizerKind::Type)) FEATURE(thread_sanitizer, LangOpts.Sanitize.has(SanitizerKind::Thread)) FEATURE(dataflow_sanitizer, LangOpts.Sanitize.has(SanitizerKind::DataFlow)) FEATURE(scudo, LangOpts.Sanitize.hasOneOf(SanitizerKind::Scudo)) diff --git a/clang/include/clang/Basic/Sanitizers.def b/clang/include/clang/Basic/Sanitizers.def index bee35e9dca7c3..4b59b43437c2c 100644 --- a/clang/include/clang/Basic/Sanitizers.def +++ b/clang/include/clang/Basic/Sanitizers.def @@ -73,6 +73,9 @@ SANITIZER("fuzzer", Fuzzer) // libFuzzer-required instrumentation, no linking. SANITIZER("fuzzer-no-link", FuzzerNoLink) +// TypeSanitizer +SANITIZER("type", Type) + // ThreadSanitizer SANITIZER("thread", Thread) diff --git a/clang/include/clang/Driver/SanitizerArgs.h b/clang/include/clang/Driver/SanitizerArgs.h index 47ef175302679..fde2ea3eac8ea 100644 --- a/clang/include/clang/Driver/SanitizerArgs.h +++ b/clang/include/clang/Driver/SanitizerArgs.h @@ -86,6 +86,7 @@ class SanitizerArgs { bool needsHwasanAliasesRt() const { return needsHwasanRt() && HwasanUseAliases; } + bool needsTysanRt() const { return Sanitizers.has(SanitizerKind::Type); } bool needsTsanRt() const { return Sanitizers.has(SanitizerKind::Thread); } bool needsMsanRt() const { return Sanitizers.has(SanitizerKind::Memory); } bool needsFuzzer() const { return Sanitizers.has(SanitizerKind::Fuzzer); } diff --git a/clang/lib/CodeGen/BackendUtil.cpp b/clang/lib/CodeGen/BackendUtil.cpp index b09680086248d..ff7cc5a8e48ba 100644 --- a/clang/lib/CodeGen/BackendUtil.cpp +++ b/clang/lib/CodeGen/BackendUtil.cpp @@ -80,6 +80,7 @@ #include "llvm/Transforms/Instrumentation/SanitizerBinaryMetadata.h" #include "llvm/Transforms/Instrumentation/SanitizerCoverage.h" #include "llvm/Transforms/Instrumentation/ThreadSanitizer.h" +#include "llvm/Transforms/Instrumentation/TypeSanitizer.h" #include "llvm/Transforms/ObjCARC.h" #include "llvm/Transforms/Scalar/EarlyCSE.h" #include "llvm/Transforms/Scalar/GVN.h" @@ -707,6 +708,11 @@ static void addSanitizers(const Triple &TargetTriple, MPM.addPass(createModuleToFunctionPassAdaptor(ThreadSanitizerPass())); } +if (LangOpts.Sanitize.has(SanitizerKind::Type)) { + MPM.addPass(ModuleTypeSanitizerPass()); + MPM.addPass(createModuleToFunctionPassAdaptor(TypeSanitizerPass())); +} + auto ASanPass = [&](SanitizerMask Mask, bool CompileKernel) { if (LangOpts.Sanitize.has(Mask)) { bool UseGlobalGC = asanUseGlobalsGC(TargetTriple, CodeGenOpts); diff --git a/clang/lib/CodeGen/CGDecl.cpp b/clang/lib/CodeGen/CGDecl.cpp index 90aa4c0745a8a..4933f0c95fa8a 100644 --- a/clang/lib/CodeGen/CGDecl.cpp +++ b/clang/lib/CodeGen/CGDecl.cpp @@ -484,7 +484,8 @@ void CodeGenFunction::EmitStaticVarDecl(const VarDecl &D, LocalDeclMap.find(&D)->second = Address(castedAddr, elemTy, alignment); CGM.setStaticLocalDeclAddress(&D, castedAddr); - CGM.getSanitizerMetadata()->reportGlobal(var, D); + CGM.getSanitizerMetadata()->reportGlobalToASan(var, D); + CGM.getSanitizerMetadata()->reportGlobalToTySan(var, D); // Emit global variable debug descriptor for static vars. CGDebugInfo *DI = getDebugInfo(); diff --git a/clang/lib/CodeGen/CGDeclCXX.cpp b/clang/lib/CodeGen/CGDeclCXX
[llvm-branch-commits] [clang] [compiler-rt] [TySan] A Type Sanitizer (Runtime Library) (PR #76261)
https://github.com/fhahn updated https://github.com/llvm/llvm-project/pull/76261 >From 733b3ed3f7441453889157834e0a5b6c288bf976 Mon Sep 17 00:00:00 2001 From: Florian Hahn Date: Thu, 27 Jun 2024 15:48:05 +0100 Subject: [PATCH] [tysan] Add runtime support --- clang/runtime/CMakeLists.txt | 2 +- .../cmake/Modules/AllSupportedArchDefs.cmake | 1 + compiler-rt/cmake/config-ix.cmake | 14 +- compiler-rt/lib/tysan/CMakeLists.txt | 64 compiler-rt/lib/tysan/lit.cfg | 35 ++ compiler-rt/lib/tysan/lit.site.cfg.in | 12 + compiler-rt/lib/tysan/tysan.cpp | 339 ++ compiler-rt/lib/tysan/tysan.h | 79 compiler-rt/lib/tysan/tysan.syms.extra| 2 + compiler-rt/lib/tysan/tysan_flags.inc | 17 + compiler-rt/lib/tysan/tysan_interceptors.cpp | 250 + compiler-rt/lib/tysan/tysan_platform.h| 93 + compiler-rt/test/tysan/CMakeLists.txt | 32 ++ compiler-rt/test/tysan/anon-ns.cpp| 41 +++ compiler-rt/test/tysan/anon-same-struct.c | 26 ++ compiler-rt/test/tysan/anon-struct.c | 27 ++ compiler-rt/test/tysan/basic.c| 65 compiler-rt/test/tysan/char-memcpy.c | 45 +++ compiler-rt/test/tysan/global.c | 31 ++ compiler-rt/test/tysan/int-long.c | 21 ++ compiler-rt/test/tysan/lit.cfg.py | 139 +++ compiler-rt/test/tysan/lit.site.cfg.py.in | 17 + compiler-rt/test/tysan/ptr-float.c| 19 + ...ruct-offset-multiple-compilation-units.cpp | 51 +++ compiler-rt/test/tysan/struct-offset.c| 26 ++ compiler-rt/test/tysan/struct.c | 39 ++ compiler-rt/test/tysan/union-wr-wr.c | 18 + compiler-rt/test/tysan/violation-pr45282.c| 32 ++ compiler-rt/test/tysan/violation-pr47137.c| 40 +++ compiler-rt/test/tysan/violation-pr51837.c| 34 ++ compiler-rt/test/tysan/violation-pr62544.c| 24 ++ compiler-rt/test/tysan/violation-pr62828.cpp | 44 +++ compiler-rt/test/tysan/violation-pr68655.cpp | 40 +++ compiler-rt/test/tysan/violation-pr86685.c| 29 ++ 34 files changed, 1746 insertions(+), 2 deletions(-) create mode 100644 compiler-rt/lib/tysan/CMakeLists.txt create mode 100644 compiler-rt/lib/tysan/lit.cfg create mode 100644 compiler-rt/lib/tysan/lit.site.cfg.in create mode 100644 compiler-rt/lib/tysan/tysan.cpp create mode 100644 compiler-rt/lib/tysan/tysan.h create mode 100644 compiler-rt/lib/tysan/tysan.syms.extra create mode 100644 compiler-rt/lib/tysan/tysan_flags.inc create mode 100644 compiler-rt/lib/tysan/tysan_interceptors.cpp create mode 100644 compiler-rt/lib/tysan/tysan_platform.h create mode 100644 compiler-rt/test/tysan/CMakeLists.txt create mode 100644 compiler-rt/test/tysan/anon-ns.cpp create mode 100644 compiler-rt/test/tysan/anon-same-struct.c create mode 100644 compiler-rt/test/tysan/anon-struct.c create mode 100644 compiler-rt/test/tysan/basic.c create mode 100644 compiler-rt/test/tysan/char-memcpy.c create mode 100644 compiler-rt/test/tysan/global.c create mode 100644 compiler-rt/test/tysan/int-long.c create mode 100644 compiler-rt/test/tysan/lit.cfg.py create mode 100644 compiler-rt/test/tysan/lit.site.cfg.py.in create mode 100644 compiler-rt/test/tysan/ptr-float.c create mode 100644 compiler-rt/test/tysan/struct-offset-multiple-compilation-units.cpp create mode 100644 compiler-rt/test/tysan/struct-offset.c create mode 100644 compiler-rt/test/tysan/struct.c create mode 100644 compiler-rt/test/tysan/union-wr-wr.c create mode 100644 compiler-rt/test/tysan/violation-pr45282.c create mode 100644 compiler-rt/test/tysan/violation-pr47137.c create mode 100644 compiler-rt/test/tysan/violation-pr51837.c create mode 100644 compiler-rt/test/tysan/violation-pr62544.c create mode 100644 compiler-rt/test/tysan/violation-pr62828.cpp create mode 100644 compiler-rt/test/tysan/violation-pr68655.cpp create mode 100644 compiler-rt/test/tysan/violation-pr86685.c diff --git a/clang/runtime/CMakeLists.txt b/clang/runtime/CMakeLists.txt index 65fcdc2868f03..ff2605b23d25b 100644 --- a/clang/runtime/CMakeLists.txt +++ b/clang/runtime/CMakeLists.txt @@ -122,7 +122,7 @@ if(LLVM_BUILD_EXTERNAL_COMPILER_RT AND EXISTS ${COMPILER_RT_SRC_ROOT}/) COMPONENT compiler-rt) # Add top-level targets that build specific compiler-rt runtimes. - set(COMPILER_RT_RUNTIMES fuzzer asan builtins dfsan lsan msan profile tsan ubsan ubsan-minimal) + set(COMPILER_RT_RUNTIMES fuzzer asan builtins dfsan lsan msan profile tsan tysan ubsan ubsan-minimal) foreach(runtime ${COMPILER_RT_RUNTIMES}) get_ext_project_build_command(build_runtime_cmd ${runtime}) add_custom_target(${runtime} diff --git a/compiler-rt/cmake/Modules/AllSupportedArchDefs.cmake b/compiler-rt/cmake/Modules/AllSupportedArchDefs.cmake index ac4a71202384d..4701b58de4bda 1006
[llvm-branch-commits] [llvm] [LV] Disable VPlan-based cost model for 19.x release. (PR #100097)
https://github.com/fhahn milestoned https://github.com/llvm/llvm-project/pull/100097 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Disable VPlan-based cost model for 19.x release. (PR #100097)
https://github.com/fhahn created https://github.com/llvm/llvm-project/pull/100097 As discussed in https://github.com/llvm/llvm-project/pull/92555 flip the default for the option added in https://github.com/llvm/llvm-project/pull/99536 to true. This restores the original behavior for the release branch to give the VPlan-based cost model more time to mature on main. >From a72a0bf44a8b259be3c62e79082d2fdc04fc2771 Mon Sep 17 00:00:00 2001 From: Florian Hahn Date: Tue, 23 Jul 2024 11:15:26 +0100 Subject: [PATCH] [LV] Disable VPlan-based cost model for 19.x release. As discussed in https://github.com/llvm/llvm-project/pull/92555 flip the default for the option added in https://github.com/llvm/llvm-project/pull/99536 to true. This restores the original behavior for the release branch to give the VPlan-based cost model more time to mature on main. --- llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | 2 +- .../test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll | 2 -- 2 files changed, 1 insertion(+), 3 deletions(-) diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp index 6d28b8fabe42e..68363abdb817a 100644 --- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp +++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp @@ -206,7 +206,7 @@ static cl::opt VectorizeMemoryCheckThreshold( cl::desc("The maximum allowed number of runtime memory checks")); static cl::opt UseLegacyCostModel( -"vectorize-use-legacy-cost-model", cl::init(false), cl::Hidden, +"vectorize-use-legacy-cost-model", cl::init(true), cl::Hidden, cl::desc("Use the legacy cost model instead of the VPlan-based cost model. " "This option will be removed in the future.")); diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll b/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll index fc310f4163082..1a78eaf644723 100644 --- a/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll +++ b/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll @@ -135,7 +135,6 @@ define void @vector_reverse_i64(ptr nocapture noundef writeonly %A, ptr nocaptur ; CHECK-NEXT: LV: Interleaving is not beneficial. ; CHECK-NEXT: LV: Found a vectorizable loop (vscale x 4) in ; CHECK-NEXT: LEV: Epilogue vectorization is not profitable for this loop -; CHECK-NEXT: VF picked by VPlan cost model: vscale x 4 ; CHECK-NEXT: Executing best plan with VF=vscale x 4, UF=1 ; CHECK-NEXT: VPlan 'Final VPlan for VF={vscale x 4},UF>=1' { ; CHECK-NEXT: Live-in vp<%0> = VF * UF @@ -339,7 +338,6 @@ define void @vector_reverse_f32(ptr nocapture noundef writeonly %A, ptr nocaptur ; CHECK-NEXT: LV: Interleaving is not beneficial. ; CHECK-NEXT: LV: Found a vectorizable loop (vscale x 4) in ; CHECK-NEXT: LEV: Epilogue vectorization is not profitable for this loop -; CHECK-NEXT: VF picked by VPlan cost model: vscale x 4 ; CHECK-NEXT: Executing best plan with VF=vscale x 4, UF=1 ; CHECK-NEXT: VPlan 'Final VPlan for VF={vscale x 4},UF>=1' { ; CHECK-NEXT: Live-in vp<%0> = VF * UF ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Disable VPlan-based cost model for 19.x release. (PR #100097)
https://github.com/fhahn edited https://github.com/llvm/llvm-project/pull/100097 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LAA] Refine stride checks for SCEVs during dependence analysis. (#99… (PR #102201)
https://github.com/fhahn milestoned https://github.com/llvm/llvm-project/pull/102201 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LAA] Refine stride checks for SCEVs during dependence analysis. (#99… (PR #102201)
https://github.com/fhahn created https://github.com/llvm/llvm-project/pull/102201 …577) Update getDependenceDistanceStrideAndSize to reason about different combinations of strides directly and explicitly. Update getPtrStride to return 0 for invariant pointers. Then proceed by checking the strides. If either source or sink are not strided by a constant (i.e. not a non-wrapping AddRec) or invariant, the accesses may overlap with earlier or later iterations and we cannot generate runtime checks to disambiguate them. Otherwise they are either loop invariant or strided. In that case, we can generate a runtime check to disambiguate them. If both are strided by constants, we proceed as previously. This is an alternative to https://github.com/llvm/llvm-project/pull/99239 and also replaces additional checks if the underlying object is loop-invariant. Fixes https://github.com/llvm/llvm-project/issues/87189. PR: https://github.com/llvm/llvm-project/pull/99577 >From 83098f4513567a054663b30380e4f2039ee8a6d0 Mon Sep 17 00:00:00 2001 From: Florian Hahn Date: Fri, 26 Jul 2024 13:10:16 +0100 Subject: [PATCH] [LAA] Refine stride checks for SCEVs during dependence analysis. (#99577) Update getDependenceDistanceStrideAndSize to reason about different combinations of strides directly and explicitly. Update getPtrStride to return 0 for invariant pointers. Then proceed by checking the strides. If either source or sink are not strided by a constant (i.e. not a non-wrapping AddRec) or invariant, the accesses may overlap with earlier or later iterations and we cannot generate runtime checks to disambiguate them. Otherwise they are either loop invariant or strided. In that case, we can generate a runtime check to disambiguate them. If both are strided by constants, we proceed as previously. This is an alternative to https://github.com/llvm/llvm-project/pull/99239 and also replaces additional checks if the underlying object is loop-invariant. Fixes https://github.com/llvm/llvm-project/issues/87189. PR: https://github.com/llvm/llvm-project/pull/99577 --- .../llvm/Analysis/LoopAccessAnalysis.h| 23 ++-- llvm/lib/Analysis/LoopAccessAnalysis.cpp | 121 -- .../load-store-index-loaded-in-loop.ll| 26 ++-- .../pointer-with-unknown-bounds.ll| 4 +- .../LoopAccessAnalysis/print-order.ll | 6 +- .../LoopAccessAnalysis/select-dependence.ll | 4 +- .../LoopAccessAnalysis/symbolic-stride.ll | 4 +- 7 files changed, 87 insertions(+), 101 deletions(-) diff --git a/llvm/include/llvm/Analysis/LoopAccessAnalysis.h b/llvm/include/llvm/Analysis/LoopAccessAnalysis.h index afafb74bdcb0ac..95a74b91f7acbf 100644 --- a/llvm/include/llvm/Analysis/LoopAccessAnalysis.h +++ b/llvm/include/llvm/Analysis/LoopAccessAnalysis.h @@ -199,9 +199,8 @@ class MemoryDepChecker { /// Check whether the dependencies between the accesses are safe. /// /// Only checks sets with elements in \p CheckDeps. - bool areDepsSafe(DepCandidates &AccessSets, MemAccessInfoList &CheckDeps, - const DenseMap> - &UnderlyingObjects); + bool areDepsSafe(const DepCandidates &AccessSets, + const MemAccessInfoList &CheckDeps); /// No memory dependence was encountered that would inhibit /// vectorization. @@ -351,11 +350,8 @@ class MemoryDepChecker { /// element access it records this distance in \p MinDepDistBytes (if this /// distance is smaller than any other distance encountered so far). /// Otherwise, this function returns true signaling a possible dependence. - Dependence::DepType - isDependent(const MemAccessInfo &A, unsigned AIdx, const MemAccessInfo &B, - unsigned BIdx, - const DenseMap> - &UnderlyingObjects); + Dependence::DepType isDependent(const MemAccessInfo &A, unsigned AIdx, + const MemAccessInfo &B, unsigned BIdx); /// Check whether the data dependence could prevent store-load /// forwarding. @@ -392,11 +388,9 @@ class MemoryDepChecker { /// determined, or a struct containing (Distance, Stride, TypeSize, AIsWrite, /// BIsWrite). std::variant - getDependenceDistanceStrideAndSize( - const MemAccessInfo &A, Instruction *AInst, const MemAccessInfo &B, - Instruction *BInst, - const DenseMap> - &UnderlyingObjects); + getDependenceDistanceStrideAndSize(const MemAccessInfo &A, Instruction *AInst, + const MemAccessInfo &B, + Instruction *BInst); }; class RuntimePointerChecking; @@ -797,7 +791,8 @@ replaceSymbolicStrideSCEV(PredicatedScalarEvolution &PSE, Value *Ptr); /// If the pointer has a constant stride return it in units of the access type -/// size. Otherwise return std::nullopt. +/// size. If the pointer is loop-invariant, return 0. Otherwise return +/// std::nullopt. //
[llvm-branch-commits] [llvm] [LAA] Refine stride checks for SCEVs during dependence analysis. (#99… (PR #102201)
https://github.com/fhahn edited https://github.com/llvm/llvm-project/pull/102201 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [InstCombine] Don't look at ConstantData users (PR #103302)
https://github.com/fhahn approved this pull request. LGTM, thanks for the fix https://github.com/llvm/llvm-project/pull/103302 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] PR for llvm/llvm-project#79175 (PR #80274)
https://github.com/fhahn approved this pull request. LGTM as this fixes a miscompile https://github.com/llvm/llvm-project/pull/80274 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [VPlan] Explicitly handle scalar pointer inductions. (PR #80273)
@@ -857,11 +857,7 @@ void VPlan::execute(VPTransformState *State) { Phi = cast(State->get(R.getVPSingleValue(), 0)); } else { auto *WidenPhi = cast(&R); -// TODO: Split off the case that all users of a pointer phi are scalar -// from the VPWidenPointerInductionRecipe. -if (WidenPhi->onlyScalarsGenerated(State->VF.isScalable())) - continue; - +assert(!WidenPhi->onlyScalarsGenerated(State->VF.isScalable())); fhahn wrote: Added, thanks! https://github.com/llvm/llvm-project/pull/80273 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [VPlan] Explicitly handle scalar pointer inductions. (PR #80273)
@@ -537,6 +542,30 @@ void VPlanTransforms::optimizeInductions(VPlan &Plan, ScalarEvolution &SE) { bool HasOnlyVectorVFs = !Plan.hasVF(ElementCount::getFixed(1)); VPBasicBlock::iterator InsertPt = HeaderVPBB->getFirstNonPhi(); for (VPRecipeBase &Phi : HeaderVPBB->phis()) { fhahn wrote: Added a comment, thanks! https://github.com/llvm/llvm-project/pull/80273 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [VPlan] Explicitly handle scalar pointer inductions. (PR #80273)
@@ -489,6 +490,23 @@ Value *VPInstruction::generateInstruction(VPTransformState &State, return ReducedPartRdx; } + case VPInstruction::PtrAdd: { +if (vputils::onlyFirstLaneUsed(this)) { + auto *P = Builder.CreatePtrAdd( + State.get(getOperand(0), VPIteration(Part, 0)), + State.get(getOperand(1), VPIteration(Part, 0)), Name); + State.set(this, P, VPIteration(Part, 0)); +} else { + for (unsigned Lane = 0; Lane != State.VF.getKnownMinValue(); ++Lane) { +Value *P = Builder.CreatePtrAdd( +State.get(getOperand(0), VPIteration(Part, Lane)), +State.get(getOperand(1), VPIteration(Part, Lane)), Name); + +State.set(this, P, VPIteration(Part, Lane)); + } +} +return nullptr; fhahn wrote: Updated to split into generate for scalars (per lane, possibly optimizing depending on onlyFirstLaneUseD) and generate for vectors (per-part) Some of it could be done separately or as part of https://github.com/llvm/llvm-project/pull/80271, but would be good to agree on the overall structure first then land separately as makes sense. https://github.com/llvm/llvm-project/pull/80273 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [VPlan] Explicitly handle scalar pointer inductions. (PR #80273)
@@ -546,9 +575,10 @@ void VPlanTransforms::optimizeInductions(VPlan &Plan, ScalarEvolution &SE) { continue; const InductionDescriptor &ID = WideIV->getInductionDescriptor(); -VPValue *Steps = createScalarIVSteps(Plan, ID, SE, WideIV->getTruncInst(), - WideIV->getStartValue(), - WideIV->getStepValue(), InsertPt); +VPValue *Steps = createScalarIVSteps( +Plan, ID.getKind(), SE, WideIV->getTruncInst(), WideIV->getStartValue(), +WideIV->getStepValue(), ID.getInductionOpcode(), InsertPt, +dyn_cast_or_null(ID.getInductionBinOp())); fhahn wrote: Adjusted, thanks! Can spli off moving induction opcode field separately. https://github.com/llvm/llvm-project/pull/80273 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [VPlan] Explicitly handle scalar pointer inductions. (PR #80273)
@@ -2503,6 +2504,12 @@ class VPDerivedIVRecipe : public VPSingleDefRecipe { dyn_cast_or_null(IndDesc.getInductionBinOp()), Start, CanonicalIV, Step) {} + VPDerivedIVRecipe(InductionDescriptor::InductionKind Kind, VPValue *Start, +VPCanonicalIVPHIRecipe *CanonicalIV, VPValue *Step, +FPMathOperator *FPBinOp) fhahn wrote: Made the private one public and removed this one here, thanks! https://github.com/llvm/llvm-project/pull/80273 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [VPlan] Explicitly handle scalar pointer inductions. (PR #80273)
@@ -515,6 +533,8 @@ void VPInstruction::execute(VPTransformState &State) { State.Builder.setFastMathFlags(getFastMathFlags()); for (unsigned Part = 0; Part < State.UF; ++Part) { Value *GeneratedValue = generateInstruction(State, Part); +if (!GeneratedValue) + continue; if (!hasResult()) continue; fhahn wrote: Completely reworked this, the check for !GeneratedValue is gone now https://github.com/llvm/llvm-project/pull/80273 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [VPlan] Explicitly handle scalar pointer inductions. (PR #80273)
@@ -537,6 +542,30 @@ void VPlanTransforms::optimizeInductions(VPlan &Plan, ScalarEvolution &SE) { bool HasOnlyVectorVFs = !Plan.hasVF(ElementCount::getFixed(1)); VPBasicBlock::iterator InsertPt = HeaderVPBB->getFirstNonPhi(); for (VPRecipeBase &Phi : HeaderVPBB->phis()) { +if (auto *PtrIV = dyn_cast(&Phi)) { + if (!PtrIV->onlyScalarsGenerated(Plan.hasScalableVF())) +continue; + + const InductionDescriptor &ID = PtrIV->getInductionDescriptor(); + VPValue *StartV = Plan.getVPValueOrAddLiveIn( + ConstantInt::get(ID.getStep()->getType(), 0)); + VPValue *StepV = PtrIV->getOperand(1); + VPRecipeBase *Steps = fhahn wrote: Will do separately, thanks! https://github.com/llvm/llvm-project/pull/80273 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [VPlan] Explicitly handle scalar pointer inductions. (PR #80273)
@@ -489,15 +489,18 @@ void VPlanTransforms::removeDeadRecipes(VPlan &Plan) { } } -static VPValue *createScalarIVSteps(VPlan &Plan, const InductionDescriptor &ID, +static VPValue *createScalarIVSteps(VPlan &Plan, +InductionDescriptor::InductionKind Kind, ScalarEvolution &SE, Instruction *TruncI, VPValue *StartV, VPValue *Step, -VPBasicBlock::iterator IP) { +Instruction::BinaryOps InductionOpcode, +VPBasicBlock::iterator IP, +FPMathOperator *FPBinOp = nullptr) { VPBasicBlock *HeaderVPBB = Plan.getVectorLoopRegion()->getEntryBasicBlock(); VPCanonicalIVPHIRecipe *CanonicalIV = Plan.getCanonicalIV(); VPSingleDefRecipe *BaseIV = CanonicalIV; - if (!CanonicalIV->isCanonical(ID.getKind(), StartV, Step)) { -BaseIV = new VPDerivedIVRecipe(ID, StartV, CanonicalIV, Step); + if (!CanonicalIV->isCanonical(Kind, StartV, Step)) { +BaseIV = new VPDerivedIVRecipe(Kind, StartV, CanonicalIV, Step, FPBinOp); fhahn wrote: Yes, can split off! https://github.com/llvm/llvm-project/pull/80273 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [VPlan] Explicitly handle scalar pointer inductions. (PR #80273)
@@ -515,6 +533,8 @@ void VPInstruction::execute(VPTransformState &State) { State.Builder.setFastMathFlags(getFastMathFlags()); for (unsigned Part = 0; Part < State.UF; ++Part) { Value *GeneratedValue = generateInstruction(State, Part); +if (!GeneratedValue) + continue; if (!hasResult()) continue; assert(GeneratedValue && "generateInstruction must produce a value"); fhahn wrote: Reworked now, the check for !GeneratedValue is gone now https://github.com/llvm/llvm-project/pull/80273 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [VPlan] Explicitly handle scalar pointer inductions. (PR #80273)
@@ -537,6 +542,30 @@ void VPlanTransforms::optimizeInductions(VPlan &Plan, ScalarEvolution &SE) { bool HasOnlyVectorVFs = !Plan.hasVF(ElementCount::getFixed(1)); VPBasicBlock::iterator InsertPt = HeaderVPBB->getFirstNonPhi(); for (VPRecipeBase &Phi : HeaderVPBB->phis()) { +if (auto *PtrIV = dyn_cast(&Phi)) { + if (!PtrIV->onlyScalarsGenerated(Plan.hasScalableVF())) +continue; + + const InductionDescriptor &ID = PtrIV->getInductionDescriptor(); + VPValue *StartV = Plan.getVPValueOrAddLiveIn( + ConstantInt::get(ID.getStep()->getType(), 0)); fhahn wrote: The start value of the pointer induction is the pointer base, but here we need the start value for the generate offsets. https://github.com/llvm/llvm-project/pull/80273 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [TBAA] Only clear TBAAStruct if field can be extracted. (PR #81285)
https://github.com/fhahn created https://github.com/llvm/llvm-project/pull/81285 Retain TBAAStruct if we fail to match the access to a single field. All users at the moment use this when using the full size of the original access. SROA also retains the original TBAAStruct when accessing parts at offset 0. Motivation for this and follow-on patches is to improve codegen for libc++, where using memcpy limits optimizations, like vectorization for code iteration over std::vector>: https://godbolt.org/z/f3vqYos3c Depends on https://github.com/llvm/llvm-project/pull/81284 >From 99cf032dfabb21b820559bae61d2354e56336fdd Mon Sep 17 00:00:00 2001 From: Florian Hahn Date: Fri, 9 Feb 2024 16:25:32 + Subject: [PATCH] [TBAA] Only clear TBAAStruct if field can be extracted. Retain TBAAStruct if we fail to match the access to a single field. All users at the moment use this when using the full size of the original access. SROA also retains the original TBAAStruct when accessing parts at offset 0. Motivation for this and follow-on patches is to improve codegen for libc++, where using memcpy limits optimizations, like vectorization for code iteration over std::vector>: https://godbolt.org/z/f3vqYos3c Depends on https://github.com/llvm/llvm-project/pull/81284 --- llvm/lib/Analysis/TypeBasedAliasAnalysis.cpp | 8 +--- llvm/test/Transforms/InstCombine/struct-assign-tbaa.ll | 5 +++-- 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/llvm/lib/Analysis/TypeBasedAliasAnalysis.cpp b/llvm/lib/Analysis/TypeBasedAliasAnalysis.cpp index edc08cde686f1f..bfd70414c0340c 100644 --- a/llvm/lib/Analysis/TypeBasedAliasAnalysis.cpp +++ b/llvm/lib/Analysis/TypeBasedAliasAnalysis.cpp @@ -821,13 +821,15 @@ MDNode *AAMDNodes::extendToTBAA(MDNode *MD, ssize_t Len) { AAMDNodes AAMDNodes::adjustForAccess(unsigned AccessSize) { AAMDNodes New = *this; MDNode *M = New.TBAAStruct; - New.TBAAStruct = nullptr; if (M && M->getNumOperands() == 3 && M->getOperand(0) && mdconst::hasa(M->getOperand(0)) && mdconst::extract(M->getOperand(0))->isZero() && M->getOperand(1) && mdconst::hasa(M->getOperand(1)) && - mdconst::extract(M->getOperand(1))->getValue() == AccessSize && - M->getOperand(2) && isa(M->getOperand(2))) + mdconst::extract(M->getOperand(1))->getValue() == + AccessSize && + M->getOperand(2) && isa(M->getOperand(2))) { +New.TBAAStruct = nullptr; New.TBAA = cast(M->getOperand(2)); + } return New; } diff --git a/llvm/test/Transforms/InstCombine/struct-assign-tbaa.ll b/llvm/test/Transforms/InstCombine/struct-assign-tbaa.ll index 1042c413fbb7bb..996d2c0e67e165 100644 --- a/llvm/test/Transforms/InstCombine/struct-assign-tbaa.ll +++ b/llvm/test/Transforms/InstCombine/struct-assign-tbaa.ll @@ -38,8 +38,8 @@ define ptr @test2() { define void @test3_multiple_fields(ptr nocapture %a, ptr nocapture %b) { ; CHECK-LABEL: @test3_multiple_fields( ; CHECK-NEXT: entry: -; CHECK-NEXT:[[TMP0:%.*]] = load i64, ptr [[B:%.*]], align 4 -; CHECK-NEXT:store i64 [[TMP0]], ptr [[A:%.*]], align 4 +; CHECK-NEXT:[[TMP0:%.*]] = load i64, ptr [[B:%.*]], align 4, !tbaa.struct [[TBAA_STRUCT3:![0-9]+]] +; CHECK-NEXT:store i64 [[TMP0]], ptr [[A:%.*]], align 4, !tbaa.struct [[TBAA_STRUCT3]] ; CHECK-NEXT:ret void ; entry: @@ -86,4 +86,5 @@ entry: ; CHECK: [[TBAA0]] = !{[[META1:![0-9]+]], [[META1]], i64 0} ; CHECK: [[META1]] = !{!"float", [[META2:![0-9]+]]} ; CHECK: [[META2]] = !{!"Simple C/C++ TBAA"} +; CHECK: [[TBAA_STRUCT3]] = !{i64 0, i64 4, [[TBAA0]], i64 4, i64 4, [[TBAA0]]} ;. ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [SROA] Use !tbaa instead of !tbaa.struct if op matches field. (PR #81289)
https://github.com/fhahn created https://github.com/llvm/llvm-project/pull/81289 If a split memory access introduced by SROA accesses precisely a single field of the original operation's !tbaa.struct, use the !tbaa tag for the accessed field directly instead of the full !tbaa.struct. InstCombine already had a similar logic. Motivation for this and follow-on patches is to improve codegen for libc++, where using memcpy limits optimizations, like vectorization for code iteration over std::vector>: https://godbolt.org/z/f3vqYos3c Depends on https://github.com/llvm/llvm-project/pull/81285. >From 90639e9131670863ebb4c199a9861b2b0094d601 Mon Sep 17 00:00:00 2001 From: Florian Hahn Date: Fri, 9 Feb 2024 15:17:09 + Subject: [PATCH] [SROA] Use !tbaa instead of !tbaa.struct if op matches field. If a split memory access introduced by SROA accesses precisely a single field of the original operation's !tbaa.struct, use the !tbaa tag for the accessed field directly instead of the full !tbaa.struct. InstCombine already had a similar logic. Motivation for this and follow-on patches is to improve codegen for libc++, where using memcpy limits optimizations, like vectorization for code iteration over std::vector>: https://godbolt.org/z/f3vqYos3c Depends on https://github.com/llvm/llvm-project/pull/81285. --- llvm/include/llvm/IR/Metadata.h | 2 + llvm/lib/Analysis/TypeBasedAliasAnalysis.cpp | 13 ++ llvm/lib/Transforms/Scalar/SROA.cpp | 48 ++-- llvm/test/Transforms/SROA/tbaa-struct2.ll| 21 - llvm/test/Transforms/SROA/tbaa-struct3.ll| 16 +++ 5 files changed, 67 insertions(+), 33 deletions(-) diff --git a/llvm/include/llvm/IR/Metadata.h b/llvm/include/llvm/IR/Metadata.h index 6f23ac44dee968..33363a271d4823 100644 --- a/llvm/include/llvm/IR/Metadata.h +++ b/llvm/include/llvm/IR/Metadata.h @@ -849,6 +849,8 @@ struct AAMDNodes { /// If his AAMDNode has !tbaa.struct and \p AccessSize matches the size of the /// field at offset 0, get the TBAA tag describing the accessed field. AAMDNodes adjustForAccess(unsigned AccessSize); + AAMDNodes adjustForAccess(size_t Offset, Type *AccessTy, +const DataLayout &DL); }; // Specialize DenseMapInfo for AAMDNodes. diff --git a/llvm/lib/Analysis/TypeBasedAliasAnalysis.cpp b/llvm/lib/Analysis/TypeBasedAliasAnalysis.cpp index bfd70414c0340c..b2dc451d581939 100644 --- a/llvm/lib/Analysis/TypeBasedAliasAnalysis.cpp +++ b/llvm/lib/Analysis/TypeBasedAliasAnalysis.cpp @@ -833,3 +833,16 @@ AAMDNodes AAMDNodes::adjustForAccess(unsigned AccessSize) { } return New; } + +AAMDNodes AAMDNodes::adjustForAccess(size_t Offset, Type *AccessTy, + const DataLayout &DL) { + + AAMDNodes New = shift(Offset); + if (!DL.typeSizeEqualsStoreSize(AccessTy)) +return New; + TypeSize Size = DL.getTypeStoreSize(AccessTy); + if (Size.isScalable()) +return New; + + return New.adjustForAccess(Size.getKnownMinValue()); +} diff --git a/llvm/lib/Transforms/Scalar/SROA.cpp b/llvm/lib/Transforms/Scalar/SROA.cpp index 138dc38b5c14ce..f24cbbc1fe0591 100644 --- a/llvm/lib/Transforms/Scalar/SROA.cpp +++ b/llvm/lib/Transforms/Scalar/SROA.cpp @@ -2914,7 +2914,8 @@ class AllocaSliceRewriter : public InstVisitor { // Do this after copyMetadataForLoad() to preserve the TBAA shift. if (AATags) -NewLI->setAAMetadata(AATags.shift(NewBeginOffset - BeginOffset)); +NewLI->setAAMetadata(AATags.adjustForAccess( +NewBeginOffset - BeginOffset, NewLI->getType(), DL)); // Try to preserve nonnull metadata V = NewLI; @@ -2936,7 +2937,9 @@ class AllocaSliceRewriter : public InstVisitor { IRB.CreateAlignedLoad(TargetTy, getNewAllocaSlicePtr(IRB, LTy), getSliceAlign(), LI.isVolatile(), LI.getName()); if (AATags) -NewLI->setAAMetadata(AATags.shift(NewBeginOffset - BeginOffset)); +NewLI->setAAMetadata(AATags.adjustForAccess( +NewBeginOffset - BeginOffset, NewLI->getType(), DL)); + if (LI.isVolatile()) NewLI->setAtomic(LI.getOrdering(), LI.getSyncScopeID()); NewLI->copyMetadata(LI, {LLVMContext::MD_mem_parallel_loop_access, @@ -3011,7 +3014,8 @@ class AllocaSliceRewriter : public InstVisitor { Store->copyMetadata(SI, {LLVMContext::MD_mem_parallel_loop_access, LLVMContext::MD_access_group}); if (AATags) - Store->setAAMetadata(AATags.shift(NewBeginOffset - BeginOffset)); + Store->setAAMetadata(AATags.adjustForAccess(NewBeginOffset - BeginOffset, + V->getType(), DL)); Pass.DeadInsts.push_back(&SI); // NOTE: Careful to use OrigV rather than V. @@ -3038,7 +3042,8 @@ class AllocaSliceRewriter : public InstVisitor { Store->copyMetadata(SI, {LLVMContext::MD_mem_parallel_loop_access,
[llvm-branch-commits] [llvm] [TBAA] Only clear TBAAStruct if field can be extracted. (PR #81285)
@@ -821,13 +821,15 @@ MDNode *AAMDNodes::extendToTBAA(MDNode *MD, ssize_t Len) { AAMDNodes AAMDNodes::adjustForAccess(unsigned AccessSize) { AAMDNodes New = *this; MDNode *M = New.TBAAStruct; - New.TBAAStruct = nullptr; if (M && M->getNumOperands() == 3 && M->getOperand(0) && fhahn wrote: Yep, I left this to here to keep the changes small, I'll soon share this one in the chain. https://github.com/llvm/llvm-project/pull/81285 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [TBAA] Use !tbaa for first accessed field, even if there are others. (PR #81313)
https://github.com/fhahn created https://github.com/llvm/llvm-project/pull/81313 Motivation for this and follow-on patches is to improve codegen for libc++, where using memcpy limits optimizations, like vectorization for code iteration over std::vector>: https://godbolt.org/z/f3vqYos3c Depends on https://github.com/llvm/llvm-project/pull/81289. >From e879ab07a6b39d7cf47fbc3c17ff25918cdee628 Mon Sep 17 00:00:00 2001 From: Florian Hahn Date: Fri, 9 Feb 2024 16:48:26 + Subject: [PATCH] [TBAA] Use !tbaa for first accessed field, even if there are others. Motivation for this and follow-on patches is to improve codegen for libc++, where using memcpy limits optimizations, like vectorization for code iteration over std::vector>: https://godbolt.org/z/f3vqYos3c Depends on https://github.com/llvm/llvm-project/pull/81289. --- llvm/lib/Analysis/TypeBasedAliasAnalysis.cpp | 3 +-- llvm/test/Transforms/SROA/tbaa-struct2.ll| 21 +++ llvm/test/Transforms/SROA/tbaa-struct3.ll| 28 ++-- 3 files changed, 25 insertions(+), 27 deletions(-) diff --git a/llvm/lib/Analysis/TypeBasedAliasAnalysis.cpp b/llvm/lib/Analysis/TypeBasedAliasAnalysis.cpp index b2dc451d581939..25ac01db7633ee 100644 --- a/llvm/lib/Analysis/TypeBasedAliasAnalysis.cpp +++ b/llvm/lib/Analysis/TypeBasedAliasAnalysis.cpp @@ -821,8 +821,7 @@ MDNode *AAMDNodes::extendToTBAA(MDNode *MD, ssize_t Len) { AAMDNodes AAMDNodes::adjustForAccess(unsigned AccessSize) { AAMDNodes New = *this; MDNode *M = New.TBAAStruct; - if (M && M->getNumOperands() == 3 && M->getOperand(0) && - mdconst::hasa(M->getOperand(0)) && + if (M && M->getOperand(0) && mdconst::hasa(M->getOperand(0)) && mdconst::extract(M->getOperand(0))->isZero() && M->getOperand(1) && mdconst::hasa(M->getOperand(1)) && mdconst::extract(M->getOperand(1))->getValue() == diff --git a/llvm/test/Transforms/SROA/tbaa-struct2.ll b/llvm/test/Transforms/SROA/tbaa-struct2.ll index 02c99a2b329457..545fa47eecb2ce 100644 --- a/llvm/test/Transforms/SROA/tbaa-struct2.ll +++ b/llvm/test/Transforms/SROA/tbaa-struct2.ll @@ -11,11 +11,11 @@ declare double @subcall(double %g, i32 %m) define double @bar(ptr %wishart) { ; CHECK-LABEL: @bar( ; CHECK-NEXT:[[TMP_SROA_3:%.*]] = alloca [4 x i8], align 4 -; CHECK-NEXT:[[TMP_SROA_0_0_COPYLOAD:%.*]] = load double, ptr [[WISHART:%.*]], align 8, !tbaa.struct [[TBAA_STRUCT0:![0-9]+]] +; CHECK-NEXT:[[TMP_SROA_0_0_COPYLOAD:%.*]] = load double, ptr [[WISHART:%.*]], align 8, !tbaa [[TBAA0:![0-9]+]] ; CHECK-NEXT:[[TMP_SROA_2_0_WISHART_SROA_IDX:%.*]] = getelementptr inbounds i8, ptr [[WISHART]], i64 8 -; CHECK-NEXT:[[TMP_SROA_2_0_COPYLOAD:%.*]] = load i32, ptr [[TMP_SROA_2_0_WISHART_SROA_IDX]], align 8, !tbaa [[TBAA5:![0-9]+]] +; CHECK-NEXT:[[TMP_SROA_2_0_COPYLOAD:%.*]] = load i32, ptr [[TMP_SROA_2_0_WISHART_SROA_IDX]], align 8, !tbaa [[TBAA4:![0-9]+]] ; CHECK-NEXT:[[TMP_SROA_3_0_WISHART_SROA_IDX:%.*]] = getelementptr inbounds i8, ptr [[WISHART]], i64 12 -; CHECK-NEXT:call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[TMP_SROA_3]], ptr align 4 [[TMP_SROA_3_0_WISHART_SROA_IDX]], i64 4, i1 false), !tbaa.struct [[TBAA_STRUCT7:![0-9]+]] +; CHECK-NEXT:call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[TMP_SROA_3]], ptr align 4 [[TMP_SROA_3_0_WISHART_SROA_IDX]], i64 4, i1 false), !tbaa.struct [[TBAA_STRUCT6:![0-9]+]] ; CHECK-NEXT:[[CALL:%.*]] = call double @subcall(double [[TMP_SROA_0_0_COPYLOAD]], i32 [[TMP_SROA_2_0_COPYLOAD]]) ; CHECK-NEXT:ret double [[CALL]] ; @@ -38,14 +38,13 @@ define double @bar(ptr %wishart) { ;. ; CHECK: attributes #[[ATTR0:[0-9]+]] = { nocallback nofree nounwind willreturn memory(argmem: readwrite) } ;. -; CHECK: [[TBAA_STRUCT0]] = !{i64 0, i64 8, [[META1:![0-9]+]], i64 8, i64 4, [[TBAA5]]} -; CHECK: [[META1]] = !{[[META2:![0-9]+]], [[META2]], i64 0} -; CHECK: [[META2]] = !{!"double", [[META3:![0-9]+]], i64 0} -; CHECK: [[META3]] = !{!"omnipotent char", [[META4:![0-9]+]], i64 0} -; CHECK: [[META4]] = !{!"Simple C++ TBAA"} -; CHECK: [[TBAA5]] = !{[[META6:![0-9]+]], [[META6]], i64 0} -; CHECK: [[META6]] = !{!"int", [[META3]], i64 0} -; CHECK: [[TBAA_STRUCT7]] = !{} +; CHECK: [[TBAA0]] = !{[[META1:![0-9]+]], [[META1]], i64 0} +; CHECK: [[META1]] = !{!"double", [[META2:![0-9]+]], i64 0} +; CHECK: [[META2]] = !{!"omnipotent char", [[META3:![0-9]+]], i64 0} +; CHECK: [[META3]] = !{!"Simple C++ TBAA"} +; CHECK: [[TBAA4]] = !{[[META5:![0-9]+]], [[META5]], i64 0} +; CHECK: [[META5]] = !{!"int", [[META2]], i64 0} +; CHECK: [[TBAA_STRUCT6]] = !{} ;. ;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line: ; CHECK-MODIFY-CFG: {{.*}} diff --git a/llvm/test/Transforms/SROA/tbaa-struct3.ll b/llvm/test/Transforms/SROA/tbaa-struct3.ll index 603e7d708647fc..68553d9b1a270b 100644 --- a/llvm/test/Transforms/SROA/tbaa-struct3.ll +++ b/llvm/test/Transforms/SROA/tbaa-struct3.ll @@ -7,9 +7,9 @@ define void @load
[llvm-branch-commits] [llvm] [TBAA] Only clear TBAAStruct if field can be extracted. (PR #81285)
@@ -821,13 +821,15 @@ MDNode *AAMDNodes::extendToTBAA(MDNode *MD, ssize_t Len) { AAMDNodes AAMDNodes::adjustForAccess(unsigned AccessSize) { AAMDNodes New = *this; MDNode *M = New.TBAAStruct; - New.TBAAStruct = nullptr; if (M && M->getNumOperands() == 3 && M->getOperand(0) && fhahn wrote: Here it is: #81313 https://github.com/llvm/llvm-project/pull/81285 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [SROA] Use !tbaa instead of !tbaa.struct if op matches field. (PR #81289)
@@ -7,9 +7,9 @@ define void @load_store_transfer_split_struct_tbaa_2_float(ptr dereferenceable(2 ; CHECK-NEXT: entry: ; CHECK-NEXT:[[TMP0:%.*]] = bitcast float [[A]] to i32 ; CHECK-NEXT:[[TMP1:%.*]] = bitcast float [[B]] to i32 -; CHECK-NEXT:store i32 [[TMP0]], ptr [[RES]], align 4 +; CHECK-NEXT:store i32 [[TMP0]], ptr [[RES]], align 4, !tbaa.struct [[TBAA_STRUCT0:![0-9]+]] fhahn wrote: Yes, this should be solved by a separate improvement: #81313 https://github.com/llvm/llvm-project/pull/81289 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [SROA] Use !tbaa instead of !tbaa.struct if op matches field. (PR #81289)
fhahn wrote: > Hmm. 10 changes + 1 new usage of setAAMetaData, But only 4 relevant changes > in tests.. That part of SROA seems to lack some testing ? Yes, will add the missing coverage, just wanted to make sure this makes sense in general beforehand https://github.com/llvm/llvm-project/pull/81289 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [TBAA] Use !tbaa for first accessed field if it is an exact match in offset and size. (PR #81313)
https://github.com/fhahn edited https://github.com/llvm/llvm-project/pull/81313 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [TBAA] Use !tbaa for first accessed field if it is an exact match in offset and size. (PR #81313)
fhahn wrote: > lgtm > > Maybe rephrase the commit message to something like: > > ``` > [tbaa] Use !tbaa for first accessed field if it is an exact match in offset > and size. > ``` Updated, thanks! It would be great if you could take another look at https://github.com/llvm/llvm-project/pull/81289 in case you missed my response to your comment, as this PR depends on https://github.com/llvm/llvm-project/pull/81289 https://github.com/llvm/llvm-project/pull/81313 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [SROA] Use !tbaa instead of !tbaa.struct if op matches field. (PR #81289)
https://github.com/fhahn updated https://github.com/llvm/llvm-project/pull/81289 >From 90639e9131670863ebb4c199a9861b2b0094d601 Mon Sep 17 00:00:00 2001 From: Florian Hahn Date: Fri, 9 Feb 2024 15:17:09 + Subject: [PATCH] [SROA] Use !tbaa instead of !tbaa.struct if op matches field. If a split memory access introduced by SROA accesses precisely a single field of the original operation's !tbaa.struct, use the !tbaa tag for the accessed field directly instead of the full !tbaa.struct. InstCombine already had a similar logic. Motivation for this and follow-on patches is to improve codegen for libc++, where using memcpy limits optimizations, like vectorization for code iteration over std::vector>: https://godbolt.org/z/f3vqYos3c Depends on https://github.com/llvm/llvm-project/pull/81285. --- llvm/include/llvm/IR/Metadata.h | 2 + llvm/lib/Analysis/TypeBasedAliasAnalysis.cpp | 13 ++ llvm/lib/Transforms/Scalar/SROA.cpp | 48 ++-- llvm/test/Transforms/SROA/tbaa-struct2.ll| 21 - llvm/test/Transforms/SROA/tbaa-struct3.ll| 16 +++ 5 files changed, 67 insertions(+), 33 deletions(-) diff --git a/llvm/include/llvm/IR/Metadata.h b/llvm/include/llvm/IR/Metadata.h index 6f23ac44dee968..33363a271d4823 100644 --- a/llvm/include/llvm/IR/Metadata.h +++ b/llvm/include/llvm/IR/Metadata.h @@ -849,6 +849,8 @@ struct AAMDNodes { /// If his AAMDNode has !tbaa.struct and \p AccessSize matches the size of the /// field at offset 0, get the TBAA tag describing the accessed field. AAMDNodes adjustForAccess(unsigned AccessSize); + AAMDNodes adjustForAccess(size_t Offset, Type *AccessTy, +const DataLayout &DL); }; // Specialize DenseMapInfo for AAMDNodes. diff --git a/llvm/lib/Analysis/TypeBasedAliasAnalysis.cpp b/llvm/lib/Analysis/TypeBasedAliasAnalysis.cpp index bfd70414c0340c..b2dc451d581939 100644 --- a/llvm/lib/Analysis/TypeBasedAliasAnalysis.cpp +++ b/llvm/lib/Analysis/TypeBasedAliasAnalysis.cpp @@ -833,3 +833,16 @@ AAMDNodes AAMDNodes::adjustForAccess(unsigned AccessSize) { } return New; } + +AAMDNodes AAMDNodes::adjustForAccess(size_t Offset, Type *AccessTy, + const DataLayout &DL) { + + AAMDNodes New = shift(Offset); + if (!DL.typeSizeEqualsStoreSize(AccessTy)) +return New; + TypeSize Size = DL.getTypeStoreSize(AccessTy); + if (Size.isScalable()) +return New; + + return New.adjustForAccess(Size.getKnownMinValue()); +} diff --git a/llvm/lib/Transforms/Scalar/SROA.cpp b/llvm/lib/Transforms/Scalar/SROA.cpp index 138dc38b5c14ce..f24cbbc1fe0591 100644 --- a/llvm/lib/Transforms/Scalar/SROA.cpp +++ b/llvm/lib/Transforms/Scalar/SROA.cpp @@ -2914,7 +2914,8 @@ class AllocaSliceRewriter : public InstVisitor { // Do this after copyMetadataForLoad() to preserve the TBAA shift. if (AATags) -NewLI->setAAMetadata(AATags.shift(NewBeginOffset - BeginOffset)); +NewLI->setAAMetadata(AATags.adjustForAccess( +NewBeginOffset - BeginOffset, NewLI->getType(), DL)); // Try to preserve nonnull metadata V = NewLI; @@ -2936,7 +2937,9 @@ class AllocaSliceRewriter : public InstVisitor { IRB.CreateAlignedLoad(TargetTy, getNewAllocaSlicePtr(IRB, LTy), getSliceAlign(), LI.isVolatile(), LI.getName()); if (AATags) -NewLI->setAAMetadata(AATags.shift(NewBeginOffset - BeginOffset)); +NewLI->setAAMetadata(AATags.adjustForAccess( +NewBeginOffset - BeginOffset, NewLI->getType(), DL)); + if (LI.isVolatile()) NewLI->setAtomic(LI.getOrdering(), LI.getSyncScopeID()); NewLI->copyMetadata(LI, {LLVMContext::MD_mem_parallel_loop_access, @@ -3011,7 +3014,8 @@ class AllocaSliceRewriter : public InstVisitor { Store->copyMetadata(SI, {LLVMContext::MD_mem_parallel_loop_access, LLVMContext::MD_access_group}); if (AATags) - Store->setAAMetadata(AATags.shift(NewBeginOffset - BeginOffset)); + Store->setAAMetadata(AATags.adjustForAccess(NewBeginOffset - BeginOffset, + V->getType(), DL)); Pass.DeadInsts.push_back(&SI); // NOTE: Careful to use OrigV rather than V. @@ -3038,7 +3042,8 @@ class AllocaSliceRewriter : public InstVisitor { Store->copyMetadata(SI, {LLVMContext::MD_mem_parallel_loop_access, LLVMContext::MD_access_group}); if (AATags) - Store->setAAMetadata(AATags.shift(NewBeginOffset - BeginOffset)); + Store->setAAMetadata(AATags.adjustForAccess(NewBeginOffset - BeginOffset, + V->getType(), DL)); migrateDebugInfo(&OldAI, IsSplit, NewBeginOffset * 8, SliceSize * 8, &SI, Store, Store->getPointerOperand(), @@ -3097,8 +3102,10 @@ class AllocaSliceRewriter : public InstVisitor { } NewSI->copyMetadata(
[llvm-branch-commits] [llvm] [SROA] Use !tbaa instead of !tbaa.struct if op matches field. (PR #81289)
https://github.com/fhahn updated https://github.com/llvm/llvm-project/pull/81289 >From 90639e9131670863ebb4c199a9861b2b0094d601 Mon Sep 17 00:00:00 2001 From: Florian Hahn Date: Fri, 9 Feb 2024 15:17:09 + Subject: [PATCH 1/2] [SROA] Use !tbaa instead of !tbaa.struct if op matches field. If a split memory access introduced by SROA accesses precisely a single field of the original operation's !tbaa.struct, use the !tbaa tag for the accessed field directly instead of the full !tbaa.struct. InstCombine already had a similar logic. Motivation for this and follow-on patches is to improve codegen for libc++, where using memcpy limits optimizations, like vectorization for code iteration over std::vector>: https://godbolt.org/z/f3vqYos3c Depends on https://github.com/llvm/llvm-project/pull/81285. --- llvm/include/llvm/IR/Metadata.h | 2 + llvm/lib/Analysis/TypeBasedAliasAnalysis.cpp | 13 ++ llvm/lib/Transforms/Scalar/SROA.cpp | 48 ++-- llvm/test/Transforms/SROA/tbaa-struct2.ll| 21 - llvm/test/Transforms/SROA/tbaa-struct3.ll| 16 +++ 5 files changed, 67 insertions(+), 33 deletions(-) diff --git a/llvm/include/llvm/IR/Metadata.h b/llvm/include/llvm/IR/Metadata.h index 6f23ac44dee968..33363a271d4823 100644 --- a/llvm/include/llvm/IR/Metadata.h +++ b/llvm/include/llvm/IR/Metadata.h @@ -849,6 +849,8 @@ struct AAMDNodes { /// If his AAMDNode has !tbaa.struct and \p AccessSize matches the size of the /// field at offset 0, get the TBAA tag describing the accessed field. AAMDNodes adjustForAccess(unsigned AccessSize); + AAMDNodes adjustForAccess(size_t Offset, Type *AccessTy, +const DataLayout &DL); }; // Specialize DenseMapInfo for AAMDNodes. diff --git a/llvm/lib/Analysis/TypeBasedAliasAnalysis.cpp b/llvm/lib/Analysis/TypeBasedAliasAnalysis.cpp index bfd70414c0340c..b2dc451d581939 100644 --- a/llvm/lib/Analysis/TypeBasedAliasAnalysis.cpp +++ b/llvm/lib/Analysis/TypeBasedAliasAnalysis.cpp @@ -833,3 +833,16 @@ AAMDNodes AAMDNodes::adjustForAccess(unsigned AccessSize) { } return New; } + +AAMDNodes AAMDNodes::adjustForAccess(size_t Offset, Type *AccessTy, + const DataLayout &DL) { + + AAMDNodes New = shift(Offset); + if (!DL.typeSizeEqualsStoreSize(AccessTy)) +return New; + TypeSize Size = DL.getTypeStoreSize(AccessTy); + if (Size.isScalable()) +return New; + + return New.adjustForAccess(Size.getKnownMinValue()); +} diff --git a/llvm/lib/Transforms/Scalar/SROA.cpp b/llvm/lib/Transforms/Scalar/SROA.cpp index 138dc38b5c14ce..f24cbbc1fe0591 100644 --- a/llvm/lib/Transforms/Scalar/SROA.cpp +++ b/llvm/lib/Transforms/Scalar/SROA.cpp @@ -2914,7 +2914,8 @@ class AllocaSliceRewriter : public InstVisitor { // Do this after copyMetadataForLoad() to preserve the TBAA shift. if (AATags) -NewLI->setAAMetadata(AATags.shift(NewBeginOffset - BeginOffset)); +NewLI->setAAMetadata(AATags.adjustForAccess( +NewBeginOffset - BeginOffset, NewLI->getType(), DL)); // Try to preserve nonnull metadata V = NewLI; @@ -2936,7 +2937,9 @@ class AllocaSliceRewriter : public InstVisitor { IRB.CreateAlignedLoad(TargetTy, getNewAllocaSlicePtr(IRB, LTy), getSliceAlign(), LI.isVolatile(), LI.getName()); if (AATags) -NewLI->setAAMetadata(AATags.shift(NewBeginOffset - BeginOffset)); +NewLI->setAAMetadata(AATags.adjustForAccess( +NewBeginOffset - BeginOffset, NewLI->getType(), DL)); + if (LI.isVolatile()) NewLI->setAtomic(LI.getOrdering(), LI.getSyncScopeID()); NewLI->copyMetadata(LI, {LLVMContext::MD_mem_parallel_loop_access, @@ -3011,7 +3014,8 @@ class AllocaSliceRewriter : public InstVisitor { Store->copyMetadata(SI, {LLVMContext::MD_mem_parallel_loop_access, LLVMContext::MD_access_group}); if (AATags) - Store->setAAMetadata(AATags.shift(NewBeginOffset - BeginOffset)); + Store->setAAMetadata(AATags.adjustForAccess(NewBeginOffset - BeginOffset, + V->getType(), DL)); Pass.DeadInsts.push_back(&SI); // NOTE: Careful to use OrigV rather than V. @@ -3038,7 +3042,8 @@ class AllocaSliceRewriter : public InstVisitor { Store->copyMetadata(SI, {LLVMContext::MD_mem_parallel_loop_access, LLVMContext::MD_access_group}); if (AATags) - Store->setAAMetadata(AATags.shift(NewBeginOffset - BeginOffset)); + Store->setAAMetadata(AATags.adjustForAccess(NewBeginOffset - BeginOffset, + V->getType(), DL)); migrateDebugInfo(&OldAI, IsSplit, NewBeginOffset * 8, SliceSize * 8, &SI, Store, Store->getPointerOperand(), @@ -3097,8 +3102,10 @@ class AllocaSliceRewriter : public InstVisitor { } NewSI->copyMeta
[llvm-branch-commits] [llvm] [SROA] Use !tbaa instead of !tbaa.struct if op matches field. (PR #81289)
https://github.com/fhahn edited https://github.com/llvm/llvm-project/pull/81289 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [SROA] Use !tbaa instead of !tbaa.struct if op matches field. (PR #81289)
@@ -4561,6 +4577,10 @@ bool SROA::presplitLoadsAndStores(AllocaInst &AI, AllocaSlices &AS) { PStore->copyMetadata(*SI, {LLVMContext::MD_mem_parallel_loop_access, LLVMContext::MD_access_group, LLVMContext::MD_DIAssignID}); + +if (AATags) + PStore->setAAMetadata( fhahn wrote: Should be now, I added a number of additional tests that should cover all cases here https://github.com/llvm/llvm-project/pull/81289 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [SROA] Use !tbaa instead of !tbaa.struct if op matches field. (PR #81289)
https://github.com/fhahn commented: @dobbelaj-snps Added a substantial number of tests that should cover all cases now in 2a9b86cc10c3883cca51a5166aad6e2b755fa958 https://github.com/llvm/llvm-project/pull/81289 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [SLP] Initial vectorization of non-power-of-2 ops. (PR #77790)
https://github.com/fhahn edited https://github.com/llvm/llvm-project/pull/77790 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [VPlan] Explicitly handle scalar pointer inductions. (PR #80273)
https://github.com/fhahn closed https://github.com/llvm/llvm-project/pull/80273 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [VPlan] Explicitly handle scalar pointer inductions. (PR #80273)
fhahn wrote: @ayalz unfortunately I don't know how to update the target branch to `llvm:main`, so I went ahead and opened a new PR that's update on top of current `main`: https://github.com/llvm/llvm-project/pull/83068 Comments should be addressed, sorry for the inconvenience. https://github.com/llvm/llvm-project/pull/80273 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [SLP] Collect candidate VFs in vector in vectorizeStores (NFC). (PR #82793)
https://github.com/fhahn edited https://github.com/llvm/llvm-project/pull/82793 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [SLP] Initial vectorization of non-power-of-2 ops. (PR #77790)
https://github.com/fhahn closed https://github.com/llvm/llvm-project/pull/77790 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [SLP] Initial vectorization of non-power-of-2 ops. (PR #77790)
https://github.com/fhahn reopened https://github.com/llvm/llvm-project/pull/77790 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/18.x: [DSE] Delay deleting non-memory-defs until end of DSE. (#83411) (PR #84227)
https://github.com/fhahn approved this pull request. LGTM, thanks! https://github.com/llvm/llvm-project/pull/84227 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/18.x: [ARM] Update IsRestored for LR based on all returns (#82745) (PR #83129)
https://github.com/fhahn approved this pull request. LGTM, would be good to back-port. https://github.com/llvm/llvm-project/pull/83129 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [CIR][Basic][NFC] Add the CIR language to the Language enum (PR #86072)
fhahn wrote: Could you remove the commit-id line from the commit message, as it doesn’t seem relevant? https://github.com/llvm/llvm-project/pull/86072 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [TySan] A Type Sanitizer (Clang) (PR #76260)
https://github.com/fhahn edited https://github.com/llvm/llvm-project/pull/76260 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [flang] [libc] [libcxx] [llvm] [mlir] [openmp] [TySan] A Type Sanitizer (Clang) (PR #76260)
https://github.com/fhahn updated https://github.com/llvm/llvm-project/pull/76260 error: too big or took too long to generate ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [TySan] A Type Sanitizer (Clang) (PR #76260)
https://github.com/fhahn updated https://github.com/llvm/llvm-project/pull/76260 >From 96912aec51f6752d211d8bd091eaad6426037050 Mon Sep 17 00:00:00 2001 From: Florian Hahn Date: Thu, 18 Apr 2024 23:01:03 +0100 Subject: [PATCH 1/2] [TySan] A Type Sanitizer (Clang) --- clang/include/clang/Basic/Features.def | 1 + clang/include/clang/Basic/Sanitizers.def | 3 ++ clang/include/clang/Driver/SanitizerArgs.h | 1 + clang/lib/CodeGen/BackendUtil.cpp | 6 +++ clang/lib/CodeGen/CGDecl.cpp | 3 +- clang/lib/CodeGen/CGDeclCXX.cpp| 4 ++ clang/lib/CodeGen/CodeGenFunction.cpp | 2 + clang/lib/CodeGen/CodeGenModule.cpp| 12 +++--- clang/lib/CodeGen/CodeGenTBAA.cpp | 6 ++- clang/lib/CodeGen/SanitizerMetadata.cpp| 44 +- clang/lib/CodeGen/SanitizerMetadata.h | 13 --- clang/lib/Driver/SanitizerArgs.cpp | 15 +--- clang/lib/Driver/ToolChains/CommonArgs.cpp | 6 ++- clang/lib/Driver/ToolChains/Darwin.cpp | 5 +++ clang/lib/Driver/ToolChains/Linux.cpp | 2 + clang/test/Driver/sanitizer-ld.c | 23 +++ 16 files changed, 116 insertions(+), 30 deletions(-) diff --git a/clang/include/clang/Basic/Features.def b/clang/include/clang/Basic/Features.def index fe4d1c4afcca65..589739eea2734d 100644 --- a/clang/include/clang/Basic/Features.def +++ b/clang/include/clang/Basic/Features.def @@ -99,6 +99,7 @@ FEATURE(nullability_nullable_result, true) FEATURE(memory_sanitizer, LangOpts.Sanitize.hasOneOf(SanitizerKind::Memory | SanitizerKind::KernelMemory)) +FEATURE(type_sanitizer, LangOpts.Sanitize.has(SanitizerKind::Type)) FEATURE(thread_sanitizer, LangOpts.Sanitize.has(SanitizerKind::Thread)) FEATURE(dataflow_sanitizer, LangOpts.Sanitize.has(SanitizerKind::DataFlow)) FEATURE(scudo, LangOpts.Sanitize.hasOneOf(SanitizerKind::Scudo)) diff --git a/clang/include/clang/Basic/Sanitizers.def b/clang/include/clang/Basic/Sanitizers.def index b228ffd07ee745..a482cf520620bc 100644 --- a/clang/include/clang/Basic/Sanitizers.def +++ b/clang/include/clang/Basic/Sanitizers.def @@ -73,6 +73,9 @@ SANITIZER("fuzzer", Fuzzer) // libFuzzer-required instrumentation, no linking. SANITIZER("fuzzer-no-link", FuzzerNoLink) +// TypeSanitizer +SANITIZER("type", Type) + // ThreadSanitizer SANITIZER("thread", Thread) diff --git a/clang/include/clang/Driver/SanitizerArgs.h b/clang/include/clang/Driver/SanitizerArgs.h index 07070ec4fc0653..52b482a0e8a1a9 100644 --- a/clang/include/clang/Driver/SanitizerArgs.h +++ b/clang/include/clang/Driver/SanitizerArgs.h @@ -86,6 +86,7 @@ class SanitizerArgs { bool needsHwasanAliasesRt() const { return needsHwasanRt() && HwasanUseAliases; } + bool needsTysanRt() const { return Sanitizers.has(SanitizerKind::Type); } bool needsTsanRt() const { return Sanitizers.has(SanitizerKind::Thread); } bool needsMsanRt() const { return Sanitizers.has(SanitizerKind::Memory); } bool needsFuzzer() const { return Sanitizers.has(SanitizerKind::Fuzzer); } diff --git a/clang/lib/CodeGen/BackendUtil.cpp b/clang/lib/CodeGen/BackendUtil.cpp index 6cc00b85664f41..1db5aca770b259 100644 --- a/clang/lib/CodeGen/BackendUtil.cpp +++ b/clang/lib/CodeGen/BackendUtil.cpp @@ -80,6 +80,7 @@ #include "llvm/Transforms/Instrumentation/SanitizerBinaryMetadata.h" #include "llvm/Transforms/Instrumentation/SanitizerCoverage.h" #include "llvm/Transforms/Instrumentation/ThreadSanitizer.h" +#include "llvm/Transforms/Instrumentation/TypeSanitizer.h" #include "llvm/Transforms/ObjCARC.h" #include "llvm/Transforms/Scalar/EarlyCSE.h" #include "llvm/Transforms/Scalar/GVN.h" @@ -697,6 +698,11 @@ static void addSanitizers(const Triple &TargetTriple, MPM.addPass(createModuleToFunctionPassAdaptor(ThreadSanitizerPass())); } +if (LangOpts.Sanitize.has(SanitizerKind::Type)) { + MPM.addPass(ModuleTypeSanitizerPass()); + MPM.addPass(createModuleToFunctionPassAdaptor(TypeSanitizerPass())); +} + auto ASanPass = [&](SanitizerMask Mask, bool CompileKernel) { if (LangOpts.Sanitize.has(Mask)) { bool UseGlobalGC = asanUseGlobalsGC(TargetTriple, CodeGenOpts); diff --git a/clang/lib/CodeGen/CGDecl.cpp b/clang/lib/CodeGen/CGDecl.cpp index ce6d6d8956076e..42516fa749c830 100644 --- a/clang/lib/CodeGen/CGDecl.cpp +++ b/clang/lib/CodeGen/CGDecl.cpp @@ -482,7 +482,8 @@ void CodeGenFunction::EmitStaticVarDecl(const VarDecl &D, LocalDeclMap.find(&D)->second = Address(castedAddr, elemTy, alignment); CGM.setStaticLocalDeclAddress(&D, castedAddr); - CGM.getSanitizerMetadata()->reportGlobal(var, D); + CGM.getSanitizerMetadata()->reportGlobalToASan(var, D); + CGM.getSanitizerMetadata()->reportGlobalToTySan(var, D); // Emit global variable debug descriptor for static vars. CGDebugInfo *DI = getDebugInfo(); diff --git a/clang/lib/CodeGen/CGDeclCXX.cpp b/clang/lib/CodeGen/CGDeclCXX.cpp index e08a1e5f42df20..08b
[llvm-branch-commits] [clang] [TySan] A Type Sanitizer (Clang) (PR #76260)
https://github.com/fhahn updated https://github.com/llvm/llvm-project/pull/76260 >From 96912aec51f6752d211d8bd091eaad6426037050 Mon Sep 17 00:00:00 2001 From: Florian Hahn Date: Thu, 18 Apr 2024 23:01:03 +0100 Subject: [PATCH 1/2] [TySan] A Type Sanitizer (Clang) --- clang/include/clang/Basic/Features.def | 1 + clang/include/clang/Basic/Sanitizers.def | 3 ++ clang/include/clang/Driver/SanitizerArgs.h | 1 + clang/lib/CodeGen/BackendUtil.cpp | 6 +++ clang/lib/CodeGen/CGDecl.cpp | 3 +- clang/lib/CodeGen/CGDeclCXX.cpp| 4 ++ clang/lib/CodeGen/CodeGenFunction.cpp | 2 + clang/lib/CodeGen/CodeGenModule.cpp| 12 +++--- clang/lib/CodeGen/CodeGenTBAA.cpp | 6 ++- clang/lib/CodeGen/SanitizerMetadata.cpp| 44 +- clang/lib/CodeGen/SanitizerMetadata.h | 13 --- clang/lib/Driver/SanitizerArgs.cpp | 15 +--- clang/lib/Driver/ToolChains/CommonArgs.cpp | 6 ++- clang/lib/Driver/ToolChains/Darwin.cpp | 5 +++ clang/lib/Driver/ToolChains/Linux.cpp | 2 + clang/test/Driver/sanitizer-ld.c | 23 +++ 16 files changed, 116 insertions(+), 30 deletions(-) diff --git a/clang/include/clang/Basic/Features.def b/clang/include/clang/Basic/Features.def index fe4d1c4afcca65..589739eea2734d 100644 --- a/clang/include/clang/Basic/Features.def +++ b/clang/include/clang/Basic/Features.def @@ -99,6 +99,7 @@ FEATURE(nullability_nullable_result, true) FEATURE(memory_sanitizer, LangOpts.Sanitize.hasOneOf(SanitizerKind::Memory | SanitizerKind::KernelMemory)) +FEATURE(type_sanitizer, LangOpts.Sanitize.has(SanitizerKind::Type)) FEATURE(thread_sanitizer, LangOpts.Sanitize.has(SanitizerKind::Thread)) FEATURE(dataflow_sanitizer, LangOpts.Sanitize.has(SanitizerKind::DataFlow)) FEATURE(scudo, LangOpts.Sanitize.hasOneOf(SanitizerKind::Scudo)) diff --git a/clang/include/clang/Basic/Sanitizers.def b/clang/include/clang/Basic/Sanitizers.def index b228ffd07ee745..a482cf520620bc 100644 --- a/clang/include/clang/Basic/Sanitizers.def +++ b/clang/include/clang/Basic/Sanitizers.def @@ -73,6 +73,9 @@ SANITIZER("fuzzer", Fuzzer) // libFuzzer-required instrumentation, no linking. SANITIZER("fuzzer-no-link", FuzzerNoLink) +// TypeSanitizer +SANITIZER("type", Type) + // ThreadSanitizer SANITIZER("thread", Thread) diff --git a/clang/include/clang/Driver/SanitizerArgs.h b/clang/include/clang/Driver/SanitizerArgs.h index 07070ec4fc0653..52b482a0e8a1a9 100644 --- a/clang/include/clang/Driver/SanitizerArgs.h +++ b/clang/include/clang/Driver/SanitizerArgs.h @@ -86,6 +86,7 @@ class SanitizerArgs { bool needsHwasanAliasesRt() const { return needsHwasanRt() && HwasanUseAliases; } + bool needsTysanRt() const { return Sanitizers.has(SanitizerKind::Type); } bool needsTsanRt() const { return Sanitizers.has(SanitizerKind::Thread); } bool needsMsanRt() const { return Sanitizers.has(SanitizerKind::Memory); } bool needsFuzzer() const { return Sanitizers.has(SanitizerKind::Fuzzer); } diff --git a/clang/lib/CodeGen/BackendUtil.cpp b/clang/lib/CodeGen/BackendUtil.cpp index 6cc00b85664f41..1db5aca770b259 100644 --- a/clang/lib/CodeGen/BackendUtil.cpp +++ b/clang/lib/CodeGen/BackendUtil.cpp @@ -80,6 +80,7 @@ #include "llvm/Transforms/Instrumentation/SanitizerBinaryMetadata.h" #include "llvm/Transforms/Instrumentation/SanitizerCoverage.h" #include "llvm/Transforms/Instrumentation/ThreadSanitizer.h" +#include "llvm/Transforms/Instrumentation/TypeSanitizer.h" #include "llvm/Transforms/ObjCARC.h" #include "llvm/Transforms/Scalar/EarlyCSE.h" #include "llvm/Transforms/Scalar/GVN.h" @@ -697,6 +698,11 @@ static void addSanitizers(const Triple &TargetTriple, MPM.addPass(createModuleToFunctionPassAdaptor(ThreadSanitizerPass())); } +if (LangOpts.Sanitize.has(SanitizerKind::Type)) { + MPM.addPass(ModuleTypeSanitizerPass()); + MPM.addPass(createModuleToFunctionPassAdaptor(TypeSanitizerPass())); +} + auto ASanPass = [&](SanitizerMask Mask, bool CompileKernel) { if (LangOpts.Sanitize.has(Mask)) { bool UseGlobalGC = asanUseGlobalsGC(TargetTriple, CodeGenOpts); diff --git a/clang/lib/CodeGen/CGDecl.cpp b/clang/lib/CodeGen/CGDecl.cpp index ce6d6d8956076e..42516fa749c830 100644 --- a/clang/lib/CodeGen/CGDecl.cpp +++ b/clang/lib/CodeGen/CGDecl.cpp @@ -482,7 +482,8 @@ void CodeGenFunction::EmitStaticVarDecl(const VarDecl &D, LocalDeclMap.find(&D)->second = Address(castedAddr, elemTy, alignment); CGM.setStaticLocalDeclAddress(&D, castedAddr); - CGM.getSanitizerMetadata()->reportGlobal(var, D); + CGM.getSanitizerMetadata()->reportGlobalToASan(var, D); + CGM.getSanitizerMetadata()->reportGlobalToTySan(var, D); // Emit global variable debug descriptor for static vars. CGDebugInfo *DI = getDebugInfo(); diff --git a/clang/lib/CodeGen/CGDeclCXX.cpp b/clang/lib/CodeGen/CGDeclCXX.cpp index e08a1e5f42df20..08b
[llvm-branch-commits] [clang] [compiler-rt] [llvm] [TySan] A Type Sanitizer (Runtime Library) (PR #76261)
https://github.com/fhahn edited https://github.com/llvm/llvm-project/pull/76261 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [compiler-rt] [llvm] [TySan] A Type Sanitizer (Runtime Library) (PR #76261)
@@ -720,7 +726,7 @@ if(COMPILER_RT_SUPPORTED_ARCH) endif() message(STATUS "Compiler-RT supported architectures: ${COMPILER_RT_SUPPORTED_ARCH}") -set(ALL_SANITIZERS asan;dfsan;msan;hwasan;tsan;safestack;cfi;scudo_standalone;ubsan_minimal;gwp_asan;asan_abi) +set(ALL_SANITIZERS asan;dfsan;msan;hwasan;tsan;tysan,safestack;cfi;scudo_standalone;ubsan_minimal;gwp_asan;asan_abi) fhahn wrote: Thanks, updated! https://github.com/llvm/llvm-project/pull/76261 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LAA] Support different strides & non constant dep distances using SCEV. (PR #88039)
https://github.com/fhahn edited https://github.com/llvm/llvm-project/pull/88039 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LAA] Support different strides & non constant dep distances using SCEV. (PR #88039)
https://github.com/fhahn updated https://github.com/llvm/llvm-project/pull/88039 >From 110e5ea24d4b23a153b5f602460b81e5228c700f Mon Sep 17 00:00:00 2001 From: Florian Hahn Date: Thu, 4 Apr 2024 12:36:27 +0100 Subject: [PATCH 1/5] [LAA] Support different strides & non constant dep distances using SCEV. Extend LoopAccessAnalysis to support different strides and as a consequence non-constant distances between dependences using SCEV to reason about the direction of the dependence. In multiple places, logic to rule out dependences using the stride has been updated to only be used if StrideA == StrideB, i.e. there's a common stride. We now also may bail out at multiple places where we may have to set FoundNonConstantDistanceDependence. This is done when we need to bail out and the distance is not constant to preserve original behavior. I'd like to call out the changes in global_alias.ll in particular. In the modified mayAlias01, and mayAlias02 tests, there should be no aliasing in the original versions of the test, as they are accessing 2 different 100 element fields in a loop with 100 iterations. I moved the original tests to noAlias15 and noAlias16 respectively, while updating the original tests to use a variable trip count. In some cases, like different_non_constant_strides_known_backward_min_distance_3, we now vectorize with runtime checks, even though the runtime checks will always be false. I'll also share a follow-up patch, that also uses SCEV to more accurately identify backwards dependences with non-constant distances. Fixes https://github.com/llvm/llvm-project/issues/87336 --- llvm/lib/Analysis/LoopAccessAnalysis.cpp | 115 -- .../Transforms/Scalar/LoopLoadElimination.cpp | 4 +- .../non-constant-strides-backward.ll | 90 +- .../non-constant-strides-forward.ll | 10 +- .../Transforms/LoopVectorize/global_alias.ll | 102 ++-- .../single-iteration-loop-sroa.ll | 40 +- 6 files changed, 274 insertions(+), 87 deletions(-) diff --git a/llvm/lib/Analysis/LoopAccessAnalysis.cpp b/llvm/lib/Analysis/LoopAccessAnalysis.cpp index c25eede96a1859..314484e11c4a7c 100644 --- a/llvm/lib/Analysis/LoopAccessAnalysis.cpp +++ b/llvm/lib/Analysis/LoopAccessAnalysis.cpp @@ -1923,8 +1923,9 @@ isLoopVariantIndirectAddress(ArrayRef UnderlyingObjects, // of various temporary variables, like A/BPtr, StrideA/BPtr and others. // Returns either the dependence result, if it could already be determined, or a // tuple with (Distance, Stride, TypeSize, AIsWrite, BIsWrite). -static std::variant> +static std::variant< +MemoryDepChecker::Dependence::DepType, +std::tuple> getDependenceDistanceStrideAndSize( const AccessAnalysis::MemAccessInfo &A, Instruction *AInst, const AccessAnalysis::MemAccessInfo &B, Instruction *BInst, @@ -1982,7 +1983,7 @@ getDependenceDistanceStrideAndSize( // Need accesses with constant stride. We don't want to vectorize // "A[B[i]] += ..." and similar code or pointer arithmetic that could wrap // in the address space. - if (!StrideAPtr || !StrideBPtr || StrideAPtr != StrideBPtr) { + if (!StrideAPtr || !StrideBPtr) { LLVM_DEBUG(dbgs() << "Pointer access with non-constant stride\n"); return MemoryDepChecker::Dependence::Unknown; } @@ -1992,8 +1993,8 @@ getDependenceDistanceStrideAndSize( DL.getTypeStoreSizeInBits(ATy) == DL.getTypeStoreSizeInBits(BTy); if (!HasSameSize) TypeByteSize = 0; - uint64_t Stride = std::abs(StrideAPtr); - return std::make_tuple(Dist, Stride, TypeByteSize, AIsWrite, BIsWrite); + return std::make_tuple(Dist, std::abs(StrideAPtr), std::abs(StrideBPtr), + TypeByteSize, AIsWrite, BIsWrite); } MemoryDepChecker::Dependence::DepType MemoryDepChecker::isDependent( @@ -2011,68 +2012,108 @@ MemoryDepChecker::Dependence::DepType MemoryDepChecker::isDependent( if (std::holds_alternative(Res)) return std::get(Res); - const auto &[Dist, Stride, TypeByteSize, AIsWrite, BIsWrite] = - std::get>(Res); + const auto &[Dist, StrideA, StrideB, TypeByteSize, AIsWrite, BIsWrite] = + std::get< + std::tuple>( + Res); bool HasSameSize = TypeByteSize > 0; + uint64_t CommonStride = StrideA == StrideB ? StrideA : 0; + if (isa(Dist)) { + FoundNonConstantDistanceDependence = true; +LLVM_DEBUG(dbgs() << "LAA: Dependence because of uncomputable distance.\n"); +return Dependence::Unknown; + } + ScalarEvolution &SE = *PSE.getSE(); auto &DL = InnermostLoop->getHeader()->getModule()->getDataLayout(); - if (!isa(Dist) && HasSameSize && + if (HasSameSize && CommonStride && isSafeDependenceDistance(DL, SE, *(PSE.getBackedgeTakenCount()), *Dist, - Stride, TypeByteSize)) + CommonStride, TypeByteSize)) return Dependence::NoDep; const SCEVConstant *C = dyn_cast(Dist); - if (!C) { -LLVM_DEBUG(d
[llvm-branch-commits] [llvm] [LAA] Support different strides & non constant dep distances using SCEV. (PR #88039)
fhahn wrote: > > > I would enjoy more textual description of what every condition is meant > > > to check. > > > > > > There are multiple places that hand off reasoning to called functions, > > would you like to have a summary of what the function checks there? Could > > do as separate patch, as this would be independent of the current patch? > > I was not familiar with this code, trying to reduce the impact of this patch > doesn't help me to understand it and convince myself that it does not cause > miscompilation, it rather makes it even more difficult since conditions are > now all over the place. > Thanks, I put up https://github.com/llvm/llvm-project/pull/89381 to add extra documentation and updated this PR to be based on #89381 https://github.com/llvm/llvm-project/pull/88039 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LAA] Support different strides & non constant dep distances using SCEV. (PR #88039)
https://github.com/fhahn closed https://github.com/llvm/llvm-project/pull/88039 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LAA] Support different strides & non constant dep distances using SCEV. (PR #88039)
https://github.com/fhahn reopened https://github.com/llvm/llvm-project/pull/88039 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [compiler-rt] [llvm] [TySan] A Type Sanitizer (Runtime Library) (PR #76261)
fhahn wrote: Added compiler-rt tests for various strict-aliasing violations from the bug tracker I found. https://github.com/llvm/llvm-project/pull/76261 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm][NFC] Document cl::opt variable and fix typo (PR #90670)
https://github.com/fhahn approved this pull request. LGTM, thanks! For the title, it might be clearer to explicitly mention the variable this is documenting, commit title size permitting https://github.com/llvm/llvm-project/pull/90670 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/18.x: [FunctionAttrs] Fix incorrect nonnull inference for non-inbounds GEP (#91180) (PR #91286)
https://github.com/fhahn approved this pull request. Should be safe to back port, LGTM, thanks! https://github.com/llvm/llvm-project/pull/91286 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/18.x: [LV, LAA] Don't vectorize loops with load and store to invar address. (PR #91092)
https://github.com/fhahn approved this pull request. LG to be back-ported if desired. It fixes an mis-compile but I am not aware of any end-to-end reports. I am not sure what the criteria for cherry-picks on the release branch are at this point in the release https://github.com/llvm/llvm-project/pull/91092 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LAA] Use SCEVUse to add extra NUW flags to pointer bounds. (WIP) (PR #91962)
https://github.com/fhahn created https://github.com/llvm/llvm-project/pull/91962 Use SCEVUse to add a NUW flag to the upper bound of an accessed pointer. We must already have proved that the pointers do not wrap, as otherwise we could not use them for runtime check computations. By adding the use-specific NUW flag, we can detect cases where SCEV can prove that the compared pointers must overlap, so the runtime checks will always be false. In that case, there is no point in vectorizing with runtime checks. Note that this depends c2895cd27fbf200d1da056bc66d77eeb62690bf0, which could be submitted separately if desired; without the current change, I don't think it triggers in practice though. Depends on https://github.com/llvm/llvm-project/pull/91961 >From 448c6db95cf89b8f6d007f7049afd02ca21d4427 Mon Sep 17 00:00:00 2001 From: Florian Hahn Date: Wed, 1 May 2024 11:03:42 +0100 Subject: [PATCH 1/3] [SCEV,LAA] Add tests to make sure scoped SCEVs don't impact other SCEVs. --- .../LoopAccessAnalysis/scoped-scevs.ll| 182 ++ 1 file changed, 182 insertions(+) create mode 100644 llvm/test/Analysis/LoopAccessAnalysis/scoped-scevs.ll diff --git a/llvm/test/Analysis/LoopAccessAnalysis/scoped-scevs.ll b/llvm/test/Analysis/LoopAccessAnalysis/scoped-scevs.ll new file mode 100644 index 0..323ba2a739cf8 --- /dev/null +++ b/llvm/test/Analysis/LoopAccessAnalysis/scoped-scevs.ll @@ -0,0 +1,182 @@ +; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py UTC_ARGS: --version 4 +; RUN: opt -passes='print,print' -disable-output %s 2>&1 | FileCheck --check-prefixes=LAA,AFTER %s +; RUN: opt -passes='print,print,print' -disable-output %s 2>&1 | FileCheck --check-prefixes=BEFORE,LAA,AFTER %s + +target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128" + +declare void @use(ptr) + +; Check that scoped expressions created by LAA do not interfere with non-scoped +; SCEVs with the same operands. The tests first run print to +; populate the SCEV cache. They contain a GEP computing A+405, which is the end +; of the accessed range, before and/or after the loop. No nuw flags should be +; added to them in the second print output. + +define ptr @test_ptr_range_end_computed_before_and_after_loop(ptr %A) { +; BEFORE-LABEL: 'test_ptr_range_end_computed_before_and_after_loop' +; BEFORE-NEXT: Classifying expressions for: @test_ptr_range_end_computed_before_and_after_loop +; BEFORE:%x = getelementptr inbounds i8, ptr %A, i64 405 +; BEFORE-NEXT:--> (405 + %A) U: full-set S: full-set +; BEFORE:%y = getelementptr inbounds i8, ptr %A, i64 405 +; BEFORE-NEXT:--> (405 + %A) U: full-set S: full-set +; +; LAA-LABEL: 'test_ptr_range_end_computed_before_and_after_loop' +; LAA-NEXT:loop: +; LAA-NEXT: Memory dependences are safe with run-time checks +; LAA-NEXT: Dependences: +; LAA-NEXT: Run-time memory checks: +; LAA-NEXT: Check 0: +; LAA-NEXT:Comparing group ([[GRP1:0x[0-9a-f]+]]): +; LAA-NEXT: %gep.A.400 = getelementptr inbounds i32, ptr %A.1, i64 %iv +; LAA-NEXT:Against group ([[GRP2:0x[0-9a-f]+]]): +; LAA-NEXT: %gep.A = getelementptr inbounds i8, ptr %A, i64 %iv +; LAA-NEXT: Grouped accesses: +; LAA-NEXT:Group [[GRP1]]: +; LAA-NEXT: (Low: (1 + %A) High: (405 + %A)) +; LAA-NEXT:Member: {(1 + %A),+,4}<%loop> +; LAA-NEXT:Group [[GRP2]]: +; LAA-NEXT: (Low: %A High: (101 + %A)) +; LAA-NEXT:Member: {%A,+,1}<%loop> +; LAA-EMPTY: +; LAA-NEXT: Non vectorizable stores to invariant address were not found in loop. +; LAA-NEXT: SCEV assumptions: +; LAA-EMPTY: +; LAA-NEXT: Expressions re-written: +; +; AFTER-LABEL: 'test_ptr_range_end_computed_before_and_after_loop' +; AFTER-NEXT: Classifying expressions for: @test_ptr_range_end_computed_before_and_after_loop +; AFTER:%x = getelementptr inbounds i8, ptr %A, i64 405 +; AFTER-NEXT:--> (405 + %A) U: full-set S: full-set +; AFTER:%y = getelementptr inbounds i8, ptr %A, i64 405 +; AFTER-NEXT:--> (405 + %A) U: full-set S: full-set +entry: + %A.1 = getelementptr inbounds i8, ptr %A, i64 1 + %x = getelementptr inbounds i8, ptr %A, i64 405 + call void @use(ptr %x) + br label %loop + +loop: + %iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ] + %gep.A.400 = getelementptr inbounds i32, ptr %A.1, i64 %iv + %gep.A = getelementptr inbounds i8, ptr %A, i64 %iv + %l = load i8, ptr %gep.A, align 1 + %ext = zext i8 %l to i32 + store i32 %ext, ptr %gep.A.400, align 4 + %iv.next = add nuw nsw i64 %iv, 1 + %ec = icmp eq i64 %iv, 100 + br i1 %ec, label %exit, label %loop + +exit: + %y = getelementptr inbounds i8, ptr %A, i64 405 + ret ptr %y +} + +define void @test_ptr_range_end_computed_before_loop(ptr %A) { +; BEFORE-LABEL: 'test_ptr_range_end_computed_before_loop' +; BEFORE-NEXT: Classifying expressions for: @test_ptr_range_end_computed_before_loop +; BEFORE-NEXT:
[llvm-branch-commits] [llvm] [SCEV] Add option to request use-specific SCEV for a GEP expr (WIP). (PR #91964)
https://github.com/fhahn created https://github.com/llvm/llvm-project/pull/91964 Use SCEVUse from https://github.com/llvm/llvm-project/pull/91961 to return a SCEVUse with use-specific no-wrap flags for GEP expr, when demanded. Clients need to opt-in, as the use-specific flags may not be valid in some contexts (e.g. backedge taken counts). >From dea2c74ec83f390025cb0389859e472e8676c768 Mon Sep 17 00:00:00 2001 From: Florian Hahn Date: Sun, 12 May 2024 09:57:54 +0100 Subject: [PATCH] [SCEV] Add option to request use-specific SCEV for a GEP expr (WIP). Use SCEVUse from https://github.com/llvm/llvm-project/pull/91961 to return a SCEVUse with use-specific no-wrap flags for GEP expr, when demanded. Clients need to opt-in, as the use-specific flags may not be valid in some contexts (e.g. backedge taken counts). --- llvm/include/llvm/Analysis/ScalarEvolution.h | 14 llvm/lib/Analysis/ScalarEvolution.cpp | 36 +++ .../Analysis/ScalarEvolution/min-max-exprs.ll | 2 +- .../ScalarEvolution/no-wrap-add-exprs.ll | 12 +++ .../no-wrap-symbolic-becount.ll | 2 +- .../test/Analysis/ScalarEvolution/ptrtoint.ll | 4 +-- llvm/test/Analysis/ScalarEvolution/sdiv.ll| 2 +- llvm/test/Analysis/ScalarEvolution/srem.ll| 2 +- 8 files changed, 41 insertions(+), 33 deletions(-) diff --git a/llvm/include/llvm/Analysis/ScalarEvolution.h b/llvm/include/llvm/Analysis/ScalarEvolution.h index 2859df9964555..4ca3dbc1c6703 100644 --- a/llvm/include/llvm/Analysis/ScalarEvolution.h +++ b/llvm/include/llvm/Analysis/ScalarEvolution.h @@ -653,7 +653,7 @@ class ScalarEvolution { /// Return a SCEV expression for the full generality of the specified /// expression. - SCEVUse getSCEV(Value *V); + SCEVUse getSCEV(Value *V, bool UseCtx = false); /// Return an existing SCEV for V if there is one, otherwise return nullptr. SCEVUse getExistingSCEV(Value *V); @@ -735,9 +735,11 @@ class ScalarEvolution { /// \p GEP The GEP. The indices contained in the GEP itself are ignored, /// instead we use IndexExprs. /// \p IndexExprs The expressions for the indices. - SCEVUse getGEPExpr(GEPOperator *GEP, ArrayRef IndexExprs); + SCEVUse getGEPExpr(GEPOperator *GEP, ArrayRef IndexExprs, + bool UseCtx = false); SCEVUse getGEPExpr(GEPOperator *GEP, - const SmallVectorImpl &IndexExprs); + const SmallVectorImpl &IndexExprs, + bool UseCtx = false); SCEVUse getAbsExpr(SCEVUse Op, bool IsNSW); SCEVUse getMinMaxExpr(SCEVTypes Kind, ArrayRef Operands); SCEVUse getMinMaxExpr(SCEVTypes Kind, SmallVectorImpl &Operands); @@ -1783,11 +1785,11 @@ class ScalarEvolution { /// We know that there is no SCEV for the specified value. Analyze the /// expression recursively. - SCEVUse createSCEV(Value *V); + SCEVUse createSCEV(Value *V, bool UseCtx = false); /// We know that there is no SCEV for the specified value. Create a new SCEV /// for \p V iteratively. - SCEVUse createSCEVIter(Value *V); + SCEVUse createSCEVIter(Value *V, bool UseCtx = false); /// Collect operands of \p V for which SCEV expressions should be constructed /// first. Returns a SCEV directly if it can be constructed trivially for \p /// V. @@ -1826,7 +1828,7 @@ class ScalarEvolution { Value *FalseVal); /// Provide the special handling we need to analyze GEP SCEVs. - SCEVUse createNodeForGEP(GEPOperator *GEP); + SCEVUse createNodeForGEP(GEPOperator *GEP, bool UseCtx = false); /// Implementation code for getSCEVAtScope; called at most once for each /// SCEV+Loop pair. diff --git a/llvm/lib/Analysis/ScalarEvolution.cpp b/llvm/lib/Analysis/ScalarEvolution.cpp index 320be6f26fc0a..68f4ddd47d69a 100644 --- a/llvm/lib/Analysis/ScalarEvolution.cpp +++ b/llvm/lib/Analysis/ScalarEvolution.cpp @@ -2741,6 +2741,8 @@ SCEVUse ScalarEvolution::getAddExpr(SmallVectorImpl &Ops, break; // If we have an add, expand the add operands onto the end of the operands // list. + // CommonFlags = maskFlags(CommonFlags, setFlags(Add->getNoWrapFlags(), + // static_cast(Ops[Idx].getInt(; Ops.erase(Ops.begin()+Idx); append_range(Ops, Add->operands()); DeletedAdd = true; @@ -3759,13 +3761,14 @@ SCEVUse ScalarEvolution::getAddRecExpr(SmallVectorImpl &Operands, } SCEVUse ScalarEvolution::getGEPExpr(GEPOperator *GEP, -ArrayRef IndexExprs) { - return getGEPExpr(GEP, SmallVector(IndexExprs)); +ArrayRef IndexExprs, +bool UseCtx) { + return getGEPExpr(GEP, SmallVector(IndexExprs), UseCtx); } -SCEVUse -ScalarEvolution::getGEPExpr(GEPOperator *GEP, -const SmallVectorImpl &IndexExprs) { +SCEVUse ScalarEvolution::getGEPExpr(GEPOperator *GEP, +const
[llvm-branch-commits] [llvm] release/18.x: [LV, LAA] Don't vectorize loops with load and store to invar address. (PR #91092)
fhahn wrote: SGTM https://github.com/llvm/llvm-project/pull/91092 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LAA] Use SCEVUse to add extra NUW flags to pointer bounds. (WIP) (PR #91962)
https://github.com/fhahn updated https://github.com/llvm/llvm-project/pull/91962 >From ab0311667695fb255625cc846e02373800fad8b1 Mon Sep 17 00:00:00 2001 From: Florian Hahn Date: Wed, 1 May 2024 11:03:42 +0100 Subject: [PATCH 1/3] [SCEV,LAA] Add tests to make sure scoped SCEVs don't impact other SCEVs. --- .../LoopAccessAnalysis/scoped-scevs.ll| 182 ++ 1 file changed, 182 insertions(+) create mode 100644 llvm/test/Analysis/LoopAccessAnalysis/scoped-scevs.ll diff --git a/llvm/test/Analysis/LoopAccessAnalysis/scoped-scevs.ll b/llvm/test/Analysis/LoopAccessAnalysis/scoped-scevs.ll new file mode 100644 index 0..323ba2a739cf8 --- /dev/null +++ b/llvm/test/Analysis/LoopAccessAnalysis/scoped-scevs.ll @@ -0,0 +1,182 @@ +; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py UTC_ARGS: --version 4 +; RUN: opt -passes='print,print' -disable-output %s 2>&1 | FileCheck --check-prefixes=LAA,AFTER %s +; RUN: opt -passes='print,print,print' -disable-output %s 2>&1 | FileCheck --check-prefixes=BEFORE,LAA,AFTER %s + +target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128" + +declare void @use(ptr) + +; Check that scoped expressions created by LAA do not interfere with non-scoped +; SCEVs with the same operands. The tests first run print to +; populate the SCEV cache. They contain a GEP computing A+405, which is the end +; of the accessed range, before and/or after the loop. No nuw flags should be +; added to them in the second print output. + +define ptr @test_ptr_range_end_computed_before_and_after_loop(ptr %A) { +; BEFORE-LABEL: 'test_ptr_range_end_computed_before_and_after_loop' +; BEFORE-NEXT: Classifying expressions for: @test_ptr_range_end_computed_before_and_after_loop +; BEFORE:%x = getelementptr inbounds i8, ptr %A, i64 405 +; BEFORE-NEXT:--> (405 + %A) U: full-set S: full-set +; BEFORE:%y = getelementptr inbounds i8, ptr %A, i64 405 +; BEFORE-NEXT:--> (405 + %A) U: full-set S: full-set +; +; LAA-LABEL: 'test_ptr_range_end_computed_before_and_after_loop' +; LAA-NEXT:loop: +; LAA-NEXT: Memory dependences are safe with run-time checks +; LAA-NEXT: Dependences: +; LAA-NEXT: Run-time memory checks: +; LAA-NEXT: Check 0: +; LAA-NEXT:Comparing group ([[GRP1:0x[0-9a-f]+]]): +; LAA-NEXT: %gep.A.400 = getelementptr inbounds i32, ptr %A.1, i64 %iv +; LAA-NEXT:Against group ([[GRP2:0x[0-9a-f]+]]): +; LAA-NEXT: %gep.A = getelementptr inbounds i8, ptr %A, i64 %iv +; LAA-NEXT: Grouped accesses: +; LAA-NEXT:Group [[GRP1]]: +; LAA-NEXT: (Low: (1 + %A) High: (405 + %A)) +; LAA-NEXT:Member: {(1 + %A),+,4}<%loop> +; LAA-NEXT:Group [[GRP2]]: +; LAA-NEXT: (Low: %A High: (101 + %A)) +; LAA-NEXT:Member: {%A,+,1}<%loop> +; LAA-EMPTY: +; LAA-NEXT: Non vectorizable stores to invariant address were not found in loop. +; LAA-NEXT: SCEV assumptions: +; LAA-EMPTY: +; LAA-NEXT: Expressions re-written: +; +; AFTER-LABEL: 'test_ptr_range_end_computed_before_and_after_loop' +; AFTER-NEXT: Classifying expressions for: @test_ptr_range_end_computed_before_and_after_loop +; AFTER:%x = getelementptr inbounds i8, ptr %A, i64 405 +; AFTER-NEXT:--> (405 + %A) U: full-set S: full-set +; AFTER:%y = getelementptr inbounds i8, ptr %A, i64 405 +; AFTER-NEXT:--> (405 + %A) U: full-set S: full-set +entry: + %A.1 = getelementptr inbounds i8, ptr %A, i64 1 + %x = getelementptr inbounds i8, ptr %A, i64 405 + call void @use(ptr %x) + br label %loop + +loop: + %iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ] + %gep.A.400 = getelementptr inbounds i32, ptr %A.1, i64 %iv + %gep.A = getelementptr inbounds i8, ptr %A, i64 %iv + %l = load i8, ptr %gep.A, align 1 + %ext = zext i8 %l to i32 + store i32 %ext, ptr %gep.A.400, align 4 + %iv.next = add nuw nsw i64 %iv, 1 + %ec = icmp eq i64 %iv, 100 + br i1 %ec, label %exit, label %loop + +exit: + %y = getelementptr inbounds i8, ptr %A, i64 405 + ret ptr %y +} + +define void @test_ptr_range_end_computed_before_loop(ptr %A) { +; BEFORE-LABEL: 'test_ptr_range_end_computed_before_loop' +; BEFORE-NEXT: Classifying expressions for: @test_ptr_range_end_computed_before_loop +; BEFORE-NEXT:%A.1 = getelementptr inbounds i8, ptr %A, i64 1 +; BEFORE-NEXT:--> (1 + %A) U: full-set S: full-set +; BEFORE-NEXT:%x = getelementptr inbounds i8, ptr %A, i64 405 +; +; LAA-LABEL: 'test_ptr_range_end_computed_before_loop' +; LAA-NEXT:loop: +; LAA-NEXT: Memory dependences are safe with run-time checks +; LAA-NEXT: Dependences: +; LAA-NEXT: Run-time memory checks: +; LAA-NEXT: Check 0: +; LAA-NEXT:Comparing group ([[GRP3:0x[0-9a-f]+]]): +; LAA-NEXT: %gep.A.400 = getelementptr inbounds i32, ptr %A.1, i64 %iv +; LAA-NEXT:Against group ([[GRP4:0x[0-9a-f]+]]): +; LAA-NEXT: %gep.A = getelementptr inbounds i8, ptr
[llvm-branch-commits] [llvm] [LAA] Use getBackedgeTakenCountForCountableExits. (PR #93499)
https://github.com/fhahn created https://github.com/llvm/llvm-project/pull/93499 Update LAA to use getBackedgeTakenCountForCountableExits which returns the minimum of the countable exits When analyzing dependences and computing runtime checks, we need the smallest upper bound on the number of iterations. In terms of memory safety, it shouldn't matter if any uncomputable exits leave the loop, as long as we prove that there are no dependences given the minimum of the countable exits. The same should apply also for generating runtime checks. Note that this shifts the responsiblity of checking whether all exit counts are computable or handling early-exits to the users of LAA. Depends on https://github.com/llvm/llvm-project/pull/93498 >From 80decf5050269fa0e91bf0b397ac9a7565cd6d72 Mon Sep 17 00:00:00 2001 From: Florian Hahn Date: Wed, 8 May 2024 20:47:29 +0100 Subject: [PATCH] [LAA] Use getBackedgeTakenCountForCountableExits. Update LAA to use getBackedgeTakenCountForCountableExits which returns the minimum of the countable exits When analyzing dependences and computing runtime checks, we need the smallest upper bound on the number of iterations. In terms of memory safety, it shouldn't matter if any uncomputable exits leave the loop, as long as we prove that there are no dependences given the minimum of the countable exits. The same should apply also for generating runtime checks. Note that this shifts the responsiblity of checking whether all exit counts are computable or handling early-exits to the users of LAA. --- llvm/lib/Analysis/LoopAccessAnalysis.cpp | 11 ++- .../Vectorize/LoopVectorizationLegality.cpp | 10 ++ .../early-exit-runtime-checks.ll | 26 - .../Transforms/LoopDistribute/early-exit.ll | 96 +++ .../Transforms/LoopLoadElim/early-exit.ll | 61 5 files changed, 197 insertions(+), 7 deletions(-) create mode 100644 llvm/test/Transforms/LoopDistribute/early-exit.ll create mode 100644 llvm/test/Transforms/LoopLoadElim/early-exit.ll diff --git a/llvm/lib/Analysis/LoopAccessAnalysis.cpp b/llvm/lib/Analysis/LoopAccessAnalysis.cpp index bc8b9b8479e4f..f15dcaf94ee11 100644 --- a/llvm/lib/Analysis/LoopAccessAnalysis.cpp +++ b/llvm/lib/Analysis/LoopAccessAnalysis.cpp @@ -214,7 +214,7 @@ getStartAndEndForAccess(const Loop *Lp, const SCEV *PtrExpr, Type *AccessTy, if (SE->isLoopInvariant(PtrExpr, Lp)) { ScStart = ScEnd = PtrExpr; } else if (auto *AR = dyn_cast(PtrExpr)) { -const SCEV *Ex = PSE.getBackedgeTakenCount(); +const SCEV *Ex = PSE.getBackedgeTakenCountForCountableExits(); ScStart = AR->getStart(); ScEnd = AR->evaluateAtIteration(Ex, *SE); @@ -2056,8 +2056,9 @@ MemoryDepChecker::Dependence::DepType MemoryDepChecker::isDependent( // i.e. they are far enough appart that accesses won't access the same // location across all loop ierations. if (HasSameSize && - isSafeDependenceDistance(DL, SE, *(PSE.getBackedgeTakenCount()), *Dist, - MaxStride, TypeByteSize)) + isSafeDependenceDistance(DL, SE, + *(PSE.getBackedgeTakenCountForCountableExits()), + *Dist, MaxStride, TypeByteSize)) return Dependence::NoDep; const SCEVConstant *C = dyn_cast(Dist); @@ -2395,7 +2396,7 @@ bool LoopAccessInfo::canAnalyzeLoop() { } // ScalarEvolution needs to be able to find the exit count. - const SCEV *ExitCount = PSE->getBackedgeTakenCount(); + const SCEV *ExitCount = PSE->getBackedgeTakenCountForCountableExits(); if (isa(ExitCount)) { recordAnalysis("CantComputeNumberOfIterations") << "could not determine number of loop iterations"; @@ -3004,7 +3005,7 @@ void LoopAccessInfo::collectStridedAccess(Value *MemAccess) { // of various possible stride specializations, considering the alternatives // of using gather/scatters (if available). - const SCEV *BETakenCount = PSE->getBackedgeTakenCount(); + const SCEV *BETakenCount = PSE->getBackedgeTakenCountForCountableExits(); // Match the types so we can compare the stride and the BETakenCount. // The Stride can be positive/negative, so we sign extend Stride; diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp index 9de49d1bcfeac..0c18c4e146de1 100644 --- a/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp +++ b/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp @@ -1506,6 +1506,16 @@ bool LoopVectorizationLegality::canVectorize(bool UseVPlanNativePath) { return false; } + if (isa(PSE.getBackedgeTakenCount())) { +reportVectorizationFailure("could not determine number of loop iterations", + "could not determine number of loop iterations", + "CantComputeNumberOfIterations", ORE, TheLoop); +if (DoExtraAnalysis) + Result = false; +els
[llvm-branch-commits] [llvm] [LAA] Use getBackedgeTakenCountForCountableExits. (PR #93499)
https://github.com/fhahn updated https://github.com/llvm/llvm-project/pull/93499 >From 1ce660b45d3706912705bc9e7a8c19e86f05d0c0 Mon Sep 17 00:00:00 2001 From: Florian Hahn Date: Wed, 8 May 2024 20:47:29 +0100 Subject: [PATCH] [LAA] Use getBackedgeTakenCountForCountableExits. Update LAA to use getBackedgeTakenCountForCountableExits which returns the minimum of the countable exits When analyzing dependences and computing runtime checks, we need the smallest upper bound on the number of iterations. In terms of memory safety, it shouldn't matter if any uncomputable exits leave the loop, as long as we prove that there are no dependences given the minimum of the countable exits. The same should apply also for generating runtime checks. Note that this shifts the responsiblity of checking whether all exit counts are computable or handling early-exits to the users of LAA. --- llvm/lib/Analysis/LoopAccessAnalysis.cpp | 12 +-- .../Vectorize/LoopVectorizationLegality.cpp | 10 ++ .../early-exit-runtime-checks.ll | 39 +++- .../memcheck-wrapping-pointers.ll | 14 +-- .../Transforms/LoopDistribute/early-exit.ll | 96 +++ .../Transforms/LoopLoadElim/early-exit.ll | 61 6 files changed, 216 insertions(+), 16 deletions(-) create mode 100644 llvm/test/Transforms/LoopDistribute/early-exit.ll create mode 100644 llvm/test/Transforms/LoopLoadElim/early-exit.ll diff --git a/llvm/lib/Analysis/LoopAccessAnalysis.cpp b/llvm/lib/Analysis/LoopAccessAnalysis.cpp index bc8b9b8479e4f..58096b66f704b 100644 --- a/llvm/lib/Analysis/LoopAccessAnalysis.cpp +++ b/llvm/lib/Analysis/LoopAccessAnalysis.cpp @@ -214,7 +214,7 @@ getStartAndEndForAccess(const Loop *Lp, const SCEV *PtrExpr, Type *AccessTy, if (SE->isLoopInvariant(PtrExpr, Lp)) { ScStart = ScEnd = PtrExpr; } else if (auto *AR = dyn_cast(PtrExpr)) { -const SCEV *Ex = PSE.getBackedgeTakenCount(); +const SCEV *Ex = PSE.getSymbolicMaxBackedgeTakenCount(); ScStart = AR->getStart(); ScEnd = AR->evaluateAtIteration(Ex, *SE); @@ -2055,9 +2055,9 @@ MemoryDepChecker::Dependence::DepType MemoryDepChecker::isDependent( // stride multiplied by the backedge taken count, the accesses are independet, // i.e. they are far enough appart that accesses won't access the same // location across all loop ierations. - if (HasSameSize && - isSafeDependenceDistance(DL, SE, *(PSE.getBackedgeTakenCount()), *Dist, - MaxStride, TypeByteSize)) + if (HasSameSize && isSafeDependenceDistance( + DL, SE, *(PSE.getSymbolicMaxBackedgeTakenCount()), + *Dist, MaxStride, TypeByteSize)) return Dependence::NoDep; const SCEVConstant *C = dyn_cast(Dist); @@ -2395,7 +2395,7 @@ bool LoopAccessInfo::canAnalyzeLoop() { } // ScalarEvolution needs to be able to find the exit count. - const SCEV *ExitCount = PSE->getBackedgeTakenCount(); + const SCEV *ExitCount = PSE->getSymbolicMaxBackedgeTakenCount(); if (isa(ExitCount)) { recordAnalysis("CantComputeNumberOfIterations") << "could not determine number of loop iterations"; @@ -3004,7 +3004,7 @@ void LoopAccessInfo::collectStridedAccess(Value *MemAccess) { // of various possible stride specializations, considering the alternatives // of using gather/scatters (if available). - const SCEV *BETakenCount = PSE->getBackedgeTakenCount(); + const SCEV *BETakenCount = PSE->getSymbolicMaxBackedgeTakenCount(); // Match the types so we can compare the stride and the BETakenCount. // The Stride can be positive/negative, so we sign extend Stride; diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp index 9de49d1bcfeac..0c18c4e146de1 100644 --- a/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp +++ b/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp @@ -1506,6 +1506,16 @@ bool LoopVectorizationLegality::canVectorize(bool UseVPlanNativePath) { return false; } + if (isa(PSE.getBackedgeTakenCount())) { +reportVectorizationFailure("could not determine number of loop iterations", + "could not determine number of loop iterations", + "CantComputeNumberOfIterations", ORE, TheLoop); +if (DoExtraAnalysis) + Result = false; +else + return false; + } + LLVM_DEBUG(dbgs() << "LV: We can vectorize this loop" << (LAI->getRuntimePointerChecking()->Need ? " (with a runtime bound check)" diff --git a/llvm/test/Analysis/LoopAccessAnalysis/early-exit-runtime-checks.ll b/llvm/test/Analysis/LoopAccessAnalysis/early-exit-runtime-checks.ll index 0d85f11f06dce..a40aaa8ae99a0 100644 --- a/llvm/test/Analysis/LoopAccessAnalysis/early-exit-runtime-checks.ll +++ b/llvm/test/Analysis/LoopAccessAnalysis/early-exit-runtim
[llvm-branch-commits] [llvm] [LAA] Use PSE::getSymbolicMaxBackedgeTakenCount. (PR #93499)
https://github.com/fhahn edited https://github.com/llvm/llvm-project/pull/93499 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LAA] Use PSE::getSymbolicMaxBackedgeTakenCount. (PR #93499)
https://github.com/fhahn edited https://github.com/llvm/llvm-project/pull/93499 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] fd82b5b - [LV] Support recieps without underlying instr in collectPoisonGenRec.
Author: Florian Hahn Date: 2023-11-03T10:21:14Z New Revision: fd82b5b2876b3885b0590ba4538c316fa0e33cf7 URL: https://github.com/llvm/llvm-project/commit/fd82b5b2876b3885b0590ba4538c316fa0e33cf7 DIFF: https://github.com/llvm/llvm-project/commit/fd82b5b2876b3885b0590ba4538c316fa0e33cf7.diff LOG: [LV] Support recieps without underlying instr in collectPoisonGenRec. Support recipes without underlying instruction in collectPoisonGeneratingRecipes by directly trying to dyn_cast_or_null the underlying value. Fixes https://github.com/llvm/llvm-project/issues/70590. Added: Modified: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp llvm/test/Transforms/LoopVectorize/X86/drop-poison-generating-flags.ll Removed: diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp index 4f547886f602534..1c208f72af678f7 100644 --- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp +++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp @@ -1103,7 +1103,8 @@ void InnerLoopVectorizer::collectPoisonGeneratingRecipes( if (auto *RecWithFlags = dyn_cast(CurRec)) { RecWithFlags->dropPoisonGeneratingFlags(); } else { -Instruction *Instr = CurRec->getUnderlyingInstr(); +Instruction *Instr = dyn_cast_or_null( +CurRec->getVPSingleValue()->getUnderlyingValue()); (void)Instr; assert((!Instr || !Instr->hasPoisonGeneratingFlags()) && "found instruction with poison generating flags not covered by " diff --git a/llvm/test/Transforms/LoopVectorize/X86/drop-poison-generating-flags.ll b/llvm/test/Transforms/LoopVectorize/X86/drop-poison-generating-flags.ll index b440da6dd866081..5694367dd1f9016 100644 --- a/llvm/test/Transforms/LoopVectorize/X86/drop-poison-generating-flags.ll +++ b/llvm/test/Transforms/LoopVectorize/X86/drop-poison-generating-flags.ll @@ -405,6 +405,89 @@ loop.exit: ret void } +@c = external global [5 x i8] + +; Test case for https://github.com/llvm/llvm-project/issues/70590. +; Note that the then block has UB, but I could not find any other way to +; construct a suitable test case. +define void @pr70590_recipe_without_underlying_instr(i64 %n, ptr noalias %dst) { +; CHECK-LABEL: @pr70590_recipe_without_underlying_instr( +; CHECK: vector.body: +; CHECK-NEXT:[[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH:%.+]] ], [ [[INDEX_NEXT:%.*]], [[PRED_SREM_CONTINUE6:%.*]] ] +; CHECK-NEXT:[[VEC_IND:%.*]] = phi <4 x i64> [ , [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.*]], [[PRED_SREM_CONTINUE6]] ] +; CHECK-NEXT:[[TMP0:%.*]] = add i64 [[INDEX]], 0 +; CHECK-NEXT:[[TMP1:%.*]] = icmp eq <4 x i64> [[VEC_IND]], +; CHECK-NEXT:[[TMP2:%.*]] = xor <4 x i1> [[TMP1]], +; CHECK-NEXT:[[TMP3:%.*]] = extractelement <4 x i1> [[TMP2]], i32 0 +; CHECK-NEXT:br i1 [[TMP3]], label [[PRED_SREM_IF:%.*]], label [[PRED_SREM_CONTINUE:%.*]] +; CHECK: pred.srem.if: +; CHECK-NEXT:[[TMP4:%.*]] = srem i64 3, 0 +; CHECK-NEXT:br label [[PRED_SREM_CONTINUE]] +; CHECK: pred.srem.continue: +; CHECK-NEXT:[[TMP5:%.*]] = phi i64 [ poison, %vector.body ], [ [[TMP4]], [[PRED_SREM_IF]] ] +; CHECK-NEXT:[[TMP6:%.*]] = extractelement <4 x i1> [[TMP2]], i32 1 +; CHECK-NEXT:br i1 [[TMP6]], label [[PRED_SREM_IF1:%.*]], label [[PRED_SREM_CONTINUE2:%.*]] +; CHECK: pred.srem.if1: +; CHECK-NEXT:[[TMP7:%.*]] = srem i64 3, 0 +; CHECK-NEXT:br label [[PRED_SREM_CONTINUE2]] +; CHECK: pred.srem.continue2: +; CHECK-NEXT:[[TMP8:%.*]] = phi i64 [ poison, [[PRED_SREM_CONTINUE]] ], [ [[TMP7]], [[PRED_SREM_IF1]] ] +; CHECK-NEXT:[[TMP9:%.*]] = extractelement <4 x i1> [[TMP2]], i32 2 +; CHECK-NEXT:br i1 [[TMP9]], label [[PRED_SREM_IF3:%.*]], label [[PRED_SREM_CONTINUE4:%.*]] +; CHECK: pred.srem.if3: +; CHECK-NEXT:[[TMP10:%.*]] = srem i64 3, 0 +; CHECK-NEXT:br label [[PRED_SREM_CONTINUE4]] +; CHECK: pred.srem.continue4: +; CHECK-NEXT:[[TMP11:%.*]] = phi i64 [ poison, [[PRED_SREM_CONTINUE2]] ], [ [[TMP10]], [[PRED_SREM_IF3]] ] +; CHECK-NEXT:[[TMP12:%.*]] = extractelement <4 x i1> [[TMP2]], i32 3 +; CHECK-NEXT:br i1 [[TMP12]], label [[PRED_SREM_IF5:%.*]], label [[PRED_SREM_CONTINUE6]] +; CHECK: pred.srem.if5: +; CHECK-NEXT:[[TMP13:%.*]] = srem i64 3, 0 +; CHECK-NEXT:br label [[PRED_SREM_CONTINUE6]] +; CHECK: pred.srem.continue6: +; CHECK-NEXT:[[TMP14:%.*]] = phi i64 [ poison, [[PRED_SREM_CONTINUE4]] ], [ [[TMP13]], [[PRED_SREM_IF5]] ] +; CHECK-NEXT:[[TMP15:%.*]] = add i64 [[TMP5]], -3 +; CHECK-NEXT:[[TMP16:%.*]] = add i64 [[TMP0]], [[TMP15]] +; CHECK-NEXT:[[TMP17:%.*]] = getelementptr [5 x i8], ptr @c, i64 0, i64 [[TMP16]] +; CHECK-NEXT:[[TMP18:%.*]] = getelementptr i8, ptr [[TMP17]], i32 0 +; CHECK-NEXT:[[WIDE_LOAD:%.*]] = load <4 x i8>, ptr [[TMP18]], align 1 +; CHECK-NEXT:[[PREDPHI:%.
[llvm-branch-commits] [llvm] 035b334 - [𝘀𝗽𝗿] initial version
Author: Florian Hahn Date: 2023-11-13T22:06:01Z New Revision: 035b334598b4375d4b0682a5ced3f58fcd5a2302 URL: https://github.com/llvm/llvm-project/commit/035b334598b4375d4b0682a5ced3f58fcd5a2302 DIFF: https://github.com/llvm/llvm-project/commit/035b334598b4375d4b0682a5ced3f58fcd5a2302.diff LOG: [𝘀𝗽𝗿] initial version Created using spr 1.3.4 Added: Modified: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp llvm/test/Transforms/LoopVectorize/AArch64/sve-vector-reverse-mask4.ll llvm/test/Transforms/LoopVectorize/AArch64/vector-reverse-mask4.ll llvm/test/Transforms/LoopVectorize/X86/masked_load_store.ll Removed: diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp index e9d0315d114f65c..ae8d306c44dd885 100644 --- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp +++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp @@ -9525,8 +9525,12 @@ void VPWidenMemoryInstructionRecipe::execute(VPTransformState &State) { InnerLoopVectorizer::VectorParts BlockInMaskParts(State.UF); bool isMaskRequired = getMask(); if (isMaskRequired) -for (unsigned Part = 0; Part < State.UF; ++Part) - BlockInMaskParts[Part] = State.get(getMask(), Part); +for (unsigned Part = 0; Part < State.UF; ++Part) { + Value *Mask = State.get(getMask(), Part); + if (isReverse()) +Mask = Builder.CreateVectorReverse(Mask, "reverse"); + BlockInMaskParts[Part] = Mask; +} const auto CreateVecPtr = [&](unsigned Part, Value *Ptr) -> Value * { // Calculate the pointer for the specific unroll-part. @@ -9558,9 +9562,6 @@ void VPWidenMemoryInstructionRecipe::execute(VPTransformState &State) { PartPtr = Builder.CreateGEP(ScalarDataTy, Ptr, NumElt, "", InBounds); PartPtr = Builder.CreateGEP(ScalarDataTy, PartPtr, LastLane, "", InBounds); - if (isMaskRequired) // Reverse of a null all-one mask is a null mask. -BlockInMaskParts[Part] = -Builder.CreateVectorReverse(BlockInMaskParts[Part], "reverse"); } else { Value *Increment = createStepForVF(Builder, IndexTy, State.VF, Part); PartPtr = Builder.CreateGEP(ScalarDataTy, Ptr, Increment, "", InBounds); diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/sve-vector-reverse-mask4.ll b/llvm/test/Transforms/LoopVectorize/AArch64/sve-vector-reverse-mask4.ll index 58c54103c72c6dc..70833e44b075a98 100644 --- a/llvm/test/Transforms/LoopVectorize/AArch64/sve-vector-reverse-mask4.ll +++ b/llvm/test/Transforms/LoopVectorize/AArch64/sve-vector-reverse-mask4.ll @@ -22,8 +22,8 @@ define void @vector_reverse_mask_nxv4i1(ptr %a, ptr %cond, i64 %N) #0 { ; CHECK: %[[WIDEMSKLOAD:.*]] = call @llvm.masked.load.nxv4f64.p0(ptr %{{.*}}, i32 8, %[[REVERSE6]], poison) ; CHECK: %[[REVERSE7:.*]] = call @llvm.experimental.vector.reverse.nxv4f64( %[[WIDEMSKLOAD]]) ; CHECK: %[[FADD:.*]] = fadd %[[REVERSE7]] +; CHECK: %[[REVERSE9:.*]] = call @llvm.experimental.vector.reverse.nxv4i1( %{{.*}}) ; CHECK: %[[REVERSE8:.*]] = call @llvm.experimental.vector.reverse.nxv4f64( %[[FADD]]) -; CHECK: %[[REVERSE9:.*]] = call @llvm.experimental.vector.reverse.nxv4i1( %{{.*}}) ; CHECK: call void @llvm.masked.store.nxv4f64.p0( %[[REVERSE8]], ptr %{{.*}}, i32 8, %[[REVERSE9]] entry: diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/vector-reverse-mask4.ll b/llvm/test/Transforms/LoopVectorize/AArch64/vector-reverse-mask4.ll index e8e2008912c8344..195826300e3996f 100644 --- a/llvm/test/Transforms/LoopVectorize/AArch64/vector-reverse-mask4.ll +++ b/llvm/test/Transforms/LoopVectorize/AArch64/vector-reverse-mask4.ll @@ -43,16 +43,16 @@ define void @vector_reverse_mask_v4i1(ptr noalias %a, ptr noalias %cond, i64 %N) ; CHECK-NEXT:[[TMP5:%.*]] = fcmp une <4 x double> [[REVERSE]], zeroinitializer ; CHECK-NEXT:[[TMP6:%.*]] = fcmp une <4 x double> [[REVERSE2]], zeroinitializer ; CHECK-NEXT:[[TMP7:%.*]] = getelementptr double, ptr [[A:%.*]], i64 [[TMP1]] -; CHECK-NEXT:[[TMP8:%.*]] = getelementptr double, ptr [[TMP7]], i64 -3 ; CHECK-NEXT:[[REVERSE3:%.*]] = shufflevector <4 x i1> [[TMP5]], <4 x i1> poison, <4 x i32> +; CHECK-NEXT:[[REVERSE4:%.*]] = shufflevector <4 x i1> [[TMP6]], <4 x i1> poison, <4 x i32> +; CHECK-NEXT:[[TMP8:%.*]] = getelementptr double, ptr [[TMP7]], i64 -3 ; CHECK-NEXT:[[WIDE_MASKED_LOAD:%.*]] = call <4 x double> @llvm.masked.load.v4f64.p0(ptr [[TMP8]], i32 8, <4 x i1> [[REVERSE3]], <4 x double> poison) ; CHECK-NEXT:[[TMP9:%.*]] = getelementptr double, ptr [[TMP7]], i64 -7 -; CHECK-NEXT:[[REVERSE5:%.*]] = shufflevector <4 x i1> [[TMP6]], <4 x i1> poison, <4 x i32> -; CHECK-NEXT:[[WIDE_MASKED_LOAD6:%.*]] = call <4 x double> @llvm.masked.load.v4f64.p0(ptr [[TMP9]], i32 8, <4 x i1> [[REVERSE5]], <4 x double> poison) +; CHECK-NEXT:[[WIDE_MASKED_LOAD6:%.*]
[llvm-branch-commits] [llvm] [VPlan] Model address separately. (PR #72164)
https://github.com/fhahn created https://github.com/llvm/llvm-project/pull/72164 Move vector pointer generation to a separate VPInstruction opcode. This untangles address computation from the memory recipes future and is also needed to enable explicit unrolling in VPlan. ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [VPlan] Model address separately. (PR #72164)
fhahn wrote: Note that this patch depends on #72163 https://github.com/llvm/llvm-project/pull/72164 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] e8304de - [𝘀𝗽𝗿] initial version
Author: Florian Hahn Date: 2023-11-16T20:48:54Z New Revision: e8304de86f59fa66bc8af03b401b0c4d28f2ac97 URL: https://github.com/llvm/llvm-project/commit/e8304de86f59fa66bc8af03b401b0c4d28f2ac97 DIFF: https://github.com/llvm/llvm-project/commit/e8304de86f59fa66bc8af03b401b0c4d28f2ac97.diff LOG: [𝘀𝗽𝗿] initial version Created using spr 1.3.4 Added: Modified: llvm/lib/Transforms/InstCombine/InstructionCombining.cpp Removed: diff --git a/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp b/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp index 463a7b5bb1bb588..5859f58a9f462b0 100644 --- a/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp +++ b/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp @@ -4367,8 +4367,7 @@ static bool combineInstructionsOverFunction( Function &F, InstructionWorklist &Worklist, AliasAnalysis *AA, AssumptionCache &AC, TargetLibraryInfo &TLI, TargetTransformInfo &TTI, DominatorTree &DT, OptimizationRemarkEmitter &ORE, BlockFrequencyInfo *BFI, -ProfileSummaryInfo *PSI, unsigned MaxIterations, bool VerifyFixpoint, -LoopInfo *LI) { +ProfileSummaryInfo *PSI, LoopInfo *LI, const InstCombineOptions &Opts) { auto &DL = F.getParent()->getDataLayout(); /// Builder - This is an IRBuilder that automatically inserts new @@ -4394,8 +4393,8 @@ static bool combineInstructionsOverFunction( while (true) { ++Iteration; -if (Iteration > MaxIterations && !VerifyFixpoint) { - LLVM_DEBUG(dbgs() << "\n\n[IC] Iteration limit #" << MaxIterations +if (Iteration > Opts.MaxIterations && !Opts.VerifyFixpoint) { + LLVM_DEBUG(dbgs() << "\n\n[IC] Iteration limit #" << Opts.MaxIterations << " on " << F.getName() << " reached; stopping without verifying fixpoint\n"); break; @@ -4414,10 +4413,10 @@ static bool combineInstructionsOverFunction( break; MadeIRChange = true; -if (Iteration > MaxIterations) { +if (Iteration > Opts.MaxIterations) { report_fatal_error( "Instruction Combining did not reach a fixpoint after " + - Twine(MaxIterations) + " iterations"); + Twine(Opts.MaxIterations) + " iterations"); } } @@ -4468,8 +4467,7 @@ PreservedAnalyses InstCombinePass::run(Function &F, &AM.getResult(F) : nullptr; if (!combineInstructionsOverFunction(F, Worklist, AA, AC, TLI, TTI, DT, ORE, - BFI, PSI, Options.MaxIterations, - Options.VerifyFixpoint, LI)) + BFI, PSI, LI, Options)) // No changes, all analyses are preserved. return PreservedAnalyses::all(); @@ -4518,9 +4516,7 @@ bool InstructionCombiningPass::runOnFunction(Function &F) { nullptr; return combineInstructionsOverFunction(F, Worklist, AA, AC, TLI, TTI, DT, ORE, - BFI, PSI, - InstCombineDefaultMaxIterations, - /*VerifyFixpoint */ false, LI); + BFI, PSI, LI, InstCombineOptions()); } char InstructionCombiningPass::ID = 0; ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 267d656 - [𝘀𝗽𝗿] initial version
Author: Florian Hahn Date: 2023-11-16T20:49:01Z New Revision: 267d656ea58a77622dc512093bf64f50e4a04b95 URL: https://github.com/llvm/llvm-project/commit/267d656ea58a77622dc512093bf64f50e4a04b95 DIFF: https://github.com/llvm/llvm-project/commit/267d656ea58a77622dc512093bf64f50e4a04b95.diff LOG: [𝘀𝗽𝗿] initial version Created using spr 1.3.4 Added: Modified: llvm/include/llvm/Transforms/InstCombine/InstCombine.h llvm/lib/Passes/PassBuilder.cpp llvm/lib/Passes/PassBuilderPipelines.cpp llvm/lib/Passes/PassRegistry.def llvm/lib/Transforms/InstCombine/InstCombineInternal.h llvm/lib/Transforms/InstCombine/InstructionCombining.cpp llvm/test/Transforms/InstCombine/no_sink_instruction.ll llvm/test/Transforms/PhaseOrdering/AArch64/sinking-vs-if-conversion.ll Removed: diff --git a/llvm/include/llvm/Transforms/InstCombine/InstCombine.h b/llvm/include/llvm/Transforms/InstCombine/InstCombine.h index f38ec2debb18136..14d1c127984c874 100644 --- a/llvm/include/llvm/Transforms/InstCombine/InstCombine.h +++ b/llvm/include/llvm/Transforms/InstCombine/InstCombine.h @@ -32,6 +32,7 @@ struct InstCombineOptions { // Verify that a fix point has been reached after MaxIterations. bool VerifyFixpoint = false; unsigned MaxIterations = InstCombineDefaultMaxIterations; + bool EnableCodeSinking = true; InstCombineOptions() = default; @@ -49,6 +50,11 @@ struct InstCombineOptions { MaxIterations = Value; return *this; } + + InstCombineOptions &setEnableCodeSinking(bool Value) { +EnableCodeSinking = Value; +return *this; + } }; class InstCombinePass : public PassInfoMixin { diff --git a/llvm/lib/Passes/PassBuilder.cpp b/llvm/lib/Passes/PassBuilder.cpp index dd9d799f9d55dcc..1e79ff660ea3ea2 100644 --- a/llvm/lib/Passes/PassBuilder.cpp +++ b/llvm/lib/Passes/PassBuilder.cpp @@ -872,6 +872,8 @@ Expected parseInstCombineOptions(StringRef Params) { ParamName).str(), inconvertibleErrorCode()); Result.setMaxIterations((unsigned)MaxIterations.getZExtValue()); +} else if (ParamName == "code-sinking") { + Result.setEnableCodeSinking(Enable); } else { return make_error( formatv("invalid InstCombine pass parameter '{0}' ", ParamName).str(), diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp index f3d280316e04077..8946480340d29a9 100644 --- a/llvm/lib/Passes/PassBuilderPipelines.cpp +++ b/llvm/lib/Passes/PassBuilderPipelines.cpp @@ -1101,7 +1101,8 @@ PassBuilder::buildModuleSimplificationPipeline(OptimizationLevel Level, FunctionPassManager GlobalCleanupPM; // FIXME: Should this instead by a run of SROA? GlobalCleanupPM.addPass(PromotePass()); - GlobalCleanupPM.addPass(InstCombinePass()); + GlobalCleanupPM.addPass( + InstCombinePass(InstCombineOptions().setEnableCodeSinking(false))); invokePeepholeEPCallbacks(GlobalCleanupPM, Level); GlobalCleanupPM.addPass( SimplifyCFGPass(SimplifyCFGOptions().convertSwitchRangeToICmp(true))); diff --git a/llvm/lib/Passes/PassRegistry.def b/llvm/lib/Passes/PassRegistry.def index 2067fc473b522db..50dda63578a0add 100644 --- a/llvm/lib/Passes/PassRegistry.def +++ b/llvm/lib/Passes/PassRegistry.def @@ -526,7 +526,8 @@ FUNCTION_PASS_WITH_PARAMS("instcombine", parseInstCombineOptions, "no-use-loop-info;use-loop-info;" "no-verify-fixpoint;verify-fixpoint;" - "max-iterations=N" + "max-iterations=N;" + "no-code-sinking;code-sinking" ) FUNCTION_PASS_WITH_PARAMS("mldst-motion", "MergedLoadStoreMotionPass", diff --git a/llvm/lib/Transforms/InstCombine/InstCombineInternal.h b/llvm/lib/Transforms/InstCombine/InstCombineInternal.h index 68a8fb676d8d909..83364f14ef7db6c 100644 --- a/llvm/lib/Transforms/InstCombine/InstCombineInternal.h +++ b/llvm/lib/Transforms/InstCombine/InstCombineInternal.h @@ -53,6 +53,7 @@ class DataLayout; class DominatorTree; class GEPOperator; class GlobalVariable; +struct InstCombineOptions; class LoopInfo; class OptimizationRemarkEmitter; class ProfileSummaryInfo; @@ -68,9 +69,11 @@ class LLVM_LIBRARY_VISIBILITY InstCombinerImpl final TargetLibraryInfo &TLI, TargetTransformInfo &TTI, DominatorTree &DT, OptimizationRemarkEmitter &ORE, BlockFrequencyInfo *BFI, ProfileSummaryInfo *PSI, - const DataLayout &DL, LoopInfo *LI) + const DataLayout &DL, LoopInfo *LI, + const InstCombineOptions &Opts) : InstCombiner(Worklist, Builder, MinimizeSize, AA, AC, TLI, TTI, DT, ORE, - BFI, PSI, DL, LI) {} + BFI, PSI, DL
[llvm-branch-commits] [llvm] [Passes] Disable code sinking in InstCombine early on. (PR #72567)
https://github.com/fhahn created https://github.com/llvm/llvm-project/pull/72567 Sinking instructions very early in the pipeline destroys dereferenceability information, that could be used by other passes, e.g. this can prevent if-conversion by SimplifyCFG. ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [Passes] Disable code sinking in InstCombine early on. (PR #72567)
fhahn wrote: The reason for keeping the original `EnableCodeSinking` option is to retain the ability to disable code sinking from the command line https://github.com/llvm/llvm-project/pull/72567 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [Passes] Disable code sinking in InstCombine early on. (PR #72567)
fhahn wrote: > Is the idea here that if the original code executed these unconditionally, > then it's more likely than not that unconditionally executing them is > beneficial? I think the main motivation for the patch is the hypothesis that sinking early is worse as canonical form early on, because once we sunk we cannot really undo it easily. And once we sunk, we won't be able to consider certain transforms. Delaying sinking gives other passes like SimplifyCFG a chance to perform things like if-conversion, if considered profitable. There certainly could be regressions due to SimplifyCFG's cost model taking a wrong decision but I think in those cases it would be better to improve the cost model, rather than preventing it up-front by sinking (which isn't cost-model driven at all in InstCombine IIRC). It should also be possible to undo if-conversion in the backend, if that's more profitable there; at this point, we also arguably have much more accurate information about register pressure, available execution units, accurate latencys to make a more informed decision. Slightly orthogonal to this, one thing I want to look into at some point is adding a way to specific dereferenceabilty at various program points for pointers (e.g. via an intrinsic or assumption). That would ideally allow us to retain dereferenceabilty information from the original program, even after sinking and would allow if-conversion even after sinking. Avoiding sinking early on would probably still be the preferred early canonical form I think. https://github.com/llvm/llvm-project/pull/72567 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [Passes] Disable code sinking in InstCombine early on. (PR #72567)
fhahn wrote: The specific test cases came from some users who explicitly wanted to get the code there if-converted for the CPU they are targeting. It may not be profitable for all targets/CPUs though, so we would still rely on the cost-model to take the correct decision per target/CPU. https://github.com/llvm/llvm-project/pull/72567 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 6f3b88b - [VPlan] Move trunc ([s|z]ext A) simplifications to simplifyRecipe.
Author: Florian Hahn Date: 2023-11-16T21:17:10Z New Revision: 6f3b88baa2ac9ec892ed3ad7dd64d0d537010bc5 URL: https://github.com/llvm/llvm-project/commit/6f3b88baa2ac9ec892ed3ad7dd64d0d537010bc5 DIFF: https://github.com/llvm/llvm-project/commit/6f3b88baa2ac9ec892ed3ad7dd64d0d537010bc5.diff LOG: [VPlan] Move trunc ([s|z]ext A) simplifications to simplifyRecipe. Split off simplification from D149903 as suggested. This should be effectively NFC until D149903 lands. Added: Modified: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp Removed: diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp index c55864de9c17086..0eaaa037ad5782f 100644 --- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp +++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp @@ -816,15 +816,28 @@ static void simplifyRecipe(VPRecipeBase &R, VPTypeAnalysis &TypeInfo) { break; } case Instruction::Trunc: { -VPRecipeBase *Zext = R.getOperand(0)->getDefiningRecipe(); -if (!Zext || getOpcodeForRecipe(*Zext) != Instruction::ZExt) +VPRecipeBase *Ext = R.getOperand(0)->getDefiningRecipe(); +if (!Ext) break; -VPValue *A = Zext->getOperand(0); +unsigned ExtOpcode = getOpcodeForRecipe(*Ext); +if (ExtOpcode != Instruction::ZExt && ExtOpcode != Instruction::SExt) + break; +VPValue *A = Ext->getOperand(0); VPValue *Trunc = R.getVPSingleValue(); -Type *TruncToTy = TypeInfo.inferScalarType(Trunc); -if (TruncToTy && TruncToTy == TypeInfo.inferScalarType(A)) +Type *TruncTy = TypeInfo.inferScalarType(Trunc); +Type *ATy = TypeInfo.inferScalarType(A); +if (TruncTy == ATy) { Trunc->replaceAllUsesWith(A); - +} else if (ATy->getScalarSizeInBits() < TruncTy->getScalarSizeInBits()) { + auto *VPC = + new VPWidenCastRecipe(Instruction::CastOps(ExtOpcode), A, TruncTy); + VPC->insertBefore(&R); + Trunc->replaceAllUsesWith(VPC); +} else if (ATy->getScalarSizeInBits() > TruncTy->getScalarSizeInBits()) { + auto *VPC = new VPWidenCastRecipe(Instruction::Trunc, A, TruncTy); + VPC->insertBefore(&R); + Trunc->replaceAllUsesWith(VPC); +} #ifndef NDEBUG // Verify that the cached type info is for both A and its users is still // accurate by comparing it to freshly computed types. ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [lld] [libunwind] [libcxxabi] [libcxx] [compiler-rt] [llvm] [flang] [mlir] [clang-tools-extra] [lldb] [clang] [Passes] Disable code sinking in InstCombine early on. (PR #72567)
https://github.com/fhahn updated https://github.com/llvm/llvm-project/pull/72567 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 1f729e0 - [LV] Reverse mask up front, not when creating vector pointer.
Author: Florian Hahn Date: 2023-11-17T12:52:06Z New Revision: 1f729e0bc7a7012fa3a7618100f0865e4e32fc0d URL: https://github.com/llvm/llvm-project/commit/1f729e0bc7a7012fa3a7618100f0865e4e32fc0d DIFF: https://github.com/llvm/llvm-project/commit/1f729e0bc7a7012fa3a7618100f0865e4e32fc0d.diff LOG: [LV] Reverse mask up front, not when creating vector pointer. Reverse mask early on when populating BlockInMask. This will enable separating mask management and address computation from the memory recipes in the future and is also needed to enable explicit unrolling in VPlan. Pull Request: https://github.com/llvm/llvm-project/pull/72163 Added: Modified: llvm/test/Transforms/LoopVectorize/X86/masked_load_store.ll Removed: diff --git a/llvm/test/Transforms/LoopVectorize/X86/masked_load_store.ll b/llvm/test/Transforms/LoopVectorize/X86/masked_load_store.ll index d775b0e0f0199c4..5cc4d43ec2e49f5 100644 --- a/llvm/test/Transforms/LoopVectorize/X86/masked_load_store.ll +++ b/llvm/test/Transforms/LoopVectorize/X86/masked_load_store.ll @@ -1515,24 +1515,24 @@ define void @foo6(ptr nocapture readonly %in, ptr nocapture %out, i32 %size, ptr ; AVX512-NEXT:[[TMP21:%.*]] = getelementptr double, ptr [[IN]], i64 [[TMP1]] ; AVX512-NEXT:[[TMP22:%.*]] = getelementptr double, ptr [[IN]], i64 [[TMP2]] ; AVX512-NEXT:[[TMP23:%.*]] = getelementptr double, ptr [[IN]], i64 [[TMP3]] +; AVX512-NEXT:[[REVERSE12:%.*]] = shufflevector <8 x i1> [[TMP16]], <8 x i1> poison, <8 x i32> +; AVX512-NEXT:[[REVERSE14:%.*]] = shufflevector <8 x i1> [[TMP17]], <8 x i1> poison, <8 x i32> +; AVX512-NEXT:[[REVERSE17:%.*]] = shufflevector <8 x i1> [[TMP18]], <8 x i1> poison, <8 x i32> +; AVX512-NEXT:[[REVERSE20:%.*]] = shufflevector <8 x i1> [[TMP19]], <8 x i1> poison, <8 x i32> ; AVX512-NEXT:[[TMP24:%.*]] = getelementptr double, ptr [[TMP20]], i32 0 ; AVX512-NEXT:[[TMP25:%.*]] = getelementptr double, ptr [[TMP24]], i32 -7 -; AVX512-NEXT:[[REVERSE12:%.*]] = shufflevector <8 x i1> [[TMP16]], <8 x i1> poison, <8 x i32> ; AVX512-NEXT:[[WIDE_MASKED_LOAD:%.*]] = call <8 x double> @llvm.masked.load.v8f64.p0(ptr [[TMP25]], i32 8, <8 x i1> [[REVERSE12]], <8 x double> poison), !alias.scope !34 ; AVX512-NEXT:[[REVERSE13:%.*]] = shufflevector <8 x double> [[WIDE_MASKED_LOAD]], <8 x double> poison, <8 x i32> ; AVX512-NEXT:[[TMP26:%.*]] = getelementptr double, ptr [[TMP20]], i32 -8 ; AVX512-NEXT:[[TMP27:%.*]] = getelementptr double, ptr [[TMP26]], i32 -7 -; AVX512-NEXT:[[REVERSE14:%.*]] = shufflevector <8 x i1> [[TMP17]], <8 x i1> poison, <8 x i32> ; AVX512-NEXT:[[WIDE_MASKED_LOAD15:%.*]] = call <8 x double> @llvm.masked.load.v8f64.p0(ptr [[TMP27]], i32 8, <8 x i1> [[REVERSE14]], <8 x double> poison), !alias.scope !34 ; AVX512-NEXT:[[REVERSE16:%.*]] = shufflevector <8 x double> [[WIDE_MASKED_LOAD15]], <8 x double> poison, <8 x i32> ; AVX512-NEXT:[[TMP28:%.*]] = getelementptr double, ptr [[TMP20]], i32 -16 ; AVX512-NEXT:[[TMP29:%.*]] = getelementptr double, ptr [[TMP28]], i32 -7 -; AVX512-NEXT:[[REVERSE17:%.*]] = shufflevector <8 x i1> [[TMP18]], <8 x i1> poison, <8 x i32> ; AVX512-NEXT:[[WIDE_MASKED_LOAD18:%.*]] = call <8 x double> @llvm.masked.load.v8f64.p0(ptr [[TMP29]], i32 8, <8 x i1> [[REVERSE17]], <8 x double> poison), !alias.scope !34 ; AVX512-NEXT:[[REVERSE19:%.*]] = shufflevector <8 x double> [[WIDE_MASKED_LOAD18]], <8 x double> poison, <8 x i32> ; AVX512-NEXT:[[TMP30:%.*]] = getelementptr double, ptr [[TMP20]], i32 -24 ; AVX512-NEXT:[[TMP31:%.*]] = getelementptr double, ptr [[TMP30]], i32 -7 -; AVX512-NEXT:[[REVERSE20:%.*]] = shufflevector <8 x i1> [[TMP19]], <8 x i1> poison, <8 x i32> ; AVX512-NEXT:[[WIDE_MASKED_LOAD21:%.*]] = call <8 x double> @llvm.masked.load.v8f64.p0(ptr [[TMP31]], i32 8, <8 x i1> [[REVERSE20]], <8 x double> poison), !alias.scope !34 ; AVX512-NEXT:[[REVERSE22:%.*]] = shufflevector <8 x double> [[WIDE_MASKED_LOAD21]], <8 x double> poison, <8 x i32> ; AVX512-NEXT:[[TMP32:%.*]] = fadd <8 x double> [[REVERSE13]], ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] e3c1a20 - [𝘀𝗽𝗿] initial version
Author: Florian Hahn Date: 2023-11-17T12:52:05Z New Revision: e3c1a20a1b5a115a75d83f5783ba64927607f427 URL: https://github.com/llvm/llvm-project/commit/e3c1a20a1b5a115a75d83f5783ba64927607f427 DIFF: https://github.com/llvm/llvm-project/commit/e3c1a20a1b5a115a75d83f5783ba64927607f427.diff LOG: [𝘀𝗽𝗿] initial version Created using spr 1.3.4 Added: Modified: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp llvm/test/Transforms/LoopVectorize/AArch64/sve-vector-reverse-mask4.ll llvm/test/Transforms/LoopVectorize/AArch64/vector-reverse-mask4.ll llvm/test/Transforms/LoopVectorize/X86/masked_load_store.ll Removed: diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp index 64093a0db5f81c8..3b41939246b9946 100644 --- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp +++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp @@ -9562,8 +9562,12 @@ void VPWidenMemoryInstructionRecipe::execute(VPTransformState &State) { InnerLoopVectorizer::VectorParts BlockInMaskParts(State.UF); bool isMaskRequired = getMask(); if (isMaskRequired) -for (unsigned Part = 0; Part < State.UF; ++Part) - BlockInMaskParts[Part] = State.get(getMask(), Part); +for (unsigned Part = 0; Part < State.UF; ++Part) { + Value *Mask = State.get(getMask(), Part); + if (isReverse()) +Mask = Builder.CreateVectorReverse(Mask, "reverse"); + BlockInMaskParts[Part] = Mask; +} const auto CreateVecPtr = [&](unsigned Part, Value *Ptr) -> Value * { // Calculate the pointer for the specific unroll-part. @@ -9595,9 +9599,6 @@ void VPWidenMemoryInstructionRecipe::execute(VPTransformState &State) { PartPtr = Builder.CreateGEP(ScalarDataTy, Ptr, NumElt, "", InBounds); PartPtr = Builder.CreateGEP(ScalarDataTy, PartPtr, LastLane, "", InBounds); - if (isMaskRequired) // Reverse of a null all-one mask is a null mask. -BlockInMaskParts[Part] = -Builder.CreateVectorReverse(BlockInMaskParts[Part], "reverse"); } else { Value *Increment = createStepForVF(Builder, IndexTy, State.VF, Part); PartPtr = Builder.CreateGEP(ScalarDataTy, Ptr, Increment, "", InBounds); diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/sve-vector-reverse-mask4.ll b/llvm/test/Transforms/LoopVectorize/AArch64/sve-vector-reverse-mask4.ll index 58c54103c72c6dc..70833e44b075a98 100644 --- a/llvm/test/Transforms/LoopVectorize/AArch64/sve-vector-reverse-mask4.ll +++ b/llvm/test/Transforms/LoopVectorize/AArch64/sve-vector-reverse-mask4.ll @@ -22,8 +22,8 @@ define void @vector_reverse_mask_nxv4i1(ptr %a, ptr %cond, i64 %N) #0 { ; CHECK: %[[WIDEMSKLOAD:.*]] = call @llvm.masked.load.nxv4f64.p0(ptr %{{.*}}, i32 8, %[[REVERSE6]], poison) ; CHECK: %[[REVERSE7:.*]] = call @llvm.experimental.vector.reverse.nxv4f64( %[[WIDEMSKLOAD]]) ; CHECK: %[[FADD:.*]] = fadd %[[REVERSE7]] +; CHECK: %[[REVERSE9:.*]] = call @llvm.experimental.vector.reverse.nxv4i1( %{{.*}}) ; CHECK: %[[REVERSE8:.*]] = call @llvm.experimental.vector.reverse.nxv4f64( %[[FADD]]) -; CHECK: %[[REVERSE9:.*]] = call @llvm.experimental.vector.reverse.nxv4i1( %{{.*}}) ; CHECK: call void @llvm.masked.store.nxv4f64.p0( %[[REVERSE8]], ptr %{{.*}}, i32 8, %[[REVERSE9]] entry: diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/vector-reverse-mask4.ll b/llvm/test/Transforms/LoopVectorize/AArch64/vector-reverse-mask4.ll index e8e2008912c8344..195826300e3996f 100644 --- a/llvm/test/Transforms/LoopVectorize/AArch64/vector-reverse-mask4.ll +++ b/llvm/test/Transforms/LoopVectorize/AArch64/vector-reverse-mask4.ll @@ -43,16 +43,16 @@ define void @vector_reverse_mask_v4i1(ptr noalias %a, ptr noalias %cond, i64 %N) ; CHECK-NEXT:[[TMP5:%.*]] = fcmp une <4 x double> [[REVERSE]], zeroinitializer ; CHECK-NEXT:[[TMP6:%.*]] = fcmp une <4 x double> [[REVERSE2]], zeroinitializer ; CHECK-NEXT:[[TMP7:%.*]] = getelementptr double, ptr [[A:%.*]], i64 [[TMP1]] -; CHECK-NEXT:[[TMP8:%.*]] = getelementptr double, ptr [[TMP7]], i64 -3 ; CHECK-NEXT:[[REVERSE3:%.*]] = shufflevector <4 x i1> [[TMP5]], <4 x i1> poison, <4 x i32> +; CHECK-NEXT:[[REVERSE4:%.*]] = shufflevector <4 x i1> [[TMP6]], <4 x i1> poison, <4 x i32> +; CHECK-NEXT:[[TMP8:%.*]] = getelementptr double, ptr [[TMP7]], i64 -3 ; CHECK-NEXT:[[WIDE_MASKED_LOAD:%.*]] = call <4 x double> @llvm.masked.load.v4f64.p0(ptr [[TMP8]], i32 8, <4 x i1> [[REVERSE3]], <4 x double> poison) ; CHECK-NEXT:[[TMP9:%.*]] = getelementptr double, ptr [[TMP7]], i64 -7 -; CHECK-NEXT:[[REVERSE5:%.*]] = shufflevector <4 x i1> [[TMP6]], <4 x i1> poison, <4 x i32> -; CHECK-NEXT:[[WIDE_MASKED_LOAD6:%.*]] = call <4 x double> @llvm.masked.load.v4f64.p0(ptr [[TMP9]], i32 8, <4 x i1> [[REVERSE5]], <4 x double> poison) +; CHECK-NEXT:[[WIDE_MASKED_LOAD6:%.*]
[llvm-branch-commits] [mlir] [lldb] [llvm] [libunwind] [libcxx] [clang] [VPlan] Model address separately. (PR #72164)
=?utf-8?q?Balázs_Kéri?= , Endre =?utf-8?q?Fülöp?= ,Michael Buch ,Momchil Velikov ,S. B. Tam =?utf-8?q?,?=Matthew Devereau ,QuietMisdreavus ,Anatoly Trosinenko ,Diana ,Akash Banerjee ,Ivan Kosarev ,Simon Pilgrim ,Benjamin Maxwell ,Simon Pilgrim ,David Sherwood <57997763+david-...@users.noreply.github.com>,Leandro Lupori ,Utkarsh Saxena ,Jeremy Morse ,elhewaty ,David Spickett ,David Truby ,Alexey Bataev ,Momchil Velikov ,Utkarsh Saxena ,Timm Baeder ,Aaron Ballman ,Qiongsi Wu <274595+qiongs...@users.noreply.github.com>,Alexey Bataev ,Alex Richardson ,Z572 ,Youngsuk Kim ,Aart Bik <39774503+aart...@users.noreply.github.com>,Hans ,Alexey Bataev ,Acim-Maravic <119684637+acim-mara...@users.noreply.github.com>,Joseph Huber ,androm3da ,Benjamin Kramer ,Acim Maravic <119684637+acim-mara...@users.noreply.github.com>,Alexey Bataev ,Michael Maitland ,Jan Patrick Lehr ,Mingming Liu ,Fangrui Song ,Walter Erquinigo ,Joseph Huber ,Alan Phipps ,Timm Baeder ,lntue <35648136+ln...@users.noreply.github.com>,Zequan Wu ,Alexey Bataev ,AdityaK <1894981+hiradi...@users.noreply.github.com>,Maksim Panchenko ,Jacek Caban ,Benjamin Kramer ,Leonard Chan ,Changpeng Fang ,Jan Patrick Lehr ,PiJoules <6019989+pijou...@users.noreply.github.com>,Christudasan Devadasan , Valentin Clement =?utf-8?b?KOODkOODrOODsw=?=,Aart Bik <39774503+aart...@users.noreply.github.com>, Valentin Clement =?utf-8?b?KOODkOODrOODsw=?=,Julian Schmidt <44101708+5chmi...@users.noreply.github.com>,Jacek Caban ,Max191 <44243577+max...@users.noreply.github.com>,Nicolas van Kempen ,=?utf-8?q?Félix-Antoine?= Constantin,Peter Klausler <35819229+klaus...@users.noreply.github.com>,Jake Egan <5326451+jakee...@users.noreply.github.com>,Kiran Chandramohan ,Kiran Chandramohan , Valentin Clement =?utf-8?b?KOODkOODrOODsw=?=,Kiran Chandramohan ,Karthika Devi C ,Nikolas Klauser ,Maksim Levental ,Florian Mayer ,Florian Mayer ,Florian Mayer ,Aart Bik <39774503+aart...@users.noreply.github.com>,Benjamin Kramer ,Vitaly Buka ,Vitaly Buka ,Michael Maitland ,Benjamin Kramer ,Andres Villegas ,Greg Clayton ,Nico Weber ,Jinyang He ,Chen Zheng ,Craig Topper ,ZhaoQi ,Mircea Trofin ,Max191 <44243577+max...@users.noreply.github.com>,Shengchen Kan ,Qiu Chaofan ,Vitaly Buka ,Kiran Chandramohan ,Kiran Chandramohan ,"Yueh-Ting (eop) Chen" ,Shengchen Kan ,Timm Baeder ,Timm Baeder , Timm =?utf-8?q?Bäder?= ,Michael Buch ,Nikita Popov ,Nikita Popov ,Michael Buch ,jeanPerier ,Luke Lau ,Momchil Velikov ,Rik Huijzer ,Kiran Chandramohan ,"Yueh-Ting (eop) Chen" ,Ben Shi <2283975...@qq.com>, Stefan =?utf-8?q?Gränitz?= ,Michael Buch ,Akash Banerjee ,Jay Foad ,Michael Buch ,Michael Buch ,Valery Pykhtin ,Jay Foad ,Jacek Caban ,chuongg3 ,Simon Pilgrim ,Simon Pilgrim ,Simon Pilgrim ,Alex Bradbury ,Timm =?utf-8?q?Bäder?= ,petar-avramovic <56677889+petar-avramo...@users.noreply.github.com>,Kiran Chandramohan ,Tavian Barnes ,Yingwei Zheng ,serge-sans-paille ,Cullen Rhodes ,agozillon ,Alex Bradbury ,Alexey Bataev ,David Truby ,petar-avramovic <56677889+petar-avramo...@users.noreply.github.com>,Florian Hahn ,Z572 ,Nikita Popov ,Simon Pilgrim ,Simon Pilgrim ,Jeremy Morse ,Kiran Chandramohan ,Nicolas Vasilache ,Bill Wendling <5993918+bwendl...@users.noreply.github.com>,Heejin Ahn ,antoine moynault ,Kiran Chandramohan ,Michael Buch ,Cyndy Ishida ,Craig Topper ,Craig Topper ,Alexey Bataev ,Artem Belevich ,Craig Topper ,Nico Weber ,Yingwei Zheng ,Yingwei Zheng ,Brad Smith ,stephenpeckham <118857872+stephenpeck...@users.noreply.github.com>,Fangrui Song ,Craig Topper ,Craig Topper ,quanwanandy <150498259+quanwana...@users.noreply.github.com>,Alexey Bataev ,Mircea Trofin ,Vladislav Khmelevsky ,Rainer Orth ,Tim Harvey <146767459+timatgoo...@users.noreply.github.com>,Peiming Liu <36770114+peiming...@users.noreply.github.com>,Bjorn Pettersson ,Philip Reames ,Augusto Noronha ,Fangrui Song ,Aart Bik <39774503+aart...@users.noreply.github.com>,Aart Bik <39774503+aart...@users.noreply.github.com>,Teresa Johnson ,Jake Egan <5326451+jakee...@users.noreply.github.com>,Youngsuk Kim ,Jason Molenda ,Florian Hahn ,Amara Emerson ,Owen Pan ,Craig Topper ,Michael Maitland ,Owen Pan ,philnik777 ,David Benjamin ,Michael Maitland ,Michael Maitland ,Benjamin Kramer ,PiJoules <6019989+pijou...@users.noreply.github.com>,Fangrui Song ,Amara Emerson ,Cyndy Ishida ,Ben Shi <2283975...@qq.com>,Ben Shi <2283975...@qq.com>,Michael Kenzel <15786918+michael-ken...@users.noreply.github.com>,"Yueh-Ting (eop) Chen" ,Matthias Springer ,Matthias Springer ,jyu2-git ,LiaoChunyu ,Jordan Rupprecht ,Matthias Springer ,Jakub Kuderski ,Jinyang He ,Uday Bondhugula ,Craig Topper ,Aart Bik <39774503+aart...@users.noreply.github.com>,Christudasan Devadasan ,Timm Baeder ,Timm Baeder ,Fangrui Song ,Vladislav Khmelevsky ,wenzhi-cui <40185576+wenzhi-...@users.noreply.github.com>,Aart Bik <39774503+aart...@users.noreply.github.com>,Craig Topp
[llvm-branch-commits] [mlir] [libunwind] [libcxx] [clang-tools-extra] [clang] [llvm] [lldb] [lld] [VPlan] Model address separately. (PR #72164)
=?utf-8?q?Balázs_Kéri?= , Endre =?utf-8?q?Fülöp?= ,Michael Buch ,Momchil Velikov ,S. B. Tam =?utf-8?q?,?=Matthew Devereau ,QuietMisdreavus ,Anatoly Trosinenko ,Diana ,Akash Banerjee ,Ivan Kosarev ,Simon Pilgrim ,Benjamin Maxwell ,Simon Pilgrim ,David Sherwood <57997763+david-...@users.noreply.github.com>,Leandro Lupori ,Utkarsh Saxena ,Jeremy Morse ,elhewaty ,David Spickett ,David Truby ,Alexey Bataev ,Momchil Velikov ,Utkarsh Saxena ,Timm Baeder ,Aaron Ballman ,Qiongsi Wu <274595+qiongs...@users.noreply.github.com>,Alexey Bataev ,Alex Richardson ,Z572 ,Youngsuk Kim ,Aart Bik <39774503+aart...@users.noreply.github.com>,Hans ,Alexey Bataev ,Acim-Maravic <119684637+acim-mara...@users.noreply.github.com>,Joseph Huber ,androm3da ,Benjamin Kramer ,Acim Maravic <119684637+acim-mara...@users.noreply.github.com>,Alexey Bataev ,Michael Maitland ,Jan Patrick Lehr ,Mingming Liu ,Fangrui Song ,Walter Erquinigo ,Joseph Huber ,Alan Phipps ,Timm Baeder ,lntue <35648136+ln...@users.noreply.github.com>,Zequan Wu ,Alexey Bataev ,AdityaK <1894981+hiradi...@users.noreply.github.com>,Maksim Panchenko ,Jacek Caban ,Benjamin Kramer ,Leonard Chan ,Changpeng Fang ,Jan Patrick Lehr ,PiJoules <6019989+pijou...@users.noreply.github.com>,Christudasan Devadasan , Valentin Clement =?utf-8?b?KOODkOODrOODsw=?=,Aart Bik <39774503+aart...@users.noreply.github.com>, Valentin Clement =?utf-8?b?KOODkOODrOODsw=?=,Julian Schmidt <44101708+5chmi...@users.noreply.github.com>,Jacek Caban ,Max191 <44243577+max...@users.noreply.github.com>,Nicolas van Kempen ,=?utf-8?q?Félix-Antoine?= Constantin,Peter Klausler <35819229+klaus...@users.noreply.github.com>,Jake Egan <5326451+jakee...@users.noreply.github.com>,Kiran Chandramohan ,Kiran Chandramohan , Valentin Clement =?utf-8?b?KOODkOODrOODsw=?=,Kiran Chandramohan ,Karthika Devi C ,Nikolas Klauser ,Maksim Levental ,Florian Mayer ,Florian Mayer ,Florian Mayer ,Aart Bik <39774503+aart...@users.noreply.github.com>,Benjamin Kramer ,Vitaly Buka ,Vitaly Buka ,Michael Maitland ,Benjamin Kramer ,Andres Villegas ,Greg Clayton ,Nico Weber ,Jinyang He ,Chen Zheng ,Craig Topper ,ZhaoQi ,Mircea Trofin ,Max191 <44243577+max...@users.noreply.github.com>,Shengchen Kan ,Qiu Chaofan ,Vitaly Buka ,Kiran Chandramohan ,Kiran Chandramohan ,"Yueh-Ting (eop) Chen" ,Shengchen Kan ,Timm Baeder ,Timm Baeder , Timm =?utf-8?q?Bäder?= ,Michael Buch ,Nikita Popov ,Nikita Popov ,Michael Buch ,jeanPerier ,Luke Lau ,Momchil Velikov ,Rik Huijzer ,Kiran Chandramohan ,"Yueh-Ting (eop) Chen" ,Ben Shi <2283975...@qq.com>, Stefan =?utf-8?q?Gränitz?= ,Michael Buch ,Akash Banerjee ,Jay Foad ,Michael Buch ,Michael Buch ,Valery Pykhtin ,Jay Foad ,Jacek Caban ,chuongg3 ,Simon Pilgrim ,Simon Pilgrim ,Simon Pilgrim ,Alex Bradbury ,Timm =?utf-8?q?Bäder?= ,petar-avramovic <56677889+petar-avramo...@users.noreply.github.com>,Kiran Chandramohan ,Tavian Barnes ,Yingwei Zheng ,serge-sans-paille ,Cullen Rhodes ,agozillon ,Alex Bradbury ,Alexey Bataev ,David Truby ,petar-avramovic <56677889+petar-avramo...@users.noreply.github.com>,Florian Hahn ,Z572 ,Nikita Popov ,Simon Pilgrim ,Simon Pilgrim ,Jeremy Morse ,Kiran Chandramohan ,Nicolas Vasilache ,Bill Wendling <5993918+bwendl...@users.noreply.github.com>,Heejin Ahn ,antoine moynault ,Kiran Chandramohan ,Michael Buch ,Cyndy Ishida ,Craig Topper ,Craig Topper ,Alexey Bataev ,Artem Belevich ,Craig Topper ,Nico Weber ,Yingwei Zheng ,Yingwei Zheng ,Brad Smith ,stephenpeckham <118857872+stephenpeck...@users.noreply.github.com>,Fangrui Song ,Craig Topper ,Craig Topper ,quanwanandy <150498259+quanwana...@users.noreply.github.com>,Alexey Bataev ,Mircea Trofin ,Vladislav Khmelevsky ,Rainer Orth ,Tim Harvey <146767459+timatgoo...@users.noreply.github.com>,Peiming Liu <36770114+peiming...@users.noreply.github.com>,Bjorn Pettersson ,Philip Reames ,Augusto Noronha ,Fangrui Song ,Aart Bik <39774503+aart...@users.noreply.github.com>,Aart Bik <39774503+aart...@users.noreply.github.com>,Teresa Johnson ,Jake Egan <5326451+jakee...@users.noreply.github.com>,Youngsuk Kim ,Jason Molenda ,Florian Hahn ,Amara Emerson ,Owen Pan ,Craig Topper ,Michael Maitland ,Owen Pan ,philnik777 ,David Benjamin ,Michael Maitland ,Michael Maitland ,Benjamin Kramer ,PiJoules <6019989+pijou...@users.noreply.github.com>,Fangrui Song ,Amara Emerson ,Cyndy Ishida ,Ben Shi <2283975...@qq.com>,Ben Shi <2283975...@qq.com>,Michael Kenzel <15786918+michael-ken...@users.noreply.github.com>,"Yueh-Ting (eop) Chen" ,Matthias Springer ,Matthias Springer ,jyu2-git ,LiaoChunyu ,Jordan Rupprecht ,Matthias Springer ,Jakub Kuderski ,Jinyang He ,Uday Bondhugula ,Craig Topper ,Aart Bik <39774503+aart...@users.noreply.github.com>,Christudasan Devadasan ,Timm Baeder ,Timm Baeder ,Fangrui Song ,Vladislav Khmelevsky ,wenzhi-cui <40185576+wenzhi-...@users.noreply.github.com>,Aart Bik <39774503+aart...@users.noreply.github.com>,Craig Topp
[llvm-branch-commits] [llvm] [LivePhysRegs] Add callee-saved regs from MFI in addLiveOutsNoPristines. (PR #73553)
https://github.com/fhahn edited https://github.com/llvm/llvm-project/pull/73553 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LivePhysRegs] Add callee-saved regs from MFI in addLiveOutsNoPristines. (PR #73553)
https://github.com/fhahn edited https://github.com/llvm/llvm-project/pull/73553 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LivePhysRegs] Add callee-saved regs from MFI in addLiveOutsNoPristines. (PR #73553)
fhahn wrote: > Is this about computing **live-outs** of the return block as the code > suggests? (The summary currently talks about live-ins?) Thanks, it should be **live-outs** in the description, updated! > I don't remember the situation on aarch64, but if by chance LR is modeled > with this "pristine register" concept, then maybe the caller needs to use > addLiveIns() rather than addLiveInsNoPristines()? I am not sure, but looking at `updateLiveness` (https://github.com/llvm/llvm-project/blob/main/llvm/lib/CodeGen/PrologEpilogInserter.cpp#L582) it looks like it uses the saved registers from MFI. Pristine registers I think contain all callee-saved registers for the target, which may be overestimating the liveness quite a bit. https://github.com/llvm/llvm-project/pull/73553 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] e59b116 - fixup! fix formatting.
Author: Florian Hahn Date: 2023-11-27T19:15:33Z New Revision: e59b116dbc7bf501016c7c68f348c39df8f598a8 URL: https://github.com/llvm/llvm-project/commit/e59b116dbc7bf501016c7c68f348c39df8f598a8 DIFF: https://github.com/llvm/llvm-project/commit/e59b116dbc7bf501016c7c68f348c39df8f598a8.diff LOG: fixup! fix formatting. Added: Modified: llvm/lib/CodeGen/LivePhysRegs.cpp Removed: diff --git a/llvm/lib/CodeGen/LivePhysRegs.cpp b/llvm/lib/CodeGen/LivePhysRegs.cpp index 634f46d9d98edc6..20b517b1e1a5c11 100644 --- a/llvm/lib/CodeGen/LivePhysRegs.cpp +++ b/llvm/lib/CodeGen/LivePhysRegs.cpp @@ -222,7 +222,7 @@ void LivePhysRegs::addLiveOutsNoPristines(const MachineBasicBlock &MBB) { const MachineFrameInfo &MFI = MF.getFrameInfo(); if (MFI.isCalleeSavedInfoValid()) { for (const CalleeSavedInfo &Info : MFI.getCalleeSavedInfo()) - addReg(Info.getReg()); +addReg(Info.getReg()); } } } ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LivePhysRegs] Add callee-saved regs from MFI in addLiveOutsNoPristines. (PR #73553)
fhahn wrote: > I still feel like I am missing something here, and it's been a while since I > looked at this. But my impression is that LLVM modeling is "cheating" a bit > in that technically all the callee-saves should be implicit-uses of the > return instruction (and not live afterwards) but we don't model it that way > and instead make them appear as live-outs of the return block? So doesn't > seem like overestimating the liveness because of our current modeling? Yep, the current modeling in general may overestimates the liveness. With overestimating I meant that with this patch we overestimate in more places, but that's by design. > > If I'm reading your patch correctly it would mean we would start adding all > pristine registers for the return block[1]. I am just confused so far because > this is happening in a function called `addLiveOutsNoPristines`... > I am not sure what exactly the definition for pristines is (and not super familiar with this code in general), and maybe the function name needs to be changed. The main thing to note is that it won't add all pristines; `addPristines` adds all callee saved registers (via TRI) and removes the ones which are in he machine function's CalleeSavedInfo. The patch adds pristines, but *only* those that have been added to CalleeSavedInfo. > [1] Pristine == "callee saved but happens to be unused and hence not > saved/restored" which is what happens when you remove that > `Info.isRestored()` check? > Which code/pass is using LivePhysRegs that is causing you trouble here? The issue is in `BranchFolding`, where after splitting, the liveness of `LR` isn't preserved in the new or merged blocks. It could be fixed locally there by iterating over the registers in CalleeSavedInfo and checking if they were live-in in the original block (see below), but I am worried that fixing this locally leaves us open for similar issues in other parts of the codebase. https://github.com/llvm/llvm-project/pull/73553 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LivePhysRegs] Add callee-saved regs from MFI in addLiveOutsNoPristines. (PR #73553)
fhahn wrote: > I haven't looked closely to the patch, but I share @MatzeB's concerns here. > > Essentially this patch is reverting https://reviews.llvm.org/D36160, which > was fixing a modeling issue with LR on ARM to begin with! Thanks for sharing the additional context and where `IsRestored` is actually set. Taking a look at the original patch, it seems like it doesn't properly account for the fact that there could be multiple return blocks which may not be using `POP` to restore LR to PC. This could be due to shrink-wrapping + tail-calls in some exit blocks (like in `outlined-fn-may-clobber-lr-in-caller.ll`) or possibly some return instructions not using POP. IIUC D36160 tries to track LR liveness more accurately and doesn't fix a miscompile, but potentially introduced an mis-compile due to underestimating liveness of LR. I don't think the current interface allows to properly check if all exit blocks are covered by POP insts in `restoreCalleeSavedRegisters`, as it works on single return blocks only. Without changing the API, we could check if LR is marked as not restored, and if it is check if there are multiple return blocks, as sketched in https://gist.github.com/fhahn/67937125b64440a8a414909c4a1b7973. This could be further refined to check if POP could be used for all returns (not sure if it is worth it given the small impact on the tests) or the API could be changed to pass all return blocks to avoid re-scanning for returns on each call (not sure if we should extend the general API even more for this workaround though). WDYT https://github.com/llvm/llvm-project/pull/73553 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LivePhysRegs] Add callee-saved regs from MFI in addLiveOutsNoPristines. (PR #73553)
fhahn wrote: @kparzysz please take a loo at https://gist.github.com/fhahn/67937125b64440a8a414909c4a1b7973, which has much more limited impact. > I haven't looked at the updated testcases in detail, but I see that most of > the changes are in treating LR as live (whereas it was dead before). At the > first glance that doesn't look right. We might now overestimate liveness, which only results in missed perf correct? Although I think this is mostly theoretical at this point, as there's no test cases that show that. The issue is that underestimating as we currently do leads to incorrect results, in particular with tail calls that use LR implicitly. If LR isn't marked as live in that case, other passes are free to clobber LR (e.g. the machine-outliner by introducing calls using BL, as in https://github.com/llvm/llvm-project/blob/20f634f275b431ff256ba45cbcbb6dc5bd945fb3/llvm/test/CodeGen/Thumb2/outlined-fn-may-clobber-lr-in-caller.ll https://github.com/llvm/llvm-project/pull/73553 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [VPlan] Initial modeling of runtime VF * UF as VPValue. (PR #74761)
https://github.com/fhahn created https://github.com/llvm/llvm-project/pull/74761 This patch starts initial modeling of runtime VF * UF in VPlan. Initially, introduce a dedicated RuntimeVFxUF VPValue, which is then populated during VPlan::prepareToExecute. Initially, the runtime VF * UF applies only to the main vector loop region. Once we extend the scope of VPlan in the future, we may want to associate different VFxUFs with different vector loop regions (e.g. the epilogue vector loop) This allows explicitly parameterizing recipes that rely on the runtime VF * UF, like the canonical induction increment. At the moment, this mainly helps to avoid generating some duplicated calls to vscale with scalable vectors. It should also allow using EVL as induction increments explicitly in D99750. Referring to VF * UF is also needed in other places that we plan to migrate to VPlan, like the minimum trip count check during skeleton creation. The first version creates the value for VF * UF directly in prepareToExecute to limit the scope of the patch. A follow-on patch will model VF * UF computation explicitly in VPlan using recipes. Moved from Phabricator (https://reviews.llvm.org/D157322) ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [lld] [polly] [openmp] [llvm] [flang] [clang] [mlir] [compiler-rt] [lldb] [VPlan] Initial modeling of runtime VF * UF as VPValue. (PR #74761)
https://github.com/fhahn updated https://github.com/llvm/llvm-project/pull/74761 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [clang] [lldb] [polly] [compiler-rt] [openmp] [mlir] [lld] [flang] [VPlan] Initial modeling of runtime VF * UF as VPValue. (PR #74761)
https://github.com/fhahn updated https://github.com/llvm/llvm-project/pull/74761 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [clang] [lldb] [polly] [compiler-rt] [openmp] [mlir] [lld] [flang] [VPlan] Initial modeling of runtime VF * UF as VPValue. (PR #74761)
https://github.com/fhahn updated https://github.com/llvm/llvm-project/pull/74761 >From 6ec44342b09474536d98de55238ee59452c06518 Mon Sep 17 00:00:00 2001 From: Florian Hahn Date: Fri, 8 Dec 2023 11:00:17 + Subject: [PATCH] for -> of Created using spr 1.3.4 --- llvm/lib/Transforms/Vectorize/VPlan.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h index 30e4bf2a226d8..94cb768898136 100644 --- a/llvm/lib/Transforms/Vectorize/VPlan.h +++ b/llvm/lib/Transforms/Vectorize/VPlan.h @@ -2646,7 +2646,7 @@ class VPlan { /// The vector trip count. VPValue &getVectorTripCount() { return VectorTripCount; } - /// Returns VF * UF for the vector loop region. + /// Returns VF * UF of the vector loop region. VPValue &getVFxUF() { return VFxUF; } /// Mark the plan to indicate that using Value2VPValue is not safe any ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [lld] [polly] [openmp] [llvm] [flang] [clang] [mlir] [compiler-rt] [lldb] [VPlan] Initial modeling of runtime VF * UF as VPValue. (PR #74761)
@@ -2624,6 +2644,9 @@ class VPlan { /// The vector trip count. VPValue &getVectorTripCount() { return VectorTripCount; } + /// Returns runtime VF * UF for the vector loop region. fhahn wrote: Done, thanks! https://github.com/llvm/llvm-project/pull/74761 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [clang] [lldb] [polly] [compiler-rt] [openmp] [mlir] [lld] [flang] [VPlan] Initial modeling of runtime VF * UF as VPValue. (PR #74761)
@@ -1168,13 +1166,26 @@ class VPInstruction : public VPRecipeWithIRFlags, public VPValue { return false; case VPInstruction::ActiveLaneMask: case VPInstruction::CalculateTripCountMinusVF: -case VPInstruction::CanonicalIVIncrement: case VPInstruction::CanonicalIVIncrementForPart: case VPInstruction::BranchOnCount: return true; }; llvm_unreachable("switch should return"); } + fhahn wrote: Added, thanks! https://github.com/llvm/llvm-project/pull/74761 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [clang] [lldb] [polly] [compiler-rt] [openmp] [mlir] [lld] [flang] [VPlan] Initial modeling of runtime VF * UF as VPValue. (PR #74761)
https://github.com/fhahn edited https://github.com/llvm/llvm-project/pull/74761 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits