[PATCH] D105516: [clang][PassManager] Add -falways-mem2reg to run mem2reg at -O0

Jessica Clarke via Phabricator via cfe-commits Tue, 06 Jul 2021 15:34:54 -0700

jrtc27 created this revision.
jrtc27 added reviewers: chandlerc, rjmccall, rsmith.
Herald added subscribers: ormris, dexonsmith, dang, s.egerton, simoncook, 
hiraditya, kristof.beyls.
jrtc27 requested review of this revision.
Herald added projects: clang, LLVM.
Herald added subscribers: llvm-commits, cfe-commits.


Standard -O0 IR and assembly can be hard to follow as, without mem2reg,
there are loads and stores to the stack everywhere that clutter things
up and make it hard to see where the actual interesting instructions are
(such as when trying to debug a crash in unoptimised code from the
disassembly). It is therefore useful to be able to force mem2reg to be
run even at -O0 to clean up a lot of those stack loads and stores. There
are also Clang CodeGen tests in the tree that explicitly run mem2reg on
the output in order to make the CHECK lines more readable, which
requires manually passing -disable-O0-optnone and pipling to opt; having
a flag for the driver that supports this also makes those less clunky.

Whilst optimisation for speed's sake is not the primary purpose of this
patch, it does provide an easy significant improvement in code size as
you might expect, giving a ~12% decrease in code size on macOS/arm64
when compiling Clang itself with the option enabled, likely also having
a significant improvement on the running time of the test suite over a
plain Debug build. On GNU/Linux/amd64 the decrease is less pronounced,
at about 4%, likely due to the fact that many instructions can take one
memory operand and so do not have to pay the additional cost of a load
or store like on load-store architectures.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D105516

Files:
  clang/include/clang/Basic/CodeGenOptions.def
  clang/include/clang/Driver/Options.td
  clang/lib/CodeGen/BackendUtil.cpp
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/test/CodeGen/falways-mem2reg.c
  llvm/include/llvm/Passes/PassBuilder.h
  llvm/include/llvm/Transforms/IPO/PassManagerBuilder.h
  llvm/lib/Passes/PassBuilder.cpp
  llvm/lib/Transforms/IPO/PassManagerBuilder.cpp

Index: llvm/lib/Transforms/IPO/PassManagerBuilder.cpp
===================================================================
--- llvm/lib/Transforms/IPO/PassManagerBuilder.cpp
+++ llvm/lib/Transforms/IPO/PassManagerBuilder.cpp
@@ -187,6 +187,7 @@
     LibraryInfo = nullptr;
     Inliner = nullptr;
     DisableUnrollLoops = false;
+    AlwaysMem2Reg = false;
     SLPVectorize = false;
     LoopVectorize = true;
     LoopsInterleaved = true;
@@ -658,8 +659,11 @@
   MPM.add(createForceFunctionAttrsLegacyPass());
 
   // If all optimizations are disabled, just run the always-inline pass and,
-  // if enabled, the function merging pass.
+  // if enabled, the mem2reg and function merging passes.
   if (OptLevel == 0) {
+    if (AlwaysMem2Reg)
+      MPM.add(createPromoteMemoryToRegisterPass());
+
     addPGOInstrPasses(MPM);
     if (Inliner) {
       MPM.add(Inliner);
Index: llvm/lib/Passes/PassBuilder.cpp
===================================================================
--- llvm/lib/Passes/PassBuilder.cpp
+++ llvm/lib/Passes/PassBuilder.cpp
@@ -283,6 +283,7 @@
   SLPVectorization = false;
   LoopUnrolling = true;
   ForgetAllSCEVInLoopUnroll = ForgetSCEVInLoopUnroll;
+  AlwaysMem2Reg = false;
   Coroutines = false;
   LicmMssaOptCap = SetLicmMssaOptCap;
   LicmMssaNoAccForPromotionCap = SetLicmMssaNoAccForPromotionCap;
@@ -1931,6 +1932,9 @@
   MPM.addPass(AlwaysInlinerPass(
       /*InsertLifetimeIntrinsics=*/PTO.Coroutines));
 
+  if (PTO.AlwaysMem2Reg)
+    MPM.addPass(createModuleToFunctionPassAdaptor(PromotePass()));
+
   if (PTO.MergeFunctions)
     MPM.addPass(MergeFunctionsPass());
 
Index: llvm/include/llvm/Transforms/IPO/PassManagerBuilder.h
===================================================================
--- llvm/include/llvm/Transforms/IPO/PassManagerBuilder.h
+++ llvm/include/llvm/Transforms/IPO/PassManagerBuilder.h
@@ -156,6 +156,7 @@
 
   bool DisableTailCalls;
   bool DisableUnrollLoops;
+  bool AlwaysMem2Reg;
   bool CallGraphProfile;
   bool SLPVectorize;
   bool LoopVectorize;
Index: llvm/include/llvm/Passes/PassBuilder.h
===================================================================
--- llvm/include/llvm/Passes/PassBuilder.h
+++ llvm/include/llvm/Passes/PassBuilder.h
@@ -107,6 +107,10 @@
   /// is that of the flag: `-forget-scev-loop-unroll`.
   bool ForgetAllSCEVInLoopUnroll;
 
+  /// Tuning option to always run mem2reg regardless of the optimisation level.
+  /// Its default value is false.
+  bool AlwaysMem2Reg;
+
   /// Tuning option to enable/disable coroutine intrinsic lowering. Its default
   /// value is false. Frontends such as Clang may enable this conditionally. For
   /// example, Clang enables this option if the flags `-std=c++2a` or above, or
Index: clang/test/CodeGen/falways-mem2reg.c
===================================================================
--- /dev/null
+++ clang/test/CodeGen/falways-mem2reg.c
@@ -0,0 +1,33 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+// RUN: %clang_cc1 -triple riscv64 -emit-llvm -o - -flegacy-pass-manager -O0 %s \
+// RUN:   | FileCheck --check-prefix=O0-NO-MEM2REG
+// RUN: %clang_cc1 -triple riscv64 -emit-llvm -o - -fno-legacy-pass-manager -O0 %s \
+// RUN:   | FileCheck --check-prefix=O0-NO-MEM2REG
+// RUN: %clang_cc1 -triple riscv64 -emit-llvm -o - -flegacy-pass-manager -O0 -fno-always-mem2reg %s \
+// RUN:   | FileCheck --check-prefix=O0-NO-MEM2REG
+// RUN: %clang_cc1 -triple riscv64 -emit-llvm -o - -fno-legacy-pass-manager -O0 -fno-always-mem2reg %s \
+// RUN:   | FileCheck --check-prefix=O0-NO-MEM2REG
+// RUN: %clang_cc1 -triple riscv64 -emit-llvm -o - -flegacy-pass-manager -O0 -falways-mem2reg %s \
+// RUN:   | FileCheck --check-prefix=O0-MEM2REG
+// RUN: %clang_cc1 -triple riscv64 -emit-llvm -o - -fno-legacy-pass-manager -O0 -falways-mem2reg %s \
+// RUN:   | FileCheck --check-prefix=O0-MEM2REG
+
+// O0-NO-MEM2REG-LABEL: @add(
+// O0-NO-MEM2REG-NEXT:  entry:
+// O0-NO-MEM2REG-NEXT:    [[A_ADDR:%.*]] = alloca i32, align 4
+// O0-NO-MEM2REG-NEXT:    [[B_ADDR:%.*]] = alloca i32, align 4
+// O0-NO-MEM2REG-NEXT:    store i32 [[A:%.*]], i32* [[A_ADDR]], align 4
+// O0-NO-MEM2REG-NEXT:    store i32 [[B:%.*]], i32* [[B_ADDR]], align 4
+// O0-NO-MEM2REG-NEXT:    [[TMP0:%.*]] = load i32, i32* [[A_ADDR]], align 4
+// O0-NO-MEM2REG-NEXT:    [[TMP1:%.*]] = load i32, i32* [[B_ADDR]], align 4
+// O0-NO-MEM2REG-NEXT:    [[ADD:%.*]] = add nsw i32 [[TMP0]], [[TMP1]]
+// O0-NO-MEM2REG-NEXT:    ret i32 [[ADD]]
+//
+// O0-MEM2REG-LABEL: @add(
+// O0-MEM2REG-NEXT:  entry:
+// O0-MEM2REG-NEXT:    [[ADD:%.*]] = add nsw i32 [[A:%.*]], [[B:%.*]]
+// O0-MEM2REG-NEXT:    ret i32 [[ADD]]
+//
+int add(int a, int b) {
+  return a + b;
+}
Index: clang/lib/Driver/ToolChains/Clang.cpp
===================================================================
--- clang/lib/Driver/ToolChains/Clang.cpp
+++ clang/lib/Driver/ToolChains/Clang.cpp
@@ -5835,6 +5835,8 @@
   Args.AddLastArg(CmdArgs, options::OPT_fwritable_strings);
   Args.AddLastArg(CmdArgs, options::OPT_funroll_loops,
                   options::OPT_fno_unroll_loops);
+  Args.AddLastArg(CmdArgs, options::OPT_falways_mem2reg,
+                  options::OPT_fno_always_mem2reg);
 
   Args.AddLastArg(CmdArgs, options::OPT_pthread);
 
Index: clang/lib/CodeGen/CodeGenModule.cpp
===================================================================
--- clang/lib/CodeGen/CodeGenModule.cpp
+++ clang/lib/CodeGen/CodeGenModule.cpp
@@ -1764,6 +1764,8 @@
   // starting with the default for this optimization level.
   bool ShouldAddOptNone =
       !CodeGenOpts.DisableO0ImplyOptNone && CodeGenOpts.OptimizationLevel == 0;
+  // -falways-mem2reg implies at least a minimal amount of optimisation.
+  ShouldAddOptNone &= !CodeGenOpts.AlwaysMem2Reg;
   // We can't add optnone in the following cases, it won't pass the verifier.
   ShouldAddOptNone &= !D->hasAttr<MinSizeAttr>();
   ShouldAddOptNone &= !D->hasAttr<AlwaysInlineAttr>();
Index: clang/lib/CodeGen/BackendUtil.cpp
===================================================================
--- clang/lib/CodeGen/BackendUtil.cpp
+++ clang/lib/CodeGen/BackendUtil.cpp
@@ -682,6 +682,7 @@
   PMBuilder.CallGraphProfile = !CodeGenOpts.DisableIntegratedAS;
 
   PMBuilder.DisableUnrollLoops = !CodeGenOpts.UnrollLoops;
+  PMBuilder.AlwaysMem2Reg = CodeGenOpts.AlwaysMem2Reg;
   // Loop interleaving in the loop vectorizer has historically been set to be
   // enabled when loop unrolling is enabled.
   PMBuilder.LoopsInterleaved = CodeGenOpts.UnrollLoops;
@@ -1256,6 +1257,7 @@
   PTO.LoopInterleaving = CodeGenOpts.UnrollLoops;
   PTO.LoopVectorization = CodeGenOpts.VectorizeLoop;
   PTO.SLPVectorization = CodeGenOpts.VectorizeSLP;
+  PTO.AlwaysMem2Reg = CodeGenOpts.AlwaysMem2Reg;
   PTO.MergeFunctions = CodeGenOpts.MergeFunctions;
   // Only enable CGProfilePass when using integrated assembler, since
   // non-integrated assemblers don't recognize .cgprofile section.
Index: clang/include/clang/Driver/Options.td
===================================================================
--- clang/include/clang/Driver/Options.td
+++ clang/include/clang/Driver/Options.td
@@ -2628,6 +2628,11 @@
   HelpText<"Assume all loops are finite.">, Flags<[CC1Option]>;
 def fno_finite_loops: Flag<["-"], "fno-finite-loops">, Group<f_Group>,
   HelpText<"Do not assume that any loop is finite.">, Flags<[CC1Option]>;
+defm always_mem2reg : BoolFOption<"always-mem2reg",
+  CodeGenOpts<"AlwaysMem2Reg">, DefaultFalse,
+  PosFlag<SetTrue, [], "Always run mem2reg regardless of optimisation level">,
+  NegFlag<SetFalse, [], "Run mem2reg based on optimisation level">,
+  BothFlags<[CC1Option]>>;
 
 def ftrigraphs : Flag<["-"], "ftrigraphs">, Group<f_Group>,
   HelpText<"Process trigraph sequences">, Flags<[CC1Option]>;
Index: clang/include/clang/Basic/CodeGenOptions.def
===================================================================
--- clang/include/clang/Basic/CodeGenOptions.def
+++ clang/include/clang/Basic/CodeGenOptions.def
@@ -270,6 +270,7 @@
 CODEGENOPT(UnrollLoops       , 1, 0) ///< Control whether loops are unrolled.
 CODEGENOPT(RerollLoops       , 1, 0) ///< Control whether loops are rerolled.
 CODEGENOPT(NoUseJumpTables   , 1, 0) ///< Set when -fno-jump-tables is enabled.
+CODEGENOPT(AlwaysMem2Reg     , 1, 0) ///< Set when -falways-mem2reg is enabled.
 CODEGENOPT(UnwindTables      , 1, 0) ///< Emit unwind tables.
 CODEGENOPT(VectorizeLoop     , 1, 0) ///< Run loop vectorizer.
 CODEGENOPT(VectorizeSLP      , 1, 0) ///< Run SLP vectorizer.

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D105516: [clang][PassManager] Add -falways-mem2reg to run mem2reg at -O0

Reply via email to