[PATCH v2] Extend -falign-FOO=N to N[,M]: the second number is max padding

Denys Vlasenko Fri, 12 Aug 2016 09:29:12 -0700

falign-functions=N is too simplistic.

Ingo Molnar ran some tests and it seems that on latest x86 CPUs, 64-byte 
alignment
runs fastest (he tried many other possibilites).


However, developers are less than thrilled by the idea of a slam-dunk 64-byte
aligning everything. Too much waste:
        On 05/20/2015 02:47 AM, Linus Torvalds wrote:
        > At the same time, I have to admit that I abhor a 64-byte function
        > alignment, when we have a fair number of functions that are (much)
        > smaller than that.
        >
        > Is there some way to get gcc to take the size of the function into
        > account? Because aligning a 16-byte or 32-byte function on a 64-byte
        > alignment is just criminally nasty and wasteful.

This change makes it possible to align function to 64-byte boundaries *if*
this does not introduce huge amount of padding.

Testing:
tested that with -falign-functions=N (tried 8, 15, 16, 17...) the alignment
directives are the same before and after the patch.
Tested that -falign-functions=N,N (two equal paramenters) works exactly
like -falign-functions=N.

2016-08-12  Denys Vlasenko  <dvlas...@redhat.com>

    * common.opt (-falign-functions): Accept a string instead of an integer.
    (-falign-jumps): Likewise.
    (-falign-labels): Likewise.
    (-falign-loops): Likewise.
    * flags.h (struct target_flag_state): Add x_align_functions_max_skip
    member.
    * toplev.c (parse_N_M): New function.
    (init_alignments): Set align_FOO_log, align_FOO, align_FOO_max_skip
    from specified -falign-FOO=N{,M} option
    * varasm.c (assemble_start_function): Use align_functions_max_skip
    instead of align_functions - 1.
    * doc/invoke.texi: Update option documentation.
    * testsuite/gcc.target/i386/falign-functions.c: New file.

Index: gcc/common.opt
===================================================================
--- gcc/common.opt      (revision 239390)
+++ gcc/common.opt      (working copy)
@@ -900,7 +900,7 @@ Common Report Var(align_functions,0) Optimization
 Align the start of functions.
 
 falign-functions=
-Common RejectNegative Joined UInteger Var(align_functions)
+Common RejectNegative Joined Var(flag_align_functions)
 
 falign-jumps
 Common Report Var(align_jumps,0) Optimization UInteger
@@ -907,7 +907,7 @@ Common Report Var(align_jumps,0) Optimization UInt
 Align labels which are only reached by jumping.
 
 falign-jumps=
-Common RejectNegative Joined UInteger Var(align_jumps)
+Common RejectNegative Joined Var(flag_align_jumps)
 
 falign-labels
 Common Report Var(align_labels,0) Optimization UInteger
@@ -914,7 +914,7 @@ Common Report Var(align_labels,0) Optimization UIn
 Align all labels.
 
 falign-labels=
-Common RejectNegative Joined UInteger Var(align_labels)
+Common RejectNegative Joined Var(flag_align_labels)
 
 falign-loops
 Common Report Var(align_loops,0) Optimization UInteger
@@ -921,7 +921,7 @@ Common Report Var(align_loops,0) Optimization UInt
 Align the start of loops.
 
 falign-loops=
-Common RejectNegative Joined UInteger Var(align_loops)
+Common RejectNegative Joined Var(flag_align_loops)
 
 fargument-alias
 Common Ignore
Index: gcc/doc/invoke.texi
===================================================================
--- gcc/doc/invoke.texi (revision 239390)
+++ gcc/doc/invoke.texi (working copy)
@@ -337,9 +337,10 @@ Objective-C and Objective-C++ Dialects}.
 
 @item Optimization Options
 @xref{Optimize Options,,Options that Control Optimization}.
-@gccoptlist{-faggressive-loop-optimizations -falign-functions[=@var{n}] @gol
--falign-jumps[=@var{n}] @gol
--falign-labels[=@var{n}] -falign-loops[=@var{n}] @gol
+@gccoptlist{-faggressive-loop-optimizations @gol
+-falign-functions[=@var{n}[,@var{m}]] @gol
+-falign-jumps[=@var{n}[,@var{m}]] @gol
+-falign-labels[=@var{n}[,@var{m}]] -falign-loops[=@var{n}[,@var{m}]] @gol
 -fassociative-math -fauto-profile -fauto-profile[=@var{path}] @gol
 -fauto-inc-dec -fbranch-probabilities @gol
 -fbranch-target-load-optimize -fbranch-target-load-optimize2 @gol
@@ -7931,9 +7932,11 @@ The @option{-fstrict-overflow} option is enabled a
 
 @item -falign-functions
 @itemx -falign-functions=@var{n}
+@itemx -falign-functions=@var{n},@var{m}
 @opindex falign-functions
+If @var{m} is not specified, it defaults to @var{n}.
 Align the start of functions to the next power-of-two greater than
-@var{n}, skipping up to @var{n} bytes.  For instance,
+@var{n}, skipping up to @var{m}-1 bytes.  For instance,
 @option{-falign-functions=32} aligns functions to the next 32-byte
 boundary, but @option{-falign-functions=24} aligns to the next
 32-byte boundary only if this can be done by skipping 23 bytes or less.
@@ -7950,9 +7953,11 @@ Enabled at levels @option{-O2}, @option{-O3}.
 
 @item -falign-labels
 @itemx -falign-labels=@var{n}
+@itemx -falign-labels=@var{n},@var{m}
 @opindex falign-labels
+If @var{m} is not specified, it defaults to @var{n}.
 Align all branch targets to a power-of-two boundary, skipping up to
-@var{n} bytes like @option{-falign-functions}.  This option can easily
+@var{m}-1 bytes like @option{-falign-functions}.  This option can easily
 make code slower, because it must insert dummy operations for when the
 branch target is reached in the usual flow of the code.
 
@@ -7969,8 +7974,10 @@ Enabled at levels @option{-O2}, @option{-O3}.
 
 @item -falign-loops
 @itemx -falign-loops=@var{n}
+@itemx -falign-loops=@var{n},@var{m}
 @opindex falign-loops
-Align loops to a power-of-two boundary, skipping up to @var{n} bytes
+If @var{m} is not specified, it defaults to @var{n}.
+Align loops to a power-of-two boundary, skipping up to @var{m}-1 bytes
 like @option{-falign-functions}.  If the loops are
 executed many times, this makes up for any execution of the dummy
 operations.
@@ -7984,9 +7991,11 @@ Enabled at levels @option{-O2}, @option{-O3}.
 
 @item -falign-jumps
 @itemx -falign-jumps=@var{n}
+@itemx -falign-jumps=@var{n},@var{m}
 @opindex falign-jumps
+If @var{m} is not specified, it defaults to @var{n}.
 Align branch targets to a power-of-two boundary, for branch targets
-where the targets can only be reached by jumping, skipping up to @var{n}
+where the targets can only be reached by jumping, skipping up to @var{m}-1
 bytes like @option{-falign-functions}.  In this case, no dummy operations
 need be executed.
 
Index: gcc/flags.h
===================================================================
--- gcc/flags.h (revision 239390)
+++ gcc/flags.h (working copy)
@@ -55,6 +55,7 @@ struct target_flag_state {
   int x_align_labels_log;
   int x_align_labels_max_skip;
   int x_align_functions_log;
+  int x_align_functions_max_skip;
 
   /* The excess precision currently in effect.  */
   enum excess_precision x_flag_excess_precision;
@@ -81,6 +82,8 @@ extern struct target_flag_state *this_target_flag_
   (this_target_flag_state->x_align_labels_max_skip)
 #define align_functions_log \
   (this_target_flag_state->x_align_functions_log)
+#define align_functions_max_skip \
+  (this_target_flag_state->x_align_functions_max_skip)
 #define flag_excess_precision \
   (this_target_flag_state->x_flag_excess_precision)
 
Index: gcc/testsuite/gcc.target/i386/falign-functions.c
===================================================================
--- gcc/testsuite/gcc.target/i386/falign-functions.c    (nonexistent)
+++ gcc/testsuite/gcc.target/i386/falign-functions.c    (working copy)
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -falign-functions=64,8" } */
+/* { dg-final { scan-assembler ".p2align 6,,7" } } */
+
+void
+test_func (void)
+{
+}
Index: gcc/toplev.c
===================================================================
--- gcc/toplev.c        (revision 239390)
+++ gcc/toplev.c        (working copy)
@@ -1177,29 +1177,58 @@ target_supports_section_anchors_p (void)
   return true;
 }
 
+static int
+parse_N_M (int *align, int *maxskip, const char *flag, const char *name)
+{
+  int _align = *align;
+  int _maxskip = *maxskip;
+
+  if (flag)
+    {
+      unsigned int n, m;
+      if (strchr (flag, ','))
+       {
+         if (sscanf (flag, "%u,%u", &n, &m) != 2) goto bad;
+         _maxskip = m;
+       }
+      else
+       {
+          if (sscanf (flag, "%u", &n) != 1) goto bad;
+         _maxskip = n;
+       }
+      _align = n;
+      if (_maxskip > 0)
+       _maxskip--; /* -falign-xyz=N,M means M-1 max bytes of padding, not M */
+    }
+
+normalize:
+  if (_align <= 0)
+    _align = 1;
+  if ((unsigned)_maxskip > (unsigned)_align)
+    _maxskip = _align - 1;
+
+  *align = _align;
+  *maxskip = _maxskip;
+  return floor_log2 (_align * 2 - 1);
+
+bad:
+  error_at (UNKNOWN_LOCATION, "-falign-%s parameter '%s' is bad", name, flag);
+  goto normalize;
+}
+
 /* Default the align_* variables to 1 if they're still unset, and
    set up the align_*_log variables.  */
 static void
 init_alignments (void)
 {
-  if (align_loops <= 0)
-    align_loops = 1;
-  if (align_loops_max_skip > align_loops)
-    align_loops_max_skip = align_loops - 1;
-  align_loops_log = floor_log2 (align_loops * 2 - 1);
-  if (align_jumps <= 0)
-    align_jumps = 1;
-  if (align_jumps_max_skip > align_jumps)
-    align_jumps_max_skip = align_jumps - 1;
-  align_jumps_log = floor_log2 (align_jumps * 2 - 1);
-  if (align_labels <= 0)
-    align_labels = 1;
-  align_labels_log = floor_log2 (align_labels * 2 - 1);
-  if (align_labels_max_skip > align_labels)
-    align_labels_max_skip = align_labels - 1;
-  if (align_functions <= 0)
-    align_functions = 1;
-  align_functions_log = floor_log2 (align_functions * 2 - 1);
+  align_loops_log = parse_N_M (&align_loops, &align_loops_max_skip,
+                              flag_align_loops, "loops");
+  align_jumps_log = parse_N_M (&align_jumps, &align_jumps_max_skip,
+                              flag_align_jumps, "jumps");
+  align_labels_log = parse_N_M (&align_labels, &align_labels_max_skip,
+                               flag_align_labels, "labels");
+  align_functions_log = parse_N_M (&align_functions, &align_functions_max_skip,
+                                  flag_align_functions, "functions");
 }
 
 /* Process the options that have been parsed.  */
Index: gcc/varasm.c
===================================================================
--- gcc/varasm.c        (revision 239390)
+++ gcc/varasm.c        (working copy)
@@ -1790,7 +1790,7 @@ assemble_start_function (tree decl, const char *fn
     {
 #ifdef ASM_OUTPUT_MAX_SKIP_ALIGN
       ASM_OUTPUT_MAX_SKIP_ALIGN (asm_out_file,
-                                align_functions_log, align_functions - 1);
+                                align_functions_log, align_functions_max_skip);
 #else
       ASM_OUTPUT_ALIGN (asm_out_file, align_functions_log);
 #endif

[PATCH v2] Extend -falign-FOO=N to N[,M]: the second number is max padding

Reply via email to