falign-functions=N is too simplistic. Ingo Molnar ran some tests and it seems that on latest x86 CPUs, 64-byte alignment runs fastest (he tried many other possibilites).
However, developers are less than thrilled by the idea of a slam-dunk 64-byte aligning everything. Too much waste: On 05/20/2015 02:47 AM, Linus Torvalds wrote: > At the same time, I have to admit that I abhor a 64-byte function > alignment, when we have a fair number of functions that are (much) > smaller than that. > > Is there some way to get gcc to take the size of the function into > account? Because aligning a 16-byte or 32-byte function on a 64-byte > alignment is just criminally nasty and wasteful. This change makes it possible to align function to 64-byte boundaries *if* this does not introduce huge amount of padding. Testing: tested that with -falign-functions=N (tried 8, 15, 16, 17...) the alignment directives are the same before and after the patch. Tested that -falign-functions=N,N (two equal paramenters) works exactly like -falign-functions=N. 2016-08-12 Denys Vlasenko <dvlas...@redhat.com> * common.opt (-falign-functions): Accept a string instead of an integer. (-falign-jumps): Likewise. (-falign-labels): Likewise. (-falign-loops): Likewise. * flags.h (struct target_flag_state): Add x_align_functions_max_skip member. * toplev.c (parse_N_M): New function. (init_alignments): Set align_FOO_log, align_FOO, align_FOO_max_skip from specified -falign-FOO=N{,M} option * varasm.c (assemble_start_function): Use align_functions_max_skip instead of align_functions - 1. * doc/invoke.texi: Update option documentation. * testsuite/gcc.target/i386/falign-functions.c: New file. Index: gcc/common.opt =================================================================== --- gcc/common.opt (revision 239390) +++ gcc/common.opt (working copy) @@ -900,7 +900,7 @@ Common Report Var(align_functions,0) Optimization Align the start of functions. falign-functions= -Common RejectNegative Joined UInteger Var(align_functions) +Common RejectNegative Joined Var(flag_align_functions) falign-jumps Common Report Var(align_jumps,0) Optimization UInteger @@ -907,7 +907,7 @@ Common Report Var(align_jumps,0) Optimization UInt Align labels which are only reached by jumping. falign-jumps= -Common RejectNegative Joined UInteger Var(align_jumps) +Common RejectNegative Joined Var(flag_align_jumps) falign-labels Common Report Var(align_labels,0) Optimization UInteger @@ -914,7 +914,7 @@ Common Report Var(align_labels,0) Optimization UIn Align all labels. falign-labels= -Common RejectNegative Joined UInteger Var(align_labels) +Common RejectNegative Joined Var(flag_align_labels) falign-loops Common Report Var(align_loops,0) Optimization UInteger @@ -921,7 +921,7 @@ Common Report Var(align_loops,0) Optimization UInt Align the start of loops. falign-loops= -Common RejectNegative Joined UInteger Var(align_loops) +Common RejectNegative Joined Var(flag_align_loops) fargument-alias Common Ignore Index: gcc/doc/invoke.texi =================================================================== --- gcc/doc/invoke.texi (revision 239390) +++ gcc/doc/invoke.texi (working copy) @@ -337,9 +337,10 @@ Objective-C and Objective-C++ Dialects}. @item Optimization Options @xref{Optimize Options,,Options that Control Optimization}. -@gccoptlist{-faggressive-loop-optimizations -falign-functions[=@var{n}] @gol --falign-jumps[=@var{n}] @gol --falign-labels[=@var{n}] -falign-loops[=@var{n}] @gol +@gccoptlist{-faggressive-loop-optimizations @gol +-falign-functions[=@var{n}[,@var{m}]] @gol +-falign-jumps[=@var{n}[,@var{m}]] @gol +-falign-labels[=@var{n}[,@var{m}]] -falign-loops[=@var{n}[,@var{m}]] @gol -fassociative-math -fauto-profile -fauto-profile[=@var{path}] @gol -fauto-inc-dec -fbranch-probabilities @gol -fbranch-target-load-optimize -fbranch-target-load-optimize2 @gol @@ -7931,9 +7932,11 @@ The @option{-fstrict-overflow} option is enabled a @item -falign-functions @itemx -falign-functions=@var{n} +@itemx -falign-functions=@var{n},@var{m} @opindex falign-functions +If @var{m} is not specified, it defaults to @var{n}. Align the start of functions to the next power-of-two greater than -@var{n}, skipping up to @var{n} bytes. For instance, +@var{n}, skipping up to @var{m}-1 bytes. For instance, @option{-falign-functions=32} aligns functions to the next 32-byte boundary, but @option{-falign-functions=24} aligns to the next 32-byte boundary only if this can be done by skipping 23 bytes or less. @@ -7950,9 +7953,11 @@ Enabled at levels @option{-O2}, @option{-O3}. @item -falign-labels @itemx -falign-labels=@var{n} +@itemx -falign-labels=@var{n},@var{m} @opindex falign-labels +If @var{m} is not specified, it defaults to @var{n}. Align all branch targets to a power-of-two boundary, skipping up to -@var{n} bytes like @option{-falign-functions}. This option can easily +@var{m}-1 bytes like @option{-falign-functions}. This option can easily make code slower, because it must insert dummy operations for when the branch target is reached in the usual flow of the code. @@ -7969,8 +7974,10 @@ Enabled at levels @option{-O2}, @option{-O3}. @item -falign-loops @itemx -falign-loops=@var{n} +@itemx -falign-loops=@var{n},@var{m} @opindex falign-loops -Align loops to a power-of-two boundary, skipping up to @var{n} bytes +If @var{m} is not specified, it defaults to @var{n}. +Align loops to a power-of-two boundary, skipping up to @var{m}-1 bytes like @option{-falign-functions}. If the loops are executed many times, this makes up for any execution of the dummy operations. @@ -7984,9 +7991,11 @@ Enabled at levels @option{-O2}, @option{-O3}. @item -falign-jumps @itemx -falign-jumps=@var{n} +@itemx -falign-jumps=@var{n},@var{m} @opindex falign-jumps +If @var{m} is not specified, it defaults to @var{n}. Align branch targets to a power-of-two boundary, for branch targets -where the targets can only be reached by jumping, skipping up to @var{n} +where the targets can only be reached by jumping, skipping up to @var{m}-1 bytes like @option{-falign-functions}. In this case, no dummy operations need be executed. Index: gcc/flags.h =================================================================== --- gcc/flags.h (revision 239390) +++ gcc/flags.h (working copy) @@ -55,6 +55,7 @@ struct target_flag_state { int x_align_labels_log; int x_align_labels_max_skip; int x_align_functions_log; + int x_align_functions_max_skip; /* The excess precision currently in effect. */ enum excess_precision x_flag_excess_precision; @@ -81,6 +82,8 @@ extern struct target_flag_state *this_target_flag_ (this_target_flag_state->x_align_labels_max_skip) #define align_functions_log \ (this_target_flag_state->x_align_functions_log) +#define align_functions_max_skip \ + (this_target_flag_state->x_align_functions_max_skip) #define flag_excess_precision \ (this_target_flag_state->x_flag_excess_precision) Index: gcc/testsuite/gcc.target/i386/falign-functions.c =================================================================== --- gcc/testsuite/gcc.target/i386/falign-functions.c (nonexistent) +++ gcc/testsuite/gcc.target/i386/falign-functions.c (working copy) @@ -0,0 +1,8 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -falign-functions=64,8" } */ +/* { dg-final { scan-assembler ".p2align 6,,7" } } */ + +void +test_func (void) +{ +} Index: gcc/toplev.c =================================================================== --- gcc/toplev.c (revision 239390) +++ gcc/toplev.c (working copy) @@ -1177,29 +1177,58 @@ target_supports_section_anchors_p (void) return true; } +static int +parse_N_M (int *align, int *maxskip, const char *flag, const char *name) +{ + int _align = *align; + int _maxskip = *maxskip; + + if (flag) + { + unsigned int n, m; + if (strchr (flag, ',')) + { + if (sscanf (flag, "%u,%u", &n, &m) != 2) goto bad; + _maxskip = m; + } + else + { + if (sscanf (flag, "%u", &n) != 1) goto bad; + _maxskip = n; + } + _align = n; + if (_maxskip > 0) + _maxskip--; /* -falign-xyz=N,M means M-1 max bytes of padding, not M */ + } + +normalize: + if (_align <= 0) + _align = 1; + if ((unsigned)_maxskip > (unsigned)_align) + _maxskip = _align - 1; + + *align = _align; + *maxskip = _maxskip; + return floor_log2 (_align * 2 - 1); + +bad: + error_at (UNKNOWN_LOCATION, "-falign-%s parameter '%s' is bad", name, flag); + goto normalize; +} + /* Default the align_* variables to 1 if they're still unset, and set up the align_*_log variables. */ static void init_alignments (void) { - if (align_loops <= 0) - align_loops = 1; - if (align_loops_max_skip > align_loops) - align_loops_max_skip = align_loops - 1; - align_loops_log = floor_log2 (align_loops * 2 - 1); - if (align_jumps <= 0) - align_jumps = 1; - if (align_jumps_max_skip > align_jumps) - align_jumps_max_skip = align_jumps - 1; - align_jumps_log = floor_log2 (align_jumps * 2 - 1); - if (align_labels <= 0) - align_labels = 1; - align_labels_log = floor_log2 (align_labels * 2 - 1); - if (align_labels_max_skip > align_labels) - align_labels_max_skip = align_labels - 1; - if (align_functions <= 0) - align_functions = 1; - align_functions_log = floor_log2 (align_functions * 2 - 1); + align_loops_log = parse_N_M (&align_loops, &align_loops_max_skip, + flag_align_loops, "loops"); + align_jumps_log = parse_N_M (&align_jumps, &align_jumps_max_skip, + flag_align_jumps, "jumps"); + align_labels_log = parse_N_M (&align_labels, &align_labels_max_skip, + flag_align_labels, "labels"); + align_functions_log = parse_N_M (&align_functions, &align_functions_max_skip, + flag_align_functions, "functions"); } /* Process the options that have been parsed. */ Index: gcc/varasm.c =================================================================== --- gcc/varasm.c (revision 239390) +++ gcc/varasm.c (working copy) @@ -1790,7 +1790,7 @@ assemble_start_function (tree decl, const char *fn { #ifdef ASM_OUTPUT_MAX_SKIP_ALIGN ASM_OUTPUT_MAX_SKIP_ALIGN (asm_out_file, - align_functions_log, align_functions - 1); + align_functions_log, align_functions_max_skip); #else ASM_OUTPUT_ALIGN (asm_out_file, align_functions_log); #endif