On Thu, Aug 11, 2016 at 5:19 PM, Denys Vlasenko <dvlas...@redhat.com> wrote: > > > On 08/11/2016 10:59 PM, Andrew Pinski wrote: >> >> On Thu, Aug 11, 2016 at 1:49 PM, Denys Vlasenko <dvlas...@redhat.com> >> wrote: >>> >>> falign-functions=N is too simplistic. >>> >>> Ingo Molnar ran some tests and it looks on latest x86 CPUs, 64-byte >>> alignment >>> runs fastest (he tried many other possibilites). >>> >>> However, developers are less than thrilled by the idea of a slam-dunk >>> 64-byte >>> aligning everything. Too much waste: >>> On 05/20/2015 02:47 AM, Linus Torvalds wrote: >>> > At the same time, I have to admit that I abhor a 64-byte >>> function >>> > alignment, when we have a fair number of functions that are >>> (much) >>> > smaller than that. >>> > >>> > Is there some way to get gcc to take the size of the function >>> into >>> > account? Because aligning a 16-byte or 32-byte function on a >>> 64-byte >>> > alignment is just criminally nasty and wasteful. >>> >>> This change makes it possible to align function to 64-byte boundaries >>> *IF* >>> this does not introduce huge amount of padding. >>> >>> Patch drops forced alignment to 8 if requested alignment is higher than >>> 8: >>> before the patch, -falign-functions=9 was generating >>> >>> .p2align 4,,8 >>> .p2align 3 >>> >>> which means: "align to 16 if the skip is 8 bytes or less; else align to >>> 8". >>> After this change, ".p2align 3" is not emitted. >>> >>> It is dropped because I ultimately want to do something >>> like -falign-functions=64,8 - IOW, I want to align functions to 64 bytes, >>> but only if that generates padding of less than 8 bytes - otherwise I >>> want >>> *no alignment at all*. The forced ".p2align 3" interferes with that >>> intention. >>> >>> Testing: >>> tested that with -falign-functions=N (tried 8, 15, 16, 17...) the >>> alignment >>> directives are the same before and after the patch. >>> Tested that -falign-functions=N,N (two equal paramenters) works exactly >>> like -falign-functions=N. >> >> >> Two things I noticed you missed: >> 1) ChangeLog entries (this is required as per the GNU coding style) > > > Is this form okay? -
This is getting there but needs a little improvement. > > ... > Testing: > tested that with -falign-functions=N (tried 8, 15, 16, 17...) the alignment > directives are the same before and after the patch. > Tested that -falign-functions=N,N (two equal paramenters) works exactly > like -falign-functions=N. Oh also maybe adding testcases would be useful, just scanning the outputted assembly should be good enough. > > 2016-08-11 Denys Vlasenko <dvlas...@redhat.com> > > * common.opt: Change definitions so that -falign_FOO= accepts a string. I think this should be: * common.opt (-falign-functions): Accept a string instead of an integer. (falign-loops): Likewise. .... > * config/i386/freebsd.h: Remove "If N is large, do at least 8 byte > alignment" code. This should be something more like: * config/i386/freebsd.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Remove "If N is large, do at least 8 byte alignment" code. > * config/i386/gnu-user.h: Likewise. * config/i386/gnu-user.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Likewise. > * config/i386/iamcu.h: Likewise. > * config/i386/openbsdelf.h: Likewise. > * config/i386/x86-64.h: Likewise. > * flags.h (struct target_flag_state): Add x_align_functions_max_skip > member. > * toplev.c (parse_N_M): New function. > (init_alignments): Set align_FOO_log, align_FOO, align_FOO_max_skip > from specified --align_FOO=N{,M} option > * varasm.c (assemble_start_function): Use align_functions_max_skip > instead of align_functions - 1. > > Index: gcc/common.opt > =================================================================== > .... > > >> 2) Documentation changes. You did not add this or you doing this in a separate patch now? The current documentation is done in doc/invoke.texi: @item -falign-functions @itemx -falign-functions=@var{n} @opindex falign-functions Align the start of functions to the next power-of-two greater than @var{n}, skipping up to @var{n} bytes. For instance, @option{-falign-functions=32} aligns functions to the next 32-byte boundary, but @option{-falign-functions=24} aligns to the next 32-byte boundary only if this can be done by skipping 23 bytes or less. @option{-fno-align-functions} and @option{-falign-functions=1} are equivalent and mean that functions are not aligned. Some assemblers only support this flag when @var{n} is a power of two; in that case, it is rounded up. If @var{n} is not specified or is zero, use a machine-dependent default. ---- CUT ---- You should add your documentation there (and the rest of the -falign-* options. >> >>> >>> Index: gcc/common.opt >>> =================================================================== > > ... > >>> #define ASM_OUTPUT_MAX_SKIP_ALIGN(FILE,LOG,MAX_SKIP) \ >>> do { \ >>> if ((LOG) != 0) { \ >>> - if ((MAX_SKIP) == 0) fprintf ((FILE), "\t.p2align %d\n", (LOG)); \ >>> - else { \ >>> + if ((MAX_SKIP) == 0 || (MAX_SKIP) >= (1<<(LOG))) \ >>> + fprintf ((FILE), "\t.p2align %d\n", (LOG)); \ >>> + else \ >>> fprintf ((FILE), "\t.p2align %d,,%d\n", (LOG), (MAX_SKIP)); \ >>> - /* Make sure that we have at least 8 byte alignment if > 8 byte \ >>> - alignment is preferred. */ \ >>> - if ((LOG) > 3 \ >>> - && (1 << (LOG)) > ((MAX_SKIP) + 1) \ >>> - && (MAX_SKIP) >= 7) \ >>> - fputs ("\t.p2align 3\n", (FILE)); \ >>> - } >>> \ >>> } \ >>> } while (0) >>> #undef ASM_OUTPUT_MAX_SKIP_PAD >> >> >> These looks like a bug fix, it most likely should be sent separately; >> maybe even first. > > > ok > >> Also can you make sure the generic ASM_OUTPUT_MAX_SKIP_ALIGN is done >> correctly? > > > If by generic you mean "non-arch-specific", the situation is as follows: > if arch does not define ASM_OUTPUT_MAX_SKIP_ALIGN(file, align, max_skip), > then > ASM_OUTPUT_ALIGN_WITH_NOP(file, align) or > ASM_OUTPUT_ALIGN (file, align) macros are used. > As you se from their arguments, those macros don't take "max_skip" argument. > IOW: even in existing code, they ignore max_skip, and always pad to the > specified alignment. > Whether this is considered "correct" or not, is up to you. The patch > changes nothing for those arches. > > I checked all arches which do have ASM_OUTPUT_MAX_SKIP_ALIGN. > > visium has the same "align at least to 8" logic like many x86 flavors, > do you think it should be removed? > > rs6000, aarch64, arm use ".p2align %d,,%d" logic with no special cases, > look good to me. > > rx uses ".balign %d,3,%d" and special-cases TARGET_AS100_SYNTAX, > this arch is probably fine too.