On Tue, 31 Mar 2015, Richard Earnshaw wrote: > On 31/03/15 11:20, Richard Biener wrote: > > On Tue, 31 Mar 2015, Richard Biener wrote: > > > >> On Tue, 31 Mar 2015, Richard Earnshaw wrote: > >> > >>> On 31/03/15 08:50, Richard Biener wrote: > >>>> On Mon, Mar 30, 2015 at 10:13 PM, Richard Biener <rguent...@suse.de> > >>>> wrote: > >>>>> On March 30, 2015 6:45:34 PM GMT+02:00, Alan Lawrence > >>>>> <alan.lawre...@arm.com> wrote: > >>>>>> -O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes > >>>>>> it. > >>>>>> > >>>>>> The problem appears to be in laying out arguments, specifically > >>>>>> varargs. From > >>>>>> the "good" -fdump-rtl-expand: > >>>>>> > >>>>>> (insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0 > >>>>>> S4 A32]) > >>>>>> (reg:SI 111 [ b1$16 ])) reduced.c:14 -1 > >>>>>> (nil)) > >>>>>> (insn 19 18 20 2 (set (reg:DF 2 r2) > >>>>>> (reg:DF 112 [ b1$8 ])) reduced.c:14 -1 > >>>>>> (nil)) > >>>>>> (insn 20 19 21 2 (set (reg:SI 1 r1) > >>>>>> (reg:SI 113 [ b1 ])) reduced.c:14 -1 > >>>>>> (nil)) > >>>>>> (insn 21 20 22 2 (set (reg:SI 0 r0) > >>>>>> (reg:SI 118)) reduced.c:14 -1 > >>>>>> (nil)) > >>>>>> (call_insn 22 21 23 2 (parallel [ > >>>>>> (set (reg:SI 0 r0) > >>>>>> (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41] > >>>>>> <function_decl 0x2ab50e856d00 __builtin_printf>) [0 __builtin_printf S4 > >>>>>> A32]) > >>>>>> > >>>>>> The struct members are > >>>>>> reg:SI 113 => int a; > >>>>>> reg:DF 112 => double b; > >>>>>> reg:SI 111 => int c; > >>>>>> > >>>>>> r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is > >>>>>> pushed > >>>>>> into virtual-outgoing-args. In contrast, post-change to > >>>>>> build_ref_of_offset, we get: > >>>>>> > >>>>>> (insn 17 16 18 2 (set (reg:SI 118) > >>>>>> (symbol_ref/v/f:SI ("*.LC1") [flags 0x82] <var_decl 0x2ba57fa8d750 > >>>>>> *.LC1>)) reduced.c:14 -1 > >>>>>> (nil)) > >>>>>> (insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107 > >>>>>> virtual-outgoing-args) > >>>>>> (const_int 8 [0x8])) [0 S4 A64]) > >>>>>> (reg:SI 111 [ b1$16 ])) reduced.c:14 -1 > >>>>>> (nil)) > >>>>>> (insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0 > >>>>>> S8 A64]) > >>>>>> (reg:DF 112 [ b1$8 ])) reduced.c:14 -1 > >>>>>> (nil)) > >>>>>> (insn 20 19 21 2 (set (reg:SI 2 r2) > >>>>>> (reg:SI 113 [ b1 ])) reduced.c:14 -1 > >>>>>> (nil)) > >>>>>> (insn 21 20 22 2 (set (reg:SI 0 r0) > >>>>>> (reg:SI 118)) reduced.c:14 -1 > >>>>>> (nil)) > >>>>>> (call_insn 22 21 23 2 (parallel [ > >>>>>> (set (reg:SI 0 r0) > >>>>>> (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41] > >>>>>> <function_decl 0x2ba57f843d00 __builtin_printf>) [0 __builtin_printf S4 > >>>>>> A32]) > >>>>>> > >>>>>> r0 still gets the format string, but 'int b1.a' now goes in r2, and the > >>>>>> > >>>>>> double+following int are all pushed into virtual-outgoing-args. This is > >>>>>> because > >>>>>> arm_function_arg is fed a 64-bit-aligned int as type of the second > >>>>>> argument (the > >>>>>> type constructed by build_ref_for_offset); it then executes > >>>>>> (aapcs_layout_arg, > >>>>>> arm.c line ~~5914) > >>>>>> > >>>>>> /* C3 - For double-word aligned arguments, round the NCRN up to the > >>>>>> next even number. */ > >>>>>> ncrn = pcum->aapcs_ncrn; > >>>>>> if ((ncrn & 1) && arm_needs_doubleword_align (mode, type)) > >>>>>> ncrn++; > >>>>>> > >>>>>> Which changes r1 to r2. Passing -fno-tree-sra, or removing from the > >>>>>> testcase > >>>>>> "*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a > >>>>>> 32-bit-aligned int instead, which works as previously. > >>>>>> > >>>>>> Passing the same members of that struct in a non-vargs call, works ok - > >>>>>> I think > >>>>>> because these use the type of the declared parameters, rather than the > >>>>>> provided > >>>>>> arguments, and the former do not have the increased alignment from > >>>>>> build_ref_for_offset. > >>>>> > >>>>> It doesn't make sense to use the alignment of passed values. That > >>>>> looks like bs. > >>>>> > >>>>> This means that > >>>>> > >>>>> Int I __aligned__(8); > >>>>> > >>>>> Is passed differently than int. > >>>>> > >>>>> Arm_function_arg needs to be fixed. > >>>> > >>>> That is, > >>>> > >>>> typedef int myint __attribute__((aligned(8))); > >>>> > >>>> int main() > >>>> { > >>>> myint i = 1; > >>>> int j = 2; > >>>> __builtin_printf("%d %d\n", i, j); > >>>> } > >>>> > >>>> or > >>>> > >>>> myint i; > >>>> int j; > >>>> myint *p = &i; > >>>> int *q = &j; > >>>> > >>>> int main() > >>>> { > >>>> __builtin_printf("%d %d", *p, *q); > >>>> } > >>>> > >>>> should behave the same. There isn't a printf modifier for an "aligned > >>>> int" > >>>> because that sort of thing doesn't make sense. Special-casing aligned > >>>> vs. > >>>> non-aligned values only makes sense for things passed by value on the > >>>> stack. > >>>> And then obviously only dependent on the functuion type signature, not > >>>> on the type of the passed value. > >>>> > >>> > >>> I think the testcase is ill-formed. Just because printf doesn't have > >>> such a modifier, doesn't mean that another variadic function might not > >>> have a means to detect when an object in the variadic list needs to be > >>> over-aligned. As such, the test should really be written as: > >> > >> A value doesn't have "alignment". A function may have alignment > >> requirements on its arguments, clearly printf doesn't. > > > > Btw, it looks like function_arg is supposed to pass whether the argument > > is to the variadic part of the function or not, but it gets passed > > false as in > > > > args[i].reg = targetm.calls.function_arg (args_so_far, mode, type, > > argpos < n_named_args); > > > > n_named_args is 4. That is because ARM asks to do this via > > > > if (type_arg_types != 0 > > && targetm.calls.strict_argument_naming (args_so_far)) > > ; > > else if (type_arg_types != 0 > > && ! targetm.calls.pretend_outgoing_varargs_named > > (args_so_far)) > > /* Don't include the last named arg. */ > > --n_named_args; > > else > > /* Treat all args as named. */ > > n_named_args = num_actuals; > > > > thus you lose that info (you could have that readily available). > > > > From the calling side of a function it shouldn't matter. They only > thing the caller has to know is that the function is variadic (and > therefore that the base-standard rules from the AAPCS apply -- no use of > FP registers for parameters). The behaviour after that is *as if* all > the arguments were pushed onto the stack in the correct order and > finally the lowest 4 words are popped off the stack again into r0-r3. > Hence any alignment that would apply to arguments on the stack has to be > reflected in their allocation into r0-r3 (since the push/pop of those > registers never happens).
But how do you compute the alignment of, say a myint '1'? Of course __attribute__((aligned())) is a C extension thus AAPCS likely doesn't consider it. But yes, as given even on x86_64 we might spill a 8-byte aligned int register to a 8-byte aligned stack slot so the proposed patch makes sense in that regard. I wonder how many other passes can get confused by this (PRE, I suppose, inserting an explicitely aligned load as well as all passes after the vectorizer which also creates loads/store with explicit (usually lower) alignment). Richard. > R. > > >>> typedef int myint __attribute__((aligned(8))); > >>> > >>> int main() > >>> { > >>> myint i = 1; > >>> int j = 2; > >>> __builtin_printf("%d %d\n", (int) i, j); > >>> } > >>> > >>> Variadic functions take the types of their arguments from the types of > >>> the actuals passed. The compiler should either be applying appropriate > >>> promotion rules to make the types conformant by the language > >>> specification or respecting the types exactly. However, that should be > >>> done in the mid-end not the back-end. If incorrect alignment > >>> information is passed to the back-end it can't help but make the wrong > >>> choice. Examining the mode here wouldn't help either. Consider: > >>> > >>> int foo (int a, int b, int c, int d, int e, int f > >>> __attribute__((aligned(8)))); > >>> > >>> On ARM that should pass f in a 64-bit aligned location (since the > >>> parameter will be on the stack). > >>> > >>> R. > >>> > >>> > >>>> Richard. > >>>> > >>>>> Richard. > >>>>> > >>>>>> > >>>>>> FWIW, I also tried: > >>>>>> > >>>>>> __attribute__((__aligned__((16)))) int x; > >>>>>> int main (void) > >>>>>> { > >>>>>> __builtin_printf("%d\n", x); > >>>>>> } > >>>>>> > >>>>>> but in that case, the arm_function_arg is still fed a type with > >>>>>> alignment 32 > >>>>>> (bits), i.e. distinct from the type of the field 'x' in memory, which > >>>>>> has > >>>>>> alignment 128. > >>>>>> > >>>>>> --Alan > >>>>>> > >>>>>> Richard Biener wrote: > >>>>>>> On Mon, 30 Mar 2015, Richard Biener wrote: > >>>>>>> > >>>>>>>> On Mon, 30 Mar 2015, Alan Lawrence wrote: > >>>>>>>> > >>>>>>>>> ...actually attach the testcase... > >>>>>>>> What compile options? > >>>>>>> > >>>>>>> Just tried -O2. The GIMPLE IL assumes 64bit alignment of .LC0 but > >>>>>>> I can't see anything not guaranteeing that: > >>>>>>> > >>>>>>> .section .rodata > >>>>>>> .align 3 > >>>>>>> .LANCHOR0 = . + 0 > >>>>>>> .LC1: > >>>>>>> .ascii "%d %g %d\012\000" > >>>>>>> .space 6 > >>>>>>> .LC0: > >>>>>>> .word 7 > >>>>>>> .space 4 > >>>>>>> .word 0 > >>>>>>> .word 1075838976 > >>>>>>> .word 9 > >>>>>>> .space 4 > >>>>>>> > >>>>>>> maybe there is some more generic code-gen bug for aligned aggregate > >>>>>>> copy? That is, the patch tells the backend that the loads and > >>>>>>> stores to the 'int' vars (which have padding followed) is aligned > >>>>>>> to 8 bytes. > >>>>>>> > >>>>>>> I don't see what is wrong in the final assembler, but maybe > >>>>>>> some endian issue exists? The code looks quite ugly though ;) > >>>>>>> > >>>>>>> Richard. > >>>>>>> > >>>>>>>>> Alan Lawrence wrote: > >>>>>>>>>> We've been seeing a bunch of new failures in the *libffi* > >>>>>> testsuite on ARM > >>>>>>>>>> Linux (arm-none-linux-gnueabi, arm-none-linux-gnueabihf), > >>>>>> following this > >>>>>>>>>> one-liner fix. I've reduced the testcase down to the attached > >>>>>> (including > >>>>>>>>>> removing any dependency on libffi); with gcc r221347, this prints > >>>>>> the > >>>>>>>>>> expected > >>>>>>>>>> 7 8 9 > >>>>>>>>>> whereas with gcc r221348, instead it prints > >>>>>>>>>> 0 8 0 > >>>>>>>>>> > >>>>>>>>>> The action of r221348 is to change the alignment of a mem_ref, and > >>>>>> a > >>>>>>>>>> var_decl of b1, from 32 to 64; both have type > >>>>>>>>>> type <record_type 0x2b9b8d428d20 cls_struct_16byte > >>>>>> sizes-gimplified type_0 > >>>>>>>>>> BLK > >>>>>>>>>> size <integer_cst 0x2b9b8d3720a8 constant 192> > >>>>>>>>>> unit size <integer_cst 0x2b9b8d372078 constant 24> > >>>>>>>>>> align 64 symtab 0 alias set 1 canonical type > >>>>>> 0x2b9b8d428d20 > >>>>>>>>>> fields <field_decl 0x2b9b8d42b098 a type <integer_type > >>>>>>>>>> 0x2b9b8d092690 int> > >>>>>>>>>> SI file reduced.c line 12 col 7 > >>>>>>>>>> size <integer_cst 0x2b9b8d08eeb8 constant 32> > >>>>>>>>>> unit size <integer_cst 0x2b9b8d08eed0 constant 4> > >>>>>>>>>> align 32 offset_align 64 > >>>>>>>>>> offset <integer_cst 0x2b9b8d08eee8 constant 0> > >>>>>>>>>> bit offset <integer_cst 0x2b9b8d08ef48 constant 0> > >>>>>> context > >>>>>>>>>> <record_type 0x2b9b8d428d20 cls_struct_16byte> chain <field_decl > >>>>>>>>>> 0x2b9b8d42b130 b>> context <translation_unit_decl 0x2b9b8d4232d0 > >>>>>> D.6070> > >>>>>>>>>> pointer_to_this <pointer_type 0x2b9b8d42d0a8> chain > >>>>>> <type_decl > >>>>>>>>>> 0x2b9b8d42b000 D.6044>> > >>>>>>>>>> > >>>>>>>>>> The tree-optimized output is the same with both compilers (as this > >>>>>> does not > >>>>>>>>>> mention alignment); the expand output differs. > >>>>>>>>>> > >>>>>>>>>> Still investigating... > >>>>>>>>>> > >>>>>>>>>> --Alan > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Richard Biener wrote: > >>>>>>>>>>> This fixes a vectorizer testcase regression on powerpc where SRA > >>>>>>>>>>> drops alignment info unnecessarily. > >>>>>>>>>>> > >>>>>>>>>>> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. > >>>>>>>>>>> > >>>>>>>>>>> Richard. > >>>>>>>>>>> > >>>>>>>>>>> 2015-03-11 Richard Biener <rguent...@suse.de> > >>>>>>>>>>> > >>>>>>>>>>> PR tree-optimization/65310 > >>>>>>>>>>> * tree-sra.c (build_ref_for_offset): Also preserve larger > >>>>>>>>>>> alignment. > >>>>>>>>>>> > >>>>>>>>>>> Index: gcc/tree-sra.c > >>>>>>>>>>> > >>>>>> =================================================================== > >>>>>>>>>>> --- gcc/tree-sra.c (revision 221324) > >>>>>>>>>>> +++ gcc/tree-sra.c (working copy) > >>>>>>>>>>> @@ -1597,7 +1597,7 @@ build_ref_for_offset (location_t loc, tr > >>>>>>>>>>> misalign = (misalign + offset) & (align - 1); > >>>>>>>>>>> if (misalign != 0) > >>>>>>>>>>> align = (misalign & -misalign); > >>>>>>>>>>> - if (align < TYPE_ALIGN (exp_type)) > >>>>>>>>>>> + if (align != TYPE_ALIGN (exp_type)) > >>>>>>>>>>> exp_type = build_aligned_type (exp_type, align); > >>>>>>>>>>> mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base, > >>>>>> off); > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>> > >>>>> > >>>> > >>> > >>> > >> > >> > > > > -- Richard Biener <rguent...@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Jennifer Guild, Dilip Upmanyu, Graham Norton HRB 21284 (AG Nuernberg)