On March 30, 2015 6:45:34 PM GMT+02:00, Alan Lawrence <alan.lawre...@arm.com> wrote: >-O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes >it. > >The problem appears to be in laying out arguments, specifically >varargs. From >the "good" -fdump-rtl-expand: > >(insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0 >S4 A32]) > (reg:SI 111 [ b1$16 ])) reduced.c:14 -1 > (nil)) >(insn 19 18 20 2 (set (reg:DF 2 r2) > (reg:DF 112 [ b1$8 ])) reduced.c:14 -1 > (nil)) >(insn 20 19 21 2 (set (reg:SI 1 r1) > (reg:SI 113 [ b1 ])) reduced.c:14 -1 > (nil)) >(insn 21 20 22 2 (set (reg:SI 0 r0) > (reg:SI 118)) reduced.c:14 -1 > (nil)) >(call_insn 22 21 23 2 (parallel [ > (set (reg:SI 0 r0) > (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41] ><function_decl 0x2ab50e856d00 __builtin_printf>) [0 __builtin_printf S4 >A32]) > >The struct members are >reg:SI 113 => int a; >reg:DF 112 => double b; >reg:SI 111 => int c; > >r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is >pushed >into virtual-outgoing-args. In contrast, post-change to >build_ref_of_offset, we get: > >(insn 17 16 18 2 (set (reg:SI 118) > (symbol_ref/v/f:SI ("*.LC1") [flags 0x82] <var_decl 0x2ba57fa8d750 >*.LC1>)) reduced.c:14 -1 > (nil)) >(insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107 >virtual-outgoing-args) > (const_int 8 [0x8])) [0 S4 A64]) > (reg:SI 111 [ b1$16 ])) reduced.c:14 -1 > (nil)) >(insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0 >S8 A64]) > (reg:DF 112 [ b1$8 ])) reduced.c:14 -1 > (nil)) >(insn 20 19 21 2 (set (reg:SI 2 r2) > (reg:SI 113 [ b1 ])) reduced.c:14 -1 > (nil)) >(insn 21 20 22 2 (set (reg:SI 0 r0) > (reg:SI 118)) reduced.c:14 -1 > (nil)) >(call_insn 22 21 23 2 (parallel [ > (set (reg:SI 0 r0) > (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41] ><function_decl 0x2ba57f843d00 __builtin_printf>) [0 __builtin_printf S4 >A32]) > >r0 still gets the format string, but 'int b1.a' now goes in r2, and the > >double+following int are all pushed into virtual-outgoing-args. This is >because >arm_function_arg is fed a 64-bit-aligned int as type of the second >argument (the >type constructed by build_ref_for_offset); it then executes >(aapcs_layout_arg, >arm.c line ~~5914) > > /* C3 - For double-word aligned arguments, round the NCRN up to the > next even number. */ > ncrn = pcum->aapcs_ncrn; > if ((ncrn & 1) && arm_needs_doubleword_align (mode, type)) > ncrn++; > >Which changes r1 to r2. Passing -fno-tree-sra, or removing from the >testcase >"*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a >32-bit-aligned int instead, which works as previously. > >Passing the same members of that struct in a non-vargs call, works ok - >I think >because these use the type of the declared parameters, rather than the >provided >arguments, and the former do not have the increased alignment from >build_ref_for_offset.
It doesn't make sense to use the alignment of passed values. That looks like bs. This means that Int I __aligned__(8); Is passed differently than int. Arm_function_arg needs to be fixed. Richard. > >FWIW, I also tried: > >__attribute__((__aligned__((16)))) int x; >int main (void) >{ > __builtin_printf("%d\n", x); >} > >but in that case, the arm_function_arg is still fed a type with >alignment 32 >(bits), i.e. distinct from the type of the field 'x' in memory, which >has >alignment 128. > >--Alan > >Richard Biener wrote: >> On Mon, 30 Mar 2015, Richard Biener wrote: >> >>> On Mon, 30 Mar 2015, Alan Lawrence wrote: >>> >>>> ...actually attach the testcase... >>> What compile options? >> >> Just tried -O2. The GIMPLE IL assumes 64bit alignment of .LC0 but >> I can't see anything not guaranteeing that: >> >> .section .rodata >> .align 3 >> .LANCHOR0 = . + 0 >> .LC1: >> .ascii "%d %g %d\012\000" >> .space 6 >> .LC0: >> .word 7 >> .space 4 >> .word 0 >> .word 1075838976 >> .word 9 >> .space 4 >> >> maybe there is some more generic code-gen bug for aligned aggregate >> copy? That is, the patch tells the backend that the loads and >> stores to the 'int' vars (which have padding followed) is aligned >> to 8 bytes. >> >> I don't see what is wrong in the final assembler, but maybe >> some endian issue exists? The code looks quite ugly though ;) >> >> Richard. >> >>>> Alan Lawrence wrote: >>>>> We've been seeing a bunch of new failures in the *libffi* >testsuite on ARM >>>>> Linux (arm-none-linux-gnueabi, arm-none-linux-gnueabihf), >following this >>>>> one-liner fix. I've reduced the testcase down to the attached >(including >>>>> removing any dependency on libffi); with gcc r221347, this prints >the >>>>> expected >>>>> 7 8 9 >>>>> whereas with gcc r221348, instead it prints >>>>> 0 8 0 >>>>> >>>>> The action of r221348 is to change the alignment of a mem_ref, and >a >>>>> var_decl of b1, from 32 to 64; both have type >>>>> type <record_type 0x2b9b8d428d20 cls_struct_16byte >sizes-gimplified type_0 >>>>> BLK >>>>> size <integer_cst 0x2b9b8d3720a8 constant 192> >>>>> unit size <integer_cst 0x2b9b8d372078 constant 24> >>>>> align 64 symtab 0 alias set 1 canonical type >0x2b9b8d428d20 >>>>> fields <field_decl 0x2b9b8d42b098 a type <integer_type >>>>> 0x2b9b8d092690 int> >>>>> SI file reduced.c line 12 col 7 >>>>> size <integer_cst 0x2b9b8d08eeb8 constant 32> >>>>> unit size <integer_cst 0x2b9b8d08eed0 constant 4> >>>>> align 32 offset_align 64 >>>>> offset <integer_cst 0x2b9b8d08eee8 constant 0> >>>>> bit offset <integer_cst 0x2b9b8d08ef48 constant 0> >context >>>>> <record_type 0x2b9b8d428d20 cls_struct_16byte> chain <field_decl >>>>> 0x2b9b8d42b130 b>> context <translation_unit_decl 0x2b9b8d4232d0 >D.6070> >>>>> pointer_to_this <pointer_type 0x2b9b8d42d0a8> chain ><type_decl >>>>> 0x2b9b8d42b000 D.6044>> >>>>> >>>>> The tree-optimized output is the same with both compilers (as this >does not >>>>> mention alignment); the expand output differs. >>>>> >>>>> Still investigating... >>>>> >>>>> --Alan >>>>> >>>>> >>>>> Richard Biener wrote: >>>>>> This fixes a vectorizer testcase regression on powerpc where SRA >>>>>> drops alignment info unnecessarily. >>>>>> >>>>>> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. >>>>>> >>>>>> Richard. >>>>>> >>>>>> 2015-03-11 Richard Biener <rguent...@suse.de> >>>>>> >>>>>> PR tree-optimization/65310 >>>>>> * tree-sra.c (build_ref_for_offset): Also preserve larger >>>>>> alignment. >>>>>> >>>>>> Index: gcc/tree-sra.c >>>>>> >=================================================================== >>>>>> --- gcc/tree-sra.c (revision 221324) >>>>>> +++ gcc/tree-sra.c (working copy) >>>>>> @@ -1597,7 +1597,7 @@ build_ref_for_offset (location_t loc, tr >>>>>> misalign = (misalign + offset) & (align - 1); >>>>>> if (misalign != 0) >>>>>> align = (misalign & -misalign); >>>>>> - if (align < TYPE_ALIGN (exp_type)) >>>>>> + if (align != TYPE_ALIGN (exp_type)) >>>>>> exp_type = build_aligned_type (exp_type, align); >>>>>> mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base, >off); >>>>>> >>>>>> >>>>> >>>> >>>> >>>> >>> >>