Re: New regression on ARM Linux

Richard Biener Tue, 31 Mar 2015 03:45:54 -0700

On Tue, 31 Mar 2015, Richard Earnshaw wrote:

> On 31/03/15 11:20, Richard Biener wrote:
> > On Tue, 31 Mar 2015, Richard Biener wrote:
> > 
> >> On Tue, 31 Mar 2015, Richard Earnshaw wrote:
> >>
> >>> On 31/03/15 08:50, Richard Biener wrote:
> >>>> On Mon, Mar 30, 2015 at 10:13 PM, Richard Biener <[email protected]> 
> >>>> wrote:
> >>>>> On March 30, 2015 6:45:34 PM GMT+02:00, Alan Lawrence 
> >>>>> <[email protected]> wrote:
> >>>>>> -O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes
> >>>>>> it.
> >>>>>>
> >>>>>> The problem appears to be in laying out arguments, specifically
> >>>>>> varargs. From
> >>>>>> the "good" -fdump-rtl-expand:
> >>>>>>
> >>>>>> (insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0
> >>>>>> S4 A32])
> >>>>>>         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
> >>>>>>      (nil))
> >>>>>> (insn 19 18 20 2 (set (reg:DF 2 r2)
> >>>>>>         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
> >>>>>>      (nil))
> >>>>>> (insn 20 19 21 2 (set (reg:SI 1 r1)
> >>>>>>         (reg:SI 113 [ b1 ])) reduced.c:14 -1
> >>>>>>      (nil))
> >>>>>> (insn 21 20 22 2 (set (reg:SI 0 r0)
> >>>>>>         (reg:SI 118)) reduced.c:14 -1
> >>>>>>      (nil))
> >>>>>> (call_insn 22 21 23 2 (parallel [
> >>>>>>             (set (reg:SI 0 r0)
> >>>>>>                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
> >>>>>> <function_decl 0x2ab50e856d00 __builtin_printf>) [0 __builtin_printf S4
> >>>>>> A32])
> >>>>>>
> >>>>>> The struct members are
> >>>>>> reg:SI 113 => int a;
> >>>>>> reg:DF 112 => double b;
> >>>>>> reg:SI 111 => int c;
> >>>>>>
> >>>>>> r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is
> >>>>>> pushed
> >>>>>> into virtual-outgoing-args. In contrast, post-change to
> >>>>>> build_ref_of_offset, we get:
> >>>>>>
> >>>>>> (insn 17 16 18 2 (set (reg:SI 118)
> >>>>>>   (symbol_ref/v/f:SI ("*.LC1") [flags 0x82]  <var_decl 0x2ba57fa8d750
> >>>>>> *.LC1>)) reduced.c:14 -1
> >>>>>>      (nil))
> >>>>>> (insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107
> >>>>>> virtual-outgoing-args)
> >>>>>>                 (const_int 8 [0x8])) [0  S4 A64])
> >>>>>>         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
> >>>>>>      (nil))
> >>>>>> (insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0
> >>>>>> S8 A64])
> >>>>>>         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
> >>>>>>      (nil))
> >>>>>> (insn 20 19 21 2 (set (reg:SI 2 r2)
> >>>>>>         (reg:SI 113 [ b1 ])) reduced.c:14 -1
> >>>>>>      (nil))
> >>>>>> (insn 21 20 22 2 (set (reg:SI 0 r0)
> >>>>>>         (reg:SI 118)) reduced.c:14 -1
> >>>>>>      (nil))
> >>>>>> (call_insn 22 21 23 2 (parallel [
> >>>>>>             (set (reg:SI 0 r0)
> >>>>>>                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
> >>>>>> <function_decl 0x2ba57f843d00 __builtin_printf>) [0 __builtin_printf S4
> >>>>>> A32])
> >>>>>>
> >>>>>> r0 still gets the format string, but 'int b1.a' now goes in r2, and the
> >>>>>>
> >>>>>> double+following int are all pushed into virtual-outgoing-args. This is
> >>>>>> because
> >>>>>> arm_function_arg is fed a 64-bit-aligned int as type of the second
> >>>>>> argument (the
> >>>>>> type constructed by build_ref_for_offset); it then executes
> >>>>>> (aapcs_layout_arg,
> >>>>>> arm.c line ~~5914)
> >>>>>>
> >>>>>>   /* C3 - For double-word aligned arguments, round the NCRN up to the
> >>>>>>      next even number.  */
> >>>>>>   ncrn = pcum->aapcs_ncrn;
> >>>>>>   if ((ncrn & 1) && arm_needs_doubleword_align (mode, type))
> >>>>>>     ncrn++;
> >>>>>>
> >>>>>> Which changes r1 to r2. Passing -fno-tree-sra, or removing from the
> >>>>>> testcase
> >>>>>> "*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a
> >>>>>> 32-bit-aligned int instead, which works as previously.
> >>>>>>
> >>>>>> Passing the same members of that struct in a non-vargs call, works ok -
> >>>>>> I think
> >>>>>> because these use the type of the declared parameters, rather than the
> >>>>>> provided
> >>>>>> arguments, and the former do not have the increased alignment from
> >>>>>> build_ref_for_offset.
> >>>>>
> >>>>> It doesn't make sense to use the alignment of passed values.  That 
> >>>>> looks like bs.
> >>>>>
> >>>>> This means that
> >>>>>
> >>>>> Int I __aligned__(8);
> >>>>>
> >>>>> Is passed differently than int.
> >>>>>
> >>>>> Arm_function_arg needs to be fixed.
> >>>>
> >>>> That is,
> >>>>
> >>>> typedef int myint __attribute__((aligned(8)));
> >>>>
> >>>> int main()
> >>>> {
> >>>>   myint i = 1;
> >>>>   int j = 2;
> >>>>   __builtin_printf("%d %d\n", i, j);
> >>>> }
> >>>>
> >>>> or
> >>>>
> >>>> myint i;
> >>>> int j;
> >>>> myint *p = &i;
> >>>> int *q = &j;
> >>>>
> >>>> int main()
> >>>> {
> >>>>   __builtin_printf("%d %d", *p, *q);
> >>>> }
> >>>>
> >>>> should behave the same.  There isn't a printf modifier for an "aligned 
> >>>> int"
> >>>> because that sort of thing doesn't make sense.  Special-casing aligned 
> >>>> vs.
> >>>> non-aligned values only makes sense for things passed by value on the 
> >>>> stack.
> >>>> And then obviously only dependent on the functuion type signature, not
> >>>> on the type of the passed value.
> >>>>
> >>>
> >>> I think the testcase is ill-formed.  Just because printf doesn't have
> >>> such a modifier, doesn't mean that another variadic function might not
> >>> have a means to detect when an object in the variadic list needs to be
> >>> over-aligned.  As such, the test should really be written as:
> >>
> >> A value doesn't have "alignment".  A function may have alignment
> >> requirements on its arguments, clearly printf doesn't.
> > 
> > Btw, it looks like function_arg is supposed to pass whether the argument
> > is to the variadic part of the function or not, but it gets passed
> > false as in
> > 
> >       args[i].reg = targetm.calls.function_arg (args_so_far, mode, type,
> >                                                 argpos < n_named_args);
> > 
> > n_named_args is 4.  That is because ARM asks to do this via
> > 
> >   if (type_arg_types != 0
> >       && targetm.calls.strict_argument_naming (args_so_far))
> >     ;
> >   else if (type_arg_types != 0
> >            && ! targetm.calls.pretend_outgoing_varargs_named 
> > (args_so_far))
> >     /* Don't include the last named arg.  */
> >     --n_named_args;
> >   else
> >     /* Treat all args as named.  */
> >     n_named_args = num_actuals;
> > 
> > thus you lose that info (you could have that readily available).
> > 
> 
> From the calling side of a function it shouldn't matter.  They only
> thing the caller has to know is that the function is variadic (and
> therefore that the base-standard rules from the AAPCS apply -- no use of
> FP registers for parameters).  The behaviour after that is *as if* all
> the arguments were pushed onto the stack in the correct order and
> finally the lowest 4 words are popped off the stack again into r0-r3.
> Hence any alignment that would apply to arguments on the stack has to be
> reflected in their allocation into r0-r3 (since the push/pop of those
> registers never happens).


But how do you compute the alignment of, say a myint '1'?  Of course
__attribute__((aligned())) is a C extension thus AAPCS likely doesn't 
consider it.

But yes, as given even on x86_64 we might spill a 8-byte aligned
int register to a 8-byte aligned stack slot so the proposed patch
makes sense in that regard.  I wonder how many other passes can
get confused by this (PRE, I suppose, inserting an explicitely
aligned load as well as all passes after the vectorizer which
also creates loads/store with explicit (usually lower) alignment).

Richard.

> R.
> 
> >>> typedef int myint __attribute__((aligned(8)));
> >>>
> >>> int main()
> >>> {
> >>>   myint i = 1;
> >>>   int j = 2;
> >>>   __builtin_printf("%d %d\n", (int) i, j);
> >>> }
> >>>
> >>> Variadic functions take the types of their arguments from the types of
> >>> the actuals passed.  The compiler should either be applying appropriate
> >>> promotion rules to make the types conformant by the language
> >>> specification or respecting the types exactly.  However, that should be
> >>> done in the mid-end not the back-end.  If incorrect alignment
> >>> information is passed to the back-end it can't help but make the wrong
> >>> choice.  Examining the mode here wouldn't help either.  Consider:
> >>>
> >>> int foo (int a, int b, int c, int d, int e, int f
> >>> __attribute__((aligned(8))));
> >>>
> >>> On ARM that should pass f in a 64-bit aligned location (since the
> >>> parameter will be on the stack).
> >>>
> >>> R.
> >>>
> >>>
> >>>> Richard.
> >>>>
> >>>>> Richard.
> >>>>>
> >>>>>>
> >>>>>> FWIW, I also tried:
> >>>>>>
> >>>>>> __attribute__((__aligned__((16)))) int x;
> >>>>>> int main (void)
> >>>>>> {
> >>>>>>   __builtin_printf("%d\n", x);
> >>>>>> }
> >>>>>>
> >>>>>> but in that case, the arm_function_arg is still fed a type with
> >>>>>> alignment 32
> >>>>>> (bits), i.e. distinct from the type of the field 'x' in memory, which
> >>>>>> has
> >>>>>> alignment 128.
> >>>>>>
> >>>>>> --Alan
> >>>>>>
> >>>>>> Richard Biener wrote:
> >>>>>>> On Mon, 30 Mar 2015, Richard Biener wrote:
> >>>>>>>
> >>>>>>>> On Mon, 30 Mar 2015, Alan Lawrence wrote:
> >>>>>>>>
> >>>>>>>>> ...actually attach the testcase...
> >>>>>>>> What compile options?
> >>>>>>>
> >>>>>>> Just tried -O2.  The GIMPLE IL assumes 64bit alignment of .LC0 but
> >>>>>>> I can't see anything not guaranteeing that:
> >>>>>>>
> >>>>>>>         .section        .rodata
> >>>>>>>         .align  3
> >>>>>>> .LANCHOR0 = . + 0
> >>>>>>> .LC1:
> >>>>>>>         .ascii  "%d %g %d\012\000"
> >>>>>>>         .space  6
> >>>>>>> .LC0:
> >>>>>>>         .word   7
> >>>>>>>         .space  4
> >>>>>>>         .word   0
> >>>>>>>         .word   1075838976
> >>>>>>>         .word   9
> >>>>>>>         .space  4
> >>>>>>>
> >>>>>>> maybe there is some more generic code-gen bug for aligned aggregate
> >>>>>>> copy?  That is, the patch tells the backend that the loads and
> >>>>>>> stores to the 'int' vars (which have padding followed) is aligned
> >>>>>>> to 8 bytes.
> >>>>>>>
> >>>>>>> I don't see what is wrong in the final assembler, but maybe
> >>>>>>> some endian issue exists?  The code looks quite ugly though ;)
> >>>>>>>
> >>>>>>> Richard.
> >>>>>>>
> >>>>>>>>> Alan Lawrence wrote:
> >>>>>>>>>> We've been seeing a bunch of new failures in the *libffi*
> >>>>>> testsuite on ARM
> >>>>>>>>>> Linux (arm-none-linux-gnueabi, arm-none-linux-gnueabihf),
> >>>>>> following this
> >>>>>>>>>> one-liner fix. I've reduced the testcase down to the attached
> >>>>>> (including
> >>>>>>>>>> removing any dependency on libffi); with gcc r221347, this prints
> >>>>>> the
> >>>>>>>>>> expected
> >>>>>>>>>> 7 8 9
> >>>>>>>>>> whereas with gcc r221348, instead it prints
> >>>>>>>>>> 0 8 0
> >>>>>>>>>>
> >>>>>>>>>> The action of r221348 is to change the alignment of a mem_ref, and
> >>>>>> a
> >>>>>>>>>> var_decl of b1, from 32 to 64; both have type
> >>>>>>>>>>   type <record_type 0x2b9b8d428d20 cls_struct_16byte
> >>>>>> sizes-gimplified type_0
> >>>>>>>>>> BLK
> >>>>>>>>>>          size <integer_cst 0x2b9b8d3720a8 constant 192>
> >>>>>>>>>>          unit size <integer_cst 0x2b9b8d372078 constant 24>
> >>>>>>>>>>          align 64 symtab 0 alias set 1 canonical type
> >>>>>> 0x2b9b8d428d20
> >>>>>>>>>>          fields <field_decl 0x2b9b8d42b098 a type <integer_type
> >>>>>>>>>> 0x2b9b8d092690 int>
> >>>>>>>>>>              SI file reduced.c line 12 col 7
> >>>>>>>>>>              size <integer_cst 0x2b9b8d08eeb8 constant 32>
> >>>>>>>>>>              unit size <integer_cst 0x2b9b8d08eed0 constant 4>
> >>>>>>>>>>              align 32 offset_align 64
> >>>>>>>>>>              offset <integer_cst 0x2b9b8d08eee8 constant 0>
> >>>>>>>>>>              bit offset <integer_cst 0x2b9b8d08ef48 constant 0>
> >>>>>> context
> >>>>>>>>>> <record_type 0x2b9b8d428d20 cls_struct_16byte> chain <field_decl
> >>>>>>>>>> 0x2b9b8d42b130 b>> context <translation_unit_decl 0x2b9b8d4232d0
> >>>>>> D.6070>
> >>>>>>>>>>          pointer_to_this <pointer_type 0x2b9b8d42d0a8> chain
> >>>>>> <type_decl
> >>>>>>>>>> 0x2b9b8d42b000 D.6044>>
> >>>>>>>>>>
> >>>>>>>>>> The tree-optimized output is the same with both compilers (as this
> >>>>>> does not
> >>>>>>>>>> mention alignment); the expand output differs.
> >>>>>>>>>>
> >>>>>>>>>> Still investigating...
> >>>>>>>>>>
> >>>>>>>>>> --Alan
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Richard Biener wrote:
> >>>>>>>>>>> This fixes a vectorizer testcase regression on powerpc where SRA
> >>>>>>>>>>> drops alignment info unnecessarily.
> >>>>>>>>>>>
> >>>>>>>>>>> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
> >>>>>>>>>>>
> >>>>>>>>>>> Richard.
> >>>>>>>>>>>
> >>>>>>>>>>> 2015-03-11  Richard Biener  <[email protected]>
> >>>>>>>>>>>
> >>>>>>>>>>>  PR tree-optimization/65310
> >>>>>>>>>>>  * tree-sra.c (build_ref_for_offset): Also preserve larger
> >>>>>>>>>>>  alignment.
> >>>>>>>>>>>
> >>>>>>>>>>> Index: gcc/tree-sra.c
> >>>>>>>>>>>
> >>>>>> ===================================================================
> >>>>>>>>>>> --- gcc/tree-sra.c       (revision 221324)
> >>>>>>>>>>> +++ gcc/tree-sra.c       (working copy)
> >>>>>>>>>>> @@ -1597,7 +1597,7 @@ build_ref_for_offset (location_t loc, tr
> >>>>>>>>>>>    misalign = (misalign + offset) & (align - 1);
> >>>>>>>>>>>    if (misalign != 0)
> >>>>>>>>>>>      align = (misalign & -misalign);
> >>>>>>>>>>> -  if (align < TYPE_ALIGN (exp_type))
> >>>>>>>>>>> +  if (align != TYPE_ALIGN (exp_type))
> >>>>>>>>>>>      exp_type = build_aligned_type (exp_type, align);
> >>>>>>>>>>>     mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base,
> >>>>>> off);
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>
> >>
> > 
> 
> 

-- 
Richard Biener <[email protected]>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Jennifer Guild,
Dilip Upmanyu, Graham Norton HRB 21284 (AG Nuernberg)

Re: New regression on ARM Linux

Reply via email to