Re: New regression on ARM Linux

Richard Biener Mon, 30 Mar 2015 13:13:54 -0700

On March 30, 2015 6:45:34 PM GMT+02:00, Alan Lawrence <[email protected]> 
wrote:
>-O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes
>it.
>
>The problem appears to be in laying out arguments, specifically
>varargs. From 
>the "good" -fdump-rtl-expand:
>
>(insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0 
>S4 A32])
>         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
>      (nil))
>(insn 19 18 20 2 (set (reg:DF 2 r2)
>         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
>      (nil))
>(insn 20 19 21 2 (set (reg:SI 1 r1)
>         (reg:SI 113 [ b1 ])) reduced.c:14 -1
>      (nil))
>(insn 21 20 22 2 (set (reg:SI 0 r0)
>         (reg:SI 118)) reduced.c:14 -1
>      (nil))
>(call_insn 22 21 23 2 (parallel [
>             (set (reg:SI 0 r0)
>                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41] 
><function_decl 0x2ab50e856d00 __builtin_printf>) [0 __builtin_printf S4
>A32])
>
>The struct members are
>reg:SI 113 => int a;
>reg:DF 112 => double b;
>reg:SI 111 => int c;
>
>r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is
>pushed 
>into virtual-outgoing-args. In contrast, post-change to
>build_ref_of_offset, we get:
>
>(insn 17 16 18 2 (set (reg:SI 118)
>   (symbol_ref/v/f:SI ("*.LC1") [flags 0x82]  <var_decl 0x2ba57fa8d750 
>*.LC1>)) reduced.c:14 -1
>      (nil))
>(insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107
>virtual-outgoing-args)
>                 (const_int 8 [0x8])) [0  S4 A64])
>         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
>      (nil))
>(insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0 
>S8 A64])
>         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
>      (nil))
>(insn 20 19 21 2 (set (reg:SI 2 r2)
>         (reg:SI 113 [ b1 ])) reduced.c:14 -1
>      (nil))
>(insn 21 20 22 2 (set (reg:SI 0 r0)
>         (reg:SI 118)) reduced.c:14 -1
>      (nil))
>(call_insn 22 21 23 2 (parallel [
>             (set (reg:SI 0 r0)
>                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41] 
><function_decl 0x2ba57f843d00 __builtin_printf>) [0 __builtin_printf S4
>A32])
>
>r0 still gets the format string, but 'int b1.a' now goes in r2, and the
>
>double+following int are all pushed into virtual-outgoing-args. This is
>because 
>arm_function_arg is fed a 64-bit-aligned int as type of the second
>argument (the 
>type constructed by build_ref_for_offset); it then executes
>(aapcs_layout_arg, 
>arm.c line ~~5914)
>
>   /* C3 - For double-word aligned arguments, round the NCRN up to the
>      next even number.  */
>   ncrn = pcum->aapcs_ncrn;
>   if ((ncrn & 1) && arm_needs_doubleword_align (mode, type))
>     ncrn++;
>
>Which changes r1 to r2. Passing -fno-tree-sra, or removing from the
>testcase 
>"*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a 
>32-bit-aligned int instead, which works as previously.
>
>Passing the same members of that struct in a non-vargs call, works ok -
>I think 
>because these use the type of the declared parameters, rather than the
>provided 
>arguments, and the former do not have the increased alignment from 
>build_ref_for_offset.


It doesn't make sense to use the alignment of passed values.  That looks like 
bs.

This means that

Int I __aligned__(8);

Is passed differently than int.

Arm_function_arg needs to be fixed.

Richard.

>
>FWIW, I also tried:
>
>__attribute__((__aligned__((16)))) int x;
>int main (void)
>{
>   __builtin_printf("%d\n", x);
>}
>
>but in that case, the arm_function_arg is still fed a type with
>alignment 32 
>(bits), i.e. distinct from the type of the field 'x' in memory, which
>has 
>alignment 128.
>
>--Alan
>
>Richard Biener wrote:
>> On Mon, 30 Mar 2015, Richard Biener wrote:
>> 
>>> On Mon, 30 Mar 2015, Alan Lawrence wrote:
>>>
>>>> ...actually attach the testcase...
>>> What compile options?
>> 
>> Just tried -O2.  The GIMPLE IL assumes 64bit alignment of .LC0 but
>> I can't see anything not guaranteeing that:
>> 
>>         .section        .rodata
>>         .align  3
>> .LANCHOR0 = . + 0
>> .LC1:
>>         .ascii  "%d %g %d\012\000"
>>         .space  6
>> .LC0:
>>         .word   7
>>         .space  4
>>         .word   0
>>         .word   1075838976
>>         .word   9
>>         .space  4
>> 
>> maybe there is some more generic code-gen bug for aligned aggregate
>> copy?  That is, the patch tells the backend that the loads and
>> stores to the 'int' vars (which have padding followed) is aligned
>> to 8 bytes.
>> 
>> I don't see what is wrong in the final assembler, but maybe
>> some endian issue exists?  The code looks quite ugly though ;)
>> 
>> Richard.
>> 
>>>> Alan Lawrence wrote:
>>>>> We've been seeing a bunch of new failures in the *libffi*
>testsuite on ARM
>>>>> Linux (arm-none-linux-gnueabi, arm-none-linux-gnueabihf),
>following this
>>>>> one-liner fix. I've reduced the testcase down to the attached
>(including
>>>>> removing any dependency on libffi); with gcc r221347, this prints
>the
>>>>> expected
>>>>> 7 8 9
>>>>> whereas with gcc r221348, instead it prints
>>>>> 0 8 0
>>>>>
>>>>> The action of r221348 is to change the alignment of a mem_ref, and
>a
>>>>> var_decl of b1, from 32 to 64; both have type
>>>>>   type <record_type 0x2b9b8d428d20 cls_struct_16byte
>sizes-gimplified type_0
>>>>> BLK
>>>>>          size <integer_cst 0x2b9b8d3720a8 constant 192>
>>>>>          unit size <integer_cst 0x2b9b8d372078 constant 24>
>>>>>          align 64 symtab 0 alias set 1 canonical type
>0x2b9b8d428d20
>>>>>          fields <field_decl 0x2b9b8d42b098 a type <integer_type
>>>>> 0x2b9b8d092690 int>
>>>>>              SI file reduced.c line 12 col 7
>>>>>              size <integer_cst 0x2b9b8d08eeb8 constant 32>
>>>>>              unit size <integer_cst 0x2b9b8d08eed0 constant 4>
>>>>>              align 32 offset_align 64
>>>>>              offset <integer_cst 0x2b9b8d08eee8 constant 0>
>>>>>              bit offset <integer_cst 0x2b9b8d08ef48 constant 0>
>context
>>>>> <record_type 0x2b9b8d428d20 cls_struct_16byte> chain <field_decl
>>>>> 0x2b9b8d42b130 b>> context <translation_unit_decl 0x2b9b8d4232d0
>D.6070>
>>>>>          pointer_to_this <pointer_type 0x2b9b8d42d0a8> chain
><type_decl
>>>>> 0x2b9b8d42b000 D.6044>>
>>>>>
>>>>> The tree-optimized output is the same with both compilers (as this
>does not
>>>>> mention alignment); the expand output differs.
>>>>>
>>>>> Still investigating...
>>>>>
>>>>> --Alan
>>>>>
>>>>>
>>>>> Richard Biener wrote:
>>>>>> This fixes a vectorizer testcase regression on powerpc where SRA
>>>>>> drops alignment info unnecessarily.
>>>>>>
>>>>>> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
>>>>>>
>>>>>> Richard.
>>>>>>
>>>>>> 2015-03-11  Richard Biener  <[email protected]>
>>>>>>
>>>>>>  PR tree-optimization/65310
>>>>>>  * tree-sra.c (build_ref_for_offset): Also preserve larger
>>>>>>  alignment.
>>>>>>
>>>>>> Index: gcc/tree-sra.c
>>>>>>
>===================================================================
>>>>>> --- gcc/tree-sra.c       (revision 221324)
>>>>>> +++ gcc/tree-sra.c       (working copy)
>>>>>> @@ -1597,7 +1597,7 @@ build_ref_for_offset (location_t loc, tr
>>>>>>    misalign = (misalign + offset) & (align - 1);
>>>>>>    if (misalign != 0)
>>>>>>      align = (misalign & -misalign);
>>>>>> -  if (align < TYPE_ALIGN (exp_type))
>>>>>> +  if (align != TYPE_ALIGN (exp_type))
>>>>>>      exp_type = build_aligned_type (exp_type, align);
>>>>>>     mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base,
>off);
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>

Re: New regression on ARM Linux

Reply via email to