Re: New regression on ARM Linux

Alan Lawrence Mon, 30 Mar 2015 09:46:12 -0700

-O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes it.

The problem appears to be in laying out arguments, specifically varargs. Fromthe "good" -fdump-rtl-expand:


(insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0  S4 A32])
        (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
     (nil))
(insn 19 18 20 2 (set (reg:DF 2 r2)
        (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
     (nil))
(insn 20 19 21 2 (set (reg:SI 1 r1)
        (reg:SI 113 [ b1 ])) reduced.c:14 -1
     (nil))
(insn 21 20 22 2 (set (reg:SI 0 r0)
        (reg:SI 118)) reduced.c:14 -1
     (nil))
(call_insn 22 21 23 2 (parallel [
            (set (reg:SI 0 r0)

(call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]<function_decl 0x2ab50e856d00 __builtin_printf>) [0 __builtin_printf S4 A32])


The struct members are
reg:SI 113 => int a;
reg:DF 112 => double b;
reg:SI 111 => int c;

r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is pushedinto virtual-outgoing-args. In contrast, post-change to build_ref_of_offset, we get:


(insn 17 16 18 2 (set (reg:SI 118)

(symbol_ref/v/f:SI ("*.LC1") [flags 0x82] <var_decl 0x2ba57fa8d750*.LC1>)) reduced.c:14 -1

     (nil))
(insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107 virtual-outgoing-args)
                (const_int 8 [0x8])) [0  S4 A64])
        (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
     (nil))
(insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0  S8 A64])
        (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
     (nil))
(insn 20 19 21 2 (set (reg:SI 2 r2)
        (reg:SI 113 [ b1 ])) reduced.c:14 -1
     (nil))
(insn 21 20 22 2 (set (reg:SI 0 r0)
        (reg:SI 118)) reduced.c:14 -1
     (nil))
(call_insn 22 21 23 2 (parallel [
            (set (reg:SI 0 r0)

(call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]<function_decl 0x2ba57f843d00 __builtin_printf>) [0 __builtin_printf S4 A32])

r0 still gets the format string, but 'int b1.a' now goes in r2, and thedouble+following int are all pushed into virtual-outgoing-args. This is becausearm_function_arg is fed a 64-bit-aligned int as type of the second argument (thetype constructed by build_ref_for_offset); it then executes (aapcs_layout_arg,arm.c line ~~5914)


  /* C3 - For double-word aligned arguments, round the NCRN up to the
     next even number.  */
  ncrn = pcum->aapcs_ncrn;
  if ((ncrn & 1) && arm_needs_doubleword_align (mode, type))
    ncrn++;

Which changes r1 to r2. Passing -fno-tree-sra, or removing from the testcase"*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a32-bit-aligned int instead, which works as previously.

Passing the same members of that struct in a non-vargs call, works ok - I thinkbecause these use the type of the declared parameters, rather than the providedarguments, and the former do not have the increased alignment frombuild_ref_for_offset.


FWIW, I also tried:

__attribute__((__aligned__((16)))) int x;
int main (void)
{
  __builtin_printf("%d\n", x);
}

but in that case, the arm_function_arg is still fed a type with alignment 32(bits), i.e. distinct from the type of the field 'x' in memory, which hasalignment 128.


--Alan

Richard Biener wrote:

On Mon, 30 Mar 2015, Richard Biener wrote:

On Mon, 30 Mar 2015, Alan Lawrence wrote:

...actually attach the testcase...

What compile options?


Just tried -O2.  The GIMPLE IL assumes 64bit alignment of .LC0 but
I can't see anything not guaranteeing that:

        .section        .rodata
        .align  3
.LANCHOR0 = . + 0
.LC1:
        .ascii  "%d %g %d\012\000"
        .space  6
.LC0:
        .word   7
        .space  4
        .word   0
        .word   1075838976
        .word   9
        .space  4

maybe there is some more generic code-gen bug for aligned aggregate
copy?  That is, the patch tells the backend that the loads and
stores to the 'int' vars (which have padding followed) is aligned
to 8 bytes.

I don't see what is wrong in the final assembler, but maybe
some endian issue exists?  The code looks quite ugly though ;)

Richard.

Alan Lawrence wrote:

We've been seeing a bunch of new failures in the *libffi* testsuite on ARM
Linux (arm-none-linux-gnueabi, arm-none-linux-gnueabihf), following this
one-liner fix. I've reduced the testcase down to the attached (including
removing any dependency on libffi); with gcc r221347, this prints the
expected
7 8 9
whereas with gcc r221348, instead it prints
0 8 0

The action of r221348 is to change the alignment of a mem_ref, and a
var_decl of b1, from 32 to 64; both have type
  type <record_type 0x2b9b8d428d20 cls_struct_16byte sizes-gimplified type_0
BLK
         size <integer_cst 0x2b9b8d3720a8 constant 192>
         unit size <integer_cst 0x2b9b8d372078 constant 24>
         align 64 symtab 0 alias set 1 canonical type 0x2b9b8d428d20
         fields <field_decl 0x2b9b8d42b098 a type <integer_type
0x2b9b8d092690 int>
             SI file reduced.c line 12 col 7
             size <integer_cst 0x2b9b8d08eeb8 constant 32>
             unit size <integer_cst 0x2b9b8d08eed0 constant 4>
             align 32 offset_align 64
             offset <integer_cst 0x2b9b8d08eee8 constant 0>
             bit offset <integer_cst 0x2b9b8d08ef48 constant 0> context
<record_type 0x2b9b8d428d20 cls_struct_16byte> chain <field_decl
0x2b9b8d42b130 b>> context <translation_unit_decl 0x2b9b8d4232d0 D.6070>
         pointer_to_this <pointer_type 0x2b9b8d42d0a8> chain <type_decl
0x2b9b8d42b000 D.6044>>

The tree-optimized output is the same with both compilers (as this does not
mention alignment); the expand output differs.

Still investigating...

--Alan


Richard Biener wrote:

This fixes a vectorizer testcase regression on powerpc where SRA
drops alignment info unnecessarily.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2015-03-11  Richard Biener  <rguent...@suse.de>

        PR tree-optimization/65310
        * tree-sra.c (build_ref_for_offset): Also preserve larger
        alignment.

Index: gcc/tree-sra.c
===================================================================
--- gcc/tree-sra.c      (revision 221324)
+++ gcc/tree-sra.c      (working copy)
@@ -1597,7 +1597,7 @@ build_ref_for_offset (location_t loc, tr
   misalign = (misalign + offset) & (align - 1);
   if (misalign != 0)
     align = (misalign & -misalign);
-  if (align < TYPE_ALIGN (exp_type))
+  if (align != TYPE_ALIGN (exp_type))
     exp_type = build_aligned_type (exp_type, align);
    mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base, off);

Re: New regression on ARM Linux

Reply via email to