-O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes it.
The problem appears to be in laying out arguments, specifically varargs. From
the "good" -fdump-rtl-expand:
(insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0 S4 A32])
(reg:SI 111 [ b1$16 ])) reduced.c:14 -1
(nil))
(insn 19 18 20 2 (set (reg:DF 2 r2)
(reg:DF 112 [ b1$8 ])) reduced.c:14 -1
(nil))
(insn 20 19 21 2 (set (reg:SI 1 r1)
(reg:SI 113 [ b1 ])) reduced.c:14 -1
(nil))
(insn 21 20 22 2 (set (reg:SI 0 r0)
(reg:SI 118)) reduced.c:14 -1
(nil))
(call_insn 22 21 23 2 (parallel [
(set (reg:SI 0 r0)
(call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
<function_decl 0x2ab50e856d00 __builtin_printf>) [0 __builtin_printf S4 A32])
The struct members are
reg:SI 113 => int a;
reg:DF 112 => double b;
reg:SI 111 => int c;
r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is pushed
into virtual-outgoing-args. In contrast, post-change to build_ref_of_offset, we get:
(insn 17 16 18 2 (set (reg:SI 118)
(symbol_ref/v/f:SI ("*.LC1") [flags 0x82] <var_decl 0x2ba57fa8d750
*.LC1>)) reduced.c:14 -1
(nil))
(insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107 virtual-outgoing-args)
(const_int 8 [0x8])) [0 S4 A64])
(reg:SI 111 [ b1$16 ])) reduced.c:14 -1
(nil))
(insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0 S8 A64])
(reg:DF 112 [ b1$8 ])) reduced.c:14 -1
(nil))
(insn 20 19 21 2 (set (reg:SI 2 r2)
(reg:SI 113 [ b1 ])) reduced.c:14 -1
(nil))
(insn 21 20 22 2 (set (reg:SI 0 r0)
(reg:SI 118)) reduced.c:14 -1
(nil))
(call_insn 22 21 23 2 (parallel [
(set (reg:SI 0 r0)
(call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
<function_decl 0x2ba57f843d00 __builtin_printf>) [0 __builtin_printf S4 A32])
r0 still gets the format string, but 'int b1.a' now goes in r2, and the
double+following int are all pushed into virtual-outgoing-args. This is because
arm_function_arg is fed a 64-bit-aligned int as type of the second argument (the
type constructed by build_ref_for_offset); it then executes (aapcs_layout_arg,
arm.c line ~~5914)
/* C3 - For double-word aligned arguments, round the NCRN up to the
next even number. */
ncrn = pcum->aapcs_ncrn;
if ((ncrn & 1) && arm_needs_doubleword_align (mode, type))
ncrn++;
Which changes r1 to r2. Passing -fno-tree-sra, or removing from the testcase
"*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a
32-bit-aligned int instead, which works as previously.
Passing the same members of that struct in a non-vargs call, works ok - I think
because these use the type of the declared parameters, rather than the provided
arguments, and the former do not have the increased alignment from
build_ref_for_offset.
FWIW, I also tried:
__attribute__((__aligned__((16)))) int x;
int main (void)
{
__builtin_printf("%d\n", x);
}
but in that case, the arm_function_arg is still fed a type with alignment 32
(bits), i.e. distinct from the type of the field 'x' in memory, which has
alignment 128.
--Alan
Richard Biener wrote:
On Mon, 30 Mar 2015, Richard Biener wrote:
On Mon, 30 Mar 2015, Alan Lawrence wrote:
...actually attach the testcase...
What compile options?
Just tried -O2. The GIMPLE IL assumes 64bit alignment of .LC0 but
I can't see anything not guaranteeing that:
.section .rodata
.align 3
.LANCHOR0 = . + 0
.LC1:
.ascii "%d %g %d\012\000"
.space 6
.LC0:
.word 7
.space 4
.word 0
.word 1075838976
.word 9
.space 4
maybe there is some more generic code-gen bug for aligned aggregate
copy? That is, the patch tells the backend that the loads and
stores to the 'int' vars (which have padding followed) is aligned
to 8 bytes.
I don't see what is wrong in the final assembler, but maybe
some endian issue exists? The code looks quite ugly though ;)
Richard.
Alan Lawrence wrote:
We've been seeing a bunch of new failures in the *libffi* testsuite on ARM
Linux (arm-none-linux-gnueabi, arm-none-linux-gnueabihf), following this
one-liner fix. I've reduced the testcase down to the attached (including
removing any dependency on libffi); with gcc r221347, this prints the
expected
7 8 9
whereas with gcc r221348, instead it prints
0 8 0
The action of r221348 is to change the alignment of a mem_ref, and a
var_decl of b1, from 32 to 64; both have type
type <record_type 0x2b9b8d428d20 cls_struct_16byte sizes-gimplified type_0
BLK
size <integer_cst 0x2b9b8d3720a8 constant 192>
unit size <integer_cst 0x2b9b8d372078 constant 24>
align 64 symtab 0 alias set 1 canonical type 0x2b9b8d428d20
fields <field_decl 0x2b9b8d42b098 a type <integer_type
0x2b9b8d092690 int>
SI file reduced.c line 12 col 7
size <integer_cst 0x2b9b8d08eeb8 constant 32>
unit size <integer_cst 0x2b9b8d08eed0 constant 4>
align 32 offset_align 64
offset <integer_cst 0x2b9b8d08eee8 constant 0>
bit offset <integer_cst 0x2b9b8d08ef48 constant 0> context
<record_type 0x2b9b8d428d20 cls_struct_16byte> chain <field_decl
0x2b9b8d42b130 b>> context <translation_unit_decl 0x2b9b8d4232d0 D.6070>
pointer_to_this <pointer_type 0x2b9b8d42d0a8> chain <type_decl
0x2b9b8d42b000 D.6044>>
The tree-optimized output is the same with both compilers (as this does not
mention alignment); the expand output differs.
Still investigating...
--Alan
Richard Biener wrote:
This fixes a vectorizer testcase regression on powerpc where SRA
drops alignment info unnecessarily.
Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
Richard.
2015-03-11 Richard Biener <rguent...@suse.de>
PR tree-optimization/65310
* tree-sra.c (build_ref_for_offset): Also preserve larger
alignment.
Index: gcc/tree-sra.c
===================================================================
--- gcc/tree-sra.c (revision 221324)
+++ gcc/tree-sra.c (working copy)
@@ -1597,7 +1597,7 @@ build_ref_for_offset (location_t loc, tr
misalign = (misalign + offset) & (align - 1);
if (misalign != 0)
align = (misalign & -misalign);
- if (align < TYPE_ALIGN (exp_type))
+ if (align != TYPE_ALIGN (exp_type))
exp_type = build_aligned_type (exp_type, align);
mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base, off);