https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62662
Bug ID: 62662
Summary: [4.9/5 Regression] Miscompilation of Qt on s390x
Product: gcc
Version: 4.9.1
Status: UNCONFIRMED
Keywords: wrong-code
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jakub at gcc dot gnu.org
CC: krebbel at gcc dot gnu.org, uweigand at gcc dot gnu.org
Target: s390x-linux
struct A
{
int a;
void b ();
};
int c;
struct B
{
struct C { A d; };
static C e;
};
struct D
{
B::C *d;
D () : d (&B::e) { d->d.b (); }
};
struct F
{
F (int *);
int *f;
D g;
};
void A::b ()
{
volatile int x;
__asm__("" : "=&d" (c), "=&d" (x), "=m" (*&a) : "a" (&a), "dm" (0));
}
F::F (int *p1) : f (p1) {}
compiled with -m64 -O2 -fvisibility=hidden -fPIC -march=z9-109 -mtune=z10
results in wrong-code:
stg %r15,120(%r15)
stg %r3,0(%r2)
.LCFI3:
lay %r15,-168(%r15)
.LCFI4:
larl %r4,c
larl %r1,_ZN1B1eE@GOTENT
lhi %r5,0
lg %r1,0(%r1)
stg %r1,8(%r2)
st %r2,0(%r4)
st %r3,164(%r15)
lg %r4,280(%r15)
lg %r15,288(%r15)
.LCFI5:
br %r4
The %r14 return register is not saved to the return register stack slot in the
prologue, but is restored from it in the epilogue and jumped to.
>From quick look at this, this seems to be because of bad interaction in between
/* Fetch return address from stack before load multiple,
this will do good for scheduling. */
code in s390_emit_epilogue and s390_optimize_prologue. If
cfun_frame_layout.save_return_addr_p is false, and first register needing save
during s390_emit_epilogue is below 13 and last above 14, then
s390_emit_epilogue attempts to optimize and loads the return register into
typically %r4 before doing the load multiple insn, but s390_optimize_prologue
doesn't take this into account, it will remove the store multiple insn in the
prologue if at that point only %r15 needs saving, and also replace the load
multiple, but keep loading from return register stack slot into %r4 and return
to %r4.
So, either s390_optimize_prologue needs to adjust also the return insn (and
remove the load of the return address from stack slot if DSE can't handle it?),
or we need to arrange in this case that s390_optimize_prologue doesn't optimize
the return address store away.