On 11 December 2013 19:25, Vladimir Makarov <[email protected]> wrote:
> On 12/11/2013, 5:35 AM, Yvan Roux wrote:
>>
>> Hi Vladimir,
>>
>> I've some regressions on ARM after this SP elimination patch, and they
>> are execution failures. Here is the list:
>>
>> g++.dg/cilk-plus/AN/array_test_ND_tplt.cc -O3 -fcilkplus
>> gcc.c-torture/execute/va-arg-22.c -O2
>> gcc.dg/atomic/c11-atomic-exec-5.c -O0
>> gfortran.dg/direct_io_12.f90 -O[23]
>> gfortran.dg/elemental_dependency_1.f90 -O2
>> gfortran.dg/matmul_2.f90 -O2
>> gfortran.dg/matmul_6.f90 -O2
>> gfortran.dg/mvbits_7.f90 -O3
>> gfortran.dg/unlimited_polymorphic_1.f03 -O3
>>
>> I reduced and looked at var-arg-22.c and the issue is that in
>> lra_eliminate_regs_1 (called by get_equiv_with_elimination) we
>> transformed sfp + 0x4c in sp + 0xfc because of a bad sp offset. What
>> we try to do here is to change the pseudo 195 of the insn 118 below :
>>
>> (insn 118 114 112 8 (set (reg:DI 195)
>> (unspec:DI [
>> (mem:DI (plus:SI (reg/f:SI 215)
>> (const_int 8 [0x8])) [7 MEM[(struct A35 *)_12
>> + 64B]+8 S8 A8])
>> ] UNSPEC_UNALIGNED_LOAD)) v2.c:49 146 {unaligned_loaddi}
>> (expr_list:REG_EQUIV (mem/c:DI (plus:SI (reg/f:SI 192)
>> (const_int 8 [0x8])) [7 a35+8 S8 A32])
>> (nil)))
>>
>> with its equivalent (x arg of lra_eliminate_regs_1):
>>
>> (mem/c:DI (plus:SI (reg/f:SI 102 sfp)
>> (const_int 76 [0x4c])) [7 a35+8 S8 A32])
>>
>> lra_eliminate_regs_1 is called with full_p = true (it is not really
>> clear for what it means),
>
>
> It means we use full offset between the regs, otherwise we use change in the
> full offset from the previous iteration (it can be changed as we reserve
> stack memory for spilled pseudos and the reservation can be done several
> times). As equiv value is stored as it was before any elimination, we need
> always to use full offset to make elimination.
Ok thanks it's clearer.
> but in the PLUS switch case, we have offset
>>
>> = 0xb (given by ep->offset) and as lra_get_insn_recog_data
>> (insn)->sp_offset value is 0, we will indeed add 0xb to the original
>> 0x4c offset.
>>
>
> 0 value is suspicious because it is default. We might have not set up it
> from neighbor insns.
>
>
>
>> So, here I don't get if it is the sp_offset value of the
>> lra_insn_recog_data element which is not well updated or if lra_
>> eliminate_regs_1 has to be called with update_p and not full_p (which
>> fixed the value in that particular case). Is it more obvious for you
>> ?
>>
>
> Yvan, could you send me the reduced preprocessed case and the options for
> cc1 to reproduce it.
Here is cc1 command line :
cc1 -quiet -march=armv7-a -mtune=cortex-a15 -mfloat-abi=hard
-mfpu=neon -mthumb v2.c -O2
I use a native build on a chromebook, but it's reproducible with a
cross compiler.
With the attached test case the issue is when processing insn 118.
Thanks,
Yvan
typedef __builtin_va_list __gnuc_va_list;
typedef __gnuc_va_list va_list;
extern void abort (void);
extern void exit (int);
void bar (int n, int c)
{
static int lastn = -1, lastc = -1;
if (lastn != n) {
if (lastc != lastn)
abort ();
lastc = 0;
lastn = n;
}
if (c != (char) (lastc ^ (n << 3)))
abort ();
lastc++;
}
typedef struct { char x[31]; } A31;
typedef struct { char x[32]; } A32;
typedef struct { char x[35]; } A35;
typedef struct { char x[72]; } A72;
void foo (int size, ...)
{
A31 a31;
A32 a32;
A35 a35;
A72 a72;
va_list ap;
int i;
if (size != 21) abort ();
__builtin_va_start(ap,size);
a31 = __builtin_va_arg(ap,typeof (a31));
for (i = 0; i < 31; i++) bar (31, a31.x[i]);
a32 = __builtin_va_arg(ap,typeof (a32));
for (i = 0; i < 32; i++) bar (32, a32.x[i]);
a35 = __builtin_va_arg(ap,typeof (a35));
for (i = 0; i < 35; i++) bar (35, a35.x[i]);
a72 = __builtin_va_arg(ap,typeof (a72));
for (i = 0; i < 72; i++) bar (72, a72.x[i]);
__builtin_va_end(ap);
}
int main (void)
{
A31 a31;
A32 a32;
A35 a35;
A72 a72;
int i;
for (i = 0; i < 31; i++) a31.x[i] = i ^ (31 << 3);
for (i = 0; i < 32; i++) a32.x[i] = i ^ (32 << 3);
for (i = 0; i < 35; i++) a35.x[i] = i ^ (35 << 3);
for (i = 0; i < 72; i++) a72.x[i] = i ^ (72 << 3);
foo (21, a31, a32, a35, a72);
exit (0);
}