Hi Sterling,
Attached please find the testcase for the spill issue. Try it out with the
patch :-)
>
> On Wed, Oct 15, 2014 at 7:10 PM, Yangfei (Felix) <[email protected]>
> wrote:
> > Hi Sterling,
> >
> > Since the patch is delayed for a long time, I'm kind of pushing it.
> > Sorry for
> that.
> > Yeah, you are right. We have some performance issue here as GCC may
> use one more general register in some cases with this patch.
> > Take the following arraysum testcase for example. In doloop
> > optimization,
> GCC figures out that the number of iterations is 1024 and creates a new pseudo
> 79 as the new trip count register.
> > The pseudo 79 is live throughout the loop, this makes the register
> pressure in the loop higher. And it's possible that this new pseudo is
> spilled by
> reload when the register pressure is very high.
> > I know that the xtensa loop instruction copies the trip count register
> > into
> the LCOUNT special register. And we need describe this hardware feature in GCC
> in order to free the trip count register.
> > But I find it difficult to do. Do you have any good suggestions on this?
>
> There are two issues related to the trip count, one I would like you to solve
> now,
> one later.
>
> 1. Later: The trip count doesn't need to be updated at all inside these
> loops, once
> the loop instruction executes. The code below relates to this case.
>
> 2. Now: You should be able to use a loop instruction regardless of whether the
> trip count is spilled. If you have an example where that wouldn't work, I
> would
> love to see it.
>
void
foo (unsigned f, long v, unsigned *w, unsigned a, unsigned b, unsigned e,
unsigned c, unsigned d)
{
unsigned h = v / 4, x[16];
while (f < h)
{
unsigned i;
f++;
a |= (a >> 30);
d = (d << 30) | ((unsigned) d >> 30);
c = (c << 30) | ((unsigned) c >> 30);
b = 30 | ((unsigned) b >> 30);
d += a = (a << 30) | ((unsigned) a >> 2);
c += ((d << 5) | (d >> 27)) + ((e & (a ^ b))) + 0x5a827999 + x[12];
a += (c & e);
c = 30 | ((unsigned) c);
i = x[5] ^ x[7] ^ x[8] ^ x[3];
x[5] = (i << 1) | ((unsigned) i >> 31);
i = x[6] ^ x[2] ^ x[14] ^ x[13];
x[6] = (i << 1) | (i >> 31);
b += (c | (c >> 5)) + (d ^ e) + 0x6ed9eba1 + (x[7] = (i << 1) |
((unsigned) i >> 31));
x[8] = i | 1;
e += (a | 5) + b + (i = x[9] ^ x[6], x[10] = (i << (unsigned) i));
e = 30 | ((unsigned) e >> 30);
i = x[12] ^ x[14] ^ x[12] ^ x[12], (x[12] = 1 | ((unsigned) i));
i = x[13] ^ x[5] ^ x[10], (x[13] = (i << (unsigned) 1));
i = x[2] ^ x[7] ^ x[12], (x[15] = i | ((unsigned) i >> 1));
i = x[2] ^ x[0] ^ x[13], (x[0] = (i << 1) | 31);
e = (e << 30) | 2;
i = x[14] ^ x[2] ^ x[15], (x[2] = i | 1);
x[3] = i | ((unsigned) i);
i = x[14] ^ x[12] ^ x[4], (x[4] = 1 | ((unsigned) i >> 1));
x[5] = i | 1;
e = (e << 30) | 30;
b += (5 | ((unsigned) e >> 5)) + 0x8f1bbcdc + (x[9] = (i | ((unsigned) i
>> 1)));
i = x[2] ^ (x [10] = ((i << 1) | (i >> 1)));
x[13] = (i | ((unsigned) i >> 1));
(i = x[14] ^ x[0] ^ x[14], (x[14] = ((i << 1) | 31)));
a = *w += a;
}
}