LRA issue: integers spilled to floating-point registers

vmakarov at gcc dot gnu.org Thu, 07 May 2015 11:55:59 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65862


--- Comment #7 from Vladimir Makarov <vmakarov at gcc dot gnu.org> ---
(In reply to Robert Suchanek from comment #5)
> Sorry for late reply, I was on vacation.
> 
> > The costs are equal if cost of moving general regs to/from fp regs or
> > memory are equal.  So it looks ok to me.
> > 
> > r218 spilled in IRA is reassigned to a fp reg in *LRA*.  
> 
> > But I could try to use preferred class in LRA (after checking how it
> > affects x86/x86-64 performance), if such solution is ok for you.
> 
> Indeed, the above test case only shows the problem in LRA. If the preferred
> class would be the winner then why not. However, there are still some issues
> with IRA and I have another testcase to show it.
> 
> > I am not sure, that the result code is better as we access memory 3
> > times instead of access to $f20.
> 
> On one hand, yes, it seems good but it's not always desirable to use FP regs
> until absolutely necessary. For instance, compiling the dynamic linker that
> uses FP regs does not seem to be right.
> 
> I had another thought about spilling into registers and how we could
> guarantee
> spilling into the desirable class. In the majority of cases where integers
> end up
> in floating-point registers, I see the following in the dumps:
> ...
>       Reassigning non-reload pseudos
>                Assign 52 to r217 (freq=46)
> ...
> 

If we use preferred class instead of allocno one, the problem goes away.  Of
course, if IRA does not assign ALL_REGS to the preferred class.

> 
> In the remaining 5% cases, IRA assigns FP regs with LRA blindly following
> IRA's decisions like in the following reduced case:
> 

I guess the same would happen for reload pass too, as it reassigns pseudos
spilled in reload when IRA assigns ALL_REGS to preferred class.

> int a, b, d, e, j, k, n, o;
> unsigned c, h, i, l, m, p;
> int *f;
> int *g;
> int fn1(int p1) { return p1 - a; }
> 
> int fn2() {
>   b = b + 1 - a;
>   e = 1 + o + 1518500249;
>   d = d + n;
>   c = (int)c + g[0];
>   b = b + m + 1;
>   d = d + p + 1518500249;
>   d = d + k - 1;
>   c = fn1(c + j + 1518500249);
>   e = fn1(e + i + 1);
>   d = d + h + 1859775393 - a;
>   c = fn1(c + (d ^ 1 ^ b) + g[1] + 1);
>   b = fn1(b + m + 3);
>   d = fn1(d + l + 1);
>   b = b + (c ^ 1) + p + 1;
>   e = fn1(e + (b ^ c ^ d) + n + 1);
>   d = o;
>   b = 0;
>   e = e + k + 1859775393;
>   f[0] = e;
> }
> 
> I'm not sure how this could be fixed in LRA and again this is related to
> ALL_REGS for allocnos. Perhaps changing the class for reloads to the spill
> class in LRA would do the trick but it may have other problems.
> My last attempt was to increase the cost of FP_REGS in IRA for integral
> modes (similar effect to increasing the costs of moving FP<>GR in the
> backend) but the cost pass looks complicated and I'm not entirely sure where
> to tweak it. Any suggestions/ideas?
> 

Currently, I see the solution in introducing a target hook which can narrow
allocno class for given pseudo in IRA.  For pseudo of non fp-mode, it should
narrow ALL_REGS to general regs in MIPS case.

Actually, I was already asked for such hook from ARM people but their case was
not convincible. 


> > I tried reverting the ALL_REGS patch and I don't see any regressions - in
> > fact allocations are slightly better (fewer registers with ALL_REGS
> > preference which is what we need - a strong decision to allocate to either
> > FP or int regs). So what was the motivation for it?
> 
> AFAICS, the aim was to fix the code generation regression for x86. x86
> doesn't seem to be as much affected as others. I did not notice code size
> differences with -O2 and default arch for x86_64-unknown-linux-gnu triplet
> and CSiBE benchmark, -Os showed some minor improvements/regression with the
> largest difference in mpeg2dec-0.3.1 yielding ~0.3% improvement. I haven't
> evaluated performance changes.
> 
> For MIPS, I also saw allocation improvements, more erratic than x86 with
> improvement about 0.5% on average. Reverting the patch does bring the old
> issue back but I wonder what is the impact of it and whether it is a
> justifiable fix to the extent it outweights the disadvantages. Or maybe the
> original problem could be fixed differently?

I'll try to investigate this more.  But first, I'd like to make a patch for the
new hook in order you evaluate it usefulness for MIPS.  I hope to make it and
send it to you tomorrow.

[Bug rtl-optimization/65862] [MIPS] IRA/LRA issue: integers spilled to floating-point registers

Reply via email to