https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65862
--- Comment #7 from Vladimir Makarov <vmakarov at gcc dot gnu.org> --- (In reply to Robert Suchanek from comment #5) > Sorry for late reply, I was on vacation. > > > The costs are equal if cost of moving general regs to/from fp regs or > > memory are equal. So it looks ok to me. > > > > r218 spilled in IRA is reassigned to a fp reg in *LRA*. > > > But I could try to use preferred class in LRA (after checking how it > > affects x86/x86-64 performance), if such solution is ok for you. > > Indeed, the above test case only shows the problem in LRA. If the preferred > class would be the winner then why not. However, there are still some issues > with IRA and I have another testcase to show it. > > > I am not sure, that the result code is better as we access memory 3 > > times instead of access to $f20. > > On one hand, yes, it seems good but it's not always desirable to use FP regs > until absolutely necessary. For instance, compiling the dynamic linker that > uses FP regs does not seem to be right. > > I had another thought about spilling into registers and how we could > guarantee > spilling into the desirable class. In the majority of cases where integers > end up > in floating-point registers, I see the following in the dumps: > ... > Reassigning non-reload pseudos > Assign 52 to r217 (freq=46) > ... > If we use preferred class instead of allocno one, the problem goes away. Of course, if IRA does not assign ALL_REGS to the preferred class. > > In the remaining 5% cases, IRA assigns FP regs with LRA blindly following > IRA's decisions like in the following reduced case: > I guess the same would happen for reload pass too, as it reassigns pseudos spilled in reload when IRA assigns ALL_REGS to preferred class. > int a, b, d, e, j, k, n, o; > unsigned c, h, i, l, m, p; > int *f; > int *g; > int fn1(int p1) { return p1 - a; } > > int fn2() { > b = b + 1 - a; > e = 1 + o + 1518500249; > d = d + n; > c = (int)c + g[0]; > b = b + m + 1; > d = d + p + 1518500249; > d = d + k - 1; > c = fn1(c + j + 1518500249); > e = fn1(e + i + 1); > d = d + h + 1859775393 - a; > c = fn1(c + (d ^ 1 ^ b) + g[1] + 1); > b = fn1(b + m + 3); > d = fn1(d + l + 1); > b = b + (c ^ 1) + p + 1; > e = fn1(e + (b ^ c ^ d) + n + 1); > d = o; > b = 0; > e = e + k + 1859775393; > f[0] = e; > } > > I'm not sure how this could be fixed in LRA and again this is related to > ALL_REGS for allocnos. Perhaps changing the class for reloads to the spill > class in LRA would do the trick but it may have other problems. > My last attempt was to increase the cost of FP_REGS in IRA for integral > modes (similar effect to increasing the costs of moving FP<>GR in the > backend) but the cost pass looks complicated and I'm not entirely sure where > to tweak it. Any suggestions/ideas? > Currently, I see the solution in introducing a target hook which can narrow allocno class for given pseudo in IRA. For pseudo of non fp-mode, it should narrow ALL_REGS to general regs in MIPS case. Actually, I was already asked for such hook from ARM people but their case was not convincible. > > I tried reverting the ALL_REGS patch and I don't see any regressions - in > > fact allocations are slightly better (fewer registers with ALL_REGS > > preference which is what we need - a strong decision to allocate to either > > FP or int regs). So what was the motivation for it? > > AFAICS, the aim was to fix the code generation regression for x86. x86 > doesn't seem to be as much affected as others. I did not notice code size > differences with -O2 and default arch for x86_64-unknown-linux-gnu triplet > and CSiBE benchmark, -Os showed some minor improvements/regression with the > largest difference in mpeg2dec-0.3.1 yielding ~0.3% improvement. I haven't > evaluated performance changes. > > For MIPS, I also saw allocation improvements, more erratic than x86 with > improvement about 0.5% on average. Reverting the patch does bring the old > issue back but I wonder what is the impact of it and whether it is a > justifiable fix to the extent it outweights the disadvantages. Or maybe the > original problem could be fixed differently? I'll try to investigate this more. But first, I'd like to make a patch for the new hook in order you evaluate it usefulness for MIPS. I hope to make it and send it to you tomorrow.