Re: Register Allocation issues with microblaze-elf

Michael Veksler Thu, 14 Feb 2013 00:46:53 -0800

On 02/14/2013 03:28 AM, Vladimir Makarov wrote:

On 13-02-13 6:36 PM, Michael Eager wrote:

[snip]

I thought about register pressure causing this, but I think thatshould causespilling of one of the registers which were not used in this longsequence,
rather than causing a large number of additional loads.
Longer living pseudos has more probability to be spilled as theyusually conflicts with more pseudos during their lives and spillingthem frees a hard reg for many conflicting pseudos. That how RAheuristics work (in the old RA log (live range span) was used. Thebigger the value, the more probability for spilling).
Perhaps the cost analysis has a problem.
I've checked it and it looks ok to me.
RA focused on generation of faster code. Looking at the fragment youprovided it, it is hard to saysomething about it. I tried -Os for gcc4.8 and it generatesdesirable code for the fragment inquestion (by the way the peak register pressure decreased to 66 inthis case).
It's both larger and slower, since the additional loads take muchlonger. I'll take a
look at -Os.
It looks like the values of p++ are being pre-calculated and storedon the stack. This results in
a load, rather than an increment of a register.
If it is so. It might be another optimization which moves p++calculations higher. IRA does not do it (more correctly a new IRAfeature implemented by Bernd Schmidt in gcc4.7 can move insnsdownwards in CFG graph to decrease reg pressure).
I checked all rtl passes these calcualtions are not created by any RTLpass. So it is probably some tree-ssa optimization.
Any industrial RA uses heuristic algorithms, in some cases betterheuristics can work worse thanworse heuristics. So you should probably check is there anyprogress moving from gcc4.1 to gcc4.6with performance point of view for variety benchmarks. IntroducingIRA improves code for x86 4% onSPEC2000. Subsequent improving (like using dynamic registerclasses) made further performance
improvements.
My impression is that the performance is worse. Reports I've seenare that the code is
substantially larger, which means more instructions.
I'm skeptical about comparisons between x86 and RISC processors. Whatworks well
for one may not work well for the other.
IRA improved code for many RISC processors. Although tetter RA hassmaller effect for these processors as they have more registers.
Looking at the test code, I can make some conclusions for myself:
o We need a common pass decreasing reg pressure (I already expressedthis in the past) asoptimizations become more aggressive. Some progress was made tomake few optimizations aware aboutRA (reg-pressure scheduling, loop-invariant motions, and codehoisting) but there are too manypasses and it is wrong and impossible to make them all aware of RA.Some register pressuredecreasing heuristics are difficult to implement in RA (like insnrearrangements or complex
rematerialization) and this pass could focus on them.
That might be useful.
o Implement RA live range splitting in regions different from loopsor BB (now IRA makes splittingonly on loop bounds and LRA in BB, the old RA had no live rangesplitting at all).
Each of the blocks of code is in it's own BB. I haven't checked, butI'd guessthat most of the registers are in use on entry and still live onexit, so the
block has no registers to allocate.
Splitting in BB scope this case is not profitable.
I'd also recommend to try the following options concerning RA:-fira-loop-pressure,-fsched-pressure, -fira-algorithm=CB|priority,-fira-region=one,all,mixed. Actually-fira-algorithm=priority + -fira-region=one is analog of what theold RA did.

I am reading this thread and getting more and more puzzled. The RA stuffis very complicated,having many constraints and many dependencies with other passes. Takingthis intoaccount, it seems that no heuristic algorithm can even get close to anoptimal registerallocation. A heuristic algorithm can't take all effects andside-effects into account

simultaneously.

Considering all that, why GCC does not use generic optimizationalgorithms for RA?A generic optimization can take all issues into account, simultaneously.I am talkingabout ILP/MIP (Integer Linear Programming/Mixed IP), SAT and CSP [1](ConstraintSatisfaction Problem). There has been a lot of progress in these areas,and the

solvers are much faster (orders of magnitude) than they were 10 years ago.

So why ILP/SAT/CSP aren't used in RA? I don't think they will work muchslower thanwhat RA does today ( they will be somewhat slower but not much). I dobelieve thatthere is a good chance to get much better results from RA with thesetechnologies.

I'd like to invest some time into a feasibility check, if someone iswilling to workwith me on modeling the RA problem (or at least several examples, suchas the above).


Michael

[1] Even though CSP and SAT solving algorithms (systematic orstochastic) are notstrictly optimization algorithms, they still can be used whereoptimization is needed.It is very difficult/time consuming to get an optimal solutionwith them, yet it is possible

      to get a good-enough solution in reasonable time.

Re: Register Allocation issues with microblaze-elf

Reply via email to