Shobaki, Ghassan wrote:
Thanks for everyone who has responded to this email thread!
That was quite informative for me although I am not yet familiar with
all the phases in GCC. I have worked on instruction scheduling and
register allocation in another compiler and am currently trying to
understand how things are done in GCC.
That is a hard part because there is no good documentation about reload
and haifa scheduler and this part of compiler (may be with combiner) is
the most complicated one, imho.
Can you guys kindly give me some
references or brief descriptions of the following:
1. Selective scheduling. I am not familiar with the term. Is this a
trace scheduler of some sort?
Originally described in
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.39.352
GCC implementation is described in 2007 and 2006 GCC summit proceedings
(see authors Belevantsev or Kuvyrkov).
2. Reload. Is this another name for the register allocation phase or
something else?
Please, see http://gcc.gnu.org/wiki/reload
3. Register allocation in GCC. Is it based on a graph-coloring algorithm
or a linear-scan algorithm or something else?
A new register allocator was recently included into GCC. The old one
was a combination of BB local allocation and priority-coloring based
RA. The new one is a regional Chaitin-Briggs RA.
The new register allocator (IRA) is described in 2007 GCC summit.
Comparison with Callahan-Koblentz algorithm can be found in
http://vmakarov.fedorapeople.org/vmakarov-submission-cgo2008.pdf
It looks that the answer to my question about fixing pre-pass scheduling
by adding register pressure awareness is that Vladimir is currently
working on it. Vladimir, can you please give us some brief description
of the approach that you are taking to solve this problem? If you can
refer me to some document that you have written up on this, that would
be great. Balancing the tradeoff between hiding latencies and minimizing
register pressure (shortening live ranges) is a very complex problem.
Ideally, a precise solution requires some kind of backtracking to undo
bad scheduling decisions. However, I think it may be possible to develop
a one-pass greedy algorithm that gives a good approximation in most
cases. My thinking is that an such an approximate greedy algorithm
(i.e., without backtracking) will probably do something like this:
The scheduler keeps track of the register pressure and then for each
instruction in the ready list, it computes what would the register
pressure be if this instruction was scheduled next. Based on that, it
selects the instruction that leads to the minimum register pressure.
Computing register pressure after the scheduling of an instruction
requires keeping track of the number of uses of each live range. If an
instruction has the last use of a live range, that should give it
priority over other instructions. This is very much what the existing
heuristic tries to do, but the existing heuristic is static. It does not
dynamically keep track of register pressure and the defs and uses of
each live range.
How does this relate to the solution that you are currently working on,
Vladimir?
I am trying several different algorithms. Originally, I was more
interesting in just usage of scheduler infrastructure to shrink live
ranges of pseudos for a better RA (which is the next pass). The second
one is a standard heuristic which can be found in Muchnic's book:
1. when register pressure is high, prefer insns which decreases the
pressure.
2. when register pressure is low, use standard insn scheduling heuristic.
Another algorithm for live-range shrinkage I am trying to restore
expression DAG and reorder insns in Sethi-Ulman enumeration style.
Do you know when you will have something working? Do you have any
initial performance results?
No and No. Compiler is still not bootstrapping successfully (because of
standard failure in reload). I hope to have something working in a month.