On 12/02/10 15:17, Vladimir Makarov wrote:
On 12/01/2010 02:14 PM, Paul Koning wrote:
On Nov 29, 2010, at 9:51 PM, Vladimir Makarov wrote:
On 11/29/2010 08:52 PM, Paul Koning wrote:
I'm doing some experiments to get to know GCC better, and something
is puzzling me.
I have defined an md file with DFA and costs describing the fact
that loads take a while (as do stores). Also, there is no memory to
memory move, only memory to/from register.
Test program is basically a=b; c=d; e=f; g=h;
Sched1, as expected, turns this into four loads followed by four
stores, exploiting the pipeline.
Then IRA kicks in. It shuffles the insns back into load/store,
load/store pairs, essentially the source code order. It looks like
it's doing that to reduce the number of registers used. Fair
enough, but this makes the code less efficient. I don't see a way
to tell IRA not to do this.
Most probably that happens because of ira.c::update_equiv_regs.
This function was inherited from the old register allocator. The
major goal of the function is to find equivalent
memory/constants/invariants for pseudos which can be used by reload
pass. Pseudo equivalence also affects live range splitting decision
in IRA.
Update_equiv_regs can also move insns initiating pseudo equivalences
close to the pseudo usage. You could try to prevent this and to see
what happens. IMO preventing such insn moving will do more harm on
performance on SPEC benchmarks for x86/x86-64 processors.
As it happens, there's a secondary reload involved: the loads are
into one set of registers but the stores from another, so a
register to register move is added in by reload. Does that explain
the behavior? I tried changing the cover_classes, but that doesn't
make a difference.
It is hard to say without the dump file. If everything is correctly
defined, it should not happen.
I extended the test code a little, and fed it to a mips64el-elf
targeted gcc. It showed the same pattern in one of the two functions
but not the other. The test code is test8.c (attached).
What I see in the assembly output (test8.s, also attached) is that
foo() has a load then store then load then store pattern, which
contradicts what sched1 constructed and doesn't take advantage of the
pipeline. However, bar() does use the pipeline. I don't know what's
different between these two.
Do you want some dump file (which ones)? Or you could just reproduce
this with the current gcc, it's a standard target build. The compile
was -O2 -mtune=mips64r2 -mabi=n32.
As I guessed the problem is in update_reg_equiv transformation
trying to move initialization insn close to its single use to decrease
the register pressure. A lot of people already complaint about
undoing scheduling by this function.
The following patch solves the problem when you use
-fsched-pressure. I would not like to do that for regular (not
register pressure-sensitive) insn scheduling for obvious reasons.
I think most RISC targets (including MIPS ones) should make
-fsched-pressure by default.
2010-12-02 Vladimir Makarov <vmaka...@redhat.com>
* ira.c (update_equiv_regs): Prohibit move insns if
pressure-sensitive scheduling was done.
Jeff, sorry for bothering you. Is it ok to commit the patch to the
trunk?
It seems fairly reasonable to me, at least in the short term.
ISTM that longer term we'd want to do these transformations when we're
unable to allocate the affected pseudos to hard regs. ie, leave the
schedule alone unless it results in an inability to get a reasonable
allocation
jeff