On 12/01/2010 02:14 PM, Paul Koning wrote:
On Nov 29, 2010, at 9:51 PM, Vladimir Makarov wrote:
On 11/29/2010 08:52 PM, Paul Koning wrote:
I'm doing some experiments to get to know GCC better, and something is puzzling
me.
I have defined an md file with DFA and costs describing the fact that loads
take a while (as do stores). Also, there is no memory to memory move, only
memory to/from register.
Test program is basically a=b; c=d; e=f; g=h;
Sched1, as expected, turns this into four loads followed by four stores,
exploiting the pipeline.
Then IRA kicks in. It shuffles the insns back into load/store, load/store
pairs, essentially the source code order. It looks like it's doing that to
reduce the number of registers used. Fair enough, but this makes the code less
efficient. I don't see a way to tell IRA not to do this.
Most probably that happens because of ira.c::update_equiv_regs. This function
was inherited from the old register allocator. The major goal of the function
is to find equivalent memory/constants/invariants for pseudos which can be used
by reload pass. Pseudo equivalence also affects live range splitting decision
in IRA.
Update_equiv_regs can also move insns initiating pseudo equivalences close to
the pseudo usage. You could try to prevent this and to see what happens. IMO
preventing such insn moving will do more harm on performance on SPEC benchmarks
for x86/x86-64 processors.
As it happens, there's a secondary reload involved: the loads are into one set
of registers but the stores from another, so a register to register move is
added in by reload. Does that explain the behavior? I tried changing the
cover_classes, but that doesn't make a difference.
It is hard to say without the dump file. If everything is correctly defined,
it should not happen.
I extended the test code a little, and fed it to a mips64el-elf targeted gcc.
It showed the same pattern in one of the two functions but not the other. The
test code is test8.c (attached).
What I see in the assembly output (test8.s, also attached) is that foo() has a
load then store then load then store pattern, which contradicts what sched1
constructed and doesn't take advantage of the pipeline. However, bar() does
use the pipeline. I don't know what's different between these two.
Do you want some dump file (which ones)? Or you could just reproduce this with
the current gcc, it's a standard target build. The compile was -O2
-mtune=mips64r2 -mabi=n32.
As I guessed the problem is in update_reg_equiv transformation
trying to move initialization insn close to its single use to decrease
the register pressure. A lot of people already complaint about
undoing scheduling by this function.
The following patch solves the problem when you use
-fsched-pressure. I would not like to do that for regular (not
register pressure-sensitive) insn scheduling for obvious reasons.
I think most RISC targets (including MIPS ones) should make
-fsched-pressure by default.
2010-12-02 Vladimir Makarov <vmaka...@redhat.com>
* ira.c (update_equiv_regs): Prohibit move insns if
pressure-sensitive scheduling was done.
Jeff, sorry for bothering you. Is it ok to commit the patch to the
trunk?
Index: ira.c
===================================================================
--- ira.c (revision 167373)
+++ ira.c (working copy)
@@ -2585,7 +2585,13 @@ update_equiv_regs (void)
rtx equiv_insn;
if (! reg_equiv[regno].replace
- || reg_equiv[regno].loop_depth < loop_depth)
+ || reg_equiv[regno].loop_depth < loop_depth
+ /* There is no sense to move insns if we did
+ register pressure-sensitive scheduling was
+ done because it will not improve allocation
+ but worsen insn schedule with a big
+ probability. */
+ || (flag_sched_pressure && flag_schedule_insns))
continue;
/* reg_equiv[REGNO].replace gets set only when