On Nov 29, 2010, at 9:51 PM, Vladimir Makarov wrote: > On 11/29/2010 08:52 PM, Paul Koning wrote: >> I'm doing some experiments to get to know GCC better, and something is >> puzzling me. >> >> I have defined an md file with DFA and costs describing the fact that loads >> take a while (as do stores). Also, there is no memory to memory move, only >> memory to/from register. >> >> Test program is basically a=b; c=d; e=f; g=h; >> >> Sched1, as expected, turns this into four loads followed by four stores, >> exploiting the pipeline. >> >> Then IRA kicks in. It shuffles the insns back into load/store, load/store >> pairs, essentially the source code order. It looks like it's doing that to >> reduce the number of registers used. Fair enough, but this makes the code >> less efficient. I don't see a way to tell IRA not to do this. >> > Most probably that happens because of ira.c::update_equiv_regs. This > function was inherited from the old register allocator. The major goal of > the function is to find equivalent memory/constants/invariants for pseudos > which can be used by reload pass. Pseudo equivalence also affects live range > splitting decision in IRA. > > Update_equiv_regs can also move insns initiating pseudo equivalences close to > the pseudo usage. You could try to prevent this and to see what happens. > IMO preventing such insn moving will do more harm on performance on SPEC > benchmarks for x86/x86-64 processors. >> As it happens, there's a secondary reload involved: the loads are into one >> set of registers but the stores from another, so a register to register move >> is added in by reload. Does that explain the behavior? I tried changing >> the cover_classes, but that doesn't make a difference. >> > It is hard to say without the dump file. If everything is correctly defined, > it should not happen. >
I extended the test code a little, and fed it to a mips64el-elf targeted gcc. It showed the same pattern in one of the two functions but not the other. The test code is test8.c (attached). What I see in the assembly output (test8.s, also attached) is that foo() has a load then store then load then store pattern, which contradicts what sched1 constructed and doesn't take advantage of the pipeline. However, bar() does use the pipeline. I don't know what's different between these two. Do you want some dump file (which ones)? Or you could just reproduce this with the current gcc, it's a standard target build. The compile was -O2 -mtune=mips64r2 -mabi=n32. paul
test8.c
Description: Binary data
test8.s
Description: Binary data