On Nov 29, 2010, at 9:51 PM, Vladimir Makarov wrote:

> On 11/29/2010 08:52 PM, Paul Koning wrote:
>> I'm doing some experiments to get to know GCC better, and something is 
>> puzzling me.
>> 
>> I have defined an md file with DFA and costs describing the fact that loads 
>> take a while (as do stores). Also, there is no memory to memory move, only 
>> memory to/from register.
>> 
>> Test program is basically a=b; c=d; e=f; g=h;
>> 
>> Sched1, as expected, turns this into four loads followed by four stores, 
>> exploiting the pipeline.
>> 
>> Then IRA kicks in.  It shuffles the insns back into load/store, load/store 
>> pairs, essentially the source code order.  It looks like it's doing that to 
>> reduce the number of registers used.  Fair enough, but this makes the code 
>> less efficient.  I don't see a way to tell IRA not to do this.
>> 
> Most probably that happens because of ira.c::update_equiv_regs.   This 
> function was inherited from the old register allocator.  The major goal of 
> the function is to find equivalent memory/constants/invariants for pseudos 
> which can be used by reload pass.  Pseudo equivalence also affects live range 
> splitting decision in IRA.
> 
> Update_equiv_regs can also move insns initiating pseudo equivalences close to 
> the pseudo usage.  You could try to prevent this and to see what happens.  
> IMO preventing such insn moving will do more harm on performance on SPEC 
> benchmarks for x86/x86-64 processors.
>> As it happens, there's a secondary reload involved: the loads are into one 
>> set of registers but the stores from another, so a register to register move 
>> is added in by reload.  Does that explain the behavior?  I tried changing 
>> the cover_classes, but that doesn't make a difference.
>> 
> It is hard to say without the dump file.  If everything is correctly defined, 
> it should not happen.
> 

I extended the test code a little, and fed it to a mips64el-elf targeted gcc.  
It showed the same pattern in one of the two functions but not the other.  The 
test code is test8.c (attached).

What I see in the assembly output (test8.s, also attached) is that foo() has a 
load then store then load then store pattern, which contradicts what sched1 
constructed and doesn't take advantage of the pipeline.  However, bar() does 
use the pipeline.  I don't know what's different between these two.

Do you want some dump file (which ones)?  Or you could just reproduce this with 
the current gcc, it's a standard target build.  The compile was -O2 
-mtune=mips64r2 -mabi=n32.

        paul

Attachment: test8.c
Description: Binary data

Attachment: test8.s
Description: Binary data

Reply via email to