Hi Vladimir,

I've been working on code size improvements for mips16 and have been pleased to 
see some improvement when switching to use LRA instead of classic reload. At 
the same time though I have also seen some differences between reload and LRA 
in terms of how efficiently reload registers are reused.

The trigger for LRA to underperform compared with classic reload is when IRA 
allocates inappropriate registers and thus puts a lot of stress on reloading. 
Mips16 showed this because it can only access a small subset of the MIPS 
registers for general instructions. The remaining MIPS registers are still 
available as they can be accessed by some special instructions and used via 
move instructions as temporaries. In the current mips16 backend, register move 
costings lead IRA to determine that although the preferred class for most 
pseudos is M16_REGS, the allocno class ends up as GR_REGS. IRA then resorts to 
allocating registers outside of M16_REGS more and more as register pressure 
increases, even though this is fairly stupid. 

When using classic reload the inappropriate register allocations are 
effectively reverted as the reload pseudos that get invented tend to all 
converge on the same hard register completely removing the original pseudo. For 
LRA the reloads tend to diverge and different hard registers are assigned to 
the reload pseudos leaving us with two new pseudos and the original. Two extra 
move instructions and two extra hard registers used. While I'm not saying it is 
LRA's fault for not fixing this situation perfectly it does seem that classic 
reload is better at it.

I have found a potential solution to the original IRA register allocation 
problem but I think there may still be something to address in LRA to improve 
this scenario anyway. My proposed solution to the IRA problem for mips16 is to 
adjust register move costings such that the total of moving between M16_REGS 
and GR_REGS and back is more expensive than memory, but moving from GR_REGS to 
GR_REGS is cheaper than memory (even though this is a bit weird as you have to 
go through an M16_REG to move from one GR_REG to another GR_REG).

GR_REGS to GR_REGS has to be cheaper than memory as it needs to be a candidate 
pressure class but the additional cost for M16->GR->M16 means that IRA does not 
use GR_REGS as an alternative class and the allocno class is just M16_REGS as 
desired. This feels a bit like a hack but may be the best solution. The hard 
register costings used when allocating registers from an allocno class just 
don't seem to be strong enough to prevent poor register allocation in this 
case, I don't know if the hard register costs are supposed to resolve this 
issue or if they are just about fine tuning.

With the fix in place, LRA outperforms classic reload which is fantastic!

I have a small(ish) test case for this and dumps for IRA, LRA and classic 
reload along with the patch to enable LRA for mips16. I can also provide the 
fix to register costing that effectively avoids/hides this problem for mips16. 
Should I post them here or put them in a bugzilla ticket?

Any advice on which area needs fixing would be welcome and I am quite happy to 
work on this given some direction. I suspect these issues are relevant for any 
architecture that is not 100% orthogonal which is pretty much all and 
particularly important for compressed instruction sets.

Regards,
Matthew

--
Matthew Fortune
Leading Software Design Engineer, MIPS processor IP
Imagination Technologies Limited
t: +44 (0)113 242 9814
www.imgtec.com


Reply via email to