------- Comment #5 from hutchinsonandy at gcc dot gnu dot org 2008-04-13 00:33 ------- This bug has to do with reload and additional register conflicts introduced by register lowering.
In the smaller case, the register for 'a' is a call used register (often r22..r25). The avr backend code performs optimization of long OR, to give a single byte or. The worse code occurs because 'a' get assigned to call saved register. This 'long' register (r10..r13) has be pushed/popped by function. This register also cannot be used for immediate OR. So the code grows to load another register with long constant. The backend does not have any optimization for this. The difference in register allocation occurs as a side effect of wide-types. With -fno-split-wide-types' a SI (long) RTL move is used to place result 'a' ( psuedo register p48)into R22..r25. With 'split-wide-types' this is split into 4 individual QI (byte) moves of subregs p48[0]..p48[3] into R22,r23,r24 and r25 When global register allocator is trying to figure out which cpu register it should use for 'a', it looks for preferred type and conflicts. For 'wide-types' it can use preferred R22..25, with no conflict - so it does and you get small code. With split wide types, it wants to use R22..25 but it can see that the use of 'a' OVERLAPS the use of R22,R23,R24 across 3 instructions - next it tries R25 - which is not big enough or valid. The next available register of that size is R10. The conflict or overlapped access is technically incorrect. Reload is looking at p48 as a single entity rather than its subregs and is unable to spot that on a subreg basis there is no conflict. ie R22 does not conflict with p48[0], r23 does not conflict with p48[1] etc. Ok - thats what happens how do we fix? I have no idea (yet) how to deal with it directly in reload or subreg lowering. This would be best places as the problem is not confined to this testcase. ALL SUGGESTIONS WELCOME! With this problem, I noted several issues with AVR target that do not help. 1) The above example has enough free registers - the problem is that none of them are contiguous enough to hold the long value of 'a'. This is due to the fragmentation of the register set that occurs with the current allocation order. Changing the order can alleviate this. 2) Splitting logical operations would definitely remove the long OR with 1. I am not sure it would free any registers to remove the conflict. 3) Alternatively, optimisation of single byte OR on SI pattern could be done. The current *iorsi3_clobber is intended to do this but is impotent - it will not be matched by combine - or used as peephole - it needs fixin. Again, this may not help with the conflict. 4) The local register allocation was favoring LD_REGS for 't' - when any GENERAL_REG could be used. This is because *movqi pattern does not have constraint 'L' to allow GENERAL_REG 'r' to be loaded with zero. Same problem for movhi - but movsi is correct! (Alas it was not enough to free register.) Solving 1..3, would help but not cure this issue. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35860