Opening caveats: This comes from a day of digging into why the Arduino environment wasn't working correctly on my Fedora 13 system with avr-gcc-4.5. I'm not a GCC hacker and not intending to become one, so forgive sloppy terminology and if the right fix is one of the larger things I mention below, someone else is going to have to do it. I didn't turn up any previous work on this other than people on the Arduino forums recommending downgrading to gcc-4.3, but I also didn't actively ask around, so I'm 3/4's expecting a fix for this to already exist somewhere on gcc-patches or on someone's hard drive. If so, at least I learned a bit more about GCC internals. :-)
OK, with that out of the way, the following code is miscompiling on AVR (-O1 or -O2) with gcc-4.5.1: === unsigned char func1(void); void funct2(int l); void dummy(void) { int l = 256 * func1() + func1(); if (l > 256) { return; } func1(); funct2(l); } === Producing the assembly for the body of the function: === rcall func1 rcall func1 ldi r18,lo8(0) mov r28,r18 mov r29,r19 add r28,r24 adc r29,__zero_reg__ ldi r19,hi8(257) cpi r28,lo8(257) cpc r29,r19 brge .L1 rcall func1 mov r24,r28 mov r25,r29 rcall funct2 === The obvious place where things go wrong is in the IRA pass. THe insns: === (insn 35 10 33 2 foo.c:5 (set (reg:HI 50 [ D.1955 ]) (const_int 0 [0x0])) 10 {*movhi} (nil)) (insn 33 35 34 2 foo.c:5 (set (subreg:QI (reg:HI 50 [ D.1955 ]) 1) (reg:QI 41 [ D.1955 ])) 4 {*movqi} (expr_list:REG_DEAD (reg:QI 41 [ D.1955 ]) (nil))) (insn 34 33 13 2 foo.c:5 (set (subreg:QI (reg:HI 50 [ D.1955 ]) 0) (const_int 0 [0x0])) 4 {*movqi} (nil)) === Get rewritten after register assignment (r41 => r17, r50 => r28) to: === (insn 35 10 37 2 foo.c:5 (set (reg:HI 28 r28 [orig:50 D.1955 ] [50]) (const_int 0 [0x0])) 10 {*movhi} (expr_list:REG_EQUAL (const_int 0 [0x0]) (nil))) (insn 37 35 33 2 foo.c:5 (set (reg:HI 14 r14) (reg:HI 28 r28 [orig:50 D.1955 ] [50])) 10 {*movhi} (nil)) (insn 33 37 39 2 foo.c:5 (set (reg:QI 18 r18) (reg:QI 17 r17 [orig:41 D.1955 ] [41])) 4 {*movqi} (nil)) (insn 39 33 38 2 foo.c:5 (set (reg:QI 15 r15 [+1 ]) (reg:QI 18 r18)) 4 {*movqi} (nil)) (insn 38 39 34 2 foo.c:5 (set (reg:HI 28 r28 [orig:50 D.1955 ] [50]) (reg:HI 14 r14)) 10 {*movhi} (nil)) (insn 34 38 40 2 foo.c:5 (set (reg:QI 18 r18) (const_int 0 [0x0])) 4 {*movqi} (nil)) (insn 40 34 13 2 foo.c:5 (set (reg:HI 28 r28 [orig:50 D.1955 ] [50]) (reg:HI 18 r18)) 10 {*movhi} (nil)) === Note the the way that insn 34 is transformed into insn 34 and 40, which at first glance aren't doing the same thing at all and overwrite the earlier computation of r29. There are basically two problems here: Problem 1: there are severe restrictions on the use of r28/r29 (which is the frame pointer) from checks in avr_hard_regno_mode_ok() and in simplify_gen_subreg. These checks prevent subregister access to it by splitting it into two pieces from being allowed. See: http://gcc.gnu.org/ml/gcc-patches/2005-01/msg00834.html for some background information. But the register allocator happily puts stuff into r28/r29 and reload has to bend over backwards to try and sort things out. Problem 2: supposedly, according to the description of (set...) in rtl.texi [...], if @var{lval} is a @code{subreg} whose machine mode is narrower than the mode of the register, the rest of the register can be changed in an undefined way. So actually the transform from insn 34 to insn 34/40 is correct. It's the generation of insn 40 by lower-subreg.c:resolve_shift_zext() that was wrong; for dest_zero it generates a (subreg ...) expression when setting the low part after setting the high part. This may be the only place where there is a problem, but considering the enormous amount of simplify_gen_subreg(), and the very limited use of (strict-lower-part...) my guess is that there are a bunch of similar problems. The insn 34 => insn 34/40 transformation comes from code in push_reload() which turns a subreg set into a set of the full parent register if that would be valid according to the above rule and direct subreg access isn't allowed. (The slightly smaller of the two gigantic if statements in that function, with the comment "Similarly for paradoxical and problematical SUBREGs on the output." I think this is the "problematic" part of that, and is modelled by the '!HARD_REGNO_MODE_OK (subreg_regno (out), outmode)' check within that function.) So, ideas about fixing: 1) Remove r28/r29 (the framepointer) from GENERAL_REGS on AVR. Then (almost?) nothing will ever be allocated to it, and it should be at least very hard to trigger the problem. I'll attach a patch implementing this. It seems to work in basic testing; libgcc compiles, the test case and the larger code that it comes from now works correctly. Code generation may be a bit worse from having only 2/3 pointer register pairs to work with, but it will at least be correct. 2) Split apart HImode from Pmode on AVR. The current code in avr_hard_regno_mode_ok() seems to be written on the assumption that this is the case since it talks about only allowing Pmode. I'm not sure how to do this without bloating avr.md a lot. 3) Try to fix the framepointer situation so that it gets correctly accounted for and you can't accidentally write over half of it if you use half the register at the wrong time. Sounds like the right thing to do. Sounds a project I'd have no idea how to start. 4) Make the code in resolve_shift_zext() and anywhere else use (strict_lower_part ) when it doesn't want the upper part to be overwritten. Again, sounds like the right thing to do. I tried to do this, was beyond my GCC hacking skills. Also, I suspect that it might have gigantic unexpected consequences on generated code on various platforms. 5) Try to split apart 'HARD_REGNO_MODE_OK' into two concepts: - Whether it's OK to allocate registers in this mode - Whether it's OK to read and write this register as a subreg of a larger register. The second check could then be used in simplify_subreg_regno() and the push_reload() optimization. I don't think GCC really needs the extra complexity. For comparison, With the attached patch the body of the function is the much more reasonable: === rcall func1 mov r15,r24 rcall func1 mov r17,r15 ldi r16,lo8(0) add r16,r24 adc r17,__zero_reg__ ldi r24,hi8(257) cpi r16,lo8(257) cpc r17,r24 brge .L1 rcall func1 mov r24,r16 mov r25,r17 rcall funct2 === -- Summary: avr miscompilations related to frame pointer registers Product: gcc Version: 4.5.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: otaylor at redhat dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45291