[Bug target/45291] New: avr miscompilations related to frame pointer registers

otaylor at redhat dot com Sun, 15 Aug 2010 13:20:14 -0700

Opening caveats: This comes from a day of digging into why the Arduino
environment wasn't working correctly on my Fedora 13 system with avr-gcc-4.5.
I'm not a GCC hacker and not intending to become one, so forgive sloppy
terminology and if the right fix is one of the larger things I mention below,
someone else is going to have to do it. I didn't turn up any previous work on
this other than people on the Arduino forums recommending downgrading to
gcc-4.3, but I also didn't actively ask around, so I'm 3/4's expecting a fix
for this to already exist somewhere on gcc-patches or on someone's hard drive.
If so, at least I learned a bit more about GCC internals. :-)


OK, with that out of the way, the following code is miscompiling on AVR (-O1 or
-O2) with gcc-4.5.1:

===
unsigned char func1(void);
void funct2(int l);

void dummy(void) {
  int l = 256 * func1() + func1();
  if (l > 256) {
      return;
  }
  func1();
  funct2(l);
}
===

Producing the assembly for the body of the function:

===
        rcall func1
        rcall func1
        ldi r18,lo8(0)
        mov r28,r18
        mov r29,r19
        add r28,r24
        adc r29,__zero_reg__
        ldi r19,hi8(257)
        cpi r28,lo8(257)
        cpc r29,r19
        brge .L1
        rcall func1
        mov r24,r28
        mov r25,r29
        rcall funct2
===

The obvious place where things go wrong is in the IRA pass. THe insns:

===
(insn 35 10 33 2 foo.c:5 (set (reg:HI 50 [ D.1955 ])
        (const_int 0 [0x0])) 10 {*movhi} (nil))

(insn 33 35 34 2 foo.c:5 (set (subreg:QI (reg:HI 50 [ D.1955 ]) 1)
        (reg:QI 41 [ D.1955 ])) 4 {*movqi} (expr_list:REG_DEAD (reg:QI 41 [
D.1955 ])
        (nil)))

(insn 34 33 13 2 foo.c:5 (set (subreg:QI (reg:HI 50 [ D.1955 ]) 0)
        (const_int 0 [0x0])) 4 {*movqi} (nil))
===

Get rewritten after register assignment (r41 => r17, r50 => r28) to:

===
(insn 35 10 37 2 foo.c:5 (set (reg:HI 28 r28 [orig:50 D.1955 ] [50])
        (const_int 0 [0x0])) 10 {*movhi} (expr_list:REG_EQUAL (const_int 0
[0x0])
        (nil)))

(insn 37 35 33 2 foo.c:5 (set (reg:HI 14 r14)
        (reg:HI 28 r28 [orig:50 D.1955 ] [50])) 10 {*movhi} (nil))

(insn 33 37 39 2 foo.c:5 (set (reg:QI 18 r18)
        (reg:QI 17 r17 [orig:41 D.1955 ] [41])) 4 {*movqi} (nil))

(insn 39 33 38 2 foo.c:5 (set (reg:QI 15 r15 [+1 ])
        (reg:QI 18 r18)) 4 {*movqi} (nil))

(insn 38 39 34 2 foo.c:5 (set (reg:HI 28 r28 [orig:50 D.1955 ] [50])
        (reg:HI 14 r14)) 10 {*movhi} (nil))

(insn 34 38 40 2 foo.c:5 (set (reg:QI 18 r18)
        (const_int 0 [0x0])) 4 {*movqi} (nil))

(insn 40 34 13 2 foo.c:5 (set (reg:HI 28 r28 [orig:50 D.1955 ] [50])
        (reg:HI 18 r18)) 10 {*movhi} (nil))
===

Note the the way that insn 34 is transformed into insn 34 and 40, which
at first glance aren't doing the same thing at all and overwrite the
earlier computation of r29.

There are basically two problems here:

 Problem 1: there are severe restrictions on the use of r28/r29 (which
 is the frame pointer) from checks in avr_hard_regno_mode_ok() and
 in simplify_gen_subreg. These checks prevent subregister access to
 it by splitting it into two pieces from being allowed.

 See:

    http://gcc.gnu.org/ml/gcc-patches/2005-01/msg00834.html

 for some background information. But the register allocator happily
 puts stuff into r28/r29 and reload has to bend over backwards to
 try and sort things out.

 Problem 2: supposedly, according to the description of (set...) in rtl.texi

    [...], if @var{lval} is a @code{subreg} whose machine mode is
    narrower than the mode of the register, the rest of the register can
    be changed in an undefined way.

 So actually the transform from insn 34 to insn 34/40 is correct. It's the
 generation of insn 40 by lower-subreg.c:resolve_shift_zext() that was
 wrong; for dest_zero it generates a (subreg ...) expression when setting
 the low part after setting the high part.

 This may be the only place where there is a problem, but considering
 the enormous amount of simplify_gen_subreg(), and the very limited use
 of (strict-lower-part...) my guess is that there are a bunch of similar
 problems.

 The insn 34 => insn 34/40 transformation comes from code in push_reload()
which
 turns a subreg set into a set of the full parent register if that would
 be valid according to the above rule and direct subreg access isn't
 allowed.

 (The slightly smaller of the two gigantic if statements in that function,
 with the comment "Similarly for paradoxical and problematical SUBREGs
 on the output." I think this is the "problematic" part of that, and
 is modelled by the '!HARD_REGNO_MODE_OK (subreg_regno (out), outmode)'
 check within that function.)

So, ideas about fixing:

 1) Remove r28/r29 (the framepointer) from GENERAL_REGS on AVR.
    Then (almost?) nothing will ever be allocated to it, and it should
    be at least very hard to trigger the problem. I'll attach a patch
    implementing this. It seems to work in basic testing; libgcc compiles,
    the test case and the larger code that it comes from now works
    correctly. Code generation may be a bit worse from having only 2/3
    pointer register pairs to work with, but it will at least be correct.

 2) Split apart HImode from Pmode on AVR. The current code in
    avr_hard_regno_mode_ok() seems to be written on the assumption that
    this is the case since it talks about only allowing Pmode.
    I'm not sure how to do this without bloating avr.md a lot.

 3) Try to fix the framepointer situation so that it gets correctly
    accounted for and you can't accidentally write over half of it
    if you use half the register at the wrong time. Sounds like the
    right thing to do. Sounds a project I'd have no idea how to start.

 4) Make the code in resolve_shift_zext() and anywhere else use
    (strict_lower_part ) when it doesn't want the upper part to be
    overwritten. Again, sounds like the right thing to do. I tried to do
    this, was beyond my GCC hacking skills. Also, I suspect that it might
    have gigantic unexpected consequences on generated code
    on various platforms.

 5) Try to split apart 'HARD_REGNO_MODE_OK' into two concepts:

    - Whether it's OK to allocate registers in this mode
    - Whether it's OK to read and write this register as a subreg
      of a larger register.

    The second check could then be used in simplify_subreg_regno()
    and the push_reload() optimization. I don't think GCC really
    needs the extra complexity.

For comparison, With the attached patch the body of the function is the
much more reasonable:

===
        rcall func1
        mov r15,r24
        rcall func1
        mov r17,r15
        ldi r16,lo8(0)
        add r16,r24
        adc r17,__zero_reg__
        ldi r24,hi8(257)
        cpi r16,lo8(257)
        cpc r17,r24
        brge .L1
        rcall func1
        mov r24,r16
        mov r25,r17
        rcall funct2
===


-- 
           Summary: avr miscompilations related to frame pointer registers
           Product: gcc
           Version: 4.5.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: otaylor at redhat dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45291

[Bug target/45291] New: avr miscompilations related to frame pointer registers

Reply via email to