Hello, using the 3.4.3 baseline on SGI MIPS3 Irix6.5, I'm running into a problem where bad code is generated on a relatively trivial program when both -funit-at-a-time and -foptimize-sibling-calls is asserted. The nature of the failure is that the RTL optimizer seems to get confused about what value should be targeted to an argument register; it seems to coallesce two separate temporaries into one. Note that the original RTL being generated originates in some new code that I've added to support an experimental dialact of C (called UPC), so it isn't out of the question that there is some aliasing or other issue that I've introduced. However, most tests are passing, and just a few show the failure mode illustrated below. All the tests pass on i386 and IA64, fyi -- they don't demonstrate this failure.
First question: are there known problems in 3.4.3 with -funit-at-a-time and/or -foptimize-sibling-calls? (I ran a few queries of the Bugzilladatabase but didn't find anything). I confirmed the problematic optimizations by compiling the program with -O0 -funit-at-a-time -foptimize-sibling-calls and noticed that correct code is generated if either or both optimization switches are removed from the command line. I tried debugging the problem by compiling with -da and looked at the various rtl dump files: t.upc.00.cgraph t.upc.07.addressof t.upc.25.greg t.upc.35.mach t.upc.01.rtl t.upc.11.cfg t.upc.26.postreload t.upc.02.sibling t.upc.19.life t.upc.27.flow2 t.upc.04.jump t.upc.24.lreg t.upc.29.ce3 The bad code shows up in t.upc.02.sibling, so probably -dr -di would have sufficed. The problem that I'm seeing is illustrated in the following RTL: (insn 66 65 77 0 (set (reg:SI 225 [ <anonymous> ]) (reg/f:SI 177 virtual-stack-vars)) -1 (nil) (nil)) (insn 77 66 78 0 (set (reg:DI 228) (const_int 0 [0x0])) -1 (nil) (nil)) (insn 78 77 79 0 (set (reg:DI 228) (mem/s:DI (reg/f:SI 177 virtual-stack-vars) [0 S8 A128])) -1 (nil) (nil)) (insn 79 78 80 0 (set (reg:DI 4 $4) (reg:DI 228)) -1 (nil) (nil)) (insn 80 79 81 0 (set (reg:SI 5 $5) (reg:SI 225 [ <anonymous> ])) -1 (nil) (nil)) (insn 81 80 82 0 (set (reg:SI 6 $6) (reg:SI 224 [ <anonymous> ])) -1 (nil) (nil)) (insn 82 81 83 0 (set (reg:SI 229) (unspec:SI [ (reg:SI 28 $28) (const:SI (unspec:SI [ (symbol_ref:SI ("__putblk3") [flags 0x41] <function_decl 40ced00 __putblk3>) ] 107)) (reg:SI 79 $fakec) ] 27)) -1 (nil) (nil)) (call_insn 83 82 115 0 (parallel [ (call (mem:SI (reg:SI 229) [0 S4 A32]) (const_int 0 [0x0])) (clobber (reg:SI 31 $31)) ]) -1 (nil) (nil) (expr_list (use (reg:SI 28 $28)) (expr_list (use (reg:SI 6 $6)) (expr_list (use (reg:SI 5 $5)) (expr_list (use (reg:DI 4 $4)) (nil)))))) (insn 115 83 116 0 (clobber (mem/s:BLK (reg/f:SI 177 virtual-stack-vars) [0 A128])) -1 (nil) Above, the second argument (reg:SI $5) is set to (reg:SI 225), which in turn is set to (reg/f:SI 177 virtual-stack-vars) which is simply the frame pointer. Note that the first argument (reg:SI $4) will end up being set to the contents of the location that the frame pointer points to -- this is incorrect -- it should be set to the contents of 16($fp), or at least some other location than the double word location beginning at $fp. It looks as if the optimizer somehow aliased the two locations, or it decided somehow that they weren't both live at the same time. If we maintain the -foptimize-sibling-calls switch but do not assert -funit-at-a-time, the following correct RTL is generated: (insn 39 38 40 0 (set (reg:SI 205) (const_int 8 [0x8])) -1 (nil) (nil)) (insn 40 39 41 0 (set (reg:SI 206) (reg/f:SI 177 virtual-stack-vars)) -1 (nil) (nil)) (insn 41 40 42 0 (set (reg:DI 207) (const_int 0 [0x0])) -1 (nil) (nil)) (insn 42 41 43 0 (set (reg:DI 207) (mem/s:DI (plus:SI (reg/f:SI 177 virtual-stack-vars) (const_int 16 [0x10])) [0 S8 A128])) -1 (nil) (nil)) (insn 43 42 44 0 (set (reg:DI 4 $4) (reg:DI 207)) -1 (nil) (nil)) (insn 44 43 45 0 (set (reg:SI 5 $5) (reg:SI 206)) -1 (nil) (nil)) (insn 45 44 46 0 (set (reg:SI 6 $6) (reg:SI 205)) -1 (nil) (nil)) (call_insn 46 45 48 0 (parallel [ (call (mem:SI (symbol_ref:SI ("__putblk3") [flags 0x41] <function_decl 40ced00 __putblk3>) [0 S4 A32]) (const_int 0 [0x0])) (clobber (reg:SI 31 $31)) ]) -1 (nil) (nil) (expr_list (use (reg:SI 28 $28)) (expr_list (use (reg:SI 6 $6)) (expr_list (use (reg:SI 5 $5)) (expr_list (use (reg:DI 4 $4)) (nil)))))) (insn 48 46 49 0 (clobber (mem/s:BLK (plus:SI (reg/f:SI 177 virtual-stack-vars) (const_int 16 [0x10])) [0 A128])) -1 (nil) (nil)) Here it is a little different, because the first arg. ($4) is set the contents of 16($fp), and the second arg. is set the $fp. When -funit-at-a-time is asserted, I tried looking at t.upc.01.rtl to get a picture of the RTL before it is optimized. However, the RTL is not very comprehensible and seems abbreviated. For example, this is the call to __putblk3: (call_insn 84 66 141 (call_placeholder 77 67 0 0 (call_insn 83 82 0 (parallel [ (call (mem:SI (reg:SI 229) [0 S4 A32]) (const_int 0 [0x0])) (clobber (reg:SI 31 $31)) ]) -1 (nil) (nil) (expr_list (use (reg:SI 28 $28)) (expr_list (use (reg:SI 6 $6)) (expr_list (use (reg:SI 5 $5)) (expr_list (use (reg:DI 4 $4)) (nil))))))) -1 (nil) (nil) (nil)) Note that there is no mention of __putblk3 at all, presumably because it is buried inside the call place holder somewhere. Also stranger still, there is no mention of the argument registers at all. Am I wrong in assuming that the two digits in the names of the RTL dump files indicate the sequence of the RTL passes? Which dump file has the unoptimized RTL? Which passes run before the sibling call optimization? It may be worth adding here that the double word locations involved are records whose values set by moving a constructor to the relevant location. Although Ada probably uses constructors a lot, it wouldn't surprise me if this area isn't heavily tested. Any tips on debugging this codegen issue would be appreciated. thanks - Gary