PR23774 shows a situation where powerpc-linux (and other rs6000 targets) break an ABI requirement that 0(r1) always holds a pointer to the previous stack frame, except for the topmost frame. This particular case concerns dynamic stack (eg. alloca) deallocation. The other case where this happens is __builtin_longjmp. It's easy to see why this happens. restore_stack_block and restore_stack_nonlocal both write the backchain after setting the new stack pointer.
Simply reordering the instructions doesn't cure the problem. Scheduling conspires against us, and puts them back the other way. So we need some sort of dependency or blockage to stop the scheduler reordering. I tried a number of approaches. 1) First, I tried using emit_barrier() between the backchain write and sp set. This failed on a testcase where the sp assignment was deleted, leaving the barrier as the last insn. rs6000_output_function_epilogue takes this to mean no epilogue is needed, so this idea doesn't work unless you stop the stack pointer assignment being removed. 2) Next, I defined parallels to keep things together. Like the following, with another for DImode. (define_insn_and_split "set_stack_pointer_si" [(set (mem:SI (match_operand:SI 1 "gpc_reg_operand" "b")) (match_operand:SI 2 "gpc_reg_operand" "r")) (set (match_operand:SI 0 "gpc_reg_operand" "=b") (match_dup 1))] "TARGET_32BIT" "{st|stw} %2,0(%1)\;mr %0,%1" "&& !TARGET_UPDATE" [(set (mem:SI (match_dup 1)) (match_dup 2)) (set (match_dup 0) (match_dup 1))] "" [(set_attr "type" "store") (set_attr "length" "8")]) This works, but doesn't give ideal power4/5 insn grouping, with (I think) one too many nops being emitted. Maybe that's a bug in the bundling code, but trying to make it work with parallel's like the above didn't seem a good idea. 3) Someone thought that setting the proper memory alias set for the mems in these patterns would help. I tried that, it doesn't. 4) I used an unspec to make the sp assignment depend on the backchain mem write. ;; Tie the update of sp to the write of the backchain word, so that the ;; backchain word is valid before sp is changed. If !TARGET_UPDATE ;; don't bother, since stack allocation isn't atomic. (define_insn_and_split "*sp_<mode>" [(set (match_operand:P 0 "gpc_reg_operand" "=b") (unspec:P [(match_operand:P 1 "gpc_reg_operand" "r") (match_operand:P 2 "memory_operand" "m")] UNSPEC_SP))] "" "mr %0,%1" "&& !TARGET_UPDATE" [(set (match_dup 0) (match_dup 1))] "") This also works, but give dismal results on power4/5. You get a bunch of nops inserted, because mention of the mem in the unspec is taken to mean that memory is read (duh!), and so store/load blockages need to be avoided. 5) I used gen_stack_tie() in a similar approach to using emit_barrier(). This too works, but again the MEMs in the stack_tie insn are incorrectly interpreted as real memory accesses by the bundling code, resulting in multiple unnecessary nops. I saw lwz 0,0(1) stw 0,0(10) nop nop mr 1,10 nop nop being generated for restore_stack_block. 6) Finally, I tried a trick with a fake trap_if to ensure the scheduler keeps what it thinks might be a trapping memory access, the backchain write, before the stack pointer assignment. This also works, and gives good results on power4/5. However, I'm not really that happy with this idea since it relies on gcc rtl analysis staying sufficiently ignorant. So I suspect I ought to go back to number (5), and make the bundling code aware of stack_tie. It looks like this might be just a simple hack to rs6000.c:get_next_active_insn. Comments? Oh, and BTW, why doesn't get_next_active_insn discard BARRIER and CODE_LABEL? diff -urp -xCVS -x'*~' -x'.#*' -xTAGS -xautom4te.cache -x'*.info' gcc-virgin/gcc/config/rs6000/rs6000.md gcc-current/gcc/config/rs6000/rs6000.md --- gcc-virgin/gcc/config/rs6000/rs6000.md 2005-09-01 20:15:50.000000000 +0930 +++ gcc-current/gcc/config/rs6000/rs6000.md 2005-09-09 15:54:01.000000000 +0930 @@ -9347,51 +9346,68 @@ "" "DONE;") +;; The fake trap_if here makes the scheduler write out all mem that +;; might trap. The idea is to ensure that the backchain word is written +;; before sp is set. This scheme will fail if gcc is clever enough to +;; figure out that the backchain write indeed cannot trap. +(define_insn "*set_sp_<mode>" + [(set (match_operand:P 0 "gpc_reg_operand" "=b") + (match_operand:P 1 "gpc_reg_operand" "r")) + (trap_if (const_int 2) (const_int 0))] + "" + "mr %0,%1") + (define_expand "restore_stack_block" - [(use (match_operand 0 "register_operand" "")) - (set (match_dup 2) (match_dup 3)) - (set (match_dup 0) (match_operand 1 "register_operand" "")) - (set (match_dup 3) (match_dup 2))] + [(set (match_dup 2) (match_dup 3)) + (set (match_dup 4) (match_dup 2)) + (parallel [(set (match_operand 0 "register_operand" "") + (match_operand 1 "register_operand" "")) + (trap_if (const_int 2) (const_int 0))])] "" " { operands[2] = gen_reg_rtx (Pmode); - operands[3] = gen_rtx_MEM (Pmode, operands[0]); + operands[3] = gen_frame_mem (Pmode, operands[0]); + /* We don't want the backchain write to be recognized as non-trapping, + so don't use gen_frame_mem here. */ + operands[4] = gen_rtx_MEM (Pmode, operands[1]); }") (define_expand "save_stack_nonlocal" - [(match_operand 0 "memory_operand" "") - (match_operand 1 "register_operand" "")] + [(set (match_dup 3) (match_dup 4)) + (set (match_operand 0 "memory_operand" "") (match_dup 3)) + (set (match_dup 2) (match_operand 1 "register_operand" ""))] "" " { - rtx temp = gen_reg_rtx (Pmode); int units_per_word = (TARGET_32BIT) ? 4 : 8; /* Copy the backchain to the first word, sp to the second. */ - emit_move_insn (temp, gen_rtx_MEM (Pmode, operands[1])); - emit_move_insn (adjust_address_nv (operands[0], Pmode, 0), temp); - emit_move_insn (adjust_address_nv (operands[0], Pmode, units_per_word), - operands[1]); - DONE; + operands[0] = adjust_address_nv (operands[0], Pmode, 0); + operands[2] = adjust_address_nv (operands[0], Pmode, units_per_word); + operands[3] = gen_reg_rtx (Pmode); + operands[4] = gen_frame_mem (Pmode, operands[1]); }") (define_expand "restore_stack_nonlocal" - [(match_operand 0 "register_operand" "") - (match_operand 1 "memory_operand" "")] + [(set (match_dup 2) (match_operand 1 "memory_operand" "")) + (set (match_dup 3) (match_dup 4)) + (set (match_dup 5) (match_dup 2)) + (parallel [(set (match_operand 0 "register_operand" "") + (match_dup 3)) + (trap_if (const_int 2) (const_int 0))])] "" " { - rtx temp = gen_reg_rtx (Pmode); - int units_per_word = (TARGET_32BIT) ? 4 : 8; + int units_per_word = TARGET_32BIT ? 4 : 8; - /* Restore the backchain from the first word, sp from the second. */ - emit_move_insn (temp, - adjust_address_nv (operands[1], Pmode, 0)); - emit_move_insn (operands[0], - adjust_address_nv (operands[1], Pmode, units_per_word)); - emit_move_insn (gen_rtx_MEM (Pmode, operands[0]), temp); - DONE; + operands[2] = gen_reg_rtx (Pmode); + operands[3] = gen_reg_rtx (Pmode); + operands[1] = adjust_address_nv (operands[1], Pmode, 0); + operands[4] = adjust_address_nv (operands[1], Pmode, units_per_word); + /* We don't want the backchain write to be recognized as non-trapping, + so don't use gen_frame_mem here. */ + operands[5] = gen_rtx_MEM (Pmode, operands[3]); }") ;; TOC register handling. -- Alan Modra IBM OzLabs - Linux Technology Centre