[PowerPC] PR23774 stack backchain broken saga

Alan Modra Fri, 09 Sep 2005 08:30:20 -0700

PR23774 shows a situation where powerpc-linux (and other rs6000 targets)
break an ABI requirement that 0(r1) always holds a pointer to the
previous stack frame, except for the topmost frame.  This particular
case concerns dynamic stack (eg. alloca) deallocation.  The other case
where this happens is __builtin_longjmp.  It's easy to see why this
happens.  restore_stack_block and restore_stack_nonlocal both write the
backchain after setting the new stack pointer.


Simply reordering the instructions doesn't cure the problem.  Scheduling
conspires against us, and puts them back the other way.  So we need some
sort of dependency or blockage to stop the scheduler reordering.  I
tried a number of approaches.

1) First, I tried using emit_barrier() between the backchain write and
sp set.  This failed on a testcase where the sp assignment was deleted,
leaving the barrier as the last insn.  rs6000_output_function_epilogue
takes this to mean no epilogue is needed, so this idea doesn't work
unless you stop the stack pointer assignment being removed.

2) Next, I defined parallels to keep things together.  Like the
following, with another for DImode.

(define_insn_and_split "set_stack_pointer_si"
  [(set (mem:SI (match_operand:SI 1 "gpc_reg_operand" "b"))
        (match_operand:SI 2 "gpc_reg_operand" "r"))
   (set (match_operand:SI 0 "gpc_reg_operand" "=b")
        (match_dup 1))]
  "TARGET_32BIT"
  "{st|stw} %2,0(%1)\;mr %0,%1"
  "&& !TARGET_UPDATE"
  [(set (mem:SI (match_dup 1)) (match_dup 2))
   (set (match_dup 0) (match_dup 1))]
  ""
  [(set_attr "type" "store")
   (set_attr "length" "8")])

This works, but doesn't give ideal power4/5 insn grouping, with (I
think) one too many nops being emitted.  Maybe that's a bug in the
bundling code, but trying to make it work with parallel's like the above
didn't seem a good idea.

3) Someone thought that setting the proper memory alias set for the mems
in these patterns would help.  I tried that, it doesn't.

4) I used an unspec to make the sp assignment depend on the backchain
mem write.

;; Tie the update of sp to the write of the backchain word, so that the
;; backchain word is valid before sp is changed.  If !TARGET_UPDATE
;; don't bother, since stack allocation isn't atomic.
(define_insn_and_split "*sp_<mode>"
  [(set (match_operand:P 0 "gpc_reg_operand" "=b")
        (unspec:P [(match_operand:P 1 "gpc_reg_operand" "r")
                   (match_operand:P 2 "memory_operand" "m")]
                  UNSPEC_SP))]
  ""
  "mr %0,%1"
  "&& !TARGET_UPDATE"
  [(set (match_dup 0) (match_dup 1))]
  "")

This also works, but give dismal results on power4/5.  You get a bunch
of nops inserted, because mention of the mem in the unspec is taken to
mean that memory is read (duh!), and so store/load blockages need to be
avoided.

5) I used gen_stack_tie() in a similar approach to using
emit_barrier().  This too works, but again the MEMs in the stack_tie
insn are incorrectly interpreted as real memory accesses by the bundling
code, resulting in multiple unnecessary nops.  I saw

        lwz 0,0(1)
        stw 0,0(10)
        nop
        nop
        mr 1,10
        nop
        nop

being generated for restore_stack_block.

6) Finally, I tried a trick with a fake trap_if to ensure the scheduler
keeps what it thinks might be a trapping memory access, the backchain
write, before the stack pointer assignment.  This also works, and gives
good results on power4/5.  However, I'm not really that happy with this
idea since it relies on gcc rtl analysis staying sufficiently ignorant.

So I suspect I ought to go back to number (5), and make the bundling
code aware of stack_tie.  It looks like this might be just a simple hack
to rs6000.c:get_next_active_insn.  Comments?  Oh, and BTW, why doesn't
get_next_active_insn discard BARRIER and CODE_LABEL?


diff -urp -xCVS -x'*~' -x'.#*' -xTAGS -xautom4te.cache -x'*.info' 
gcc-virgin/gcc/config/rs6000/rs6000.md gcc-current/gcc/config/rs6000/rs6000.md
--- gcc-virgin/gcc/config/rs6000/rs6000.md      2005-09-01 20:15:50.000000000 
+0930
+++ gcc-current/gcc/config/rs6000/rs6000.md     2005-09-09 15:54:01.000000000 
+0930
@@ -9347,51 +9346,68 @@
   ""
   "DONE;")
 
+;; The fake trap_if here makes the scheduler write out all mem that
+;; might trap.  The idea is to ensure that the backchain word is written
+;; before sp is set.  This scheme will fail if gcc is clever enough to
+;; figure out that the backchain write indeed cannot trap.
+(define_insn "*set_sp_<mode>"
+  [(set (match_operand:P 0 "gpc_reg_operand" "=b")
+       (match_operand:P 1 "gpc_reg_operand" "r"))
+   (trap_if (const_int 2) (const_int 0))]
+  ""
+  "mr %0,%1")
+
 (define_expand "restore_stack_block"
-  [(use (match_operand 0 "register_operand" ""))
-   (set (match_dup 2) (match_dup 3))
-   (set (match_dup 0) (match_operand 1 "register_operand" ""))
-   (set (match_dup 3) (match_dup 2))]
+  [(set (match_dup 2) (match_dup 3))
+   (set (match_dup 4) (match_dup 2))
+   (parallel [(set (match_operand 0 "register_operand" "")
+                  (match_operand 1 "register_operand" ""))
+             (trap_if (const_int 2) (const_int 0))])]
   ""
   "
 {
   operands[2] = gen_reg_rtx (Pmode);
-  operands[3] = gen_rtx_MEM (Pmode, operands[0]);
+  operands[3] = gen_frame_mem (Pmode, operands[0]);
+  /* We don't want the backchain write to be recognized as non-trapping,
+     so don't use gen_frame_mem here.  */
+  operands[4] = gen_rtx_MEM (Pmode, operands[1]);
 }")
 
 (define_expand "save_stack_nonlocal"
-  [(match_operand 0 "memory_operand" "")
-   (match_operand 1 "register_operand" "")]
+  [(set (match_dup 3) (match_dup 4))
+   (set (match_operand 0 "memory_operand" "") (match_dup 3))
+   (set (match_dup 2) (match_operand 1 "register_operand" ""))]
   ""
   "
 {
-  rtx temp = gen_reg_rtx (Pmode);
   int units_per_word = (TARGET_32BIT) ? 4 : 8;
 
   /* Copy the backchain to the first word, sp to the second.  */
-  emit_move_insn (temp, gen_rtx_MEM (Pmode, operands[1]));
-  emit_move_insn (adjust_address_nv (operands[0], Pmode, 0), temp);
-  emit_move_insn (adjust_address_nv (operands[0], Pmode, units_per_word),
-                 operands[1]);
-  DONE;
+  operands[0] = adjust_address_nv (operands[0], Pmode, 0);
+  operands[2] = adjust_address_nv (operands[0], Pmode, units_per_word);
+  operands[3] = gen_reg_rtx (Pmode);
+  operands[4] = gen_frame_mem (Pmode, operands[1]);
 }")
 
 (define_expand "restore_stack_nonlocal"
-  [(match_operand 0 "register_operand" "")
-   (match_operand 1 "memory_operand" "")]
+  [(set (match_dup 2) (match_operand 1 "memory_operand" ""))
+   (set (match_dup 3) (match_dup 4))
+   (set (match_dup 5) (match_dup 2))
+   (parallel [(set (match_operand 0 "register_operand" "")
+                  (match_dup 3))
+             (trap_if (const_int 2) (const_int 0))])]
   ""
   "
 {
-  rtx temp = gen_reg_rtx (Pmode);
-  int units_per_word = (TARGET_32BIT) ? 4 : 8;
+  int units_per_word = TARGET_32BIT ? 4 : 8;
 
-  /* Restore the backchain from the first word, sp from the second.  */
-  emit_move_insn (temp,
-                 adjust_address_nv (operands[1], Pmode, 0));
-  emit_move_insn (operands[0],
-                 adjust_address_nv (operands[1], Pmode, units_per_word));
-  emit_move_insn (gen_rtx_MEM (Pmode, operands[0]), temp);
-  DONE;
+  operands[2] = gen_reg_rtx (Pmode);
+  operands[3] = gen_reg_rtx (Pmode);
+  operands[1] = adjust_address_nv (operands[1], Pmode, 0);
+  operands[4] = adjust_address_nv (operands[1], Pmode, units_per_word);
+  /* We don't want the backchain write to be recognized as non-trapping,
+     so don't use gen_frame_mem here.  */
+  operands[5] = gen_rtx_MEM (Pmode, operands[3]);
 }")
 
 ;; TOC register handling.

-- 
Alan Modra
IBM OzLabs - Linux Technology Centre

[PowerPC] PR23774 stack backchain broken saga

Reply via email to