This was initially seen internally with a 4.1.2 gcc plus power6 patches,
but with some adjustments to haifa-sched.c (only touching the order in the
ready list) can be reproduced with trunk.  The underlying issue is, that
the stack pointer restore instruction in the rs6000 backend does not conflict
with all insns accessing stack memory, but only with those created by
gen_frame_mem().  Attached is a C++ testcase (from khtml).  Bear with me while
explaining the error, it's hard to construct a small runtime testcase.

Suppose this patch is applied to trunk:

-------------------------------------
Index: haifa-sched.c
===================================================================
--- haifa-sched.c       (revision 121710)
+++ haifa-sched.c       (working copy)
@@ -798,6 +798,8 @@ priority (rtx insn)
            }
          while (twin != prev_first);
        }
+if (INSN_UID(insn) == 235 || INSN_UID(insn)==230)
+  this_priority += 10;
       INSN_PRIORITY (insn) = this_priority;
       INSN_PRIORITY_KNOWN (insn) = 1;
     }
@@ -930,7 +932,7 @@ rank_for_schedule (const void *x, const
   /* If insns are equally good, sort by INSN_LUID (original insn order),
      so that we make the sort stable.  This minimizes instruction movement,
      thus minimizing sched's effect on debugging and cross-jumping.  */
-  return INSN_LUID (tmp) - INSN_LUID (tmp2);
+  return INSN_LUID (tmp2) - INSN_LUID (tmp);
 }

 /* Resort the array A in which only element at index N may be out of order. 
*/
@@ -2402,7 +2404,7 @@ schedule_block (basic_block *target_bb,
          if (ready.n_ready > 0)
            ready_sort (&ready);

-         if (targetm.sched.reorder2
+         if (0 && targetm.sched.reorder2
              && (ready.n_ready == 0
                  || !SCHED_GROUP_P (ready_element (&ready, 0))))
            {
-----------------------------------------

I.e. switch off target specific reorder2, reorder insns by _inverse_ uid
and special case two instructions by priority in order to force them to be
output earlier.  The compile the attached testcase with:

% ./gcc/cc1plus -O2 -quiet -fsched-verbose=6 -da bug.ii

Examine the assembler output for the docWidth method (not the thunk):

_ZNK5khtml12RenderCanvas8docWidthEv:
   ....
        addi 9,1,12                 r9 = r1+12
.L22:
        lwz 0,52(1)
        lwz 31,44(1)
        lwz 30,40(1)
        lwz 29,36(1)
        lwz 28,32(1)
        mtlr 0
        addi 1,1,48                 r1 = r1+48
        lwz 3,0(9)                  access [r9] == [r1-36]
        blr

Note that the access to [r9] is below the adjustment to r1 (stack pointer).
When a signal occurs at exactly that place this will most probably have been
overwritten.  I'm aware of the red zone on ppc64 (and in fact also on ppc
linux) but with a more complicated function the invalidly accessed offset
could be made larger.

The reason why this happens can be seen in the dumps.  The relevant insns are
(before sched2):

(insn:HI 153 152 154 18 (set (reg/v/f:SI 9 9 [orig:123 a ] [123])
        (plus:SI (reg/f:SI 1 1)
            (const_int 12 [0xc]))) 79 {*addsi3_internal1} (nil)
    (expr_list:REG_EQUAL (plus:SI (reg/f:SI 1 1)
            (const_int 12 [0xc]))
        (nil)))
...
(insn:HI 157 155 228 19 (set (reg:SI 3 3 [orig:135 D.62311 ] [135])
        (mem:SI (reg/v/f:SI 9 9 [orig:123 a ] [123]) [4 S4 A32])) 310
       {*movsi_internal1} (nil)
    (expr_list:REG_DEAD (reg/v/f:SI 9 9 [orig:123 a ] [123])
        (nil)))

(insn 228 157 229 19 (use (reg/i:SI 3 3 [ <result> ])) -1 (nil)
    (nil))
...
(insn 231 230 232 19 (set (reg:SI 28 28)
        (mem/c:SI (plus:SI (reg/f:SI 1 1)
                (const_int 32 [0x20])) [69 S4 A8])) 310 {*movsi_internal1}
        (nil)(nil))

[a couple more loads from callee saved reg from [r1+offset]

(insn 235 234 236 19 (set (reg/f:SI 1 1)
        (plus:SI (reg/f:SI 1 1)
            (const_int 48 [0x30]))) 79 {*addsi3_internal1} (nil)
    (nil))

(jump_insn 236 235 239 19 (parallel [
            (return)
            (use (reg:SI 65 lr))
        ]) 544 {*return_internal_si} (nil)
    (nil))

Note how the second last insn, the stack pointer adjustment can't cause any
dependency of any sort with insn 157, as the latter uses register 9 to access
the stack.  Note further that all accesses to the saved registers are using
alias set 69, while the read from [r9] (for the to be returned value) is using
alias set 4.  From inside the compiler it can be checked that both alias sets
do not conflict.

That is important to know, because the rs6000 backend has provisions to
generate a sort of schedule barrier instruction supposed to clobber stack
memory in order to not resort the saved register instructions with the stack
adjustment.  That's the 'stack_tie' instruction.  That one uses
gen_frame_mem() to create a MEM rtx which it can clobber, but that will have
(in this case) alias set 4 (get_frame_alias_set() ).  So, even if it would
have been emitted it wouldn't stop this particular reordering (because not
also clobbering stuff in alias set 69).

But that reorder barrier isn't emitted by the backend anyway (see
rs6000.c:rs6000_emit_epilogue) as frame_reg_rtx and sp_reg_rtx are equal for
our case.

I think the right solution here is to
1) always emit the reorder barrier
2) make the reorder barrier use alias set 0

Alternatively (but more difficult to implement) would be to not clobber
alias set 0, but another new alias set conflicting with _all_ frame related
alias sets.  That could also be reached by not using alias set 4 for the
load of the returned value in this case, but in general the alias set comes
from the middle end (the decls) already.

The proposed solution is for instance also used in the i386 backend, where in
fact the stack pointer adjusting instructions itself clobbers all memory.

The same problem also exists in the 4.1 branch (and probably 4.2, but haven't
checked).


-- 
           Summary: stack references scheduled below stack pointer
                    adjustment
           Product: gcc
           Version: 4.3.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: matz at gcc dot gnu dot org
  GCC host triplet: powerpc-linux


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30736

Reply via email to