This was initially seen internally with a 4.1.2 gcc plus power6 patches, but with some adjustments to haifa-sched.c (only touching the order in the ready list) can be reproduced with trunk. The underlying issue is, that the stack pointer restore instruction in the rs6000 backend does not conflict with all insns accessing stack memory, but only with those created by gen_frame_mem(). Attached is a C++ testcase (from khtml). Bear with me while explaining the error, it's hard to construct a small runtime testcase.
Suppose this patch is applied to trunk: ------------------------------------- Index: haifa-sched.c =================================================================== --- haifa-sched.c (revision 121710) +++ haifa-sched.c (working copy) @@ -798,6 +798,8 @@ priority (rtx insn) } while (twin != prev_first); } +if (INSN_UID(insn) == 235 || INSN_UID(insn)==230) + this_priority += 10; INSN_PRIORITY (insn) = this_priority; INSN_PRIORITY_KNOWN (insn) = 1; } @@ -930,7 +932,7 @@ rank_for_schedule (const void *x, const /* If insns are equally good, sort by INSN_LUID (original insn order), so that we make the sort stable. This minimizes instruction movement, thus minimizing sched's effect on debugging and cross-jumping. */ - return INSN_LUID (tmp) - INSN_LUID (tmp2); + return INSN_LUID (tmp2) - INSN_LUID (tmp); } /* Resort the array A in which only element at index N may be out of order. */ @@ -2402,7 +2404,7 @@ schedule_block (basic_block *target_bb, if (ready.n_ready > 0) ready_sort (&ready); - if (targetm.sched.reorder2 + if (0 && targetm.sched.reorder2 && (ready.n_ready == 0 || !SCHED_GROUP_P (ready_element (&ready, 0)))) { ----------------------------------------- I.e. switch off target specific reorder2, reorder insns by _inverse_ uid and special case two instructions by priority in order to force them to be output earlier. The compile the attached testcase with: % ./gcc/cc1plus -O2 -quiet -fsched-verbose=6 -da bug.ii Examine the assembler output for the docWidth method (not the thunk): _ZNK5khtml12RenderCanvas8docWidthEv: .... addi 9,1,12 r9 = r1+12 .L22: lwz 0,52(1) lwz 31,44(1) lwz 30,40(1) lwz 29,36(1) lwz 28,32(1) mtlr 0 addi 1,1,48 r1 = r1+48 lwz 3,0(9) access [r9] == [r1-36] blr Note that the access to [r9] is below the adjustment to r1 (stack pointer). When a signal occurs at exactly that place this will most probably have been overwritten. I'm aware of the red zone on ppc64 (and in fact also on ppc linux) but with a more complicated function the invalidly accessed offset could be made larger. The reason why this happens can be seen in the dumps. The relevant insns are (before sched2): (insn:HI 153 152 154 18 (set (reg/v/f:SI 9 9 [orig:123 a ] [123]) (plus:SI (reg/f:SI 1 1) (const_int 12 [0xc]))) 79 {*addsi3_internal1} (nil) (expr_list:REG_EQUAL (plus:SI (reg/f:SI 1 1) (const_int 12 [0xc])) (nil))) ... (insn:HI 157 155 228 19 (set (reg:SI 3 3 [orig:135 D.62311 ] [135]) (mem:SI (reg/v/f:SI 9 9 [orig:123 a ] [123]) [4 S4 A32])) 310 {*movsi_internal1} (nil) (expr_list:REG_DEAD (reg/v/f:SI 9 9 [orig:123 a ] [123]) (nil))) (insn 228 157 229 19 (use (reg/i:SI 3 3 [ <result> ])) -1 (nil) (nil)) ... (insn 231 230 232 19 (set (reg:SI 28 28) (mem/c:SI (plus:SI (reg/f:SI 1 1) (const_int 32 [0x20])) [69 S4 A8])) 310 {*movsi_internal1} (nil)(nil)) [a couple more loads from callee saved reg from [r1+offset] (insn 235 234 236 19 (set (reg/f:SI 1 1) (plus:SI (reg/f:SI 1 1) (const_int 48 [0x30]))) 79 {*addsi3_internal1} (nil) (nil)) (jump_insn 236 235 239 19 (parallel [ (return) (use (reg:SI 65 lr)) ]) 544 {*return_internal_si} (nil) (nil)) Note how the second last insn, the stack pointer adjustment can't cause any dependency of any sort with insn 157, as the latter uses register 9 to access the stack. Note further that all accesses to the saved registers are using alias set 69, while the read from [r9] (for the to be returned value) is using alias set 4. From inside the compiler it can be checked that both alias sets do not conflict. That is important to know, because the rs6000 backend has provisions to generate a sort of schedule barrier instruction supposed to clobber stack memory in order to not resort the saved register instructions with the stack adjustment. That's the 'stack_tie' instruction. That one uses gen_frame_mem() to create a MEM rtx which it can clobber, but that will have (in this case) alias set 4 (get_frame_alias_set() ). So, even if it would have been emitted it wouldn't stop this particular reordering (because not also clobbering stuff in alias set 69). But that reorder barrier isn't emitted by the backend anyway (see rs6000.c:rs6000_emit_epilogue) as frame_reg_rtx and sp_reg_rtx are equal for our case. I think the right solution here is to 1) always emit the reorder barrier 2) make the reorder barrier use alias set 0 Alternatively (but more difficult to implement) would be to not clobber alias set 0, but another new alias set conflicting with _all_ frame related alias sets. That could also be reached by not using alias set 4 for the load of the returned value in this case, but in general the alias set comes from the middle end (the decls) already. The proposed solution is for instance also used in the i386 backend, where in fact the stack pointer adjusting instructions itself clobbers all memory. The same problem also exists in the 4.1 branch (and probably 4.2, but haven't checked). -- Summary: stack references scheduled below stack pointer adjustment Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: matz at gcc dot gnu dot org GCC host triplet: powerpc-linux http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30736