https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64081
--- Comment #45 from Aldy Hernandez <aldyh at gcc dot gnu.org> --- Created attachment 40683 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40683&action=edit reduced testcase that exhibits problem on a cross build (function crapola) This pre-processed source is miscompiled by the stage1 compiler building the stage2 compiler. I have been able to abstract the problem to a function crapola() in the attachment. This function was extracted from build_pred_graph() in tree-ssa-structalias.c: void crapola() { unsigned int j; for (j = 1; j < (varmap).length (); j++) { if (!get_varinfo (j)->is_special_var) bitmap_set_bit (graph->direct_nodes, j); } } With attachment 40673, one can build a Linux cross compiler on top of r226811 and see the difference in compiling this testcase with and without the changes to loop-iv.c (--target=powerpc-ibm-aix7.2.0.0 and compiling the testcase with -O2). Now...could someone please double check my logic here? The problem here is that the above function gets two parallel counters for 'j' that IMO are not kept in sync: ._Z7crapolav: LFB..4689: lwz 9,LC..184(2) lwz 8,0(9) cmpwi 7,8,0 beqlr 7 lwz 5,4(8) ;; r5 = varmap.length() cmplwi 7,5,1 blelr- 7 addi 7,5,-1 ;; r7 = varmap.length() - 1 lwz 9,LC..185(2) addi 8,8,8 mtctr 7 ;; CTR = varmap.length() - 1 li 10,1 ;; r10 = j = 1 lwz 3,0(9) li 4,1 .align 4 L..1471: lwzu 9,4(8) ;; We read here once too many times and BOOM! lwz 9,4(9) andis. 7,9,0x4000 ;; twiddling to get is_special_var bne 0,L..1472 ;; jump to problematic loop if is_special_var != 0 lwz 6,52(3) lwz 7,0(6) cmpwi 7,7,0 bne- 7,L..1479 rlwinm 9,10,29,3,29 rlwinm 7,10,0,27,31 add 9,6,9 addi 10,10,1 ;; r10++; (j++) lwz 6,12(9) cmplw 7,5,10 ;; cr7 = compare(varmap.length(), j) slw 7,4,7 ;; (NOTE: This is r7 *NOT* cr7) or 7,6,7 ;; This is just code updating the bitmap. stw 7,12(9) bne+ 7,L..1471 ;; loop on cr7 which should compare "j != length" ;; BOO!!! we don't keep CTR in sync!!! blr .align 4 L..1472: addi 10,10,1 ;; r10++; (we keep r10/j in sync here) bdnz L..1471 ;; loop on CTR while keeping r10/j in sync blr [snip] We keep the iteration variable 'j' in r10, which we use to compare against r5. R5 is the upper bound/length. However, we also keep a running count in PPC's counter (CTR). This is in the snippet in L..1472. Notice that every time we use the PPC counter, we also update the 'j' in r10. However, the reverse is not true: when we increment j through the the snippet in L..1471, we never update the CTR. This may cause CTR to have an optimistic value when is_special_var != 0. (That is, unless there's a magical PPC instruction I'm unaware of before L..1472 that decrements CTR). All this causes one two many reads to "lwzu 9,4(8)" in the loop. Does this make sense? Can someone take it from here?