https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64081
--- Comment #46 from rguenther at suse dot de <rguenther at suse dot de> --- On Mon, 6 Feb 2017, aldyh at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64081 > > --- Comment #45 from Aldy Hernandez <aldyh at gcc dot gnu.org> --- > Created attachment 40683 > --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40683&action=edit > reduced testcase that exhibits problem on a cross build (function crapola) > > This pre-processed source is miscompiled by the stage1 compiler building the > stage2 compiler. I have been able to abstract the problem to a function > crapola() in the attachment. This function was extracted from > build_pred_graph() in tree-ssa-structalias.c: > > void crapola() > { > unsigned int j; > for (j = 1; j < (varmap).length (); j++) > { > if (!get_varinfo (j)->is_special_var) > bitmap_set_bit (graph->direct_nodes, j); > } > } > > With attachment 40673, one can build a Linux cross compiler on top of r226811 > and see the difference in compiling this testcase with and without the changes > to loop-iv.c (--target=powerpc-ibm-aix7.2.0.0 and compiling the testcase with > -O2). > > Now...could someone please double check my logic here? > > The problem here is that the above function gets two parallel counters for 'j' > that IMO are not kept in sync: > > ._Z7crapolav: > LFB..4689: > lwz 9,LC..184(2) > lwz 8,0(9) > cmpwi 7,8,0 > beqlr 7 > lwz 5,4(8) ;; r5 = varmap.length() > cmplwi 7,5,1 > blelr- 7 > addi 7,5,-1 ;; r7 = varmap.length() - 1 > lwz 9,LC..185(2) > addi 8,8,8 > mtctr 7 ;; CTR = varmap.length() - 1 > li 10,1 ;; r10 = j = 1 > lwz 3,0(9) > li 4,1 > .align 4 > L..1471: > lwzu 9,4(8) ;; We read here once too many times and BOOM! > lwz 9,4(9) > andis. 7,9,0x4000 ;; twiddling to get is_special_var > bne 0,L..1472 ;; jump to problematic loop if is_special_var != 0 > lwz 6,52(3) > lwz 7,0(6) > cmpwi 7,7,0 > bne- 7,L..1479 > rlwinm 9,10,29,3,29 > rlwinm 7,10,0,27,31 > add 9,6,9 > addi 10,10,1 ;; r10++; (j++) > lwz 6,12(9) > cmplw 7,5,10 ;; cr7 = compare(varmap.length(), j) > slw 7,4,7 ;; (NOTE: This is r7 *NOT* cr7) > or 7,6,7 ;; This is just code updating the bitmap. > stw 7,12(9) > bne+ 7,L..1471 ;; loop on cr7 which should compare "j != length" > ;; BOO!!! we don't keep CTR in sync!!! > blr > .align 4 > L..1472: > addi 10,10,1 ;; r10++; (we keep r10/j in sync here) > bdnz L..1471 ;; loop on CTR while keeping r10/j in sync > blr > [snip] > > We keep the iteration variable 'j' in r10, which we use to compare against > r5. > R5 is the upper bound/length. However, we also keep a running count in PPC's > counter (CTR). This is in the snippet in L..1472. Notice that every time we > use the PPC counter, we also update the 'j' in r10. However, the reverse is > not true: when we increment j through the the snippet in L..1471, we never > update the CTR. This may cause CTR to have an optimistic value when > is_special_var != 0. (That is, unless there's a magical PPC instruction I'm > unaware of before L..1472 that decrements CTR). > > All this causes one two many reads to "lwzu 9,4(8)" in the loop. > > Does this make sense? Can someone take it from here? Sounds like sth goes wrong with (updating?) do-loop insns. Note that in the past I successfully debugged an AIX issue with a cross from x86_64-linux.