https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64081

--- Comment #46 from rguenther at suse dot de <rguenther at suse dot de> ---
On Mon, 6 Feb 2017, aldyh at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64081
> 
> --- Comment #45 from Aldy Hernandez <aldyh at gcc dot gnu.org> ---
> Created attachment 40683
>   --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40683&action=edit
> reduced testcase that exhibits problem on a cross build (function crapola)
> 
> This pre-processed source is miscompiled by the stage1 compiler building the
> stage2 compiler.  I have been able to abstract the problem to a function
> crapola() in the attachment.  This function was extracted from
> build_pred_graph() in tree-ssa-structalias.c:
> 
> void crapola()
> {
>   unsigned int j;
>   for (j = 1; j < (varmap).length (); j++)
>     {
>       if (!get_varinfo (j)->is_special_var)
>  bitmap_set_bit (graph->direct_nodes, j);
>     }
> }
> 
> With attachment 40673, one can build a Linux cross compiler on top of r226811
> and see the difference in compiling this testcase with and without the changes
> to loop-iv.c (--target=powerpc-ibm-aix7.2.0.0 and compiling the testcase with
> -O2).
> 
> Now...could someone please double check my logic here?
> 
> The problem here is that the above function gets two parallel counters for 'j'
> that IMO are not kept in sync:
> 
> ._Z7crapolav:
> LFB..4689:
>         lwz 9,LC..184(2)
>         lwz 8,0(9)
>         cmpwi 7,8,0
>         beqlr 7
>         lwz 5,4(8)      ;; r5 = varmap.length()
>         cmplwi 7,5,1
>         blelr- 7
>         addi 7,5,-1     ;; r7 = varmap.length() - 1
>         lwz 9,LC..185(2)
>         addi 8,8,8
>         mtctr 7         ;; CTR = varmap.length() - 1
>         li 10,1         ;; r10 = j = 1
>         lwz 3,0(9)
>         li 4,1
>         .align 4
> L..1471:
>         lwzu 9,4(8)     ;; We read here once too many times and BOOM!
>         lwz 9,4(9)
>         andis. 7,9,0x4000  ;; twiddling to get is_special_var
>         bne 0,L..1472      ;; jump to problematic loop if is_special_var != 0
>         lwz 6,52(3)
>         lwz 7,0(6)
>         cmpwi 7,7,0
>         bne- 7,L..1479
>         rlwinm 9,10,29,3,29
>         rlwinm 7,10,0,27,31
>         add 9,6,9
>         addi 10,10,1    ;; r10++; (j++)
>         lwz 6,12(9)
>         cmplw 7,5,10    ;; cr7 = compare(varmap.length(), j)
>         slw 7,4,7       ;; (NOTE: This is r7 *NOT* cr7)
>         or 7,6,7        ;; This is just code updating the bitmap.
>         stw 7,12(9)
>         bne+ 7,L..1471  ;; loop on cr7 which should compare "j != length"
>                         ;; BOO!!!  we don't keep CTR in sync!!!
>         blr
>         .align 4
> L..1472:
>         addi 10,10,1    ;; r10++; (we keep r10/j in sync here)
>         bdnz L..1471    ;; loop on CTR while keeping r10/j in sync
>         blr
> [snip]
> 
> We keep the iteration variable 'j' in r10, which we use to compare against 
> r5. 
> R5 is the upper bound/length.  However, we also keep a running count in PPC's
> counter (CTR).  This is in the snippet in L..1472.  Notice that every time we
> use the PPC counter, we also update the 'j' in r10.  However, the reverse is
> not true: when we increment j through the the snippet in L..1471, we never
> update the CTR. This may cause CTR to have an optimistic value when
> is_special_var != 0.  (That is, unless there's a magical PPC instruction I'm
> unaware of before L..1472 that decrements CTR).
> 
> All this causes one two many reads to "lwzu 9,4(8)" in the loop.
> 
> Does this make sense?  Can someone take it from here?

Sounds like sth goes wrong with (updating?) do-loop insns.

Note that in the past I successfully debugged an AIX issue with a cross
from x86_64-linux.

Reply via email to