https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64081

--- Comment #45 from Aldy Hernandez <aldyh at gcc dot gnu.org> ---
Created attachment 40683
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40683&action=edit
reduced testcase that exhibits problem on a cross build (function crapola)

This pre-processed source is miscompiled by the stage1 compiler building the
stage2 compiler.  I have been able to abstract the problem to a function
crapola() in the attachment.  This function was extracted from
build_pred_graph() in tree-ssa-structalias.c:

void crapola()
{
  unsigned int j;
  for (j = 1; j < (varmap).length (); j++)
    {
      if (!get_varinfo (j)->is_special_var)
 bitmap_set_bit (graph->direct_nodes, j);
    }
}

With attachment 40673, one can build a Linux cross compiler on top of r226811
and see the difference in compiling this testcase with and without the changes
to loop-iv.c (--target=powerpc-ibm-aix7.2.0.0 and compiling the testcase with
-O2).

Now...could someone please double check my logic here?

The problem here is that the above function gets two parallel counters for 'j'
that IMO are not kept in sync:

._Z7crapolav:
LFB..4689:
        lwz 9,LC..184(2)
        lwz 8,0(9)
        cmpwi 7,8,0
        beqlr 7
        lwz 5,4(8)      ;; r5 = varmap.length()
        cmplwi 7,5,1
        blelr- 7
        addi 7,5,-1     ;; r7 = varmap.length() - 1
        lwz 9,LC..185(2)
        addi 8,8,8
        mtctr 7         ;; CTR = varmap.length() - 1
        li 10,1         ;; r10 = j = 1
        lwz 3,0(9)
        li 4,1
        .align 4
L..1471:
        lwzu 9,4(8)     ;; We read here once too many times and BOOM!
        lwz 9,4(9)
        andis. 7,9,0x4000  ;; twiddling to get is_special_var
        bne 0,L..1472      ;; jump to problematic loop if is_special_var != 0
        lwz 6,52(3)
        lwz 7,0(6)
        cmpwi 7,7,0
        bne- 7,L..1479
        rlwinm 9,10,29,3,29
        rlwinm 7,10,0,27,31
        add 9,6,9
        addi 10,10,1    ;; r10++; (j++)
        lwz 6,12(9)
        cmplw 7,5,10    ;; cr7 = compare(varmap.length(), j)
        slw 7,4,7       ;; (NOTE: This is r7 *NOT* cr7)
        or 7,6,7        ;; This is just code updating the bitmap.
        stw 7,12(9)
        bne+ 7,L..1471  ;; loop on cr7 which should compare "j != length"
                        ;; BOO!!!  we don't keep CTR in sync!!!
        blr
        .align 4
L..1472:
        addi 10,10,1    ;; r10++; (we keep r10/j in sync here)
        bdnz L..1471    ;; loop on CTR while keeping r10/j in sync
        blr
[snip]

We keep the iteration variable 'j' in r10, which we use to compare against r5. 
R5 is the upper bound/length.  However, we also keep a running count in PPC's
counter (CTR).  This is in the snippet in L..1472.  Notice that every time we
use the PPC counter, we also update the 'j' in r10.  However, the reverse is
not true: when we increment j through the the snippet in L..1471, we never
update the CTR. This may cause CTR to have an optimistic value when
is_special_var != 0.  (That is, unless there's a magical PPC instruction I'm
unaware of before L..1472 that decrements CTR).

All this causes one two many reads to "lwzu 9,4(8)" in the loop.

Does this make sense?  Can someone take it from here?

Reply via email to