[Bug middle-end/60013] [4.9 Regression] Build of 176.gcc from CPU2000 loops in cc1 starting with r207231

jakub at gcc dot gnu.org Tue, 04 Feb 2014 13:33:01 -0800

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60013


--- Comment #13 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Ok, here is what happens on the reduced testcase.  We have 9 dynamic conditions
and 14 basic blocks.  Let's use letters for the dynamic conditions:
A (op1[ref offset: 0] == 0)
B (op1[ref offset: 0] != 0)
C (op0 >= 2)
D (op0 <= 3)
E (op0 >= 4)
F (op0 <= 6)
G (op0 == 8)
H (op1[ref offset: 0] == 1)
I (op1[ref offset: 0] != 1)

Most of the basic blocks have only a single predecessor and aren't part of any
kind of loop, so they have fixed set of predicates after first iteration, and
bb13 has one pred edge (the default: edge of switch) with true predicate, so it
is always true in the end, so the only really interesting basic blocks are
bb10, bb11 (ABNORMAL_DISPATCHER) and bb12.  The preds of bb10 are bb11 and bb9,
preds of bb11 are bb4, bb5, bb6 and bb12 and pred of bb12 is bb10.  The
predicates for
the bbs that really don't change after first iterations are:
bb4 G&A
bb5 F&E&A
bb6 D&C&A
bb9 H&B
and the problematic basic blocks are always processed in increasing bb->index,
i.e. bb10, bb11, bb12.  So bb10 get's H&B.  Then bb11 is processed, bb12 still
has no predicates computed, and the DNF (G&A)|(F&E&A)|(D&C&A)|(H&B) is being
converted into CNF, but there is a cap on the number of &s (8), thus we don't
compute the whole CNF (which would need to have 12 M|N|O|P style operands, 4
M|N|O style operands and 1 M|N style operand, i.e. in total 17?), but we pick
just 8 operands from this (the ones with largest values of the bitmask), so
(H|G|F|D)&(H|G|F|C)&(H|G|E|D)&(H|G|E|C)&(G|F|D|B)&(G|F|C|B)&(G|E|D|B)&(G|E|C|B).
Next time bb12 is processed and it's predicate is set to H&B.
On the next big round, nothing but these 3 bbs change, bb10 is set to
the same predicate as bb11 previously, the oring of H&B predicate into this
doesn't change anything, either H or B condition is already in all the
predicates.  Then bb11 is processed, and unlike previous case there is now one
additional executable edge with H&B predicate (the same as from bb8), but this
time the computed (incomplete) set of predicate is almost like the previous
one, but does contain additionally &(H|A) term and &(G|E|C|B) term fell out, as
there was no space for it.  Then bb12 again copies bb10 predicate, the one
without &(H|A) and with &(G|E|C|B).  Then next time bb10 is processed again,
and (H&B) is being ored with the current bb11 predicate, including (H|A),
excluding (G|E|C|B), the result is the same as that bb11 predicate.  Then bb11
is processed and as a result get's the old big predicate, the one with
(G|E|C|B) term and without (H|A) and that is the start of the oscillation.

The interesting thing is e.g. that the
(H|G|F|D)&(H|G|F|C)&(H|G|E|D)&(H|G|E|C)&(G|F|D|B)&(G|F|C|B)&(G|E|D|B)&(G|E|C|B)
ored with (H&B) gives the
(H|G|F|D)&(H|G|F|C)&(H|G|E|D)&(H|G|E|C)&(H|A)&(G|F|D|B)&(G|F|C|B)&(G|E|D|B)
result, while (H&B) ored with
(H|G|F|D)&(H|G|F|C)&(H|G|E|D)&(H|G|E|C)&(G|F|D|B)&(G|F|C|B)&(G|E|D|B)&(G|E|C|B)
gives the second operand.

So, IMHO easiest fix would be just to punt whenever we need during or_predicate
more than 8 terms - turn it into a true_predicate, then we will get guaranteed
termination, the drawback would be that in some cases we could have more true
predicates than previously.

Or we could e.g. always compute the or in infinite precision (or say with say
256 terms and punt into true predicate if we reach that) and only at the end of
processing some particular bb pick some subset of at most 8 bitmasks some way
(either prefer the highest values of the bitmask ever appearing there, or say
prefer terms that were already present among the predicates on the bb
previously (if any) and only pick the highest values otherwise, etc.

[Bug middle-end/60013] [4.9 Regression] Build of 176.gcc from CPU2000 loops in cc1 starting with r207231

Reply via email to