https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90009
Tom de Vries <vries at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jakub at redhat dot com, | |rguenth at gcc dot gnu.org Component|target |tree-optimization --- Comment #1 from Tom de Vries <vries at gcc dot gnu.org> --- For the oacc test-case, the pass thread1 duplicates blocks containing an oacc fork and an oacc join. In the resulting code, there are still as many forks as joins, but forks are no longer post-dominated by a single join: ... <bb 2> [local count: 87490073]: _10 = *.omp_data_i_9(D).b; b_11 = *_10; _12 = *.omp_data_i_9(D).n; n_13 = *_12; if (n_13 != 0) goto <bb 3>; [50.00%] else goto <bb 6>; [50.00%] <bb 3> [local count: 43745037]: # c_47 = PHI <0(2)> _15 = .UNIQUE (OACC_FORK, 0, 1); <bb 4> [local count: 43745037]: _19 = .GOACC_DIM_POS (1); if (n_13 > _19) goto <bb 8>; [27.00%] else goto <bb 5>; [73.00%] <bb 5> [local count: 31933877]: _1 = .UNIQUE (OACC_JOIN, _15, 1); goto <bb 12>; [100.00%] <bb 6> [local count: 43745036]: # c_3 = PHI <b_11(2)> _17 = .UNIQUE (OACC_FORK, 0, 1); <bb 7> [local count: 43745036]: _48 = .GOACC_DIM_POS (1); if (n_13 > _48) goto <bb 8>; [27.00%] else goto <bb 10>; [73.00%] <bb 8> [local count: 23622320]: # c_23 = PHI <c_3(7), c_47(4)> # _29 = PHI <_17(7), _15(4)> # _50 = PHI <_48(7), _19(4)> <bb 9> [local count: 1073741824]: # _2 = PHI <_50(8), _26(9)> _24 = *.omp_data_i_9(D).data; (*_24)[_2] = 1; _26 = _2 + 2; if (n_13 > _26) goto <bb 9>; [89.00%] else goto <bb 10>; [11.00%] <bb 10> [local count: 405516494]: # c_22 = PHI <c_3(7), c_23(9)> # _27 = PHI <_17(7), _29(9)> _31 = .UNIQUE (OACC_JOIN, _27, 1); <bb 11> [local count: 55556197]: if (c_22 != 0) goto <bb 13>; [80.00%] else goto <bb 12>; [20.00%] <bb 12> [local count: 87490074]: return; <bb 13> [local count: 43745037]: _33 = *.omp_data_i_9(D).data; (*_33)[0] = 2; goto <bb 12>; [100.00%] ... In particular, the joins _31 and and _1 are both reachable from fork _15 (bb paths 3 -> 4 -> 5 and 3 -> 4 -> 8 -> 9 -> 10). This confuses nvptx_discover_pars: ... Loops 0: mask 0 head=-1, tail=-1 blocks: 2 3 15 10 13 5 1: mask 2 head=14, tail=-1 blocks: 14 1: mask 2 head=12, tail=13 blocks: 12 6 7 8 9 4 ... resulting in the loop with head 14 having no tail (-1). These asserts detect the problem during nvptx_discover_pars: ... @@ -3243,6 +3244,7 @@ nvptx_find_par (bb_insn_map_t *map, parallel *par, basic_block block) gcc_assert (mask); par = new parallel (par, mask); + gcc_assert (par->forked_block == NULL); par->forked_block = block; par->forked_insn = end; if (nvptx_needs_shared_bcast (mask)) @@ -3258,6 +3260,7 @@ nvptx_find_par (bb_insn_map_t *map, parallel *par, basic_block block) unsigned mask = UINTVAL (XVECEXP (PATTERN (end), 0, 0)); gcc_assert (par->mask == mask); + gcc_assert (par->join_block == NULL); par->join_block = block; par->join_insn = end; if (nvptx_needs_shared_bcast (mask)) ... Tentative fix: ... diff --git a/gcc/tree-ssa-threadbackward.c b/gcc/tree-ssa-threadbackward.c index 81dc05dc831..259ddc9c929 100644 --- a/gcc/tree-ssa-threadbackward.c +++ b/gcc/tree-ssa-threadbackward.c @@ -261,6 +261,13 @@ thread_jumps::profitable_jump_thread_path (basic_block bbi, tree name, gsi_next_nondebug (&gsi)) { gimple *stmt = gsi_stmt (gsi); + if (is_gimple_call (stmt) + && gimple_call_internal_p (stmt) + && gimple_call_internal_unique_p (stmt)) + { + m_path.pop (); + return NULL; + } /* Do not count empty statements and labels. */ if (gimple_code (stmt) != GIMPLE_NOP && !(gimple_code (stmt) == GIMPLE_ASSIGN ...