On Fri, Jul 24, 2015 at 12:10 PM, Tom de Vries <tom_devr...@mentor.com> wrote: > On 20/07/15 15:04, Tom de Vries wrote: >> >> On 16/07/15 12:15, Richard Biener wrote: >>> >>> On Thu, Jul 16, 2015 at 11:39 AM, Tom de Vries >>> <tom_devr...@mentor.com> wrote: >>>> >>>> On 16/07/15 10:44, Richard Biener wrote: >>>>> >>>>> >>>>> On Wed, Jul 15, 2015 at 9:36 PM, Tom de Vries <tom_devr...@mentor.com> >>>>> wrote: >>>>>> >>>>>> >>>>>> Hi, >>>>>> >>>>>> I. >>>>>> >>>>>> In openmp expansion of loops, we do some effort to try to create >>>>>> matching >>>>>> loops in the loop state of the child function, f.i.in >>>>>> expand_omp_for_generic: >>>>>> ... >>>>>> struct loop *outer_loop; >>>>>> if (seq_loop) >>>>>> outer_loop = l0_bb->loop_father; >>>>>> else >>>>>> { >>>>>> outer_loop = alloc_loop (); >>>>>> outer_loop->header = l0_bb; >>>>>> outer_loop->latch = l2_bb; >>>>>> add_loop (outer_loop, l0_bb->loop_father); >>>>>> } >>>>>> >>>>>> if (!gimple_omp_for_combined_p (fd->for_stmt)) >>>>>> { >>>>>> struct loop *loop = alloc_loop (); >>>>>> loop->header = l1_bb; >>>>>> /* The loop may have multiple latches. */ >>>>>> add_loop (loop, outer_loop); >>>>>> } >>>>>> ... >>>>>> >>>>>> And if that doesn't work out, we try to mark the loop state for >>>>>> fixup, in >>>>>> expand_omp_taskreg and expand_omp_target: >>>>>> ... >>>>>> /* When the OMP expansion process cannot guarantee an >>>>>> up-to-date >>>>>> loop tree arrange for the child function to fixup >>>>>> loops. */ >>>>>> if (loops_state_satisfies_p (LOOPS_NEED_FIXUP)) >>>>>> child_cfun->x_current_loops->state |= LOOPS_NEED_FIXUP; >>>>>> ... >>>>>> >>>>>> and expand_omp_for: >>>>>> ... >>>>>> else >>>>>> /* If there isn't a continue then this is a degerate case where >>>>>> the introduction of abnormal edges during lowering will >>>>>> prevent >>>>>> original loops from being detected. Fix that up. */ >>>>>> loops_state_set (LOOPS_NEED_FIXUP); >>>>>> ... >>>>>> >>>>>> However, loops are fixed up anyway, because the first pass we execute >>>>>> with >>>>>> the new child function is pass_fixup_cfg. >>>>>> >>>>>> The new child function contains a function call to >>>>>> __builtin_omp_get_num_threads, which is marked with ECF_CONST, so >>>>>> execute_fixup_cfg marks the function for TODO_cleanup_cfg, and >>>>>> subsequently >>>>>> the loops with LOOPS_NEED_FIXUP. >>>>>> >>>>>> >>>>>> II. >>>>>> >>>>>> This patch adds a verification that at the end of the omp-expand >>>>>> processing >>>>>> of the child function, either the loop structure is ok, or marked for >>>>>> fixup. >>>>>> >>>>>> This verfication triggered a failure in parloops. When an outer >>>>>> loop is >>>>>> being parallelized, both the outer and inner loop are cancelled. Then >>>>>> during >>>>>> omp-expansion, we create a loop in the loop state for the outer >>>>>> loop (the >>>>>> one that is transformed), but not for the inner, which causes the >>>>>> verification failure: >>>>>> ... >>>>>> outer-1.c:11:3: error: loop with header 5 not in loop tree >>>>>> ... >>>>>> >>>>>> [ I ran into this verification failure with an openacc kernels >>>>>> testcase >>>>>> on >>>>>> the gomp-4_0-branch, where parloops is called additionally from a >>>>>> different >>>>>> location, and pass_fixup_cfg is not the first pass that the child >>>>>> function >>>>>> is processed by. ] >>>>>> >>>>>> The patch contains a bit that makes sure that the loop state of the >>>>>> child >>>>>> function is marked for fixup in parloops. The bit is non-trival >>>>>> since it >>>>>> create a loop state and sets the fixup flag on the loop state, but >>>>>> postpones >>>>>> the init_loops_structure call till move_sese_region_to_fn, where it >>>>>> can >>>>>> succeed. >>>>>> >>>>>> >> >> <SNIP> >> >>> Can we fix the root-cause of the issue instead? That is, build a >>> valid loop >>> structure in the first place? >>> >> >> This patch manages to keep the loop structure, that is, to not cancel >> the loop tree in parloops, and guarantee a valid loop structure at the >> end of parloops. >> >> The transformation to insert the omp_for invalidates the loop state >> properties LOOPS_HAVE_RECORDED_EXITS and LOOPS_HAVE_SIMPLE_LATCHES, so >> we drop those in parloops. >> >> In expand_omp_for_static_nochunk, we detect the existing loop struct of >> the omp_for, and keep it. >> >> Then by calling pass_tree_loop_init after pass_expand_omp_ssa, we get >> the loop state properties LOOPS_HAVE_RECORDED_EXITS and >> LOOPS_HAVE_SIMPLE_LATCHES back. >> > > This updated patch tries a more minimal approach. > > Rather than dropping property LOOPS_HAVE_RECORDED_EXITS, we record the new > exit instead. > > And rather than adding pass_tree_loop_init after pass_expand_omp_ssa, we > just set LOOPS_HAVE_SIMPLE_LATCHES back at the end of pass_expand_omp_ssa. > > Bootstrapped and reg-tested on x86_64. > > OK for trunk?
I wonder about the need to clear LOOPS_HAVE_SIMPLE_LATCHES (and esp. turning that back on in execute_expand_omp). The parloops code lacks comments and the /* Prepare cfg. */ part looks twisty to me - but I don't see why it should be difficult to preserve simple latches as well - is this just because we insert the GIMPLE_OMP_CONTINUE in it? If execute_expand_omp is not performed in a loop pipeline where things expect simple latches (well, obviously it isn't) then why set LOOPS_HAVE_SIMPLE_LATCHES here at all? If somebody needs it it will just request simple latches. So if the GIMPLE_OMP_CONTINUE part is correct then I'd prefer to keep LOOPS_HAVE_SIMPLE_LATCHES unset in execute_expand_omp. Ok with that change. Thanks, Richard. > Thanks, > - Tom >