On Tue, 9 Sep 2014, Tom de Vries wrote: > On 18-08-14 14:16, Tom de Vries wrote: > > On 06-08-14 17:10, Tom de Vries wrote: > > > We could insert a pass-group here that only deals with functions that have > > > the > > > kernels directive, and do the auto-par thing in a pass_oacc_kernels (which > > > should share the majority of the infrastructure with the parloops pass): > > > ... > > > NEXT_PASS (pass_build_ealias); > > > INSERT_PASSES_AFTER/WITHIN (passes_oacc_kernels) > > > NEXT_PASS (pass_ch); > > > NEXT_PASS (pass_ccp); > > > NEXT_PASS (pass_lim_aux); > > > NEXT_PASS (pass_oacc_par); > > > POP_INSERT_PASSES () > > > ... > > > > > > Any comments, ideas or suggestions ? > > > > I've experimented with implementing this on top of gomp-4_0-branch, and I > > ran > > into PR46032. > > > > PR46032 is about vectorization failure on a function split off by omp > > parallelization. The vectorization fails due to aliasing constraints in the > > split off function, which are not present in the original code.
Heh. At least the omp-low.c parts from comment #1 should be pushed to trunk... > > In the gomp-4_0-branch, the code marked by the openacc kernels directive is > > split off during omp_expand. The generated code has the same additional > > aliasing > > constraints, and in pass_oacc_par the parallelization fails. > > > > The PR46032 contains a tentative patch by Richard Biener, which applies > > cleanly > > on top of 4.6 (I haven't yet reached a level of understanding of > > tree-ssa-structalias.c to be able to resolve the conflict in > > intra_create_variable_infos when applying on 4.7). The tentative patch > > involves > > running ipa-pta, which is also a pass run after the point where we write out > > the > > lto stream. I'm not sure whether it makes sense to run the pta-ipa pass as > > part > > of the pass_oacc_kernels pass list. No, that's not even possible I think. > > I see three ways of continuing from here: > > - take the tentative patch and make it work, including running pta-ipa > > during > > passes_oacc_kernels > > - same, but try somehow to manage without running pta-ipa. > > - try to postpone splitting of the function until the end of pass_oacc_par. I don't understand the last option? What is the actual issue you run into? You split oacc kernels off and _then_ run "autopar" on the split-off function (and get additional kernels)? > > Some advice on how to continue from here would be *highly* appreciated. My > > hunch > > atm is to investigate the last option. > > > > Jakub, > Richard, > > I've investigated the last option, and published the current state in git-only > branch vries/oacc-kernels ( > https://gcc.gnu.org/git/?p=gcc.git;a=shortlog;h=refs/heads/vries/oacc-kernels > ). > > The current state at commit 9255cadc5b6f8f7f4e4506e65a6be7fb3c00cd35 is that: > - a simple loop marked with the oacc kernels directive is analyzed for > parallelization, > - the loop is then rewritten using oacc parallel and oacc loop directives > - these oacc directives are expanded using omp_expand_local > - this results in the loop being split off into a separate function, while > the loop is replaced with a GOACC_parallel call > - all this is done before writing out the lto stream > - no support yet for reductions, nested loops, more than one loop nest in > kernels region > > At toplevel, the added pass list looks like this: > ... > NEXT_PASS (pass_build_ealias); > /* Pass group that runs when there are oacc kernels in the > function. */ Not sure why pass_oacc_kernels runs before all the other local cleanups? I would have put it after pass_cd_dce at least. > NEXT_PASS (pass_oacc_kernels); > PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels) > NEXT_PASS (pass_ch_oacc_kernels); > NEXT_PASS (pass_tree_loop_init); > NEXT_PASS (pass_lim); > NEXT_PASS (pass_ccp); > NEXT_PASS (pass_parallelize_loops_oacc_kernels); > NEXT_PASS (pass_tree_loop_done); > POP_INSERT_PASSES () > ... > > The main question I'm currently facing is the following: when to do lowering > (in other words, rewriting of variable access in terms of .omp_data) of the > kernels region. There are basically 2 passes that contain code to do this: > - pass_lower_omp (on pre-ssa code) > - pass_parallelize_loops (on ssa code) Both use the same utilities. > Atm I'm using pass_lower_omp, and I've added a patch that handles omp-lowered > code conservatively in ccp and forwprop in order for the lowering to remain > until arriving at pass_parallelize_loops_oacc_kernels. You mean omp-_un_-lowered code? > But it might turn out to be easier/necessary to handle this in > pass_parallelize_loops_oacc_kernels instead. I'd do it similar to how autopar does it (not that autopar is a great example for a GCC pass these days...). Richard. > Any advice on this issue, and on the current implementation is welcome. > > Thanks, > - Tom