On Thu, 18 Jun 2015, Tom de Vries wrote: > On 18/06/15 12:48, Richard Biener wrote: > > On Thu, 18 Jun 2015, Tom de Vries wrote: > > > > > Hi, > > > > > > I ran into a problem with fortran loops in oacc kernels regions not being > > > parallelized, after introducting transform_to_exit_first_loop_alt. > > > > > > For gfortran.dg/goacc/kernels-loop.f95, we get: > > > ... > > > #pragma omp target oacc_parallel num_gangs(1) > > > ... > > > instead of the desired num_gangs (32). > > > > > > transform_to_exit_first_loop_alt fails because nit is _135, where nit is > > > defined by: > > > ... > > > *_105 = 0; > > > D__lsm.27_50 = *_105; > > > _32 = (unsigned int) D__lsm.27_50; > > > _135 = 1023 - _32; > > > ... > > > > > > pass_fre would manage to propagate the '*105 = 0' assignment. But in the > > > current pass order, pass_fre is run before pass_lim, where this pattern is > > > introduced: > > > ... > > > NEXT_PASS (pass_ch_oacc_kernels); > > > NEXT_PASS (pass_fre); > > > NEXT_PASS (pass_tree_loop_init); > > > NEXT_PASS (pass_lim); > > > NEXT_PASS (pass_copy_prop); > > > NEXT_PASS (pass_scev_cprop); > > > NEXT_PASS (pass_parallelize_loops_oacc_kernels); > > > NEXT_PASS (pass_expand_omp_ssa); > > > NEXT_PASS (pass_tree_loop_done); > > > ... > > > > > > The patch moves pass_fre to the location of pass_copy_prop, and replaces > > > it. > > > Furthermore, it adds scans to the fortran test-cases to make sure they get > > > properly parallelized. > > > > You may now figure out that LIM needs FRE to detect equal memory > > references to apply store-motion. But maybe the issues oacc > > lowering introduces are limited and under your control. > > > > To show the context of the pass group, after this commit the pass group looks > like this: > ... > NEXT_PASS (pass_sra_early); > NEXT_PASS (pass_build_ealias); > NEXT_PASS (pass_fre); > NEXT_PASS (pass_oacc_kernels); > PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels) > NEXT_PASS (pass_ch_oacc_kernels); > NEXT_PASS (pass_tree_loop_init); > NEXT_PASS (pass_lim); > NEXT_PASS (pass_tree_loop_done); > NEXT_PASS (pass_fre); > NEXT_PASS (pass_tree_loop_init); > NEXT_PASS (pass_scev_cprop); > NEXT_PASS (pass_parallelize_loops_oacc_kernels); > NEXT_PASS (pass_expand_omp_ssa); > NEXT_PASS (pass_tree_loop_done); > POP_INSERT_PASSES () > NEXT_PASS (pass_merge_phi); > NEXT_PASS (pass_dse); > NEXT_PASS (pass_cd_dce); > ... > In other words, the pass group is run directly after pass_fre. > > When I move pass_fre before the pass group to directly after the pass group, I > start seeing the failure mode you describe.
Yes, it really depends on what kind of changes pass_oacc_kernels does (though pass_ch_oacc_kernels which is loop-header copying? may also do relevant changes enabling LIM/store-motion after FRE cleanup if there is a loop nest involved) Richard. > Thanks, > - Tom > > -- Richard Biener <[email protected]> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nuernberg)
