Hi, rather embarrasingly, I found out that there is a missing condition to make sure that HSA grid size is zero when the OpenMP loop bounds should preclude the loop from executing at all. I do not know whether I lost is somewhere when preparing patches for trunk or whether I forgot about it from the beginning. In any case, the patch below adds it where it should be.
This popped up during my libgomp testsuite runs as a consequence of Jakub's revision 253395 after which HSAIL was apparently generated for a a few more kernels and libgomp.c/for-5.c started to fail (taking the whole machine GPGPU subsystem with it). So there is already a testcase for this. My long term plan for gridification is to replace it with the approach that our nvidia offloading uses once we have simpler (and better supported) function pointers in HSA or/and, better yet, a full blown GCN BE. It does not currently work well but I still try to avoid any regressions (this one took long because the bug started happening when I changed some unrelated things on the APU machine and was suspecting them). Bootstrapped with hsa enabled on an x86_64-linux and tested on an HSA capable APU, OK for trunk? Thanks, Martin 2017-10-10 Martin Jambor <mjam...@suse.cz> * omp-grid.c (grid_attempt_target_gridification): Also insert a condition whether loop should be executed at all. --- gcc/omp-grid.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/gcc/omp-grid.c b/gcc/omp-grid.c index a7b6f60aeaf..121c96ebe39 100644 --- a/gcc/omp-grid.c +++ b/gcc/omp-grid.c @@ -1315,6 +1315,7 @@ grid_attempt_target_gridification (gomp_target *target, n1 = fold_convert (itype, n1); n2 = fold_convert (itype, n2); + tree cond = fold_build2 (cond_code, boolean_type_node, n1, n2); tree step = omp_get_for_step_from_incr (loc, gimple_omp_for_incr (inner_loop, i)); @@ -1328,6 +1329,7 @@ grid_attempt_target_gridification (gomp_target *target, fold_build1 (NEGATE_EXPR, itype, step)); else t = fold_build2 (TRUNC_DIV_EXPR, itype, t, step); + t = fold_build3 (COND_EXPR, itype, cond, t, build_zero_cst (itype)); if (grid.tiling) { if (cond_code == GT_EXPR) -- 2.14.2