[GRAPHITE] Replacement of isl_int by isl_val
straint *c; mpz_t g; - isl_int v; + isl_val *v; c = isl_inequality_alloc (isl_local_space_from_space (space)); mpz_init (g); - isl_int_init (v); tree_int_to_gmp (ub, g); - isl_int_set_gmp (v, g); + v = isl_val_int_from_gmp (the_isl_ctx, g); mpz_clear (g); - c = isl_constraint_set_constant (c, v); - isl_int_clear (v); + c = isl_constraint_set_constant_val (c, v); c = isl_constraint_set_coefficient_si (c, isl_dim_param, p, -1); scop->context = isl_set_add_constraint (scop->context, c); Index: gcc/graphite-clast-to-gimple.c === --- gcc/graphite-clast-to-gimple.c (revision 212311) +++ gcc/graphite-clast-to-gimple.c (working copy) @@ -28,6 +28,18 @@ #include #include #include +#include +/* For C++ linkage of C functions. + Missing from isl/val_gmp.h in isl 0.12 versions. + Appearing in isl/val_gmp.h in isl 0.13. + To be removed when passing to isl 0.13. */ +#if defined(__cplusplus) +extern "C" { +#endif +#include +#if defined(__cplusplus) +} +#endif #include #include #endif @@ -871,18 +883,18 @@ static void compute_bounds_for_param (scop_p scop, int param, mpz_t low, mpz_t up) { - isl_int v; + isl_val *v; isl_aff *aff = isl_aff_zero_on_domain (isl_local_space_from_space (isl_set_get_space (scop->context))); aff = isl_aff_add_coefficient_si (aff, isl_dim_param, param, 1); - isl_int_init (v); - isl_set_min (scop->context, aff, &v); - isl_int_get_gmp (v, low); - isl_set_max (scop->context, aff, &v); - isl_int_get_gmp (v, up); - isl_int_clear (v); + v = isl_set_min_val (scop->context, aff); + isl_val_get_num_gmp (v, low); + isl_val_free (v); + v = isl_set_max_val (scop->context, aff); + isl_val_get_num_gmp (v, up); + isl_val_free (v); isl_aff_free (aff); } @@ -901,8 +913,7 @@ isl_set *domain; isl_aff *dimension; isl_local_space *local_space; - isl_int isl_value; - enum isl_lp_result lp_result; + isl_val *isl_value; domain = isl_set_copy (isl_set_from_cloog_domain (loop->domain)); local_space = isl_local_space_from_space (isl_set_get_space (domain)); @@ -911,17 +922,12 @@ isl_set_dim (domain, isl_dim_set) - 1, 1); - isl_int_init (isl_value); - - lp_result = isl_set_min (domain, dimension, &isl_value); - assert (lp_result == isl_lp_ok); - isl_int_get_gmp (isl_value, low); - - lp_result = isl_set_max (domain, dimension, &isl_value); - assert (lp_result == isl_lp_ok); - isl_int_get_gmp (isl_value, up); - - isl_int_clear (isl_value); + isl_value = isl_set_min_val (domain, dimension); + isl_val_get_num_gmp (isl_value, low); + isl_val_free (isl_value); + isl_value = isl_set_max_val (domain, dimension); + isl_val_get_num_gmp (isl_value, up); + isl_val_free (isl_value); isl_set_free (domain); isl_aff_free (dimension); } 2014-07-06 Mircea Namolaru Replacement of isl-int by isl_val * graphite-clast-to-gimple.c: include isl/val.h, isl/val_gmp.h (compute_bounds_for_param): use isl_val instead of isl_int (compute_bounds_for_loop): likewise * graphite-interchange.c: include isl/val.h, isl/val_gmp.h (build_linearized_memory_access): use isl_val instead of isl_int (pdr_stride_in_loop): likewise * graphite-optimize-isl.c: (getPrevectorMap): use isl_val instead of isl_int * graphite-poly.c: (pbb_number_of_iterations_at_time): use ils_val instead of isl_int graphite-sese-to-poly.c: include isl/val.h, isl/val_gmp.h (extern the_isl_ctx): declare (build_pbb_scattering_polyhedrons): use isl_val instead of isl_int (extract_affine_gmp): likewise (wrap): likewise (build_loop_iteration_domains): likewise (add_param_constraints): likewise (add_param_constraints): likewise
Re: [GRAPHITE, PATCH] Loop unroll and jam optimization
A small modification of the patch + extended comment for the mapping used for computing the separating class. The code didn't contained an adjustment of the stride with 1 (but the comment yes), that was temporary removed and forget to insert it back. The code generated now is: ISL AST generated by ISL: { for (int c0 = 0; c0 < HEIGHT - 3; c0 += 4) for (int c1 = 0; c1 < LENGTH - 3; c1 += 1) for (int c2 = c0; c2 <= c0 + 3; c2 += 1) S_4(c2, c1); if ((HEIGHT - 1) % 4 <= 2) for (int c1 = 0; c1 < LENGTH - 3; c1 += 1) for (int c2 = -((HEIGHT - 1) % 4) + HEIGHT - 1; c2 < HEIGHT; c2 += 1) S_4(c2, c1); } The previous code was correct too (after all the code set an option for AST but the transformation is preserved), but slightly less optimal in the case when N was a multiple of the stride. Mircea - Original Message - > From: "Sven Verdoolaege" > To: "Mircea Namolaru" > Cc: gcc-patches@gcc.gnu.org, "Tobias Grosser" , "Richard > Biener" , > "Albert Cohen" > Sent: Wednesday, November 12, 2014 12:24:40 AM > Subject: Re: [GRAPHITE, PATCH] Loop unroll and jam optimization > > On Tue, Nov 11, 2014 at 04:05:57PM +0100, Mircea Namolaru wrote: > > > I'm not sure if Tobi or Albert have told you, but the separation_class > > > option > > > is going to be phased out since its design is fundamentally flawed. > > > If you can't wait until isl-0.15, then I guess you have no choice but > > > to use this option, but you should realize that it will remain frozen > > > in its current broken state (until it is removed at some point). > > > > > > > No, didn't know about the phase out of separation_class option. Anyway, for > > the time > > being is the best solution available. My understanding is that this option > > should > > always generate correct code, of course as long as the scheduling is > > correct, but > > think that had some cases when setting the separating_class leads to > > incorrect code. > > > > For isl_0.15, do you intend to provide some option with a similar > > functionality ? > > Yes. > > > > > + /* Extract the original and auxiliar maps from pbb->transformed. > > > > + Set pbb->transformed to the original map. */ > > > > + psmap = &smap; > > > > + psmap->n = 0; > > > > + res = isl_map_foreach_basic_map (pbb->transformed, separate_map, > > > > (void *)psmap); > > > > + gcc_assert (res == 0); > > > > + > > > > + isl_map_free(pbb->transformed); > > > > + pbb->transformed = isl_map_copy(psmap->map_arr[0]); > > > > + > > > > > > I have no idea what this pbb->transformed is supposed to represent, > > > but you appear to be assuming that it has exactly two disjuncts and that > > > they appear in some order. Now, perhaps you have explicitly > > > checked that this map has two disjuncts, but then you should > > > probably bring the check closer since any operation on sets that > > > you perform could change the internal representation (even of > > > other sets). However, in no way can you assume that > > > isl_map_foreach_basic_map will iterate over these disjuncts > > > in any specific order. > > > > > > > At this point pbb->transformed has two basic maps, one is the mapping for > > unroll and jam, > > and one for the full tile for the striped dimension. Introduce a check that > > differentiate > > between them as the image of one maps should be included in the other. > > I didn't do a full review (and I won't have time for it either), > but at a quick glance, > you still seem to be assuming that if you take the union of two > basic maps, that you can extract the original two basic maps from the union. > In general, you can't. At least you shouldn't assume that you can. > Besides, your updated code is also pretty convoluted, > with very little documentation. > > skimo > 2014-11-12 Mircea Namolaru * common.opt (flag_loop_unroll_and_jam): New flag for unroll and jam. * params.def (PARAM_LOOP_UNROLL_JAM_SIZE) (PARAM_LOOP_UNROLL_JAM_DEPTH): Parameters for unroll and jam flag. * graphite-poly.h (struct poly_bb:map_sepclass): New field * graphite-poly.c (new_poly_bb): New field intialization. * graphite-isl-ast-to-gimple.c (generate_luj_sepclass_opt, generate_luj_sepclass,generate_luj_options): New functions to s
Re: [GRAPHITE, PATCH] Ping: Loop unroll and jam optimization
isl_keep isl_union_map *schedule, + __isl_take isl_union_map *opt_luj) { isl_ctx *ctx = isl_union_map_get_ctx (schedule); isl_space *range_space = isl_space_set_alloc (ctx, 0, 1); @@ -894,6 +982,9 @@ isl_union_set *domain = isl_union_map_range (isl_union_map_copy (schedule)); domain = isl_union_set_universe (domain); isl_union_map *options = isl_union_map_from_domain_and_range (domain, range); + + options = isl_union_map_union (options, opt_luj); + return isl_ast_build_set_options (control, options); } @@ -907,9 +998,14 @@ isl_options_set_ast_build_atomic_upper_bound (scop->ctx, true); add_parameters_to_ivs_params (scop, ip); + + isl_union_map *options_luj = generate_luj_options (scop); + isl_union_map *schedule_isl = generate_isl_schedule (scop); isl_ast_build *context_isl = generate_isl_context (scop); - context_isl = set_options (context_isl, schedule_isl); + + context_isl = set_options (context_isl, schedule_isl, options_luj); + isl_union_map *dependences = NULL; if (flag_loop_parallelize_all) { Index: gcc/toplev.c === --- gcc/toplev.c (revision 217013) +++ gcc/toplev.c (working copy) @@ -1302,11 +1302,12 @@ || flag_loop_block || flag_loop_interchange || flag_loop_strip_mine - || flag_loop_parallelize_all) + || flag_loop_parallelize_all + || flag_loop_unroll_jam) sorry ("Graphite loop optimizations cannot be used (ISL is not available)" "(-fgraphite, -fgraphite-identity, -floop-block, " "-floop-interchange, -floop-strip-mine, -floop-parallelize-all, " - "and -ftree-loop-linear)"); + "-floop-unroll-and-jam, and -ftree-loop-linear)"); #endif /* One region RA really helps to decrease the code size. */ Index: gcc/params.def === --- gcc/params.def (revision 217013) +++ gcc/params.def (working copy) @@ -847,6 +847,21 @@ "size of tiles for loop blocking", 51, 0, 0) +/* Size of unrolling factor for unroll-and-jam. */ + +DEFPARAM (PARAM_LOOP_UNROLL_JAM_SIZE, + "loop-unroll-jam-size", + "size of unrolling factor for unroll-and-jam", + 4, 0, 0) + +/* Size of the band formed by the strip mined dimension and the most inner one for unroll-and-jam. */ + +DEFPARAM (PARAM_LOOP_UNROLL_JAM_DEPTH, + "loop-unroll-jam-depth", + "depth of unrolled loop for unroll-and-jam", + 2, 0, 0) + + /* Maximal number of parameters that we allow in a SCoP. */ DEFPARAM (PARAM_GRAPHITE_MAX_NB_SCOP_PARAMS, Index: gcc/graphite-poly.c === --- gcc/graphite-poly.c (revision 217013) +++ gcc/graphite-poly.c (working copy) @@ -276,7 +276,7 @@ /* This pass needs to be run at the final stage, as it does not update the lst. */ - if (flag_loop_optimize_isl) + if (flag_loop_optimize_isl || flag_loop_unroll_jam) transform_done |= optimize_isl (scop); return transform_done; @@ -327,6 +327,7 @@ pbb->schedule = NULL; pbb->transformed = NULL; pbb->saved = NULL; + pbb->map_sepclass = NULL; PBB_SCOP (pbb) = scop; pbb_set_black_box (pbb, black_box); PBB_TRANSFORMED (pbb) = NULL; 2014-11-12 Mircea Namolaru * common.opt (flag_loop_unroll_and_jam): New flag for unroll and jam. * params.def (PARAM_LOOP_UNROLL_JAM_SIZE) (PARAM_LOOP_UNROLL_JAM_DEPTH): Parameters for unroll and jam flag. * graphite-poly.h (struct poly_bb:map_sepclass): New field * graphite-poly.c (new_poly_bb): New field intialization. * graphite-isl-ast-to-gimple.c (generate_luj_sepclass_opt, generate_luj_sepclass,generate_luj_options): New functions to set AST options for unroll and jam. * graphite-isl-ast-to-gimple.c (set_options,scop_to_isl_ast): Added support for unroll and jam options. * graphite-optimize-isl.c (getPrevectorMap_full): New function for sepating class map. * graphite-optimize-isl.c (optimize_isl,apply_schedule_map_to_scop, getScheduleMap,getScheduleForBand,getScheduleForBandList): Added support for the separating class map. * graphite.c: Support for unroll and jam flag. * graphite-poly.c: Likewise. * toplev.c: Likewise.
Re: [GRAPHITE, PATCH] Ping: Loop unroll and jam optimization
> New optimization flags and new params need documentation in > gcc/doc/invoke.texi. > Thanks. Added description in invoke.texi. The patch is in trunk. > The description of the --params suggest they provide fixed values - is > there no way to autodetect sensible values with a cost-model? I > hardly doubt that you can find two fixed values that apply for a whole > program... There are a lot of models/heuristics, but in the general case tile sizes remain kind of an open problem. For the time being, this option and its parameters are intended mostly to compiler developers interested to bring the loop to a specific form enabling further optimizations. Mircea.
[GRAPHITE, PATCH] Loop unroll and jam optimization
Hello, This is the patch for unroll and jam optimizations. It is based on the code written by Tobias from graphite-optimize-isl.c (the code was unreachable till now) extended and enabled it via a new option -floop-unroll-jam. The patch takes advantage of the new isl based code generator introduced recently in GCC (in fact of the possible options for building the AST). The code generated for this optimization in the case of non-constant loop bounds initially looks as below. This is not very useful because the standard GCC unrolling don't succeed to unroll the most inner loop. ISL AST generated by ISL: for (int c0 = 0; c0 < HEIGHT; c0 += 4) for (int c1 = 0; c1 < LENGTH - 3; c1 += 1) for (int c2 = c0; c2 <= min(HEIGHT - 1, c0 + 3); c2 += 1) S_4(c2, c1); Now, the "separating class" option (set for unroll and jam) produces this nice loop structure: ISL AST generated by ISL: for (int c0 = 0; c0 < HEIGHT; c0 += 4) for (int c1 = 0; c1 < LENGTH - 3; c1 += 1) if (HEIGHT >= c0 + 4) { for (int c2 = c0; c2 <= c0 + 3; c2 += 1) S_4(c2, c1); } else for (int c2 = c0; c2 < HEIGHT; c2 += 1) S_4(c2, c1); The "unroll" option (set for unroll and jam) produces: ISL AST generated by ISL: for (int c0 = 0; c0 < HEIGHT; c0 += 4) for (int c1 = 0; c1 < LENGTH - 3; c1 += 1) if (HEIGHT >= c0 + 4) { S_4(c0, c1); S_4(c0 + 1, c1); S_4(c0 + 2, c1); S_4(c0 + 3, c1); } else { S_4(c0, c1); if (HEIGHT >= c0 + 2) { S_4(c0 + 1, c1); if (4 * floord(HEIGHT - 3, 4) + 3 == HEIGHT && c0 + 3 == HEIGHT) S_4(HEIGHT - 1, c1); } } The "separate" option (set by default for all dimensions for the new isl based code generator) don't succeed to remove the ifs from the loops and generate two loop structures (this would have been highly desirable). As the stage 1 is going to close soon, quick feedback to this patch is greatly appreciated. Many thanks, Mircea Namolaru 2014-11-7 Mircea Namolaru * common.opt (flag_loop_unroll_jam): New flag for unroll and jam. * params.def (PARAM_LOOP_UNROLL_JAM_SIZE) (PARAM_LOOP_UNROLL_JAM_DEPTH): Parameters for unroll and jam flag. * graphite-isl-ast-to-gimple.c (struct_sep_map): New structure. * graphite-isl-ast-to-gimple.c (separate_map,generate_isl_options_2, generate_isl_options_1,generate_isl_domain_n,generate_isl_options): New functions to set the AST options for unroll and jam. * graphite-isl-ast-to-gimple.c (set_options,scop_to_isl_ast): Added support for unroll and jam options. * graphite-optimize-isl.c (getPrevectorMap_f): New function for unroll and jam. * graphite-optimize-isl.c (getScheduleForBand,getScheduleForBandList): Added support for unroll and jam. * graphite.c: Support for unroll and jam flag. * graphite-poly.c: Same. * toplev.c: Same. Index: gcc/graphite.c === --- gcc/graphite.c (revision 217013) +++ gcc/graphite.c (working copy) @@ -383,7 +383,8 @@ || flag_loop_strip_mine || flag_graphite_identity || flag_loop_parallelize_all - || flag_loop_optimize_isl) + || flag_loop_optimize_isl + || flag_loop_unroll_jam) flag_graphite = 1; return flag_graphite != 0; Index: gcc/common.opt === --- gcc/common.opt (revision 217013) +++ gcc/common.opt (working copy) @@ -1328,6 +1328,10 @@ Common Report Var(flag_loop_block) Optimization Enable Loop Blocking transformation +floop-unroll-jam +Common Report Var(flag_loop_unroll_jam) Optimization +Enable Loop Unroll Jam transformation + fgnu-tm Common Report Var(flag_tm) Enable support for GNU transactional memory Index: gcc/toplev.c === --- gcc/toplev.c (revision 217013) +++ gcc/toplev.c (working copy) @@ -1302,11 +1302,12 @@ || flag_loop_block || flag_loop_interchange || flag_loop_strip_mine - || flag_loop_parallelize_all) + || flag_loop_parallelize_all + || flag_loop_unroll_jam) sorry ("Graphite loop optimizations cannot be used (ISL is not available)" "(-fgraphite, -fgraphite-identity, -floop-block, " "-floop-interchange, -floop-strip-mine, -floop-parallelize-all, " - "and -ftree-loop-linear)"); + "-floop-unroll-jam, and -ftree-loop-linear)"); #endif /* One region RA really helps to decrease the code size. */ Index: gcc/params.def === --- gcc/params.def (revision 217013) +++ gcc/params.def (working copy) @@ -847,6 +847,21 @@ "size of tiles for loop blocking", 51, 0
Re: [GRAPHITE, PATCH] Loop unroll and jam optimization
isl_union_set_union (domain_isl, + isl_union_set_from_set (bb_domain_r)); + isl_set_free (bb_domain_s); + } + else + gcc_assert(0); + + isl_map_free(psmap->map_arr[0]); + isl_map_free(psmap->map_arr[1]); +} + + return domain_isl; +} + +/* Set the AST built options for loop unroll and jam. */ + +static __isl_give isl_union_map * +generate_luj_options (scop_p scop) +{ + isl_union_set *domain_isl; + isl_union_map *options_isl_ss; + isl_union_map *options_isl = +isl_union_map_empty (isl_set_get_space (scop->context)); + int dim = get_max_schedule_dimensions (scop) - 1; + int dim1 = dim - PARAM_VALUE (PARAM_LOOP_UNROLL_JAM_DEPTH); + + if (!flag_loop_unroll_jam) +return options_isl; + + domain_isl = generate_luj_sepclass (scop); + + options_isl_ss = generate_luj_sepclass_opt (scop, domain_isl, dim1, 0); + options_isl = isl_union_map_union (options_isl, options_isl_ss); + + return options_isl; +} + /* Generates a schedule, which specifies an order used to visit elements in a domain. */ @@ -879,11 +1025,13 @@ } /* Set the separate option for all dimensions. - This helps to reduce control overhead. */ + This helps to reduce control overhead. + Set the options for unroll and jam. */ static __isl_give isl_ast_build * set_options (__isl_take isl_ast_build *control, - __isl_keep isl_union_map *schedule) + __isl_keep isl_union_map *schedule, + __isl_take isl_union_map *opt_luj) { isl_ctx *ctx = isl_union_map_get_ctx (schedule); isl_space *range_space = isl_space_set_alloc (ctx, 0, 1); @@ -894,6 +1042,9 @@ isl_union_set *domain = isl_union_map_range (isl_union_map_copy (schedule)); domain = isl_union_set_universe (domain); isl_union_map *options = isl_union_map_from_domain_and_range (domain, range); + + options = isl_union_map_union (options, opt_luj); + return isl_ast_build_set_options (control, options); } @@ -907,9 +1058,14 @@ isl_options_set_ast_build_atomic_upper_bound (scop->ctx, true); add_parameters_to_ivs_params (scop, ip); + + isl_union_map *options_luj = generate_luj_options (scop); + isl_union_map *schedule_isl = generate_isl_schedule (scop); isl_ast_build *context_isl = generate_isl_context (scop); - context_isl = set_options (context_isl, schedule_isl); + + context_isl = set_options (context_isl, schedule_isl, options_luj); + isl_union_map *dependences = NULL; if (flag_loop_parallelize_all) { Index: gcc/graphite-poly.c === --- gcc/graphite-poly.c (revision 217013) +++ gcc/graphite-poly.c (working copy) @@ -276,7 +276,7 @@ /* This pass needs to be run at the final stage, as it does not update the lst. */ - if (flag_loop_optimize_isl) + if (flag_loop_optimize_isl || flag_loop_unroll_jam) transform_done |= optimize_isl (scop); return transform_done; Index: gcc/params.def === --- gcc/params.def (revision 217013) +++ gcc/params.def (working copy) @@ -847,6 +847,21 @@ "size of tiles for loop blocking", 51, 0, 0) +/* Size of unrolling factor for unroll-and-jam. */ + +DEFPARAM (PARAM_LOOP_UNROLL_JAM_SIZE, + "loop-unroll-jam-size", + "size of unrolling factor for unroll-and-jam", + 4, 0, 0) + +/* Size of the band formed by the strip mined dimension and the most inner one for unroll-and-jam. */ + +DEFPARAM (PARAM_LOOP_UNROLL_JAM_DEPTH, + "loop-unroll-jam-depth", + "depth of unrolled loop for unroll-and-jam", + 2, 0, 0) + + /* Maximal number of parameters that we allow in a SCoP. */ DEFPARAM (PARAM_GRAPHITE_MAX_NB_SCOP_PARAMS, 2014-11-11 Mircea Namolaru * common.opt (flag_loop_unroll_and_jam): New flag for unroll and jam. * params.def (PARAM_LOOP_UNROLL_JAM_SIZE) (PARAM_LOOP_UNROLL_JAM_DEPTH): Parameters for unroll and jam flag. * graphite-isl-ast-to-gimple.c (struct_sep_map): New structure. * graphite-isl-ast-to-gimple.c (separate_map,generate_luj_sepclass_opt, generate_luj_sepclass,generate_luj_options): New functions to set AST options for unroll and jam. * graphite-isl-ast-to-gimple.c (set_options,scop_to_isl_ast): Added support for unroll and jam options. * graphite-optimize-isl.c (getPrevectorMap_full): New function for unroll and jam. * graphite-optimize-isl.c (getScheduleForBand,getScheduleForBandList): Added support for unroll and jam. * graphite.c: Support for unroll and jam flag. * graphite-poly.c: Same. * toplev.c: Same.
Re: [GRAPHITE, PATCH] Loop unroll and jam optimization
Changed the option to -floop-unroll-and jam as you suggested. > > The patch takes advantage of the new isl based code generator introduced > > recently > > in GCC (in fact of the possible options for building the AST). > > > > The code generated for this optimization in the case of non-constant loop > > bounds > > initially looks as below. This is not very useful because the standard GCC > > unrolling don't succeed to unroll the most inner loop. > > > > ISL AST generated by ISL: > > for (int c0 = 0; c0 < HEIGHT; c0 += 4) > > for (int c1 = 0; c1 < LENGTH - 3; c1 += 1) > > for (int c2 = c0; c2 <= min(HEIGHT - 1, c0 + 3); c2 += 1) > > Hmm, so this iterates at most 4 times, right? Eventually the body is > considered > too large by GCC or it fails to compute an upper bound for the number > of iterations. > Is that (an upper bound for the number of iterations) available readily from > ISL > at code-generation time? If so you can transfer this knowledge to the GCC > loop > information. > The problem was not explained well. It is not only the unrolling, it is also the loop separation (which the latest version of the patch does). Even if the gcc unrolling succeeds to unroll the inner loop you will get a code similar with the one obtained by the previous version of this patch, which is not what is wanted. Last time when checked, GCC unrolling was not able to unroll the inner loop. In my opinion it is the min and max that prevent it (graphite for blocking, strip-mine, unroll and jam emits such code). The bounds of the iteration domain are expressed in min, max terms. > I'm curious to see a testcase (and a way to generate the above form) to see > what > is actually the problem. > Of course. Take the code from the unroll-and-jam patch and the attached test case (but as said other graphite options will generate similar code). But somehow it seems that the new isl based code generator could handle more easily such transformations. Mircea > Thanks, > Richard. > > > S_4(c2, c1); > > > > Now, the "separating class" option (set for unroll and jam) produces this > > nice loop > > structure: > > ISL AST generated by ISL: > > for (int c0 = 0; c0 < HEIGHT; c0 += 4) > > for (int c1 = 0; c1 < LENGTH - 3; c1 += 1) > > if (HEIGHT >= c0 + 4) { > > for (int c2 = c0; c2 <= c0 + 3; c2 += 1) > > S_4(c2, c1); > > } else > > for (int c2 = c0; c2 < HEIGHT; c2 += 1) > > S_4(c2, c1); > > > > The "unroll" option (set for unroll and jam) produces: > > ISL AST generated by ISL: > > for (int c0 = 0; c0 < HEIGHT; c0 += 4) > > for (int c1 = 0; c1 < LENGTH - 3; c1 += 1) > > if (HEIGHT >= c0 + 4) { > > S_4(c0, c1); > > S_4(c0 + 1, c1); > > S_4(c0 + 2, c1); > > S_4(c0 + 3, c1); > > } else { > > S_4(c0, c1); > > if (HEIGHT >= c0 + 2) { > > S_4(c0 + 1, c1); > > if (4 * floord(HEIGHT - 3, 4) + 3 == HEIGHT && c0 + 3 == HEIGHT) > > S_4(HEIGHT - 1, c1); > > } > > } > > > > The "separate" option (set by default for all dimensions for the new isl > > based code generator) > > don't succeed to remove the ifs from the loops and generate two loop > > structures (this would > > have been highly desirable). > > > > As the stage 1 is going to close soon, quick feedback to this patch is > > greatly appreciated. > > Many thanks, Mircea Namolaru > int f1(int v[1024][1024], int HEIGHT, int LENGTH) { int i, j; for (i=0; i
Re: [GRAPHITE, PATCH] Loop unroll and jam optimization
> > > At this point pbb->transformed has two basic maps, one is the mapping for > > unroll and jam, > > and one for the full tile for the striped dimension. Introduce a check that > > differentiate > > between them as the image of one maps should be included in the other. > > I didn't do a full review (and I won't have time for it either), > but at a quick glance, > you still seem to be assuming that if you take the union of two > basic maps, that you can extract the original two basic maps from the union. > In general, you can't. At least you shouldn't assume that you can. > Besides, your updated code is also pretty convoluted, > with very little documentation. > Many thanks. To considerably ease the computation of separation class, needed to transmit information available at scheduling computation time to AST computation time. I didn't want to introduce new fields in graphite global structures, and tried to trick pbb->transformed to contain the unroll and jam mapping as well as a mapping use to compute the separation class. It wasn't good idea (for the mappings used by unroll and jam my version of isl preserve the basic mappings afteer union), but of course my code should not be based on assumptions not guaranteed by isl semantics. So, in this updated patch I go the other way and keep the additional information required in the pbb structure. This has the advantages that removes the most convoluted part of my code dealing with the two basic maps kept in pbb->transformed. As before in graphite-optimize_isl,c, beside the map for unroll and jam, another map needed for the separation class defined by full tiles is computed. In graphite-isl-ast-to-gimple.c, the separation class option is set (using the auxiliary map computed previously) and is added to the other AST build options. Mircea 2014-11-12 Mircea Namolaru * common.opt (flag_loop_unroll_and_jam): New flag for unroll and jam. * params.def (PARAM_LOOP_UNROLL_JAM_SIZE) (PARAM_LOOP_UNROLL_JAM_DEPTH): Parameters for unroll and jam flag. * graphite-poly.h (struct poly_bb:map_sepclass): New field * graphite-poly.c (new_poly_bb): New field intialization. * graphite-isl-ast-to-gimple.c (generate_luj_sepclass_opt, generate_luj_sepclass,generate_luj_options): New functions to set AST options for unroll and jam. * graphite-isl-ast-to-gimple.c (set_options,scop_to_isl_ast): Added support for unroll and jam options. * graphite-optimize-isl.c (getPrevectorMap_full): New function for sepating class map. * graphite-optimize-isl.c (optimize_isl,apply_schedule_map_to_scop, getScheduleMap,getScheduleForBand,getScheduleForBandList): Added support for the separating class map. * graphite.c: Support for unroll and jam flag. * graphite-poly.c: Likewise. * toplev.c: Likewise. Index: gcc/toplev.c === --- gcc/toplev.c (revision 217013) +++ gcc/toplev.c (working copy) @@ -1302,11 +1302,12 @@ || flag_loop_block || flag_loop_interchange || flag_loop_strip_mine - || flag_loop_parallelize_all) + || flag_loop_parallelize_all + || flag_loop_unroll_jam) sorry ("Graphite loop optimizations cannot be used (ISL is not available)" "(-fgraphite, -fgraphite-identity, -floop-block, " "-floop-interchange, -floop-strip-mine, -floop-parallelize-all, " - "and -ftree-loop-linear)"); + "-floop-unroll-and-jam, and -ftree-loop-linear)"); #endif /* One region RA really helps to decrease the code size. */ Index: gcc/graphite-optimize-isl.c === --- gcc/graphite-optimize-isl.c (revision 217013) +++ gcc/graphite-optimize-isl.c (working copy) @@ -186,7 +186,7 @@ PartialSchedule = isl_band_get_partial_schedule (Band); *Dimensions = isl_band_n_member (Band); - if (DisableTiling) + if (DisableTiling || flag_loop_unroll_jam) return PartialSchedule; /* It does not make any sense to tile a band with just one dimension. */ @@ -241,7 +241,9 @@ constant number of iterations, if the number of loop iterations at DimToVectorize can be devided by VectorWidth. The default VectorWidth is currently constant and not yet target specific. This function does not reason - about parallelism. */ + about parallelism. + + */ static isl_map * getPrevectorMap (isl_ctx *ctx, int DimToVectorize, int ScheduleDimensions, @@ -305,8 +307,89 @@ isl_constraint_set_constant_si (c, VectorWidth - 1); TilingMap = isl_map_add_constraint (TilingMap, c); - isl_map_dump (TilingMap); + return TilingMap; +} +/* Compute an auxiliary map to getPrevectorMap, for computing the separat
{GRAPHITE] Replacement of isl_int by isl_val
Patch for replacement of the isl_int (obsolete) by isl_val. No regressions for c/c++/fortran on x86-64 Linux.2014-02-09 Mircea Namolaru Replacement of isl-int by isl_val * graphite-clast-to-gimple.c: include isl/val.h, isl/val_gmp.h (compute_bounds_for_param): use isl_val instead of isl_int (compute_bounds_for_loop): likewise * graphite-interchange.c: include isl/val.h, isl/val_gmp.h (build_linearized_memory_access) (pdr_stride_in_loop) * graphite-optimize-isl.c: (getPrevectorMap): use isl_val instead of isl_int * graphite-poly.c: (pbb_number_of_iterations_at_time) graphite-sese-to-poly.c: include isl/val.h, isl/val_gmp.h (extern the_isl_ctx): declare (build_pbb_scattering_polyhedrons): use isl_val instead of isl_int (extract_affine_gmp): likewise (wrap): likewise (build_loop_iteration_domains): likewise (add_param_constraints): likewise (add_param_constraints): likewise Index: gcc/graphite-optimize-isl.c === --- gcc/graphite-optimize-isl.c (revision 207298) +++ gcc/graphite-optimize-isl.c (working copy) @@ -260,6 +260,7 @@ DimToVectorize can be devided by VectorWidth. The default VectorWidth is currently constant and not yet target specific. This function does not reason about parallelism. */ + static isl_map * getPrevectorMap (isl_ctx *ctx, int DimToVectorize, int ScheduleDimensions, @@ -273,8 +274,9 @@ isl_aff *Aff; int PointDimension; /* ip */ int TileDimension; /* it */ - isl_int VectorWidthMP; + isl_val *VectorWidthMP; int i; + isl_ctx *ct; /* assert (0 <= DimToVectorize && DimToVectorize < ScheduleDimensions);*/ @@ -304,10 +306,10 @@ Aff = isl_aff_zero_on_domain (LocalSpaceRange); Aff = isl_aff_set_constant_si (Aff, VectorWidth); Aff = isl_aff_set_coefficient_si (Aff, isl_dim_in, TileDimension, 1); - isl_int_init (VectorWidthMP); - isl_int_set_si (VectorWidthMP, VectorWidth); - Aff = isl_aff_mod (Aff, VectorWidthMP); - isl_int_clear (VectorWidthMP); + + ct = isl_aff_get_ctx (Aff); + VectorWidthMP = isl_val_int_from_si (ct, VectorWidth); + Aff = isl_aff_mod_val (Aff, VectorWidthMP); Modulo = isl_pw_aff_zero_set (isl_pw_aff_from_aff (Aff)); TilingMap = isl_map_intersect_range (TilingMap, Modulo); Index: gcc/graphite-sese-to-poly.c === --- gcc/graphite-sese-to-poly.c (revision 207298) +++ gcc/graphite-sese-to-poly.c (working copy) @@ -26,8 +26,15 @@ #include #include #include +#include +#if defined(__cplusplus) +extern "C" { +#endif +#include +#if defined(__cplusplus) +} +#endif #include -#include #include #endif @@ -67,7 +74,6 @@ #include "graphite-poly.h" #include "graphite-sese-to-poly.h" - /* Assigns to RES the value of the INTEGER_CST T. */ static inline void @@ -481,13 +487,11 @@ int i; int nb_iterators = pbb_dim_iter_domain (pbb); int used_scattering_dimensions = nb_iterators * 2 + 1; - isl_int val; + isl_val *val; isl_space *dc, *dm; gcc_assert (scattering_dimensions >= used_scattering_dimensions); - isl_int_init (val); - dc = isl_set_get_space (pbb->domain); dm = isl_space_add_dims (isl_space_from_domain (dc), isl_dim_out, scattering_dimensions); @@ -501,12 +505,10 @@ isl_constraint *c = isl_equality_alloc (isl_local_space_from_space (isl_map_get_space (pbb->schedule))); - if (0 != isl_aff_get_coefficient (static_sched, isl_dim_in, - i / 2, &val)) - gcc_unreachable (); + val = isl_aff_get_coefficient_val (static_sched, isl_dim_in, i / 2); - isl_int_neg (val, val); - c = isl_constraint_set_constant (c, val); + val = isl_val_neg (val); + c = isl_constraint_set_constant_val (c, val); c = isl_constraint_set_coefficient_si (c, isl_dim_out, i, 1); pbb->schedule = isl_map_add_constraint (pbb->schedule, c); } @@ -520,8 +522,6 @@ } } - isl_int_clear (val); - pbb->transformed = isl_map_copy (pbb->schedule); } @@ -700,12 +700,12 @@ isl_local_space *ls = isl_local_space_from_space (isl_space_copy (space)); isl_aff *aff = isl_aff_zero_on_domain (ls); isl_set *dom = isl_set_universe (space); - isl_int v; + isl_val *v; + isl_ctx *ct; - isl_int_init (v); - isl_int_set_gmp (v, g); - aff = isl_aff_add_constant (aff, v); - isl_int_clear (v); + ct = isl_aff_get_ctx (aff); + v = isl_val_int_from_gmp (ct, g); + aff = isl_aff_add_constant_val (aff, v); return isl_pw_aff_alloc (dom, aff); } @@ -728,19 +728,17 @@ /* Compute pwaff mod 2^width. */ +extern isl_ctx *the_isl_ctx; + static isl_pw_aff * wrap (isl_pw_aff *pwaff, unsigned width) { - isl_int mod; + isl_val *mod; - isl_int_init (mod); - isl_int_set_si (mod, 1); - isl_int
Re: [PATCH,GRAPHITE] Fix for P1 bug 58028
Thanks for comments - updated the patch (fixed my e-mail address too :-)). 2014-02-26 Tobias Grosser Mircea Namolaru Fix for bug 58028 * graphite-clast-to-gimple.c (set_cloog_options): Don't remove scalar dimensions. Index: gcc/graphite-clast-to-gimple.c === --- gcc/graphite-clast-to-gimple.c (revision 207298) +++ gcc/graphite-clast-to-gimple.c (working copy) @@ -1522,6 +1522,13 @@ variables. */ options->save_domains = 1; + /* Do not remove scalar dimensions. CLooG by default removes scalar + dimensions very early from the input schedule. However, they are + necessary to correctly derive from the saved domains + (options->save_domains) the relationship between the generated loops + and the schedule dimensions they are generated from. */ + options->noscalars = 1; + /* Disable optimizations and make cloog generate source code closer to the input. This is useful for debugging, but later we want the optimized code. - Mail original - > De: "Rainer Orth" > À: "Tobias Burnus" > Cc: "Tobias Grosser" , "Mircea Namolaru" > , gcc-patches@gcc.gnu.org > Envoyé: Jeudi 27 Février 2014 11:14:30 > Objet: Re: [PATCH,GRAPHITE] Fix for P1 bug 58028 > > Tobias Burnus writes: > > > On 02/26/2014 10:09 PM, Mircea Namolaru wrote: > >> + /* Do not remove scalar dimensions. Cloog be default removes scalar > >> + dimensions very early from the input schedule. However, they are > > > > Small nit: Instead of 'be default' it should be 'by default' > > ... and two spaces after the full stops. While you're at it, please fix > the CLooG capitalization ;-) > > Rainer > > -- > - > Rainer Orth, Center for Biotechnology, Bielefeld University >
Re: [PATCH,GRAPHITE] Fix for P1 bug 58028
Hi, Thanks. Here is the updated patch. 2014-02-26 Tobias Grosser Mircea Namolaru PR tree-optimization/58028 * graphite-clast-to-gimple.c (set_cloog_options): Don't remove scalar dimensions. Index: gcc/graphite-clast-to-gimple.c === --- gcc/graphite-clast-to-gimple.c (revision 207298) +++ gcc/graphite-clast-to-gimple.c (working copy) @@ -1522,6 +1522,13 @@ variables. */ options->save_domains = 1; + /* Do not remove scalar dimensions. CLooG by default removes scalar + dimensions very early from the input schedule. However, they are + necessary to correctly derive from the saved domains + (options->save_domains) the relationship between the generated loops + and the schedule dimensions they are generated from. */ + options->noscalars = 1; + /* Disable optimizations and make cloog generate source code closer to the input. This is useful for debugging, but later we want the optimized code. Mircea
Replacement of isl_int by isl_val
l_val *mod; - isl_int_init (mod); - isl_int_set_si (mod, 1); - isl_int_mul_2exp (mod, mod, width); + mod = isl_val_int_from_ui(the_isl_ctx, width); + mod = isl_val_2exp (mod); + pwaff = isl_pw_aff_mod_val (pwaff, mod); - pwaff = isl_pw_aff_mod (pwaff, mod); - - isl_int_clear (mod); - return pwaff; } @@ -994,11 +998,10 @@ isl_space *space; isl_constraint *c; int pos = isl_set_dim (outer, isl_dim_set); - isl_int v; + isl_val *v; mpz_t g; mpz_init (g); - isl_int_init (v); inner = isl_set_add_dims (inner, isl_dim_set, 1); space = isl_set_get_space (inner); @@ -1016,8 +1019,8 @@ (isl_local_space_from_space (isl_space_copy (space))); c = isl_constraint_set_coefficient_si (c, isl_dim_set, pos, -1); tree_int_to_gmp (nb_iters, g); - isl_int_set_gmp (v, g); - c = isl_constraint_set_constant (c, v); + v = isl_val_int_from_gmp (the_isl_ctx, g); + c = isl_constraint_set_constant_val (c, v); inner = isl_set_add_constraint (inner, c); } @@ -1071,9 +1074,9 @@ c = isl_inequality_alloc (isl_local_space_from_space (isl_space_copy (space))); c = isl_constraint_set_coefficient_si (c, isl_dim_set, pos, -1); - isl_int_set_gmp (v, g); + v = isl_val_int_from_gmp (the_isl_ctx, g); mpz_clear (g); - c = isl_constraint_set_constant (c, v); + c = isl_constraint_set_constant_val (c, v); inner = isl_set_add_constraint (inner, c); } else @@ -1096,7 +1099,6 @@ isl_set_free (outer); isl_space_free (space); - isl_int_clear (v); mpz_clear (g); } @@ -1330,17 +1332,15 @@ isl_space *space = isl_set_get_space (scop->context); isl_constraint *c; mpz_t g; - isl_int v; + isl_val *v; c = isl_inequality_alloc (isl_local_space_from_space (space)); mpz_init (g); - isl_int_init (v); tree_int_to_gmp (lb, g); - isl_int_set_gmp (v, g); - isl_int_neg (v, v); + v = isl_val_int_from_gmp (the_isl_ctx, g); + v = isl_val_neg (v); mpz_clear (g); - c = isl_constraint_set_constant (c, v); - isl_int_clear (v); + c = isl_constraint_set_constant_val (c, v); c = isl_constraint_set_coefficient_si (c, isl_dim_param, p, 1); scop->context = isl_set_add_constraint (scop->context, c); @@ -1351,17 +1351,15 @@ isl_space *space = isl_set_get_space (scop->context); isl_constraint *c; mpz_t g; - isl_int v; + isl_val *v; c = isl_inequality_alloc (isl_local_space_from_space (space)); mpz_init (g); - isl_int_init (v); tree_int_to_gmp (ub, g); - isl_int_set_gmp (v, g); + v = isl_val_int_from_gmp (the_isl_ctx, g); mpz_clear (g); - c = isl_constraint_set_constant (c, v); - isl_int_clear (v); + c = isl_constraint_set_constant_val (c, v); c = isl_constraint_set_coefficient_si (c, isl_dim_param, p, -1); scop->context = isl_set_add_constraint (scop->context, c); 2014-08-03 Mircea Namolaru Replacement of isl-int by isl_val * graphite-clast-to-gimple.c: include isl/val.h, isl/val_gmp.h (compute_bounds_for_param): use isl_val instead of isl_int (compute_bounds_for_loop): likewise * graphite-interchange.c: include isl/val.h, isl/val_gmp.h (build_linearized_memory_access): use isl_val instead of isl_int (pdr_stride_in_loop): likewise * graphite-optimize-isl.c: (getPrevectorMap): use isl_val instead of isl_int * graphite-poly.c: (pbb_number_of_iterations_at_time): use ils_val instead of isl_int graphite-sese-to-poly.c: include isl/val.h, isl/val_gmp.h (extern the_isl_ctx): declare (build_pbb_scattering_polyhedrons): use isl_val instead of isl_int (extract_affine_gmp): likewise (wrap): likewise (build_loop_iteration_domains): likewise (add_param_constraints): likewise (add_param_constraints): likewise
Re: Replacement of isl_int by isl_val
> On 08/03/14 17:44, Mircea Namolaru wrote: > > 2014-08-03 Mircea Namolaru > > > > Replacement of isl-int by isl_val > > * graphite-clast-to-gimple.c: include isl/val.h, isl/val_gmp.h > > (compute_bounds_for_param): use isl_val instead of isl_int > > (compute_bounds_for_loop): likewise > > * graphite-interchange.c: include isl/val.h, isl/val_gmp.h > > (build_linearized_memory_access): use isl_val instead of isl_int > > (pdr_stride_in_loop): likewise > > * graphite-optimize-isl.c: > > (getPrevectorMap): use isl_val instead of isl_int > > * graphite-poly.c: > > (pbb_number_of_iterations_at_time): use ils_val instead of isl_int > > graphite-sese-to-poly.c: include isl/val.h, isl/val_gmp.h > > (extern the_isl_ctx): declare > > (build_pbb_scattering_polyhedrons): use isl_val instead of isl_int > > (extract_affine_gmp): likewise > > (wrap): likewise > > (build_loop_iteration_domains): likewise > > (add_param_constraints): likewise > > (add_param_constraints): likewise > This is good. Please install if you haven't already. > > jeff > I don't have maintainer (write) permissions. Many thanks, Mircea
Re: [PATCH] Fix bug 59586
Hi, Your patch is fine - even without this bug, introducing NULL pointer checks before dereferencing a pointer is a good thing. Mircea
Re: Fix PR59586
Hi, I think that NULL pointer checks should be added for all pointers must_raw, may_raw etc, not only for the *_no_source ones. This will make the function more robust and easier to maintain. Indeed in the current code only the *_no_source pointers may be NULL, but this may change in the future so you don't want to base the correctness of the code on this assumption. Mircea - Original Message - > From: "Roman Gareev" > To: gcc-patches@gcc.gnu.org > Cc: "Tobias Grosser" , "mircea namolaru" > > Sent: Monday, March 10, 2014 5:39:47 PM > Subject: Fix PR59586 > > This patch fixes PR59586. > The segfault is caused by NULL arguments passed to compute_deps by > loop_level_carries_dependences. > This causes an assignment of NULL values to the no_source parameters > of compute_deps. > They are passed to subtract_commutative_associative_deps and dereferenced. > > However, this NULL arguments are appropriate for the algorithm used > in loop_level_carries_dependences. It uses compute_deps > for finding RAW, WAR and WAW dependences of all basic blocks > in the body of the given loop. Subsequently, it tries to > determine presence of these dependences at the given level. > Therefore it maps the relation of the dependences to the relation > of the corresponding time-stamps and intersects the result with > the relation in which all the inputs before the DEPTH occur at the > same time as the output, and the input at the DEPTH occurs before output. > If the intersection is not empty, some dependences are carried > by the DEPTH we currently check and the loop is consequently not parallel. > > This patch tries to avoid the problem by adding NULL checking of the > no_source statements to > subtract_commutative_associative_deps. > > Tested x86_64-unknown-linux-gnu, applying to 4.8.3 and trunk. >