https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82591
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Last reconfirmed| |2017-10-18 CC| |skimo at kotnet dot org, | |spop at gcc dot gnu.org Target Milestone|--- |8.0 Ever confirmed|0 |1 --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- With GCC 7 we have zero SCOPs detected, with GCC 8 we detect one SCOP. What takes all the time is code generation(!), namely isl_ast_build_node_from_schedule. The schedule isn't that complicated: domain: "[P_32] -> { S_6[i1, i2] : 0 <= P_32 <= 65535 and 0 <= i1 <= 127 and 0 <= i2 <= 2 and 65536*floor((-513 + P_32 - 1539i1 - 513i2)/65536) >= -514 + P_32 - 1539i1 - 513i2 }" child: schedule: "[P_32] -> [{ S_6[i1, i2] -> [(floor((i1)/51))] }, { S_6[i1, i2] -> [(floor((i2)/51))] }]" permutable: 1 coincident: [ 1, 1 ] options: "{ separate[0] }" child: schedule: "[P_32] -> [{ S_6[i1, i2] -> [(i1 - 51*floor((i1)/51))] }, { S_6[i1, i2] -> [(i2 - 51*floor((i2)/51))] }]" permutable: 1 coincident: [ 1, 1 ] options: "{ separate[0] }" note there's no limiting in place on the AST generator. I don't immediately see what we can do about this - it looks like a bug in ISL to me. Sven? It looks like the above might be enough to turn this into a ISL testcase for test_inputs/codegen? The separate option is not necessary to trigger the problem. The AST generated is for (int c0 = 0; c0 <= 2; c0 += 1) for (int c2 = 0; c2 <= min(50, -51 * c0 + 127); c2 += 1) for (int c3 = 0; c3 <= 2; c3 += 1) if ((-P_32 + 78489 * c0 + 1539 * c2 + 513 * c3 + 66048) % 65536 >= 65534) S_6(51 * c0 + c2, c3); The AST generated for the original schedule (before ISL optimizing and blocking) is for (int c0 = 0; c0 <= 127; c0 += 1) for (int c1 = 0; c1 <= 2; c1 += 1) if ((-P_32 + 1539 * c0 + 513 * c1 + 66048) % 65536 >= 65534) S_6(c0, c1); and the corresponding schedule is domain: "[P_32] -> { S_6[i1, i2] : 0 <= P_32 <= 65535 and 0 <= i1 <= 127 and 0 <= i2 <= 2 and 65536*floor((-513 + P_32 - 1539i1 - 513i2)/65536) >= -514 + P_32 - 1539i1 - 513i2 }" child: schedule: "[P_32] -> L_1[{ S_6[i1, i2] -> [(i1)] }]" child: schedule: "[P_32] -> L_2[{ S_6[i1, i2] -> [(i2)] }]" On the GCC side we can avoid the slowness with --param loop-block-tile-size=0 thus turn off forced blocking (we're doing quite stupid blocking by 51 even on loops that we know iterate less...). Our tiling code is doing long tile_size = PARAM_VALUE (PARAM_LOOP_BLOCK_TILE_SIZE); if (dims <= 1 || tile_size == 0 || !isl_schedule_node_band_get_permutable (node)) { if (dump_file && dump_flags) fprintf (dump_file, "not tiled\n"); return node; } /* Tile loops. */ space = isl_schedule_node_band_get_space (node); isl_multi_val *sizes = isl_multi_val_zero (space); isl_ctx *ctx = isl_schedule_node_get_ctx (node); for (unsigned i = 0; i < dims; i++) { sizes = isl_multi_val_set_val (sizes, i, isl_val_int_from_si (ctx, tile_size)); if (dump_file && dump_flags) fprintf (dump_file, "tiled by %ld\n", tile_size); } node = isl_schedule_node_band_tile (node, sizes); node = isl_schedule_node_child (node, 0); any hint where we can avoid tiling with obviously too big tiles?