On Fri, Jun 16, 2017 at 11:21 AM, Richard Biener <richard.guent...@gmail.com> wrote: > On Mon, Jun 12, 2017 at 7:03 PM, Bin Cheng <bin.ch...@arm.com> wrote: >> Hi, >> For now, loop distribution handles variables used outside of loop as >> reduction. >> This is inaccurate because all partitions contain statement defining >> induction >> vars. > > But final induction values are usually not used outside of the loop... This is in actuality for induction variable which is used outside of the loop. > > What is missing is loop distribution trying to change partition order. In > fact > we somehow assume we can move a reduction across a detected builtin > (I don't remember if we ever check for validity of that...). Hmm, I am not sure when we can't. If there is any dependence between builtin/reduction partitions, it should be captured by RDG or PG, otherwise the partitions are independent and can be freely ordered as long as reduction partition is scheduled last? > >> Ideally we should factor out scev-propagation as a standalone interface >> which can be called when necessary. Before that, this patch simply >> workarounds >> reduction issue by checking if the statement belongs to all partitions. If >> yes, >> the reduction must be computed in the last partition no matter how the loop >> is >> distributed. >> Bootstrap and test on x86_64 and AArch64. Is it OK? > > stmt_in_all_partitions is not kept up-to-date during partition merging and if > merging makes the reduction partition(s) pass the stmt_in_all_partitions > test your simple workaround doesn't work ... I think it doesn't matter because: A) it's really workaround for induction variables. In general, induction variables are included by all partition. B) After classify partition, we immediately fuses all reduction partitions. More stmt_in_all_partitions means we are fusing non-reduction partition with reduction partition, so the newly generated (stmt_in_all_partitions) are actually not reduction statements. The workaround won't work anyway even the bitmap is maintained. > > As written it's a valid optimization but can you please note it's limitation > in > some comment please? Yeah, I will add comment explaining it.
Thanks, bin > > Also... > > + bitmap_set_range (stmt_in_all_partitions, 0, rdg->n_vertices); > + rdg_build_partitions (rdg, stmts, &partitions, stmt_in_all_partitions); > > ick. Please instead do > > bitmap_copy (smtt_in_all_partitions, partitions[0]->stmts); > for (i = 1; i < ...) > bitmap_and_into (stmt_in_all_partitons, partitions[i]->stmts); > > Thanks, > Richard. > >> Thanks, >> bin >> 2017-06-07 Bin Cheng <bin.ch...@arm.com> >> >> * tree-loop-distribution.c (classify_partition): New parameter and >> better handle reduction statement. >> (rdg_build_partitions): New parameter and record statements belonging >> to all partitions. >> (distribute_loop): Update use of above functions.