On Fri, Jun 16, 2017 at 11:21 AM, Richard Biener
<richard.guent...@gmail.com> wrote:
> On Mon, Jun 12, 2017 at 7:03 PM, Bin Cheng <bin.ch...@arm.com> wrote:
>> Hi,
>> For now, loop distribution handles variables used outside of loop as 
>> reduction.
>> This is inaccurate because all partitions contain statement defining 
>> induction
>> vars.
>
> But final induction values are usually not used outside of the loop...
This is in actuality for induction variable which is used outside of the loop.
>
> What is missing is loop distribution trying to change partition order.  In 
> fact
> we somehow assume we can move a reduction across a detected builtin
> (I don't remember if we ever check for validity of that...).
Hmm, I am not sure when we can't.  If there is any dependence between
builtin/reduction partitions, it should be captured by RDG or PG,
otherwise the partitions are independent and can be freely ordered as
long as reduction partition is scheduled last?
>
>> Ideally we should factor out scev-propagation as a standalone interface
>> which can be called when necessary.  Before that, this patch simply 
>> workarounds
>> reduction issue by checking if the statement belongs to all partitions.  If 
>> yes,
>> the reduction must be computed in the last partition no matter how the loop 
>> is
>> distributed.
>> Bootstrap and test on x86_64 and AArch64.  Is it OK?
>
> stmt_in_all_partitions is not kept up-to-date during partition merging and if
> merging makes the reduction partition(s) pass the stmt_in_all_partitions
> test your simple workaround doesn't work ...
I think it doesn't matter because:
  A) it's really workaround for induction variables.  In general,
induction variables are included by all partition.
  B) After classify partition, we immediately fuses all reduction
partitions.  More stmt_in_all_partitions means we are fusing
non-reduction partition with reduction partition, so the newly
generated (stmt_in_all_partitions) are actually not reduction
statements.  The workaround won't work anyway even the bitmap is
maintained.
>
> As written it's a valid optimization but can you please note it's limitation 
> in
> some comment please?
Yeah, I will add comment explaining it.

Thanks,
bin
>
> Also...
>
> +  bitmap_set_range (stmt_in_all_partitions, 0, rdg->n_vertices);
> +  rdg_build_partitions (rdg, stmts, &partitions, stmt_in_all_partitions);
>
> ick.  Please instead do
>
>    bitmap_copy (smtt_in_all_partitions, partitions[0]->stmts);
>    for (i = 1; i < ...)
>      bitmap_and_into (stmt_in_all_partitons, partitions[i]->stmts);
>
> Thanks,
> Richard.
>
>> Thanks,
>> bin
>> 2017-06-07  Bin Cheng  <bin.ch...@arm.com>
>>
>>         * tree-loop-distribution.c (classify_partition): New parameter and
>>         better handle reduction statement.
>>         (rdg_build_partitions): New parameter and record statements belonging
>>         to all partitions.
>>         (distribute_loop): Update use of above functions.

Reply via email to