On Tue, Dec 20, 2011 at 2:23 PM, Rohit Arul Raj <rohitarul...@gmail.com> wrote:
> Hello All,
>
> With the code given below, i expected the ppc compiler (e500mc v4.6.2)
> to generate 'memset' zero  call for loop initialization (at '-O3'),
> but it generates a loop.
>
> Case:1
>
> int a[18], b[18];
> foo () {
>   int i;
>
>   for (i=0; i < 18; i++)
>      a[i] = 0;
> }
>
> Also based on the '-ftree-loop-distribute-patterns' flag, if the test
> case (taken from gcc doc) is as shown below, the compiler does
> generate 'memset' zero.
>
> Case:2
>
> int a[18], b[18];
> foo () {
>   int i;
>
>   for (i=0; i < 18; i++) {
>      a[i] = 0;               -------------(A)
>      b[i] = a[i] + i;       -------------(B)
>   }
> }
>
> Here statements (A) and (B) are split in to two loops and for the 1st
> loop the compiler generates 'memset' zero call. Isn't the same
> optimization supposed to happen with case (1)?
>
> Also with case(2)  statement (A), for loop iterations < 18, the
> compiler unrolls the loop and for iterations >= 18, 'memset' zero is
> generated.
>
> Looking at 'gcc/tree-loop-distribution.c' file,
>
> static int
> ldist_gen (struct loop *loop, struct graph *rdg,
>           VEC (int, heap) *starting_vertices)
> {
>   ...
> BITMAP_FREE (processed);
>  nbp = VEC_length (bitmap, partitions);
>
>  if (nbp <= 1
>      || partition_contains_all_rw (rdg, partitions))
>    goto ldist_done;
>    ------------------------(Z)
>
>  if (dump_file && (dump_flags & TDF_DETAILS))
>    dump_rdg_partitions (dump_file, partitions);
>
>  FOR_EACH_VEC_ELT (bitmap, partitions, i, partition)
>    if (!generate_code_for_partition (loop, partition, i < nbp - 1))
> -------------------(Y)              // code for generating built-in
> 'memset' is called from here.
>      goto ldist_done;
>
>  rewrite_into_loop_closed_ssa (NULL, TODO_update_ssa);
>  update_ssa (TODO_update_ssa_only_virtuals | TODO_update_ssa);
>
>  ldist_done:
>
>  BITMAP_FREE (remaining_stmts);
>
>  .........
>  return nbp;
>  }
>
> From statement (Z), if the no of distributed loops is <=1 , then the
> code generating built-in function (Y) is not executed.
>
> Is it a good solution to update this conditional check for single loop
> (which is not split) also? or Is there any other place/pass where we
> can implement this.

Well, at least we do not want to create any code if the builtin code
generation would fail.

Richard.

> Regards,
> Rohit

Reply via email to