Hello,

> The number of floating point ops. in loop body.
> The number of memory ops. in loop body.
> The number of operands in loop body.
> The number of implicit instructions in loop body.
> The number of unique predicates in loop body.
> The number of indirect references in loop body.
> The number of uses in the loop.h
> The number of defs. in the loop.

you have to scan insns in loop body, and check what they do
for this.

> The number of parallel "computations" in loop.
> The estimated latency of the critical path of loop.
> The estimated cycle length of loop body.
> The max. dependence height of computations.
> The max. height of memory dependencies of computations.
> The max. height of control dependencies of computations.
> The average dependence height of computations.
> The min. memory-to-memory loop-carried dependence.
> The number of memory-to-memory dependencies.

This is a bit more difficult; I guess you could persuade scheduler to
give you this information, but I have no idea how exactly.  See
modulo-sched.c, it considers similar kind of information.

> The language (C or Fortran).

You may check name in langhooks.

> The tripcount of the loop (-1 if unknown).

See find_simple_exit and related functions in loop-iv.c.

> Here is how I'm thinking of conducting the experiment:
> 
> - for each innermost loop:
>    - compile with the loop unrolled 1x, 2x, 4x, 8x, 16x, 32x and
> measure the time the benchmark takes
>    - write down the loop features and the best unroll factor
> - apply some machine learning technique to the above data to determine
> the correlations between loop features and best unroll factor
> - integrate the result into gcc and measure the benchmarks again
> 
> Do you think it is ok to only consider inner-most loops?

We do not unroll non-innermost loops at the moment, so if you want to
test non-innermost loops, you would probably have to extend some loop
manipulation functions (loop unrolling was written to work for
non-innermost loops as well, but it was not well tested and thus it is
very likely buggy).

> What about
> the unroll factors? Should I consider bigger unroll factors?

I think unroll factors over 32 should not give you much more gain,
but you will see what results you will get.

> Do you think the above setup is ok?

I am somewhat skeptical; I would expect the results to be quite
target-dependent, and also to differ from program to program.  Thus,
it may be somewhat hard to derive a useful general heuristics from
them.  But I would be very happy to be proven wrong :-)

Zdenek

Reply via email to