Hello, > The number of floating point ops. in loop body. > The number of memory ops. in loop body. > The number of operands in loop body. > The number of implicit instructions in loop body. > The number of unique predicates in loop body. > The number of indirect references in loop body. > The number of uses in the loop.h > The number of defs. in the loop.
you have to scan insns in loop body, and check what they do for this. > The number of parallel "computations" in loop. > The estimated latency of the critical path of loop. > The estimated cycle length of loop body. > The max. dependence height of computations. > The max. height of memory dependencies of computations. > The max. height of control dependencies of computations. > The average dependence height of computations. > The min. memory-to-memory loop-carried dependence. > The number of memory-to-memory dependencies. This is a bit more difficult; I guess you could persuade scheduler to give you this information, but I have no idea how exactly. See modulo-sched.c, it considers similar kind of information. > The language (C or Fortran). You may check name in langhooks. > The tripcount of the loop (-1 if unknown). See find_simple_exit and related functions in loop-iv.c. > Here is how I'm thinking of conducting the experiment: > > - for each innermost loop: > - compile with the loop unrolled 1x, 2x, 4x, 8x, 16x, 32x and > measure the time the benchmark takes > - write down the loop features and the best unroll factor > - apply some machine learning technique to the above data to determine > the correlations between loop features and best unroll factor > - integrate the result into gcc and measure the benchmarks again > > Do you think it is ok to only consider inner-most loops? We do not unroll non-innermost loops at the moment, so if you want to test non-innermost loops, you would probably have to extend some loop manipulation functions (loop unrolling was written to work for non-innermost loops as well, but it was not well tested and thus it is very likely buggy). > What about > the unroll factors? Should I consider bigger unroll factors? I think unroll factors over 32 should not give you much more gain, but you will see what results you will get. > Do you think the above setup is ok? I am somewhat skeptical; I would expect the results to be quite target-dependent, and also to differ from program to program. Thus, it may be somewhat hard to derive a useful general heuristics from them. But I would be very happy to be proven wrong :-) Zdenek