On Tuesday 12 July 2005 19:56, Josh Conner wrote:
> In looking at the function inliner, I did some analysis of the
> correctness of estimate_num_insns (from tree-inline.c), as this
> measurement is critical to the inlining heuristics.

You don't say what compiler you used for these measurements.  I suppose
you used mainline?

>   Here is the  
> overview of what I found on the functions in two sample files from  
> the FAAD2 AAC library (measured at -O3 with inlining disabled):

I think you should look at a lot more code than this.

> Thinking that there may be room for improvement on this, I tried this
> same experiment with a couple of adjustments to estimate_num_insns:
> - Instead of ignoring constants, assign them a weight of 2
> instructions (one for the constant itself and one to load into memory)

Why the load into memory, you mean constants that must be loaded from
the constant pool?  I guess the cost for the constant itself depends on
whether it is a legitimate constant or not, and how it is loaded, and
this is all very target specific.  But a cost greater than 0 probably
makes sense.

It would be nice if you retain the comment about constants in the
existing code somehow.  The cost of constants is not trivial, and you
should explain your choice better in a comment.

> - Instead of ignoring dereferences, assign a weight of 2 to an array
> reference and 1 for a pointer dereference

Makes sense.

> - Instead of ignoring case labels, assign them a weight of 2
> instructions (to represent the cost of control logic to get there)

This depends on how the switch statement is expanded, e.g. to a binary
decision tree, or to a table jump, or to something smaller than that.
So again (like everything else, sadly) this is highly target-specific
and even context-specific.  I'd think a cost of 2 is too pessimistic in
most cases.

You could look at the code in stmt.c to see how switch statements are
expanded.  Maybe there is a cheap test you can do to make CASE_LABELs
free for very small switch statements (i.e. the ones which should never
hold you back from inlining the function containing them ;-).

> For what it's worth, code size is equal to or smaller for all
> benchmarks across all platforms.

What about the compile time?

> So, here are the open issues I see at this point:
> 1. It appears that this change to estimate_num_instructions generates
> a much better estimate of actual code size.  However, the benchmark
> results are ambiguous.  Is this patch worth considering as-is?

I would say you'd at least have to look into the ppc gzip and eon
regressions before this is acceptable.  But it is not my decision to
make, of course.

> 2. Increasing instruction weights causes the instruction-based values
> (e.g., --param max-inline-insns-auto) to be effectively lower.
> However, changing these constants/defaults as in the second patch
> will cause a semantic change to anyone who is setting these values at
> the command line.  Is that change acceptable?

This has constantly changed from one release to the next since GCC 3.3,
so I don't think this should be a problem.

> I do realize that this area is one that will eventually be likely
> best served by target-dependent routines (and constants), however I
> also see a significant benefit to all targets in fixing the default
> implementation first.

I don't really think we want to make too many things target-dependent.
This whole estimate is just that: An estimate.  And it's not intended
to be absolute.  It is just a relative measure of cost.  There are IMHO
very few trees that should have target-dependent inline cost estimates.


OK, that mas the thoughts part...

> Thoughts?  Advice?

...on to advice then.

First of all, I think you should handle ARRAY_RANGE_REF and ARRAY_REF
the same.  And you probably should make BIT_FIELD_REF more expensive,
and maybe make its cost depend on which bit you're addressing (you can
see that in operands 1 and 2 of the BIT_FIELD_REF).  Its current cost
of just 1 is probably too optimistic, just like the other cases you are
trying to address with this patch.

Second, you may want to add a target hook to return the cost of target
builtins.  Even builtins that expand to just one instruction are now
counted as 16 insns plus the cost of the arguments list.  This badly
hurts when you use e.g. SSE intrinsics.  It's probably not an issue for
the benchmark you looked at now, but maybe you want to look into it
anyway, while you're at it.

Third,
> (measured at -O3 with inlining disabled):
Then why not just -O2 with inlining disabled?  Now you have enabled
loop unswitching, which is known to sometimes significantly increase
code size.

Fourth, look at _much_ more code than this.  I would especially look at
a lot more C++ code, which is where our inliner heuristics can really
dramatically improve or destroy performance.  See Richard Guenther's
inline heuristics tweaks from earlier this year, in the thread starting
here: http://gcc.gnu.org/ml/gcc-patches/2005-03/msg01936.html.

Finally, you've apparently used grep to find all the places where
PARAM_MAX_INLINE_INSNS_SINGLE and its friends are used, but you hve
missed the ones in opts.c and maybe elsewhere.


Good luck, and thanks for working on this difficult stuff.

Gr.
Steven

Reply via email to