On Tuesday 12 July 2005 19:56, Josh Conner wrote: > In looking at the function inliner, I did some analysis of the > correctness of estimate_num_insns (from tree-inline.c), as this > measurement is critical to the inlining heuristics.
You don't say what compiler you used for these measurements. I suppose you used mainline? > Here is the > overview of what I found on the functions in two sample files from > the FAAD2 AAC library (measured at -O3 with inlining disabled): I think you should look at a lot more code than this. > Thinking that there may be room for improvement on this, I tried this > same experiment with a couple of adjustments to estimate_num_insns: > - Instead of ignoring constants, assign them a weight of 2 > instructions (one for the constant itself and one to load into memory) Why the load into memory, you mean constants that must be loaded from the constant pool? I guess the cost for the constant itself depends on whether it is a legitimate constant or not, and how it is loaded, and this is all very target specific. But a cost greater than 0 probably makes sense. It would be nice if you retain the comment about constants in the existing code somehow. The cost of constants is not trivial, and you should explain your choice better in a comment. > - Instead of ignoring dereferences, assign a weight of 2 to an array > reference and 1 for a pointer dereference Makes sense. > - Instead of ignoring case labels, assign them a weight of 2 > instructions (to represent the cost of control logic to get there) This depends on how the switch statement is expanded, e.g. to a binary decision tree, or to a table jump, or to something smaller than that. So again (like everything else, sadly) this is highly target-specific and even context-specific. I'd think a cost of 2 is too pessimistic in most cases. You could look at the code in stmt.c to see how switch statements are expanded. Maybe there is a cheap test you can do to make CASE_LABELs free for very small switch statements (i.e. the ones which should never hold you back from inlining the function containing them ;-). > For what it's worth, code size is equal to or smaller for all > benchmarks across all platforms. What about the compile time? > So, here are the open issues I see at this point: > 1. It appears that this change to estimate_num_instructions generates > a much better estimate of actual code size. However, the benchmark > results are ambiguous. Is this patch worth considering as-is? I would say you'd at least have to look into the ppc gzip and eon regressions before this is acceptable. But it is not my decision to make, of course. > 2. Increasing instruction weights causes the instruction-based values > (e.g., --param max-inline-insns-auto) to be effectively lower. > However, changing these constants/defaults as in the second patch > will cause a semantic change to anyone who is setting these values at > the command line. Is that change acceptable? This has constantly changed from one release to the next since GCC 3.3, so I don't think this should be a problem. > I do realize that this area is one that will eventually be likely > best served by target-dependent routines (and constants), however I > also see a significant benefit to all targets in fixing the default > implementation first. I don't really think we want to make too many things target-dependent. This whole estimate is just that: An estimate. And it's not intended to be absolute. It is just a relative measure of cost. There are IMHO very few trees that should have target-dependent inline cost estimates. OK, that mas the thoughts part... > Thoughts? Advice? ...on to advice then. First of all, I think you should handle ARRAY_RANGE_REF and ARRAY_REF the same. And you probably should make BIT_FIELD_REF more expensive, and maybe make its cost depend on which bit you're addressing (you can see that in operands 1 and 2 of the BIT_FIELD_REF). Its current cost of just 1 is probably too optimistic, just like the other cases you are trying to address with this patch. Second, you may want to add a target hook to return the cost of target builtins. Even builtins that expand to just one instruction are now counted as 16 insns plus the cost of the arguments list. This badly hurts when you use e.g. SSE intrinsics. It's probably not an issue for the benchmark you looked at now, but maybe you want to look into it anyway, while you're at it. Third, > (measured at -O3 with inlining disabled): Then why not just -O2 with inlining disabled? Now you have enabled loop unswitching, which is known to sometimes significantly increase code size. Fourth, look at _much_ more code than this. I would especially look at a lot more C++ code, which is where our inliner heuristics can really dramatically improve or destroy performance. See Richard Guenther's inline heuristics tweaks from earlier this year, in the thread starting here: http://gcc.gnu.org/ml/gcc-patches/2005-03/msg01936.html. Finally, you've apparently used grep to find all the places where PARAM_MAX_INLINE_INSNS_SINGLE and its friends are used, but you hve missed the ones in opts.c and maybe elsewhere. Good luck, and thanks for working on this difficult stuff. Gr. Steven