https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66565
Bug ID: 66565 Summary: Problems and limitation GCC cost metrics and ways to improve the situation Product: gcc Version: 6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: ktkachov at gcc dot gnu.org CC: dje at gcc dot gnu.org Target Milestone: --- In preparation for the GNU Cauldron 2015 costs BoF here are some talking points gathered from various GCC developers. Feel free to your own gripes, proposals, concerns. * We have many different types of costs: rtx costs, branch costs, address costs, vector costs etc. that measure different things in different units. This causes a problem when they are compared against each other. Example: IVOPTs calculate complex profitability metrics and use both rtx and address costs, mixing and adding them together. Any changes in either can cause unexpected variations in the codegen. * Costs are frequently compared to magic constants that have been picked for a particular testcase on a particular target Example: loop-invariant.c: create_new_invariant compares an address cost to 3 because it helped bwaves on x86. * Implementing rtx costs in the backend usually involves writing a dumbed-down recog to pattern match on the rtxes. This is duplication of logic. * The midend frequently builds bogus rtxes that are never going to match anything and then ask the backend to assign a cost. This unnecessarily complicates backend rtx costs implementations and is hard to reason about when the midend just wants a measure of 'complexity'. * rtx costs need to be able express whether they are being used for calculating the result delay or for understanding how long it is before another (related – for some definition of related) instruction can be issued. Furthermore, result delay may depend on the context of how that result is used (eg due to pipeline forwarding). * I'm not sure whether it is necessary to distinguish between early availability of a result vs late use of a result. Branch costs really need to understand more about how predictable a branch is, not how strongly taken or not taken that branch is. With modern branch predictors a branch can be neither strongly taken nor strongly not taken yet be entirely predictable (eg if it alternates precisely between the two states). It's true that traditional static profiling tools do not do a good job of identifying the pattern of taken/not-taken that a single branch follows, let alone whether the branch is taken often enough with temporal locality to be likely predictable at all. * "Cost" is too vacuous a term with too overloaded a meaning and too low a resolution for the queries we actually want to make. The midend always wants a cost so that it can compare two strategies and pick the more optimal. The proliferation and misunderstanding of cost units is a side effect of this. So we really have to move to something like: target_should_inline_call (<interesting call information from the pass>) target_should_ifconvert (<gimple sequences for before/after transformation>) target_combine_is_profitable_p (<before, after rtx>) target_should_reconstruct_value_p (<candidate RTX to either build and keep in registers or reconstruct in loop>) target_choose_preferred_addressing_mode (<list of candidate addressing modes for a MEM>) Where we ask specific and meaningful questions of the target, rather than asking for a guess at costs and using that. The difficult argument to make will be that we have to be radical about this and prepared to regress many targets (in the short term) for the sake of ending up with a maintainable and understandable design for making these decisions. Otherwise we'll sink years of engineering time in to trying to figure out the meaning of the number 3, and we'll design something very poor which tries to emulate the old behaviour. * Can we use the (C++) type system to: (a) prevent costs from being compared to things that aren't costs, like magic numbers; (b) prevent comparisons between costs of different units; (c) enable multiple axes i.e. throughput, latency, codesize. * The documentation and guidance for what costs mean is poor. And it probably cannot be improved while costs have these ambiguous inconsistent uses.