Hi all, I'm trying to reduce the cases where the midend calls the backend rtx costs on bogus rtl for which the backend doesn't have patterns or ways of handling. Having to handle these kinds of rtxes sanely bloats those functions and makes them harder to maintain.
One of the cases where this occurs is in combine and distribute_and_simplify_rtx in particular. Citing the comment at that function: " See if X is of the form (* (+ A B) C), and if so convert to (+ (* A C) (* B C)) and try to simplify. Most of the time, this results in no change. However, if some of the operands are the same or inverses of each other, simplifications will result." The problem is that after it applies the distributive law it calls rtx costs to figure out whether the rtx became simpler. This rtx can get pretty complex. For example, on arm I've seen it try to cost: (plus:SI (mult:SI (plus:SI (reg:SI 232 [ m1 ]) (const_int 1 [0x1])) (reg:SI 232 [ m1 ])) (plus:SI (reg:SI 232 [ m1 ]) (const_int 1 [0x1]))) which is never going to match anything on arm anyway, so why should the costs function handle it? In any case, I believe combine's design is such that it should first be attempting to call recog and split on the rtxes, and only if that succeeds should it be making a target-specific decision on which rtx to prefer. distribute_and_simplify_rtx goes against that by calling rtx costs on an unverified rtx in attempt to gauge its complexity. This patch remedies that by removing the call to rtx costs and instead manually performing a relatively simple check on whether the resultant rtx was simplified. That is, using the example from the comment, whether (+ (* A C) (* B C)) still has + at the top and * in the two operands. This should give a good indication on whether any meaningful simplification was made (The '+' and '*' operators in the example can be any operators that can be distributed over). Initially, I wanted to just return the distributed version and let recog reject the invalid rtxes but that caused some code quality regressions on arm where the original rtx would not recog but would match a beneficial splitter, whereas the distributed rtx would not. With this patch I saw almost no codegen differences on arm for the whole of SPEC2006. The one exception was 416.gamess where it managed to merge a mul and an add into an mla which resulted in a slightly better code sequence. That was in a pretty large file and I don't speak Fortran'ese, so I couldn't really reduce a testcase for it, but my guess is that before the patch the costs would return some essentially random value for an arbitrarily complex rtx that it was passed to, which changed the decision in distribute_and_simplify_rtx on whether to return the distributed rtx, which could have impacted further optimisations in combine. I tried it on x86_64 as well. Again, there were almost no codegen differences. The exception was tonto and wrf where a few instructions were eliminated, but no significant difference. The resultant binaries for these two were a tiny bit smaller, with no impact on runtime. Therefore I claim that this a safe thing to do, as it leaves the target-specific rtx cost judgements in combine to be made only on valid recog-ed rtxes, and not having them cancel optimisations early due to rtx costs not handling arbitrary rtxes well. Bootstrapped on arm, x86_64, aarch64 (all linux). Tested on arm,aarch64. Ok for trunk? Thanks, Kyrill 2015-04-20 Kyrylo Tkachov <kyrylo.tkac...@arm.com> * combine.c (distribute_and_simplify_rtx): Do not check rtx costs. Look at the rtx codes to see if a simplification occured.
commit e9833e5e3e996ac68b645fdca14738232f59e1a2 Author: Kyrylo Tkachov <kyrylo.tkac...@arm.com> Date: Wed Apr 15 14:11:16 2015 +0100 [combine] Do not call rtx costs on potentially invalid rtx in distribute_and_simplify_rtx diff --git a/gcc/combine.c b/gcc/combine.c index 46cd6db..56d297b 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -9602,12 +9602,15 @@ distribute_and_simplify_rtx (rtx x, int n) tmp = apply_distributive_law (simplify_gen_binary (inner_code, mode, new_op0, new_op1)); - if (GET_CODE (tmp) != outer_code - && (set_src_cost (tmp, optimize_this_for_speed_p) - < set_src_cost (x, optimize_this_for_speed_p))) - return tmp; - return NULL_RTX; + /* We didn't manage to simplify. */ + if (GET_CODE (tmp) == outer_code + || (GET_CODE (tmp) == inner_code + && GET_CODE (XEXP (tmp, 0)) == outer_code + && GET_CODE (XEXP (tmp, 1)) == outer_code)) + return NULL_RTX; + + return tmp; } /* Simplify a logical `and' of VAROP with the constant CONSTOP, to be done