Re: [RFC] Improving GCSE to reduce constant splits on ARM

Jeff Law Mon, 03 Jan 2011 08:23:55 -0800

On 12/23/10 09:02, Dmitry Melnik wrote:

Hi,
We've found that constant splitting on ARM can be very inefficient, ifit's done inside a loop.
For example, the expression

  a = a & 0xff00ff00;
will be translated into the following code (on ARM, only 8-bit valuesshifted by an even number can be used as immediate arguments):
  bic     r0, r0, #16711680
  bic     r0, r0, #255
This makes perfect sense, unless this code is in a loop, and there aremany instructions using the same bit mask. In that case, we wouldwant to put 0xff00ff00 constant into a register, letpass_rtl_move_loop_invariants put it outside a loop and reuse it forevery appropriate bitwise AND inside a loop.
This is a real-life example (from evas rasterization library), wherefixing this issue speeds up expedite test suite on average by 6% andup to 20% on several tests.
Why the splitting happens?
On 4.4, the only problem was GCSE, which propagated separate pseudoregister with a constant into a consumer insn, i.e.
  r123 = 0xff00ff00; r124 = r125 & r123
was transformed into
  r124 = r125 & 0xff00ff00
After that, the constant within AND expression is not considered asloop invariant any more, and is not moved outside a loop. This can befixed by checking whether the insn transformed by GCSE will requiresplitting, and if it does, then the transformation should not be doneat earlier GCSE passes. We may check it by comparing rtx_cost of theconstant we're going to propagate with GCSE with rtx_cost ofconst_int(1).If moving loop invariant fails (e.g. due to register pressure), thenpass_combine still can propagate it inside AND, and in this case itwill result in the same code.
After this patchhttp://gcc.gnu.org/ml/gcc-patches/2009-08/msg01032.html , suchconstants are split as early as expand pass, so there's no chance forloop invariant code motion pass to deal with them.
So, the questions are:
1) Is it really necessary to split constants on ARM at the time ofexpand? At least, loop invariant code motion can work better ifsplitting happens later.

There is a general tension between splitting the constant early andlate. ie, there are cases where early splitting produces better codeand cases where it produces worse code. GCC tries to find a balancewhich generally generates good code. Further refinement of theheuristics is often helpful.

2) Is there any reason we shouldn't prevent GCSE from propagatingconstants that we know will be split?

Propagating the constant and keeping it at the use site generallyreduces register pressure. Like many things in GCC, it's a tradeoff andGCC attempts to do the right thing.

I would think that we're generally going to get the best code by forcingthe constant into a register and only allowing it to appear within theAND insn after cse/loop are complete. I think you can achieve that bychanging the operand predicate on the andXX insns within arm.md.

That way the constant will be made available to cse, licm and similaroptimizations, but in the case where the constant is used once in a hunkof straightline code it can be combined into the AND insn. It's notperfect since cse/licm of the constant can increase register pressure,but I think the tradeoff is reasonable.


Jeff

Re: [RFC] Improving GCSE to reduce constant splits on ARM

Reply via email to