On 05/11/2015 06:02 AM, David Sherwood wrote:
Hi,
This patch fixes a couple of issues I saw during the compilation of shell_lam.f
for 410.bwaves test in spec2006:
* create_new_invariant: We shouldn't bother attempting to calculate the address
cost for something that clearly isn't an address. Use a SCALAR_INT_MODE_P check
to eliminate the most obvious cases, such as vector modes and so on.
* get_inv_cost: Change the code so that we don't treat something as an address
if it is never actually used as an address, i.e. we didn't use it at all. This
occurs due to the way we process the innermost loop first - invariants get
pulled out of inner most loop and then get processed during next innermost loop.
Here is more detail from shell_lam.f.210r.loop2_invariant dump file that
shows what currently happens:
4427 *****ending processing of loop 27 ******
4442 (const_vector:V2DF [
4443 (const_double:DF 4.0e+0 [0x0.8p+3])
4444 (const_double:DF 4.0e+0 [0x0.8p+3])
4445 ])
4446
4447 Hot cost: 8 (final)
4448 (set (reg:V2DF 3179)
4449 (const_vector:V2DF [
4450 (const_double:DF 4.0e+0 [0x0.8p+3])
4451 (const_double:DF 4.0e+0 [0x0.8p+3])
4452 ]))
4453
4454 Hot cost: 8 (final)
4455 Set in insn 2764 is invariant (1), cost 8, depends on
4471 Decided to move invariant 1 -- gain 8
move_invariant_reg moves invariant (the const_vector above) into 3490 and
replaces use of 3179 with 3490, but conservatively keeps the definition of 3179,
which gets processed in outer loop, but is never used.
Debug for outer loop now follows:
4497 *****starting processing of loop 26 ******
6379 (insn 2764 2748 2766 110 (set (reg:V2DF 3490)
6380 (const_vector:V2DF [
6381 (const_double:DF 4.0e+0 [0x0.8p+3])
6382 (const_double:DF 4.0e+0 [0x0.8p+3])
6383 ])) shell_lam.f:287 819 {*aarch64_simd_movv2df}
6384 (nil))
6385 ;; UD chains for insn luid 88 uid 2766
6604 ;; UD chains for insn luid 21 uid 3836
6605 ;; reg 3490 { d419(bb 110 insn 2764) }
6606 (insn 3836 2763 2765 111 (set (reg:V2DF 3179)
6607 (reg:V2DF 3490)) shell_lam.f:287 -1
6608 (nil))
^^^ additional insn generated by move_invariant_reg from loop 27
6836 ;; UD chains for insn luid 40 uid 2790
6837 ;; reg 3161 { d272(bb 111 insn 2746) }
6838 ;; reg 3201 { d305(bb 111 insn 2788) }
6839 ;; reg 3490 { d419(bb 110 insn 2764) }
6840 ;; eq_note reg 3161 { d272(bb 111 insn 2746) }
6841 ;; eq_note reg 3201 { d305(bb 111 insn 2788) }
6842 (insn 2790 2788 2791 111 (set (reg:V2DF 3203 [ vect__930.427 ])
6843 (fma:V2DF (reg:V2DF 3161 [ MEM[base: vectp_q.371_2137, index:
ivtmp.679_3214,
offset: 0B] ])
6844 (neg:V2DF (reg:V2DF 3490))
6845 (reg:V2DF 3201 [ vect__928.424 ]))) shell_lam.f:287 1219
{fnmav2df4}
6846 (expr_list:REG_DEAD (reg:V2DF 3201 [ vect__928.424 ])
6847 (expr_list:REG_DEAD (reg:V2DF 3179) ...
^^^ 3179 marked as DEAD
8366 *****ending processing of loop 26 ******
8472 (const_vector:V2DF [
8473 (const_double:DF 4.0e+0 [0x0.8p+3])
8474 (const_double:DF 4.0e+0 [0x0.8p+3])
8475 ])
8476
8477 Hot cost: 8 (final)
8478 (set (reg:V2DF 3490)
8479 (const_vector:V2DF [
8480 (const_double:DF 4.0e+0 [0x0.8p+3])
8481 (const_double:DF 4.0e+0 [0x0.8p+3])
8482 ]))
8483
8484 Hot cost: 8 (final)
8485 Set in insn 2764 is invariant (12), cost 8, depends on
8505 (set (reg:V2DF 3179)
8506 (reg:V2DF 3490))
8507
8508 Hot cost: 8 (final)
8509 Set in insn 3836 is invariant (15), cost 8, depends on 12
^^^ Never moves 3179 invariant out, even though it's never used.
After my patch we now get this:
8586 Decided to move invariant 15 -- gain 16
8587 Decided to move dependent invariant 12
since get_inv_cost sees the number of uses is zero and the dependent invariant
(12) gets moved too. Of course, without my patch the definition of 3179 would
ultimately be eliminated as dead code in a later pass anyway.
Is this ok to go in?
Regards,
David Sherwood.
ChangeLog entry follows ...
2015-05-08 David Sherwood <david.sherw...@arm.com>
* loop-invariant.c (create_new_invariant): Don't calculate address cost
if mode is not scalar integers.
(get_inv_cost): Increase computational cost for unused invariants.
So you'd need to bootstrap and regression test this change for it to be
fully ready for review.
It would also be good to include a test for the testsuite where you can
show the improvements in behaviour due to this change. I'm not offhand
sure of the license on bwaves, so you may or may not be able to reduce
it and use the reduced code in the GCC testsuite.
With some instrumentation, you may easily find the second case
triggering code differences in GCC itself, which you could then reduce
and use as a test for the second change.
Conceptually both hunks seem reasonable, so really it's just a matter of
building some testcases to verify behaviour and the bootstrap/regression
test.
There's a number of ARM engineers contributing to GCC these days that
can probably help.
Jeff