The following patch fixes the performance regression for 435.gromacs on x86_64 with AVX2 (Haswell or bdver2) caused by
2015-12-18 Andreas Krebbel <kreb...@linux.vnet.ibm.com> * ira.c (ira_setup_alts): Move the scan for commutative modifier to the first loop to make it work even with disabled alternatives. which in itself is a desirable change giving the RA more freedom. It turns out the fix makes an existing issue more severe in detecting more swappable alternatives and thus exiting ira_setup_alts with operands swapped in recog_data. This seems to give a slight preference to choose alternatives with the operands swapped (I didn't try to investigate how IRA handles the "merged" alternative mask and operand swapping in its further processing). Of course previous RTL optimizers and canonicalization rules as well as backend patterns are tuned towards the not swapped variant and thus it happens doing more swaps ends up in slower code (I didn't closely investigate). So I tested the following patch which simply makes sure that ira_setup_alts does not alter recog_data. On a Intel Haswell machine I get (base is with the patch, peak is with the above change reverted): Estimated Estimated Base Base Base Peak Peak Peak Benchmarks Ref. Run Time Ratio Ref. Run Time Ratio -------------- ------ --------- --------- ------ --------- --------- 435.gromacs 7140 264 27.1 S 7140 270 26.5 S 435.gromacs 7140 264 27.1 * 7140 269 26.6 S 435.gromacs 7140 263 27.1 S 7140 269 26.5 * ============================================================================== 435.gromacs 7140 264 27.1 * 7140 269 26.5 * which means the patched result is even better than before Andreas change. Current trunk homes in at a Run Time of 321s (which is the regression to fix). Bootstrap and regtest running on x86_64-unknown-linux-gnu, ok for trunk? Thanks, Richard. 2016-02-05 Richard Biener <rguent...@suse.de> PR rtl-optimization/69274 * ira.c (ira_setup_alts): Do not change recog_data.operand order. Index: gcc/ira.c =================================================================== --- gcc/ira.c (revision 231814) +++ gcc/ira.c (working copy) @@ -1888,10 +1888,11 @@ ira_setup_alts (rtx_insn *insn, HARD_REG } if (commutative < 0) break; - if (curr_swapped) - break; + /* Swap forth and back to avoid changing recog_data. */ std::swap (recog_data.operand[commutative], recog_data.operand[commutative + 1]); + if (curr_swapped) + break; } }