The following patch fixes the performance regression for 435.gromacs
on x86_64 with AVX2 (Haswell or bdver2) caused by

2015-12-18  Andreas Krebbel  <kreb...@linux.vnet.ibm.com>

        * ira.c (ira_setup_alts): Move the scan for commutative modifier
        to the first loop to make it work even with disabled alternatives.

which in itself is a desirable change giving the RA more freedom.

It turns out the fix makes an existing issue more severe in detecting
more swappable alternatives and thus exiting ira_setup_alts with
operands swapped in recog_data.  This seems to give a slight preference
to choose alternatives with the operands swapped (I didn't try to
investigate how IRA handles the "merged" alternative mask and
operand swapping in its further processing).

Of course previous RTL optimizers and canonicalization rules as well
as backend patterns are tuned towards the not swapped variant and thus
it happens doing more swaps ends up in slower code (I didn't closely
investigate).

So I tested the following patch which simply makes sure that
ira_setup_alts does not alter recog_data.

On a Intel Haswell machine I get (base is with the patch, peak is with
the above change reverted):

                                  Estimated                       
Estimated
                Base     Base       Base        Peak     Peak       Peak
Benchmarks      Ref.   Run Time     Ratio       Ref.   Run Time     Ratio
-------------- ------  ---------  ---------    ------  ---------  
---------
435.gromacs      7140        264       27.1 S    7140        270       
26.5 S
435.gromacs      7140        264       27.1 *    7140        269       
26.6 S
435.gromacs      7140        263       27.1 S    7140        269       
26.5 *
==============================================================================
435.gromacs      7140        264       27.1 *    7140        269       
26.5 *

which means the patched result is even better than before Andreas
change.  Current trunk homes in at a Run Time of 321s (which is
the regression to fix).

Bootstrap and regtest running on x86_64-unknown-linux-gnu, ok for trunk?

Thanks,
Richard.

2016-02-05   Richard Biener  <rguent...@suse.de>

        PR rtl-optimization/69274
        * ira.c (ira_setup_alts): Do not change recog_data.operand
        order.

Index: gcc/ira.c
===================================================================
--- gcc/ira.c   (revision 231814)
+++ gcc/ira.c   (working copy)
@@ -1888,10 +1888,11 @@ ira_setup_alts (rtx_insn *insn, HARD_REG
        }
       if (commutative < 0)
        break;
-      if (curr_swapped)
-       break;
+      /* Swap forth and back to avoid changing recog_data.  */
       std::swap (recog_data.operand[commutative],
                 recog_data.operand[commutative + 1]);
+      if (curr_swapped)
+       break;
     }
 }
 

Reply via email to