On Sun, Jan 21, 2024 at 07:57:54PM +0530, Ajit Agarwal wrote:
>
> Hello All:
>
> New pass to replace adjacent memory addresses lxv with lxvp.
> Added common infrastructure for load store fusion for
> different targets.
>
> Common routines are refactored in fusion-common.h.
>
> AARCH64 load/store fusion pass is not changed with the
> common infrastructure.
>
> For AARCH64 architectures just include "fusion-common.h"
> and target dependent code can be added to that.
>
>
> Alex/Richard:
>
> If you would like me to add for AARCH64 I can do that for AARCH64.
>
> If you would like to do that is fine with me.
>
> Bootstrapped and regtested with powerpc64-linux-gnu.
>
> Improvement in performance is seen with Spec 2017 spec FP benchmarks.
This patch is a lot better than the previous patch in that it generates fewer
extra instructions, and just replaces some of the load vector instructions with
load vector pair.
In compiling Spec 2017 with it, I see the following results:
Benchmarks that generate lxvp instead of lxv:
500.perlbench_r replace 10 LXVs with 5 LXVPs
502.gcc_r replace 2 LXVs with 1 LXVPs
510.parest_r replace 28 LXVs with 14 LXVPs
511.povray_r replace 4 LXVs with 2 LXVPs
521.wrf_r replace 12 LXVs with 6 LXVPs
527.cam4_r replace 12 LXVs with 6 LXVPs
557.xz_r replace 10 LXVs with 5 LXVPs
A few of the benchmarks generated a different number of NOPs, based on how
prefixed addresses were generated. I tend to feel this is minor compared to
the others.
507.cactuBSSN_r 17 fewer alignment NOPs
520.omnetpp_r 231 more alignment NOPs
523.xalancbmk_r 246 fewer alignment NOPs
531.deepsjeng_r 2 more alignment NOPs
541.leela_r 28 more alignment NOPs
549.fotonik3d_r 27 more alignment NOPs
554.roms_r 8 more alignment NOPs
However there were three benchmarks where the code regressed. In particular,
it looks like there are more load and store vectors to the stack, so it
indicates more spilling is going on.
525.x264_r 16 more stack spills, but 84 LXVPs
526.blender_r 4 more stack spills, but 149 LXVPs
One benchmark actually generated fewer stack spills as well as generating
LXVPs.
538.imagick_r 11 fewer stack spills, and 26 LXVPs
Note, these are changes to the static instructions generated. It does not
evaluate whether the changes help/hurt performance.
--
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: [email protected]