On Fri, May 20, 2016 at 11:04:32AM +0100, Kyrill Tkachov wrote: > Hi all, > > The recent -frename-registers change exposed a deficiency in the way we fuse > AESE/AESMC instruction pairs in aarch64. > > Basically we want to enforce: > AESE Vn, _ > AESMC Vn, Vn > > to enable the fusion, but regrename comes along and renames the output Vn > register in AESMC to something else, killing the fusion in the hardware. > > The solution in this patch is to add an alternative that ties the input and > output registers in the AESMC pattern and enable that alternative when the > fusion is enabled. > > With this patch I've confirmed that the above preferred register sequence is > kept even with -frename-registers when tuning for a cpu that enables the > fusion and that the chain is broken by regrename otherwise and have seen the > appropriate improvement in a proprietary benchmark (that I cannot name) that > exercises this sequence. > > Bootstrapped and tested on aarch64-none-linux-gnu. > > Ok for trunk? > > Thanks, > Kyrill > > 2016-05-20 Kyrylo Tkachov <kyrylo.tkac...@arm.com> > > * config/aarch64/aarch64.c (aarch64_fusion_enabled_p): New function. > * config/aarch64/aarch64-protos.h (aarch64_fusion_enabled_p): Declare > prototype. > * config/aarch64/aarch64-simd.md (aarch64_crypto_aes<aesmc_op>v16qi): > Add "=w,0" alternative. Enable it when AES/AESMC fusion is enabled.
> diff --git a/gcc/config/aarch64/aarch64-protos.h > b/gcc/config/aarch64/aarch64-protos.h > index > 21cf55b60f86024429ea36ead0d2d8ae4c94b579..f6da854fbaeeab34239a1f874edaedf8a01bf9c2 > 100644 > --- a/gcc/config/aarch64/aarch64-protos.h > +++ b/gcc/config/aarch64/aarch64-protos.h > @@ -290,6 +290,7 @@ bool aarch64_constant_address_p (rtx); > bool aarch64_expand_movmem (rtx *); > bool aarch64_float_const_zero_rtx_p (rtx); > bool aarch64_function_arg_regno_p (unsigned); > +bool aarch64_fusion_enabled_p (unsigned int); This argument type should be "enum aarch64_fusion_pairs". > bool aarch64_gen_movmemqi (rtx *); > bool aarch64_gimple_fold_builtin (gimple_stmt_iterator *); > bool aarch64_is_extend_from_extract (machine_mode, rtx, rtx); > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index > b93f961fc4ebd9eb3f50b0580741c80ab6eca427..815973ca6e764121f2669ad160918561450e6c50 > 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -13359,6 +13359,14 @@ aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn > *curr) > return false; > } > > +/* Return true iff the instruction fusion described by OP is enabled. */ > + > +bool > +aarch64_fusion_enabled_p (unsigned int op) > +{ > + return (aarch64_tune_params.fusible_ops & op) != 0; > +} > + A follow-up patch fixing the uses in aarch_macro_fusion_pair_p to use your new function would be nice. OK with the change to argument type. Thanks, James