On Tue, Jun 5, 2018 at 9:32 AM Kyrill Tkachov <kyrylo.tkac...@foss.arm.com> wrote: > > > On 04/06/18 18:40, Kyrill Tkachov wrote: > > Hi all, > > > > This patch adds support for generating LDPs and STPs of Q-registers. > > This allows for more compact code generation and makes better use of the > > ISA. > > > > It's implemented in a straightforward way by allowing 16-byte modes in the > > sched-fusion machinery and adding appropriate peepholes in aarch64-ldpstp.md > > as well as the patterns themselves in aarch64-simd.md. > > > > I didn't see any non-noise performance effect on SPEC2017 on Cortex-A72 and > > Cortex-A53. > > > > Adding some folks who know more about other CPUs as well. > Are you okay with enabling these instructions in AArch64? > > If you could give this a spin on some benchmarks you > care about on your platforms it would be really useful data.
It might be useful to have a aarch64-tuning-flags.def for this; even if all current cores have it on. I might do some performance analysis on OcteonTX 81xx and 83xx (aka thunderxt81 and thunderxt83) but it won't be until end of June as I am on vacation until then. Thanks, Andrew > > Thanks, > Kyrill > > > Bootstrapped and tested on aarch64-none-linux-gnu. > > > > Ok for trunk? > > > > Thanks, > > Kyrill > > > > 2018-06-04 Kyrylo Tkachov <kyrylo.tkac...@arm.com> > > > > * config/aarch64/aarch64.c (aarch64_mode_valid_for_sched_fusion_p): > > Allow 16-byte modes. > > (aarch64_classify_address): Allow 16-byte modes for load_store_pair_p. > > * config/aarch64/aarch64-ldpstp.md: Add peepholes for LDP STP of > > 128-bit modes. > > * config/aarch64/aarch64-simd.md (load_pair<VQ:mode><VQ2:mode>): > > New pattern. > > (vec_store_pair<VQ:mode><VQ2:mode>): Likewise. > > * config/aarch64/iterators.md (VQ2): New mode iterator. > > > > 2018-06-04 Kyrylo Tkachov <kyrylo.tkac...@arm.com> > > > > * gcc.target/aarch64/ldp_stp_q.c: New test. > > * gcc.target/aarch64/stp_vec_128_1.c: Likewise. >