On 04/06/18 18:40, Kyrill Tkachov wrote:
Hi all,

This patch adds support for generating LDPs and STPs of Q-registers.
This allows for more compact code generation and makes better use of the ISA.

It's implemented in a straightforward way by allowing 16-byte modes in the
sched-fusion machinery and adding appropriate peepholes in aarch64-ldpstp.md
as well as the patterns themselves in aarch64-simd.md.

I didn't see any non-noise performance effect on SPEC2017 on Cortex-A72 and 
Cortex-A53.


Adding some folks who know more about other CPUs as well.
Are you okay with enabling these instructions in AArch64?

If you could give this a spin on some benchmarks you
care about on your platforms it would be really useful data.

Thanks,
Kyrill

Bootstrapped and tested on aarch64-none-linux-gnu.

Ok for trunk?

Thanks,
Kyrill

2018-06-04  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>

    * config/aarch64/aarch64.c (aarch64_mode_valid_for_sched_fusion_p):
    Allow 16-byte modes.
    (aarch64_classify_address): Allow 16-byte modes for load_store_pair_p.
    * config/aarch64/aarch64-ldpstp.md: Add peepholes for LDP STP of
    128-bit modes.
    * config/aarch64/aarch64-simd.md (load_pair<VQ:mode><VQ2:mode>):
    New pattern.
    (vec_store_pair<VQ:mode><VQ2:mode>): Likewise.
    * config/aarch64/iterators.md (VQ2): New mode iterator.

2018-06-04  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>

    * gcc.target/aarch64/ldp_stp_q.c: New test.
    * gcc.target/aarch64/stp_vec_128_1.c: Likewise.

Reply via email to