Hi, The C snippets below (signed division/modulo by a power-of-2 immediate value):
#define P ... void foo_div (int *a, int *b, int N) { for (int i = 0; i < N; i++) a[i] = b[i] / (1 << P); } void foo_mod (int *a, int *b, int N) { for (int i = 0; i < N; i++) a[i] = b[i] % (1 << P); } Vectorize to the following on AArch64 + SVE: foo_div: mov x0, 0 mov w2, N ptrue p1.b, all whilelo p0.s, wzr, w2 .p2align 3,,7 .L2: ld1w z1.s, p0/z, [x3, x0, lsl 2] cmplt p2.s, p1/z, z1.s, #0 // mov z0.s, p2/z, #7 // add z0.s, z0.s, z1.s // asr z0.s, z0.s, #3 // st1w z0.s, p0, [x1, x0, lsl 2] incw x0 whilelo p0.s, w0, w2 b.any .L2 ret foo_mod: ... .L2: ld1w z0.s, p0/z, [x3, x0, lsl 2] cmplt p2.s, p1/z, z0.s, #0 // mov z1.s, p2/z, #-1 // lsr z1.s, z1.s, #29 // add z0.s, z0.s, z1.s // and z0.s, z0.s, #{2^P-1} // sub z0.s, z0.s, z1.s // st1w z0.s, p0, [x1, x0, lsl 2] incw x0 whilelo p0.s, w0, w2 b.any .L2 ret This patch utilizes the special-purpose ASRD (arithmetic shift-right for divide by immediate) instruction: foo_div: ... .L2: ld1w z0.s, p0/z, [x3, x0, lsl 2] asrd z0.s, p1/m, z0.s, #{P} // st1w z0.s, p0, [x1, x0, lsl 2] incw x0 whilelo p0.s, w0, w2 b.any .L2 ret foo_mod: ... .L2: ld1w z0.s, p0/z, [x3, x0, lsl 2] movprfx z1, z0 // asrd z1.s, p1/m, z1.s, #{P} // lsl z1.s, z1.s, #{P} // sub z0.s, z0.s, z1.s // st1w z0.s, p0, [x1, x0, lsl 2] incw x0 whilelo p0.s, w0, w2 b.any .L2 ret Added new tests. Built and regression tested on aarch64-none-elf. Best Regards, Yuliang Wang gcc/ChangeLog: 2019-09-23 Yuliang Wang <yuliang.w...@arm.com> * config/aarch64/aarch64-sve.md (asrd<mode>3): New pattern for ASRD. * config/aarch64/iterators.md (UNSPEC_ASRD): New unspec. (ASRDIV): New int iterator. * internal-fn.def (IFN_ASHR_DIV): New internal function. * optabs.def (ashr_div_optab): New optab. * tree-vect-patterns.c (vect_recog_divmod_pattern): Modify pattern to support new operation. * doc/md.texi (asrd$var{m3}): Documentation for the above. * doc/sourcebuild.texi (vect_asrdiv_si): Document new target selector. gcc/testsuite/ChangeLog: 2019-09-23 Yuliang Wang <yuliang.w...@arm.com> * gcc.dg/vect/vect-asrdiv-1.c: New test. * gcc.target/aarch64/sve/asrdiv_1.c: As above. * lib/target-support.exp (check_effective_target_vect_asrdiv_si): Return true for AArch64 with SVE.
rb11863.patch
Description: rb11863.patch