Hi, The C snippets below (signed division/modulo by a power-of-2 immediate value):
#define P ...
void foo_div (int *a, int *b, int N)
{
for (int i = 0; i < N; i++)
a[i] = b[i] / (1 << P);
}
void foo_mod (int *a, int *b, int N)
{
for (int i = 0; i < N; i++)
a[i] = b[i] % (1 << P);
}
Vectorize to the following on AArch64 + SVE:
foo_div:
mov x0, 0
mov w2, N
ptrue p1.b, all
whilelo p0.s, wzr, w2
.p2align 3,,7
.L2:
ld1w z1.s, p0/z, [x3, x0, lsl 2]
cmplt p2.s, p1/z, z1.s, #0 //
mov z0.s, p2/z, #7 //
add z0.s, z0.s, z1.s //
asr z0.s, z0.s, #3 //
st1w z0.s, p0, [x1, x0, lsl 2]
incw x0
whilelo p0.s, w0, w2
b.any .L2
ret
foo_mod:
...
.L2:
ld1w z0.s, p0/z, [x3, x0, lsl 2]
cmplt p2.s, p1/z, z0.s, #0 //
mov z1.s, p2/z, #-1 //
lsr z1.s, z1.s, #29 //
add z0.s, z0.s, z1.s //
and z0.s, z0.s, #{2^P-1} //
sub z0.s, z0.s, z1.s //
st1w z0.s, p0, [x1, x0, lsl 2]
incw x0
whilelo p0.s, w0, w2
b.any .L2
ret
This patch utilizes the special-purpose ASRD (arithmetic shift-right for divide
by immediate) instruction:
foo_div:
...
.L2:
ld1w z0.s, p0/z, [x3, x0, lsl 2]
asrd z0.s, p1/m, z0.s, #{P} //
st1w z0.s, p0, [x1, x0, lsl 2]
incw x0
whilelo p0.s, w0, w2
b.any .L2
ret
foo_mod:
...
.L2:
ld1w z0.s, p0/z, [x3, x0, lsl 2]
movprfx z1, z0 //
asrd z1.s, p1/m, z1.s, #{P} //
lsl z1.s, z1.s, #{P} //
sub z0.s, z0.s, z1.s //
st1w z0.s, p0, [x1, x0, lsl 2]
incw x0
whilelo p0.s, w0, w2
b.any .L2
ret
Added new tests. Built and regression tested on aarch64-none-elf.
Best Regards,
Yuliang Wang
gcc/ChangeLog:
2019-09-23 Yuliang Wang <[email protected]>
* config/aarch64/aarch64-sve.md (asrd<mode>3): New pattern for ASRD.
* config/aarch64/iterators.md (UNSPEC_ASRD): New unspec.
(ASRDIV): New int iterator.
* internal-fn.def (IFN_ASHR_DIV): New internal function.
* optabs.def (ashr_div_optab): New optab.
* tree-vect-patterns.c (vect_recog_divmod_pattern):
Modify pattern to support new operation.
* doc/md.texi (asrd$var{m3}): Documentation for the above.
* doc/sourcebuild.texi (vect_asrdiv_si): Document new target selector.
gcc/testsuite/ChangeLog:
2019-09-23 Yuliang Wang <[email protected]>
* gcc.dg/vect/vect-asrdiv-1.c: New test.
* gcc.target/aarch64/sve/asrdiv_1.c: As above.
* lib/target-support.exp (check_effective_target_vect_asrdiv_si):
Return true for AArch64 with SVE.
rb11863.patch
Description: rb11863.patch
