[AArch64][SVE] Utilize ASRD instruction for division and remainder

Yuliang Wang Tue, 24 Sep 2019 05:53:33 -0700

Hi,

The C snippets below  (signed division/modulo by a power-of-2 immediate value):


#define P ...

void foo_div (int *a, int *b, int N)
{
    for (int i = 0; i < N; i++)
        a[i] = b[i] / (1 << P);
}
void foo_mod (int *a, int *b, int N)
{
    for (int i = 0; i < N; i++)
        a[i] = b[i] % (1 << P);
}

Vectorize to the following on AArch64 + SVE:

foo_div:
    mov         x0, 0
    mov         w2, N
    ptrue       p1.b, all
    whilelo     p0.s, wzr, w2
    .p2align    3,,7
.L2:
    ld1w        z1.s, p0/z, [x3, x0, lsl 2]
    cmplt       p2.s, p1/z, z1.s, #0            //
    mov         z0.s, p2/z, #7                  //
    add         z0.s, z0.s, z1.s                //
    asr         z0.s, z0.s, #3                  //
    st1w        z0.s, p0, [x1, x0, lsl 2]
    incw        x0
    whilelo     p0.s, w0, w2
    b.any       .L2
    ret

foo_mod:
    ...
.L2:
    ld1w        z0.s, p0/z, [x3, x0, lsl 2]
    cmplt       p2.s, p1/z, z0.s, #0            //
    mov         z1.s, p2/z, #-1                 //
    lsr         z1.s, z1.s, #29                 //
    add         z0.s, z0.s, z1.s                //
    and         z0.s, z0.s, #{2^P-1}            //
    sub         z0.s, z0.s, z1.s                //
    st1w        z0.s, p0, [x1, x0, lsl 2]
    incw        x0
    whilelo     p0.s, w0, w2
    b.any       .L2
    ret

This patch utilizes the special-purpose ASRD (arithmetic shift-right for divide 
by immediate) instruction:

foo_div:
    ...
.L2:
    ld1w        z0.s, p0/z, [x3, x0, lsl 2]
    asrd        z0.s, p1/m, z0.s, #{P}          //
    st1w        z0.s, p0, [x1, x0, lsl 2]
    incw        x0
    whilelo     p0.s, w0, w2
    b.any       .L2
    ret

foo_mod:
    ...
.L2:
    ld1w        z0.s, p0/z, [x3, x0, lsl 2]
    movprfx     z1, z0                          //
    asrd        z1.s, p1/m, z1.s, #{P}          //
    lsl         z1.s, z1.s, #{P}                //
    sub         z0.s, z0.s, z1.s                //
    st1w        z0.s, p0, [x1, x0, lsl 2]
    incw        x0
    whilelo     p0.s, w0, w2
    b.any       .L2
    ret

Added new tests. Built and regression tested on aarch64-none-elf.

Best Regards,
Yuliang Wang


gcc/ChangeLog:

2019-09-23  Yuliang Wang  <[email protected]>

        * config/aarch64/aarch64-sve.md (asrd<mode>3): New pattern for ASRD.
        * config/aarch64/iterators.md (UNSPEC_ASRD): New unspec.
        (ASRDIV): New int iterator.
        * internal-fn.def (IFN_ASHR_DIV): New internal function.
        * optabs.def (ashr_div_optab): New optab.
        * tree-vect-patterns.c (vect_recog_divmod_pattern):
        Modify pattern to support new operation.
        * doc/md.texi (asrd$var{m3}): Documentation for the above.
        * doc/sourcebuild.texi (vect_asrdiv_si): Document new target selector.

gcc/testsuite/ChangeLog:

2019-09-23  Yuliang Wang  <[email protected]>

        * gcc.dg/vect/vect-asrdiv-1.c: New test.
        * gcc.target/aarch64/sve/asrdiv_1.c: As above.
        * lib/target-support.exp (check_effective_target_vect_asrdiv_si):
        Return true for AArch64 with SVE.

rb11863.patch
Description: rb11863.patch

[AArch64][SVE] Utilize ASRD instruction for division and remainder

Reply via email to