Hi Richard,

I have renamed the optabs and associated identifiers as per your suggestion. 
Thanks.

Regards
Yuliang


gcc/ChangeLog:

2019-09-27  Yuliang Wang  <yuliang.w...@arm.com>

        * config/aarch64/aarch64-sve.md (sdiv_pow2<mode>3):
        New pattern for ASRD.
        * config/aarch64/iterators.md (UNSPEC_ASRD): New unspec.
        * internal-fn.def (IFN_DIV_POW2): New internal function.
        * optabs.def (sdiv_pow2_optab): New optab.
        * tree-vect-patterns.c (vect_recog_divmod_pattern):
        Modify pattern to support new operation.
        * doc/md.texi (sdiv_pow2$var{m3}): Documentation for the above.
        * doc/sourcebuild.texi (vect_sdivpow2_si): Document new target selector.

gcc/testsuite/ChangeLog:

2019-09-27  Yuliang Wang  <yuliang.w...@arm.com>

        * gcc.dg/vect/vect-sdivpow2-1.c: New test.
        * gcc.target/aarch64/sve/asrdiv_1.c: As above.
        * lib/target-support.exp (check_effective_target_vect_sdivpow2_si):
        Return true for AArch64 with SVE.


-----Original Message-----
From: Richard Sandiford <richard.sandif...@arm.com> 
Sent: 24 September 2019 17:12
To: Yuliang Wang <yuliang.w...@arm.com>
Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>
Subject: Re: [AArch64][SVE] Utilize ASRD instruction for division and remainder

Yuliang Wang <yuliang.w...@arm.com> writes:
> Hi,
>
> The C snippets below  (signed division/modulo by a power-of-2 immediate 
> value):
>
> #define P ...
>
> void foo_div (int *a, int *b, int N)
> {
>     for (int i = 0; i < N; i++)
>         a[i] = b[i] / (1 << P);
> }
> void foo_mod (int *a, int *b, int N)
> {
>     for (int i = 0; i < N; i++)
>         a[i] = b[i] % (1 << P);
> }
>
> Vectorize to the following on AArch64 + SVE:
>
> foo_div:
>     movx0, 0
>     movw2, N
>     ptruep1.b, all
>     whilelop0.s, wzr, w2
>     .p2align3,,7
> .L2:
>     ld1wz1.s, p0/z, [x3, x0, lsl 2]
>     cmpltp2.s, p1/z, z1.s, #0//
>     movz0.s, p2/z, #7//
>     addz0.s, z0.s, z1.s//
>     asrz0.s, z0.s, #3//
>     st1wz0.s, p0, [x1, x0, lsl 2]
>     incwx0
>     whilelop0.s, w0, w2
>     b.any.L2
>     ret
>
> foo_mod:
>     ...
> .L2:
>     ld1wz0.s, p0/z, [x3, x0, lsl 2]
>     cmpltp2.s, p1/z, z0.s, #0//
>     movz1.s, p2/z, #-1//
>     lsrz1.s, z1.s, #29//
>     addz0.s, z0.s, z1.s//
>     andz0.s, z0.s, #{2^P-1}//
>     subz0.s, z0.s, z1.s//
>     st1wz0.s, p0, [x1, x0, lsl 2]
>     incwx0
>     whilelop0.s, w0, w2
>     b.any.L2
>     ret
>
> This patch utilizes the special-purpose ASRD (arithmetic shift-right for 
> divide by immediate) instruction:
>
> foo_div:
>     ...
> .L2:
>     ld1wz0.s, p0/z, [x3, x0, lsl 2]
>     asrdz0.s, p1/m, z0.s, #{P}//
>     st1wz0.s, p0, [x1, x0, lsl 2]
>     incwx0
>     whilelop0.s, w0, w2
>     b.any.L2
>     ret
>
> foo_mod:
>     ...
> .L2:
>     ld1wz0.s, p0/z, [x3, x0, lsl 2]
>     movprfxz1, z0//
>     asrdz1.s, p1/m, z1.s, #{P}//
>     lslz1.s, z1.s, #{P}//
>     subz0.s, z0.s, z1.s//
>     st1wz0.s, p0, [x1, x0, lsl 2]
>     incwx0
>     whilelop0.s, w0, w2
>     b.any.L2
>     ret
>
> Added new tests. Built and regression tested on aarch64-none-elf.
>
> Best Regards,
> Yuliang Wang
>
>
> gcc/ChangeLog:
>
> 2019-09-23  Yuliang Wang  <yuliang.w...@arm.com>
>
> * config/aarch64/aarch64-sve.md (asrd<mode>3): New pattern for ASRD.
> * config/aarch64/iterators.md (UNSPEC_ASRD): New unspec.
> (ASRDIV): New int iterator.
> * internal-fn.def (IFN_ASHR_DIV): New internal function.
> * optabs.def (ashr_div_optab): New optab.
> * tree-vect-patterns.c (vect_recog_divmod_pattern):
> Modify pattern to support new operation.
> * doc/md.texi (asrd$var{m3}): Documentation for the above.
> * doc/sourcebuild.texi (vect_asrdiv_si): Document new target selector.

This looks good to me.  My only real question is about naming:
maybe IFN_DIV_POW2 would be a better name for the internal function and 
sdiv_pow2_optab/"div_pow2$a3" for the optab?  But I'm useless at naming things, 
so maybe others would prefer your names.

Thanks,
Richard
 

Attachment: rb11863.patch
Description: rb11863.patch

Reply via email to