RE: [PATCH] RISC-V: Add vwadd.wv/vwsub.wv auto-vectorization lowering optimization

Li, Pan2 via Gcc-patches Thu, 01 Jun 2023 18:06:37 -0700

Committed, thanks Jeff.

Pan


-----Original Message-----
From: Gcc-patches <[email protected]> On Behalf 
Of Jeff Law via Gcc-patches
Sent: Friday, June 2, 2023 2:52 AM
To: [email protected]; [email protected]
Cc: [email protected]; [email protected]; [email protected]; 
[email protected]; [email protected]
Subject: Re: [PATCH] RISC-V: Add vwadd.wv/vwsub.wv auto-vectorization lowering 
optimization



On 5/31/23 21:48, [email protected] wrote:
> From: Juzhe-Zhong <[email protected]>
> 
> 1. This patch optimize the codegen of the following auto-vectorization codes:
> 
> void foo (int32_t * __restrict a, int64_t * __restrict b, int64_t * 
> __restrict c, int n) {
>      for (int i = 0; i < n; i++)
>        c[i] = (int64_t)a[i] + b[i];
> }
> 
> Combine instruction from:
> 
> ...
> vsext.vf2
> vadd.vv
> ...
> 
> into:
> 
> ...
> vwadd.wv
> ...
> 
> Since for PLUS operation, GCC prefer the following RTL operand order when 
> combining:
> 
> (plus: (sign_extend:..)
>         (reg:)
> 
> instead of
> 
> (plus: (reg:..)
>         (sign_extend:)

> 
> which is different from MINUS pattern.
Right.  Canonicaliation rules will have the sign_extend as the first operand 
when the opcode is associative.
> 
> I split patterns of vwadd/vwsub, and add dedicated patterns for them.
> 
> 2. This patch not only optimize the case as above (1) mentioned, also enhance 
> vwadd.vv/vwsub.vv
>     optimization for complicate PLUS/MINUS codes, consider this following 
> codes:
>     
> __attribute__ ((noipa)) void
> vwadd_int16_t_int8_t (int16_t *__restrict dst, int16_t *__restrict dst2,
>                     int16_t *__restrict dst3, int8_t *__restrict a,
>                     int8_t *__restrict b, int8_t *__restrict a2,
>                     int8_t *__restrict b2, int n)
> {
>    for (int i = 0; i < n; i++)
>      {
>        dst[i] = (int16_t) a[i] + (int16_t) b[i];
>        dst2[i] = (int16_t) a2[i] + (int16_t) b[i];
>        dst3[i] = (int16_t) a2[i] + (int16_t) a[i];
>      }
> }
> 
> Before this patch:
> ...
>          vsetvli zero,a6,e8,mf2,ta,ma
>          vle8.v  v2,0(a3)
>          vle8.v  v1,0(a4)
>          vsetvli t1,zero,e16,m1,ta,ma
>          vsext.vf2       v3,v2
>          vsext.vf2       v2,v1
>          vadd.vv v1,v2,v3
>          vsetvli zero,a6,e16,m1,ta,ma
>          vse16.v v1,0(a0)
>          vle8.v  v4,0(a5)
>          vsetvli t1,zero,e16,m1,ta,ma
>          vsext.vf2       v1,v4
>          vadd.vv v2,v1,v2
> ...
> 
> After this patch:
> ...
>          vsetvli      zero,a6,e8,mf2,ta,ma
>       vle8.v  v3,0(a4)
>       vle8.v  v1,0(a3)
>       vsetvli t4,zero,e8,mf2,ta,ma
>       vwadd.vv        v2,v1,v3
>       vsetvli zero,a6,e16,m1,ta,ma
>       vse16.v v2,0(a0)
>       vle8.v  v2,0(a5)
>       vsetvli t4,zero,e8,mf2,ta,ma
>       vwadd.vv        v4,v3,v2
>       vsetvli zero,a6,e16,m1,ta,ma
>       vse16.v v4,0(a1)
>       vsetvli t4,zero,e8,mf2,ta,ma
>       sub     a7,a7,a6
>       vwadd.vv        v3,v2,v1
>       vsetvli zero,a6,e16,m1,ta,ma
>       vse16.v v3,0(a2)
> ...
> 
> The reason why current upstream GCC can not optimize codes using vwadd 
> thoroughly is combine PASS needs intermediate RTL IR (extend one of 
> the operand pattern (vwadd.wv)), then base on this intermediate RTL IR, 
> extend the other operand to generate vwadd.vv.
> 
> So vwadd.wv/vwsub.wv definitely helps to vwadd.vv/vwsub.vv code optimizations.
>   
> gcc/ChangeLog:
> 
>          * config/riscv/riscv-vector-builtins-bases.cc: Change 
> vwadd.wv/vwsub.wv intrinsic API expander
>          * config/riscv/vector.md 
> (@pred_single_widen_<plus_minus:optab><any_extend:su><mode>): Remove it.
>          (@pred_single_widen_sub<any_extend:su><mode>): New pattern.
>          (@pred_single_widen_add<any_extend:su><mode>): New pattern.
> 
> gcc/testsuite/ChangeLog:
> 
>          * gcc.target/riscv/rvv/autovec/widen/widen-5.c: New test.
>          * gcc.target/riscv/rvv/autovec/widen/widen-6.c: New test.
>          * gcc.target/riscv/rvv/autovec/widen/widen-complicate-1.c: New test.
>          * gcc.target/riscv/rvv/autovec/widen/widen-complicate-2.c: New test.
>          * gcc.target/riscv/rvv/autovec/widen/widen_run-5.c: New test.
>          * gcc.target/riscv/rvv/autovec/widen/widen_run-6.c: New test.
OK
jeff

RE: [PATCH] RISC-V: Add vwadd.wv/vwsub.wv auto-vectorization lowering optimization

Reply via email to