Committed, thanks Jeff.

Pan

-----Original Message-----
From: Gcc-patches <gcc-patches-bounces+pan2.li=intel....@gcc.gnu.org> On Behalf 
Of Jeff Law via Gcc-patches
Sent: Wednesday, July 12, 2023 7:19 AM
To: juzhe.zh...@rivai.ai; gcc-patches@gcc.gnu.org
Cc: kito.ch...@gmail.com; kito.ch...@sifive.com; rdapp....@gmail.com
Subject: Re: [PATCH] RISC-V: Optimize permutation codegen with vcompress



On 7/11/23 00:38, juzhe.zh...@rivai.ai wrote:
> From: Ju-Zhe Zhong <juzhe.zh...@rivai.ai>
> 
> This patch is to recognize specific permutation pattern which can be applied 
> compress approach.
> 
> Consider this following case:
> #include <stdint.h>
> typedef int8_t vnx64i __attribute__ ((vector_size (64)));
> #define MASK_64                                                               
>  \
>    1, 2, 3, 5, 7, 9, 10, 11, 12, 14, 15, 17, 19, 21, 22, 23, 26, 28, 30, 31,  
>   \
>      37, 38, 41, 46, 47, 53, 54, 55, 60, 61, 62, 63, 76, 77, 78, 79, 80, 81,  
>   \
>      82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99,  
>   \
>      100, 101, 102, 103, 104, 105, 106, 107
> void __attribute__ ((noinline, noclone)) test_1 (int8_t *x, int8_t *y, int8_t 
> *out)
> {
>    vnx64i v1 = *(vnx64i*)x;
>    vnx64i v2 = *(vnx64i*)y;
>    vnx64i v3 = __builtin_shufflevector (v1, v2, MASK_64);
>    *(vnx64i*)out = v3;
> }
> 
> https://godbolt.org/z/P33nev6cW
> 
> Before this patch:
>          lui     a4,%hi(.LANCHOR0)
>          addi    a4,a4,%lo(.LANCHOR0)
>          vl4re8.v        v4,0(a4)
>          li      a4,64
>          vsetvli a5,zero,e8,m4,ta,mu
>          vl4re8.v        v20,0(a0)
>          vl4re8.v        v16,0(a1)
>          vmv.v.x v12,a4
>          vrgather.vv     v8,v20,v4
>          vmsgeu.vv       v0,v4,v12
>          vsub.vv v4,v4,v12
>          vrgather.vv     v8,v16,v4,v0.t
>          vs4r.v  v8,0(a2)
>          ret
> 
> After this patch:
>       lui     a4,%hi(.LANCHOR0)
>       addi    a4,a4,%lo(.LANCHOR0)
>       vsetvli a5,zero,e8,m4,ta,ma
>       vl4re8.v        v12,0(a1)
>       vl4re8.v        v8,0(a0)
>       vlm.v   v0,0(a4)
>       vslideup.vi     v4,v12,20
>       vcompress.vm    v4,v8,v0
>       vs4r.v  v4,0(a2)
>       ret
> 
> gcc/ChangeLog:
> 
>          * config/riscv/riscv-protos.h (enum insn_type): Add vcompress 
> optimization.
>          * config/riscv/riscv-v.cc (emit_vlmax_compress_insn): Ditto.
>          (shuffle_compress_patterns): Ditto.
>          (expand_vec_perm_const_1): Ditto.
> 
> gcc/testsuite/ChangeLog:
> 
>          * gcc.target/riscv/rvv/autovec/vls-vlmax/compress-1.c: New test.
>          * gcc.target/riscv/rvv/autovec/vls-vlmax/compress-2.c: New test.
>          * gcc.target/riscv/rvv/autovec/vls-vlmax/compress-3.c: New test.
>          * gcc.target/riscv/rvv/autovec/vls-vlmax/compress-4.c: New test.
>          * gcc.target/riscv/rvv/autovec/vls-vlmax/compress-5.c: New test.
>          * gcc.target/riscv/rvv/autovec/vls-vlmax/compress-6.c: New test.
>          * gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-1.c: New test.
>          * gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-2.c: New test.
>          * gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-3.c: New test.
>          * gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-4.c: New test.
>          * gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-5.c: New test.
>          * gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-6.c: New test.
I had to look at this a few times, but I think that's because it's been 
polluted by another vector architecture's handling of compressed 
vectors.  What you're doing looks quite reasonable.

OK for the trunk.

jeff

Reply via email to