Committed, thanks Jeff. Pan
-----Original Message----- From: Gcc-patches <gcc-patches-bounces+pan2.li=intel....@gcc.gnu.org> On Behalf Of Jeff Law via Gcc-patches Sent: Wednesday, July 12, 2023 7:19 AM To: juzhe.zh...@rivai.ai; gcc-patches@gcc.gnu.org Cc: kito.ch...@gmail.com; kito.ch...@sifive.com; rdapp....@gmail.com Subject: Re: [PATCH] RISC-V: Optimize permutation codegen with vcompress On 7/11/23 00:38, juzhe.zh...@rivai.ai wrote: > From: Ju-Zhe Zhong <juzhe.zh...@rivai.ai> > > This patch is to recognize specific permutation pattern which can be applied > compress approach. > > Consider this following case: > #include <stdint.h> > typedef int8_t vnx64i __attribute__ ((vector_size (64))); > #define MASK_64 > \ > 1, 2, 3, 5, 7, 9, 10, 11, 12, 14, 15, 17, 19, 21, 22, 23, 26, 28, 30, 31, > \ > 37, 38, 41, 46, 47, 53, 54, 55, 60, 61, 62, 63, 76, 77, 78, 79, 80, 81, > \ > 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, > \ > 100, 101, 102, 103, 104, 105, 106, 107 > void __attribute__ ((noinline, noclone)) test_1 (int8_t *x, int8_t *y, int8_t > *out) > { > vnx64i v1 = *(vnx64i*)x; > vnx64i v2 = *(vnx64i*)y; > vnx64i v3 = __builtin_shufflevector (v1, v2, MASK_64); > *(vnx64i*)out = v3; > } > > https://godbolt.org/z/P33nev6cW > > Before this patch: > lui a4,%hi(.LANCHOR0) > addi a4,a4,%lo(.LANCHOR0) > vl4re8.v v4,0(a4) > li a4,64 > vsetvli a5,zero,e8,m4,ta,mu > vl4re8.v v20,0(a0) > vl4re8.v v16,0(a1) > vmv.v.x v12,a4 > vrgather.vv v8,v20,v4 > vmsgeu.vv v0,v4,v12 > vsub.vv v4,v4,v12 > vrgather.vv v8,v16,v4,v0.t > vs4r.v v8,0(a2) > ret > > After this patch: > lui a4,%hi(.LANCHOR0) > addi a4,a4,%lo(.LANCHOR0) > vsetvli a5,zero,e8,m4,ta,ma > vl4re8.v v12,0(a1) > vl4re8.v v8,0(a0) > vlm.v v0,0(a4) > vslideup.vi v4,v12,20 > vcompress.vm v4,v8,v0 > vs4r.v v4,0(a2) > ret > > gcc/ChangeLog: > > * config/riscv/riscv-protos.h (enum insn_type): Add vcompress > optimization. > * config/riscv/riscv-v.cc (emit_vlmax_compress_insn): Ditto. > (shuffle_compress_patterns): Ditto. > (expand_vec_perm_const_1): Ditto. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/rvv/autovec/vls-vlmax/compress-1.c: New test. > * gcc.target/riscv/rvv/autovec/vls-vlmax/compress-2.c: New test. > * gcc.target/riscv/rvv/autovec/vls-vlmax/compress-3.c: New test. > * gcc.target/riscv/rvv/autovec/vls-vlmax/compress-4.c: New test. > * gcc.target/riscv/rvv/autovec/vls-vlmax/compress-5.c: New test. > * gcc.target/riscv/rvv/autovec/vls-vlmax/compress-6.c: New test. > * gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-1.c: New test. > * gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-2.c: New test. > * gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-3.c: New test. > * gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-4.c: New test. > * gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-5.c: New test. > * gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-6.c: New test. I had to look at this a few times, but I think that's because it's been polluted by another vector architecture's handling of compressed vectors. What you're doing looks quite reasonable. OK for the trunk. jeff