https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117383
Bug ID: 117383 Summary: gcc relies on RISC-V vcompress instruction undefined behaviour Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: anton at ozlabs dot org Target Milestone: --- I think gcc is relying on undefined behaviour with the vcompress instruction. This thread explains how vcompress is different in that the tail starts after the last mask selected field: https://github.com/riscvarchive/riscv-v-spec/issues/796 There was a bug in QEMU that I just fixed that prevented the all 1s tail agnostic option (rvv_ta_all_1s) from poisoning these bits: https://lists.nongnu.org/archive/html/qemu-riscv/2024-10/msg00561.html With that fix, I see problems with the test case below until I modify the previous setvli from ta to tu. I think 9aabf81f40f0 ("RISC-V: Optimize permutation codegen with compress") is one place we need to set tail undisturbed. Build with: gcc -march=rv64gcv -mabi=lp64d -mrvv-vector-bits=zvl -O3 QEMU without all 1s tail agnostic poisoning: -1 -2 -3 -5 -7 -9 -10 -11 -12 -14 -15 -17 -19 -21 -22 -23 -26 -28 -30 -31 -37 -38 -41 -46 -47 -53 -54 -55 -60 -61 -62 -63 52 53 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 QEMU with all 1s tail agnostic poisoning: -1 -2 -3 -5 -7 -9 -10 -11 -12 -14 -15 -17 -19 -21 -22 -23 -26 -28 -30 -31 -37 -38 -41 -46 -47 -53 -54 -55 -60 -61 -62 -63 52 53 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 Not sure where the 52/53 values are coming from either. #include <stdio.h> #include <stdint.h> typedef int8_t vnx64i __attribute__ ((vector_size (64))); #define MASK_64 \ 1, 2, 3, 5, 7, 9, 10, 11, 12, 14, 15, 17, 19, 21, 22, 23, 26, 28, 30, 31, \ 37, 38, 41, 46, 47, 53, 54, 55, 60, 61, 62, 63, 76, 77, 78, 79, 80, 81, \ 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, \ 100, 101, 102, 103, 104, 105, 106, 107 void __attribute__ ((noinline, noclone)) test_1 (int8_t *x, int8_t *y, int8_t *out) { vnx64i v1 = *(vnx64i*)x; vnx64i v2 = *(vnx64i*)y; vnx64i v3 = __builtin_shufflevector (v1, v2, MASK_64); *(vnx64i*)out = v3; } int main(void) { int8_t x[64]; int8_t y[64]; int8_t out[64]; for (int i = 0; i < 64; i++) { x[i] = -i; y[i] = i; } test_1(x, y, out); for (int i = 0; i < 64; i++) { printf("%d\n", out[i]); } }