https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117544
Bug ID: 117544 Summary: Lack of vsetvli after function call for whole register move Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: kito at gcc dot gnu.org CC: jeffreyalaw at gmail dot com, juzhe.zhong at rivai dot ai, palmer at gcc dot gnu.org, pan2.li at intel dot com, rdapp at gcc dot gnu.org Target Milestone: --- Target: riscv64 Cross posting from RISC-V LLVM community. Unfortunately, whole register move instructions depend on vtype*1, which means they will cause an illegal instruction exception if VILL=1. This is generally not a problem, as VILL is set to 0 after any valid vsetvli instruction, so it’s usually safe unless the user executes a whole vector register move very early in the program. However, the situation changed after the Linux kernel applied a patch[2] that sets VILL=1 after any system call. So, if we try to execute a whole register move after a system call, it will cause an illegal instruction exception. This can be difficult to detect, as the system call may not be invoked immediately; it might be deeply nested in a call chain, such as within printf. Unfortunately, this change has already shipped with Linux kernel 6.5, which was released on August 28, 2023. I'm not sure if it's reasonable to ask the Linux kernel maintainers to fix this by keeping VILL consistent across system calls. An alternative approach is to address this issue on the toolchain side by requiring at least one valid vsetvli instruction before any whole register move. This might be an ugly workaround, but it’s probably the simplest way to resolve the issue. I also realized this might be a better solution since the psABI specifies that VTYPE is NOT preserved across function calls. This means we can’t guarantee that VILL is not 1 at the function entry, so placing a vsetvli instruction right after the function call may be necessary. Testcase: #include <riscv_vector.h> void bar() __attribute__((riscv_vector_cc)); vint32m1_t foo(vint32m1_t a, vint32m1_t b) { register vint32m1_t x asm("v24") = b; bar(); asm ("#xx %0"::"vr"(x) ); return x; } Generated asm with riscv-linux-gcc -O3 -march=rv64gcv: foo: addi sp,sp,-16 csrr t0,vlenb sd ra,8(sp) sub sp,sp,t0 vs1r.v v24,0(sp) vmv1r.v v24,v9 call bar csrr t0,vlenb vmv1r.v v8,v24 vl1re64.v v24,0(sp) add sp,sp,t0 ld ra,8(sp) addi sp,sp,16 jr ra And the compiler could emits code like below to fix this issue: foo: addi sp,sp,-16 csrr t0,vlenb sd ra,8(sp) sub sp,sp,t0 vs1r.v v24,0(sp) vsetivli x0, 0, e8, m1, ta, ma # Need vsetvli to make VILL=0 here vmv1r.v v24,v9 call bar csrr t0,vlenb vsetivli x0, 0, e8, m1, ta, ma # Need vsetvli to make VILL=0 here vmv1r.v v8,v24 vl1re64.v v24,0(sp) add sp,sp,t0 ld ra,8(sp) addi sp,sp,16 jr ra NOTE: We have hit this issue within our internal spec run. *1 That clarification[1] is added after 1.0... [1] https://github.com/riscvarchive/riscv-v-spec/commit/856fe5bd1cb135c39258e6ca941bf234ae63e1b1 [2] https://github.com/torvalds/linux/commit/9657e9b7d2538dc73c24947aa00a8525dfb8062c