https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118734
--- Comment #5 from GCC Commits <cvs-commit at gcc dot gnu.org> --- The master branch has been updated by Robin Dapp <rd...@gcc.gnu.org>: https://gcc.gnu.org/g:dcba959fb30dc250eeb6fdd05aa878e5f1fc8c2d commit r16-2174-gdcba959fb30dc250eeb6fdd05aa878e5f1fc8c2d Author: Robin Dapp <rd...@ventanamicro.com> Date: Thu Jul 10 09:41:48 2025 +0200 RISC-V: Make zero-stride load broadcast a tunable. This patch makes the zero-stride load broadcast idiom dependent on a uarch-tunable "use_zero_stride_load". Right now we have quite a few paths that reach a strided load and some of them are not exactly straightforward. While broadcast is relatively rare on rv64 targets it is more common on rv32 targets that want to vectorize 64-bit elements. While the patch is more involved than I would have liked it could have even touched more places. The whole broadcast-like insn path feels a bit hackish due to the several optimizations we employ. Some of the complications stem from the fact that we lump together real broadcasts, vector single-element sets, and strided broadcasts. The strided-load alternatives currently require a memory_constraint to work properly which causes more complications when trying to disable just these. In short, the whole pred_broadcast handling in combination with the sew64_scalar_helper could use work in the future. I was about to start with it in this patch but soon realized that it would only distract from the original intent. What can help in the future is split strided and non-strided broadcast entirely, as well as the single-element sets. Yet unclear is whether we need to pay special attention for misaligned strided loads (PR120782). I regtested on rv32 and rv64 with strided_load_broadcast_p forced to true and false. With either I didn't observe any new execution failures but obviously there are new scan failures with strided broadcast turned off. PR target/118734 gcc/ChangeLog: * config/riscv/constraints.md (Wdm): Use tunable for Wdm constraint. * config/riscv/riscv-protos.h (emit_avltype_insn): Declare. (can_be_broadcasted_p): Rename to... (can_be_broadcast_p): ...this. * config/riscv/predicates.md: Use renamed function. (strided_load_broadcast_p): Declare. * config/riscv/riscv-selftests.cc (run_broadcast_selftests): Only run broadcast selftest if strided broadcasts are OK. * config/riscv/riscv-v.cc (emit_avltype_insn): New function. (sew64_scalar_helper): Only emit a pred_broadcast if the new tunable says so. (can_be_broadcasted_p): Rename to... (can_be_broadcast_p): ...this and use new tunable. * config/riscv/riscv.cc (struct riscv_tune_param): Add strided broad tunable. (strided_load_broadcast_p): Implement. * config/riscv/vector.md: Use strided_load_broadcast_p () and work around 64-bit broadcast on rv32 targets.