https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118734

--- Comment #5 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Robin Dapp <rd...@gcc.gnu.org>:

https://gcc.gnu.org/g:dcba959fb30dc250eeb6fdd05aa878e5f1fc8c2d

commit r16-2174-gdcba959fb30dc250eeb6fdd05aa878e5f1fc8c2d
Author: Robin Dapp <rd...@ventanamicro.com>
Date:   Thu Jul 10 09:41:48 2025 +0200

    RISC-V: Make zero-stride load broadcast a tunable.

    This patch makes the zero-stride load broadcast idiom dependent on a
    uarch-tunable "use_zero_stride_load".  Right now we have quite a few
    paths that reach a strided load and some of them are not exactly
    straightforward.

    While broadcast is relatively rare on rv64 targets it is more common on
    rv32 targets that want to vectorize 64-bit elements.

    While the patch is more involved than I would have liked it could have
    even touched more places.  The whole broadcast-like insn path feels a
    bit hackish due to the several optimizations we employ.  Some of the
    complications stem from the fact that we lump together real broadcasts,
    vector single-element sets, and strided broadcasts.  The strided-load
    alternatives currently require a memory_constraint to work properly
    which causes more complications when trying to disable just these.

    In short, the whole pred_broadcast handling in combination with the
    sew64_scalar_helper could use work in the future.  I was about to start
    with it in this patch but soon realized that it would only distract from
    the original intent.  What can help in the future is split strided and
    non-strided broadcast entirely, as well as the single-element sets.

    Yet unclear is whether we need to pay special attention for misaligned
    strided loads (PR120782).

    I regtested on rv32 and rv64 with strided_load_broadcast_p forced to
    true and false.  With either I didn't observe any new execution failures
    but obviously there are new scan failures with strided broadcast turned
    off.

            PR target/118734

    gcc/ChangeLog:

            * config/riscv/constraints.md (Wdm): Use tunable for Wdm
            constraint.
            * config/riscv/riscv-protos.h (emit_avltype_insn): Declare.
            (can_be_broadcasted_p): Rename to...
            (can_be_broadcast_p): ...this.
            * config/riscv/predicates.md: Use renamed function.
            (strided_load_broadcast_p): Declare.
            * config/riscv/riscv-selftests.cc (run_broadcast_selftests):
            Only run broadcast selftest if strided broadcasts are OK.
            * config/riscv/riscv-v.cc (emit_avltype_insn): New function.
            (sew64_scalar_helper): Only emit a pred_broadcast if the new
            tunable says so.
            (can_be_broadcasted_p): Rename to...
            (can_be_broadcast_p): ...this and use new tunable.
            * config/riscv/riscv.cc (struct riscv_tune_param): Add strided
            broad tunable.
            (strided_load_broadcast_p): Implement.
            * config/riscv/vector.md: Use strided_load_broadcast_p () and
            work around 64-bit broadcast on rv32 targets.

Reply via email to