https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118057

--- Comment #3 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
(In reply to Robin Dapp from comment #2)
> I think depending on the performance of strided loads/stores this can be
> profitable to vectorize.  Looks like we need loop versioning to account for
> the possible aliasing but once this is out of the way we could be OK.
> 
> I have a local patch that uses strided stores here (in the limited example)
> but that's GCC 16 material.

I believe strided/indexed loads/stores are pretty expensive in most of the
hardware. For example, we have tested 625 X264 reference.

Clang use indexed load/store vectorize pixel_satd_8x4 wheras GCC is SLP
vectorizing with small length unit-stride load/store.

In K1:
gcc-14 real 24m2629, clang-20 real 30m51.174s.

Big performance drop from gcc-14 to clang-20.

Compile option: -march=rv64gcv_zvl256b -mrvv-vector-bits=zvl,
-mrvv-max-lmul=m2.

Reply via email to