https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118019
--- Comment #8 from JuzheZhong <juzhe.zhong at rivai dot ai> --- (In reply to Robin Dapp from comment #7) > > The problem is GCC-15 has performance regression compare to GCC-14 on both > > strict align and we should fix it, we can't specify use no strict align in > > GCC-15 to pretend that we don't have such performance regression. > > The problem is that in GCC 15 we're now vectorizing the first loop and don't > cost it properly, most likely the vec_init/vec_construct is too inexpensive. > A solution could be two-fold: > - Increase those costs (but we can never get them really correct in case a > vec_init is just a broadcast or so) > - Enhance our vec_init expander to also allow construction from sub vectors > so the initialization becomes cheaper and we don't need to clumsily load > every uint8 element separately. > > I can start with the second one which should also help other workloads. > There are several follow-up items, though, including proper subreg handling > for those cases. Thanks. The second solution looks nice to me. Looking forward your patches