https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112387
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Last reconfirmed| |2023-11-06 Ever confirmed|0 |1 --- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> --- Indeed less gathers should be better. Note gathers are costed as vector_load (for the index vector) and N times scalar_load when not emulated. There isn't a specific gather_load. I suspect your scalar_load costs do not inter-operate with vector_* costs (aka you add apples and oranges?)