https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120687
--- Comment #2 from ktkachov at gcc dot gnu.org --- I similarly see this generates ~200 lines of assembly for aarch64 compared to ~20 with Clang so I'd mark it as target-independent. I think I remember a bug in the past about the need for loop rerolling functionality in the vectoriser