https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883
--- Comment #16 from Jeffrey A. Law <law at redhat dot com> --- The issue here (of course) is that aarch64 has a different set of defaults for when to open-code vs loop vs function call. My attempts to pick a better size for the objects results in failures on other targets. Do we have a method on aarch64 to tune this stuff via flags? Otherwise I'm likely to just xfail aarch64 and move on since DSE is doing what we want at this point if given sane input.