https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121488
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> --- I should mention `-O2 -mavx2` on x86_64. For some reason without -mavx2, we can optimize it on x86_64 due some changes in forwprop2. On aarch64 that forwprop does not happen either.