https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120687
--- Comment #14 from Richard Biener <rguenth at gcc dot gnu.org> --- So this is now as described in comment#11, the reduction chain is correctly discovered but we fail to unpermute the load during permute optimization and so fall back to single-lane SLP which naturally doesn't have reduction chains.
