https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99408
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |53947 Keywords| |missed-optimization --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- Hum, GCCs code _looks_ faster. Maybe it's our tendency to duplicate memory accesses in vector instructions (there's a PR about this somewhere). A load uop on every stmt is likely the bottleneck here. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 [Bug 53947] [meta-bug] vectorizer missed-optimizations