https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114107
--- Comment #9 from N Schaeffer <nathanael.schaeffer at gmail dot com> --- In addition, optimizing for size with -Os leads to a non-vectorized double-loop (51 bytes) while the vectorized loop with vbroadcastsd (produced by clang -Os) leads to 40 bytes. It is thus also a missed optimization for -Os.