https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67577
--- Comment #1 from Joel Yliluoma <bisqwit at iki dot fi> --- It may be also worth mentioning that adding an explicit '#pragma omp simd' before each of those loops, inside the operator functions, will make sure that GCC at least does the mathematics using packed registers. The memory store cannot apparently be forced to occur without redundant temporaries though.