https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115362
--- Comment #21 from Jon Daniel <jondaniel879 at gmail dot com> ---
The generated assembler output of
g++:
vmaskmovps 40(%rsp), %xmm1, %xmm0
vmaskmovps 56(%rsp), %xmm1, %xmm2
vmulps %xmm2, %xmm0, %xmm0
Notice the lower memory address register is taken as the second source operand
clang++:
vmaskmovps 48(%rsp), %xmm0, %xmm2
vmaskmovps 32(%rsp), %xmm0, %xmm3
vmulps %xmm3, %xmm2, %xmm2
Notice the lower memory address register is taken as the first source operand