https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115362
--- Comment #21 from Jon Daniel <jondaniel879 at gmail dot com> --- The generated assembler output of g++: vmaskmovps 40(%rsp), %xmm1, %xmm0 vmaskmovps 56(%rsp), %xmm1, %xmm2 vmulps %xmm2, %xmm0, %xmm0 Notice the lower memory address register is taken as the second source operand clang++: vmaskmovps 48(%rsp), %xmm0, %xmm2 vmaskmovps 32(%rsp), %xmm0, %xmm3 vmulps %xmm3, %xmm2, %xmm2 Notice the lower memory address register is taken as the first source operand