https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120284
--- Comment #7 from Huiba Li <huiba....@alibaba-inc.com> --- > In that case you need to use "movq %1, %0" in the asm to actually copy the > value, because the constraints don't guarantee it is the same register, it > can very well be a different one. > By using "0" or "+r" you require that the input and output use the same > register and so don't need to copy anything. Thanks for pointing it out. It give me a deeper understanding about the constraints. BTW, is it expected that gcc produces code like this: ``` ... <+64>: movq %r15, %rax <+67>: movq %rax, %r15 vpbroadcastd (%rax), %zmm17 ...``` Especially when gcc can use %rsi, instead of neither %r15 nor %rax.