https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120284
Bug ID: 120284
Summary: inline assembly operand constraint not comply with
document
Product: gcc
Version: 14.2.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: huiba@alibaba-inc.com
Target Milestone: ---
I'm writing a benchmark program, and I need to let gcc assume a variable
changes in each loop iteration, so as to avoid some optimizations and make it
more similar to real life scenario. I write it like this:
```
void test_foo(Obj* obj, const uint32_t* x) {
for (int i = 0; I < N; ++i) {
asm volatile("" : "=r"(x) : "r"(x));
uint32_t result = obj->foo(x);
asm volatile("" : : "r"(result));
}
}
```
I come across segment fault with some implementation of foo() with -O3 on
x86_64, using gcc 13.3.0 and 14.2.0. The issue doesn't exist with -O2 or clang
18.1.3. I disassemble the test_foo() function that has foo() inlined:
```
<+0>:endbr64
<+4>:pushq %rbp
<+5>:movq %rsp, %rbp
<+8>:pushq %r15
<+10>: pushq %r14
<+12>: pushq %r13
<+14>: pushq %r12
<+16>: pushq %rbx
<+17>: movq %rdi, %r12
<+20>: movq %rsi, %r15
<+23>: andq $-0x40, %rsp
<+27>: subq $0x8, %rsp
<+31>: movl $0x5f5e100, -0x48(%rsp) ; imm = 0x5F5E100
<+39>: vpmovsxbd 0x56a940(%rdi), %zmm1
<+49>: vmovdqa32 (%rdi), %zmm0
<+55>: nopw (%rax,%rax)
<+64>: movq %rax, %r15 # chang it to movq %r15, %rax
<+67>: vpbroadcastd (%rax), %zmm17
<+73>: vpbroadcastd 0x4(%rax), %zmm16
<+80>: vpbroadcastd 0x8(%rax), %zmm15
<+87>: vpbroadcastd 0xc(%rax), %zmm14
<+94>: vpbroadcastd 0x10(%rax), %zmm13
<+101>: vpbroadcastd 0x14(%rax), %zmm12
<+108>: vpbroadcastd 0x18(%rax), %zmm11
<+115>: vpbroadcastd 0x1c(%rax), %zmm10
<+122>: vpbroadcastd 0x20(%rax), %zmm9
<+129>: vpbroadcastd 0x24(%r15), %zmm8
<+136>: vpbroadcastd 0x28(%r15), %zmm7
<+143>: vpbroadcastd 0x2c(%r15), %zmm6
...
<+4464>: decl -0x48(%rsp)
<+4468>: jne0x1870 ; <+64>
<+4474>: vzeroupper
<+4477>: leaq -0x28(%rbp), %rsp
<+4481>: popq %rbx
<+4482>: popq %r12
<+4484>: popq %r13
<+4486>: popq %r14
<+4488>: popq %r15
<+4490>: popq %rbp
<+4491>: retq
```
I find that gcc is probably misusing a pair of registers in one instruction
located at <+64>. I manually exchange the 2 registers, and the program seems to
run correctly. If I change the 1st asm statement to ```asm volatile("" :
"+r"(x));``` by using "+r" constraint instead of "=r" and "r", the program also
runs correctly. I think these 2 forms are identical, as denoted in the
document:
https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Output-Operands
"When using β=β, do not assume the location contains the existing value on
entry to the asm, except when the operand is tied to an input;".
So there seems to be a bug in the frontend that doesn't forward the constraints
correctly to the backend. The command line is "g++-14 x.cpp -O3 -march=native",
I'm using ubuntu 24.04 on x86_64 (AMD EPYC 9T24), and the compilers are
installed with apt.
BTW, I also find some sub-optimal coding in the assembly:
(1) it seems unnecessary for <+64> to move from %r15 to %rax, as we can just
use %r15 in the following lines; (and why it uses both of them?)
(2) it seems unnecessary for <+20> to move from %rsi to %r15, as we can just
use %rsi in the following lines;