from:"huiba.lhb\-\-\- via Gcc\-bugs"

[Bug rtl-optimization/120284] inline assembly operand constraint not comply with document

2025-05-14 Thread huiba.lhb--- via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120284

--- Comment #3 from 鲁七  ---
> Note the 0 there rather than r. r in the input means any register while 0
> means it needs to match the same register as the 0th operand which in this
> case is the output operand.

Thanks for your quick response. Using "0" does resolve the issue, as what "+r"
does.

And actually, I don't need the input register to be the same as the output
register. All I need here is making x both input and output to the assembly. So
in theory, should it also be OK to write ```asm volatile("" : "=r"(x) :
"r"(x))```?

BTW, the "+r" form and "0" form both produce code like this:

...
<+64>:   movq   %r15, %rax
<+67>:   movq   %rax, %r15
vpbroadcastd (%rax), %zmm17
...

This also looks sub-optimal.

[Bug rtl-optimization/120284] inline assembly operand constraint not comply with document

2025-05-14 Thread huiba.lhb--- via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120284

--- Comment #5 from Huiba Li  ---
> Marking x as an output without tieing it to another register will have
> garbage in the variable after the inline-asm. That is explicitly mentioned.

Oh, I see. 

Thanks!

[Bug rtl-optimization/120284] New: inline assembly operand constraint not comply with document

2025-05-14 Thread huiba.lhb--- via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120284

Bug ID: 120284
   Summary: inline assembly operand constraint not comply with
document
   Product: gcc
   Version: 14.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: huiba@alibaba-inc.com
  Target Milestone: ---

I'm writing a benchmark program, and I need to let gcc assume a variable
changes in each loop iteration, so as to avoid some optimizations and make it
more similar to real life scenario. I write it like this:

```
void test_foo(Obj* obj, const uint32_t* x) {
  for (int i = 0; I < N; ++i) {
asm volatile("" : "=r"(x) : "r"(x)); 
uint32_t result = obj->foo(x);
asm volatile("" : : "r"(result));
  }
}
```

I come across segment fault with some implementation of foo() with -O3 on
x86_64, using gcc 13.3.0 and 14.2.0. The issue doesn't exist with -O2 or clang
18.1.3. I disassemble the test_foo() function that has foo() inlined:

```
<+0>:endbr64
<+4>:pushq  %rbp
<+5>:movq   %rsp, %rbp
<+8>:pushq  %r15
<+10>:   pushq  %r14
<+12>:   pushq  %r13
<+14>:   pushq  %r12
<+16>:   pushq  %rbx
<+17>:   movq   %rdi, %r12
<+20>:   movq   %rsi, %r15
<+23>:   andq   $-0x40, %rsp
<+27>:   subq   $0x8, %rsp
<+31>:   movl   $0x5f5e100, -0x48(%rsp) ; imm = 0x5F5E100
<+39>:   vpmovsxbd 0x56a940(%rdi), %zmm1
<+49>:   vmovdqa32 (%rdi), %zmm0
<+55>:   nopw   (%rax,%rax)

<+64>:   movq   %rax, %r15  # chang it to movq   %r15, %rax

<+67>:   vpbroadcastd (%rax), %zmm17
<+73>:   vpbroadcastd 0x4(%rax), %zmm16
<+80>:   vpbroadcastd 0x8(%rax), %zmm15
<+87>:   vpbroadcastd 0xc(%rax), %zmm14
<+94>:   vpbroadcastd 0x10(%rax), %zmm13
<+101>:  vpbroadcastd 0x14(%rax), %zmm12
<+108>:  vpbroadcastd 0x18(%rax), %zmm11
<+115>:  vpbroadcastd 0x1c(%rax), %zmm10
<+122>:  vpbroadcastd 0x20(%rax), %zmm9
<+129>:  vpbroadcastd 0x24(%r15), %zmm8
<+136>:  vpbroadcastd 0x28(%r15), %zmm7
<+143>:  vpbroadcastd 0x2c(%r15), %zmm6

...

<+4464>: decl   -0x48(%rsp)
<+4468>: jne0x1870 ; <+64>
<+4474>: vzeroupper
<+4477>: leaq   -0x28(%rbp), %rsp
<+4481>: popq   %rbx
<+4482>: popq   %r12
<+4484>: popq   %r13
<+4486>: popq   %r14
<+4488>: popq   %r15
<+4490>: popq   %rbp
<+4491>: retq
```

I find that gcc is probably misusing a pair of registers in one instruction
located at <+64>. I manually exchange the 2 registers, and the program seems to
run correctly. If I change the 1st asm statement to ```asm volatile("" :
"+r"(x));``` by using "+r" constraint instead of "=r" and "r", the program also
runs correctly. I think these 2 forms are identical, as denoted in the
document:

https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Output-Operands

"When using ‘=’, do not assume the location contains the existing value on
entry to the asm, except when the operand is tied to an input;".

So there seems to be a bug in the frontend that doesn't forward the constraints
correctly to the backend. The command line is "g++-14 x.cpp -O3 -march=native",
I'm using ubuntu 24.04 on x86_64 (AMD EPYC 9T24), and the compilers are
installed with apt.


BTW, I also find some sub-optimal coding in the assembly: 

(1) it seems unnecessary for <+64> to move from %r15 to %rax, as we can just
use %r15 in the following lines; (and why it uses both of them?)

(2) it seems unnecessary for <+20> to move from %rsi to %r15, as we can just
use %rsi in the following lines;

[Bug rtl-optimization/120284] inline assembly operand constraint not comply with document

[Bug rtl-optimization/120284] inline assembly operand constraint not comply with document

[Bug rtl-optimization/120284] New: inline assembly operand constraint not comply with document

3 matches

Site Navigation

Mail list logo

Footer information