On Thu, Jan 18, 2024 at 5:42 PM LIU Hao <[email protected]> wrote:
>
> 在 2024-01-18 17:02, Fangrui Song 写道:
> > Thanks for the proposal. I hope that -masm=intel becomes more useful:)
> >
> > Do you have a list of assembly in the unambiguous cases that fail to
> > be parsed today as a gas PR?
> > For example,
>
> Not really. Most of these are results from high-level languages. For example:
>
> # Expected: `movl shr(%rip), %eax`
> # Actual: error: invalid use of operator "shr"
> mov eax, DWORD PTR shr[rip]
>
> # Expected: `movl dword(%rip), %eax`
> # Actual: accepted as `movl 4(%rip), %eax`
> mov eax, DWORD ptr dword[rip]
GCC seems to print a symbol displacement, possibly with a modifier
(for a relocation), before the left bracket.
mov edx, DWORD PTR bx@GOT[eax]
mov edx, DWORD PTR bx[eax]
mov edx, DWORD PTR and[eax] # Error: invalid use of operator "and"
Technically, assemblers (gas and LLVM integrated assembler) can be
made to parse "bx" as a symbol, even if it matches a register name or
an operator name ("and").
However, a straightforward approach using one lookahead token cannot
disambiguate the following two cases.
mov edx, DWORD PTR fs:[eax] # segment override prefix
mov edx, DWORD PTR fs[eax] # symbol
So, we would need two lookahead tokens...
(https://github.com/llvm/llvm-project/blob/c6a6547798ca641b985456997cdf986bb99b0707/llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp#L2534-L2550
needs more code to parse `fs:` correctly.)
It is also unfortunate that whether the displacement is an immediate
or not change the behavior of brackets.
mov eax, DWORD PTR 0 # mov $0x0,%eax
mov eax, DWORD PTR [0] # mov 0x0,%eax
mov eax, DWORD PTR sym # mov 0x0,%eax with relocation
mov eax, DWORD PTR [sym] # mov 0x0,%eax with relocation
The above reveals yet another inconsistency. For a memory reference,
it seems that we should use [] but [sym] could be ambiguous if sym
matches a register name or operator name.
Does the proposal change the placement of the displacement depending
on whether it is an immediate?
This is inconsistent, but perhaps there is not much we can improve...
extern int a[2];
int foo() { return a[1]+a[2]; }
GCC's PIC -masm=intel output
mov eax, DWORD PTR a[rip+8]
add eax, DWORD PTR a[rip+4]
The displacements (a+8 and a+4) involve a plus expression and `a` and
`8`/`4` are printed in two places.
> In addition, `as -msyntax=intel -mnaked-reg` doesn't seem to be equivalent to
> `.intel_syntax noprefix`:
>
> $ as -msyntax=intel -mnaked-reg <<< 'mov eax, DWORD PTR gs:0x48' -o a.o
> {standard input}: Assembler messages:
> {standard input}:1: Error: invalid use of register
>
> $ as <<< '.intel_syntax noprefix; mov eax, DWORD PTR gs:0x48' -o a.o &&
> objdump -Mintel -d a.o
> ...
> 0000000000000000 <.text>:
> 0: 65 8b 04 25 48 00 00 mov eax,DWORD PTR gs:0x48
Confirmed by Jan.