On 12.11.2020 09:14, Jan Beulich wrote:
> On 11.11.2020 21:01, Andrew Cooper wrote:
>> On 11/11/2020 15:11, Jan Beulich wrote:
>>> On 11.11.2020 13:45, Andrew Cooper wrote:
>>>> Clang 9 and later don't handle the clobber of %r10 correctly in
>>>> _hypercall64_4(). See https://bugs.llvm.org/show_bug.cgi?id=48122
>>> Are you sure this is a bug?
>>
>> Yes.
>>
>>> With ...
>>>
>>>> #define _hypercall64_4(type, hcall, a1, a2, a3, a4) \
>>>> ({ \
>>>> - long res, tmp__; \
>>>> - register long _a4 asm ("r10") = ((long)(a4)); \
>>>> + long res, _a1 = (long)(a1), _a2 = (long)(a2), \
>>>> + _a3 = (long)(a3); \
>>>> + register long _a4 asm ("r10") = (long)(a4); \
>>>> asm volatile ( \
>>>> "call hypercall_page + %c[offset]" \
>>>> - : "=a" (res), "=D" (tmp__), "=S" (tmp__), "=d" (tmp__), \
>>>> - "=&r" (tmp__) ASM_CALL_CONSTRAINT \
>>> ... this we've requested "any register", while with ...
>>>
>>>> - : [offset] "i" (hcall * 32), \
>>>> - "1" ((long)(a1)), "2" ((long)(a2)), "3" ((long)(a3)), \
>>>> - "4" (_a4) \
>>> ... this we've asked for that specific register to be initialized
>>> from r10 (and without telling the compiler that r10 is going to
>>> change).
>>
>> Consider applying that same reasoning to "1" instead of "4". In that
>> case, a1 would no longer be bound to %rdi.
>
> That's different: "=D" specifies the register, and "1" says "use
> the same register as input". Whereas, as said, "=&r" says "use
> any register" with "1" saying "use the same register" and (_a4)
> specifying where the value is to come from.
>
>> The use of "4" explicitly binds the input and the output, which includes
>> requiring them to be the same register.
>>
>> Furthermore, LLVM tends to consider "not behaving in the same was as
>> GCC" a bug.
>
> That's a fair statement, but then still the description wants
> re-wording. Plus of course future gcc is free to change their
> behavior to that currently observed with clang.
>
> Consider the following example (on an arch where "f" is a
> floating point register and there are ways to copy directly
> between GPR and floating point registers:
>
> int i;
> register float f asm("f7") = <input>;
> asm("..." : "=r" (i) : "0" (f));
>
> In this case obviously f7 can't be used for i (as it doesn't
> match "r"). It's merely that the initial value of i is to come
> from f7. In fact for Arm64 this
>
> extern float flt;
>
> int test(void) {
> int i;
> register float f asm("s7") = flt;
> asm("add %0,%0,5" : "=r" (i) : "0" (f));
> return i;
> }
>
> behaves exactly as described:
>
> test:
> adrp x0, flt
> ldr s7, [x0, @lo12(flt)]
> fmov w0, s7
> add x0, x0, #5
> ret
>
> (Whether fmov is a sensible choice here is a different question;
> I'd have expected some fcvt*.)
Meanwhile I've realized that I neither need to resort to Arm here,
nor to floating point, e.g.
int test2(int in) {
int i;
register int ri asm("ecx") = in;
asm("nop %0" : "=r" (i) : "0" (ri));
return i;
}
You'll find that the resulting code (at -O2; gcc 10.2.0) doesn't
use %ecx at all - %edi gets moved directly to %eax.
Jan