Re: [PATCH] xen/x86: Work around Clang code generation bug with asm parameters

Jan Beulich Fri, 27 Nov 2020 02:23:11 -0800

On 12.11.2020 09:14, Jan Beulich wrote:
> On 11.11.2020 21:01, Andrew Cooper wrote:
>> On 11/11/2020 15:11, Jan Beulich wrote:
>>> On 11.11.2020 13:45, Andrew Cooper wrote:
>>>> Clang 9 and later don't handle the clobber of %r10 correctly in
>>>> _hypercall64_4().  See https://bugs.llvm.org/show_bug.cgi?id=48122
>>> Are you sure this is a bug?
>>
>> Yes.
>>
>>>  With ...
>>>
>>>>  #define _hypercall64_4(type, hcall, a1, a2, a3, a4)                     \
>>>>      ({                                                                  \
>>>> -        long res, tmp__;                                                \
>>>> -        register long _a4 asm ("r10") = ((long)(a4));                   \
>>>> +        long res, _a1 = (long)(a1), _a2 = (long)(a2),                   \
>>>> +            _a3 = (long)(a3);                                           \
>>>> +        register long _a4 asm ("r10") = (long)(a4);                     \
>>>>          asm volatile (                                                  \
>>>>              "call hypercall_page + %c[offset]"                          \
>>>> -            : "=a" (res), "=D" (tmp__), "=S" (tmp__), "=d" (tmp__),     \
>>>> -              "=&r" (tmp__) ASM_CALL_CONSTRAINT                         \
>>> ... this we've requested "any register", while with ...
>>>
>>>> -            : [offset] "i" (hcall * 32),                                \
>>>> -              "1" ((long)(a1)), "2" ((long)(a2)), "3" ((long)(a3)),     \
>>>> -              "4" (_a4)                                                 \
>>> ... this we've asked for that specific register to be initialized
>>> from r10 (and without telling the compiler that r10 is going to
>>> change).
>>
>> Consider applying that same reasoning to "1" instead of "4".  In that
>> case, a1 would no longer be bound to %rdi.
> 
> That's different: "=D" specifies the register, and "1" says "use
> the same register as input". Whereas, as said, "=&r" says "use
> any register" with "1" saying "use the same register" and (_a4)
> specifying where the value is to come from.
> 
>> The use of "4" explicitly binds the input and the output, which includes
>> requiring them to be the same register.
>>
>> Furthermore, LLVM tends to consider "not behaving in the same was as
>> GCC" a bug.
> 
> That's a fair statement, but then still the description wants
> re-wording. Plus of course future gcc is free to change their
> behavior to that currently observed with clang.
> 
> Consider the following example (on an arch where "f" is a
> floating point register and there are ways to copy directly
> between GPR and floating point registers:
> 
>    int i;
>    register float f asm("f7") = <input>;
>    asm("..." : "=r" (i) : "0" (f));
> 
> In this case obviously f7 can't be used for i (as it doesn't
> match "r"). It's merely that the initial value of i is to come
> from f7. In fact for Arm64 this
> 
> extern float flt;
> 
> int test(void) {
>       int i;
>       register float f asm("s7") = flt;
>       asm("add %0,%0,5" : "=r" (i) : "0" (f));
>       return i;
> }
> 
> behaves exactly as described:
> 
> test:
>         adrp    x0, flt
>         ldr     s7, [x0, @lo12(flt)]
>         fmov    w0, s7
>         add     x0, x0, #5
>         ret
> 
> (Whether fmov is a sensible choice here is a different question;
> I'd have expected some fcvt*.)


Meanwhile I've realized that I neither need to resort to Arm here,
nor to floating point, e.g.

int test2(int in) {
        int i;
        register int ri asm("ecx") = in;
        asm("nop %0" : "=r" (i) : "0" (ri));
        return i;
}

You'll find that the resulting code (at -O2; gcc 10.2.0) doesn't
use %ecx at all - %edi gets moved directly to %eax.

Jan

Re: [PATCH] xen/x86: Work around Clang code generation bug with asm parameters

Reply via email to