On 17/01/2024 9:37 am, Jan Beulich wrote:
> --- a/xen/arch/x86/ioport_emulate.c
> +++ b/xen/arch/x86/ioport_emulate.c
> @@ -8,11 +8,10 @@
>  #include <xen/sched.h>
>  #include <xen/dmi.h>
>  
> -unsigned int (*__read_mostly ioemul_handle_quirk)(
> -    uint8_t opcode, char *io_emul_stub, struct cpu_user_regs *regs);
> +bool __ro_after_init ioemul_handle_quirk;
>  
> -static unsigned int cf_check ioemul_handle_proliant_quirk(
> -    u8 opcode, char *io_emul_stub, struct cpu_user_regs *regs)
> +unsigned int ioemul_handle_proliant_quirk(
> +    uint8_t opcode, char *io_emul_stub, const struct cpu_user_regs *regs)
>  {
>      static const char stub[] = {
>          0x9c,       /*    pushf           */

Something occurred to me.

diff --git a/xen/arch/x86/ioport_emulate.c b/xen/arch/x86/ioport_emulate.c
index 23cba842b22e..70f94febe255 100644
--- a/xen/arch/x86/ioport_emulate.c
+++ b/xen/arch/x86/ioport_emulate.c
@@ -13,7 +13,7 @@ bool __ro_after_init ioemul_handle_quirk;
 unsigned int ioemul_handle_proliant_quirk(
     uint8_t opcode, char *io_emul_stub, const struct cpu_user_regs *regs)
 {
-    static const char stub[] = {
+    const char stub[] = {
         0x9c,       /*    pushf           */
         0xfa,       /*    cli             */
         0xee,       /*    out %al, %dx    */

is an improvement, confirmed by bloat-o-meter:

add/remove: 0/1 grow/shrink: 1/0 up/down: 1/-9 (-8)
Function                                     old     new   delta
ioemul_handle_proliant_quirk                  58      59      +1
stub                                           9       -      -9

The reason is that we've got a 9 byte object that's decomposed into two
rip-relative accesses.  i.e. we've got more pointer than data in this case.

But this adjustment seems to tickle a GCC bug.  With that change in
place, GCC emits:

<ioemul_handle_proliant_quirk>:
       48 83 ec 10             sub    $0x10,%rsp
       ...
       48 83 c4 10             add    $0x10,%rsp
       c3                      retq

i.e. we get a stack frame (space at least, no initialisation) despite
the object having been converted entirely to instruction immediates.

Or in other words, there's a further 12 byte saving available when GCC
can be persuaded to not even emit the stack frame.

What is even more weird is that I see this GCC-10, and a build of gcc
master from last week, but not when I try to reproduce in
https://godbolt.org/z/PnachbznW so there's probably some other setting
used by Xen which tickles this bug.

~Andrew

Reply via email to