On 30/09/2024 4:15 pm, Jan Beulich wrote:
> To fold two 32-bit outputs from the asm()-s into a single 64-bit value
> the compiler needs to emit a zero-extension insn for the low half. Both
> RDMSR and RDTSC clear the upper halves of their output registers anyway,
> though. So despite that zero-extending insn (a simple MOV) being cheap,
> we can do better: Without one, by declaring the local variables as 64-
> bit ones.
>
> Signed-off-by: Jan Beulich <[email protected]>

Hmm.  This is generally better, but not universally so.

xen$ ../scripts/bloat-o-meter xen-syms-before xen-syms-after
add/remove: 0/0 grow/shrink: 2/29 up/down: 24/-193 (-169)
Function                                     old     new   delta
pci_cfg_ok                                   335     351     +16
arch_ioreq_server_get_type_addr              251     259      +8
udelay                                        81      79      -2
tsc_get_info                                 258     256      -2
trace                                       1320    1318      -2
time_latch_stamps                             76      74      -2
set_legacy_ssbd.isra                         146     144      -2
read_hyperv_timer                             81      79      -2
read_counter                                  47      45      -2
read_clocks_slave                            135     133      -2
mcheck_mca_logout                           2059    2057      -2
hwp_write_request                            105     103      -2
get_s_time_fixed                              91      89      -2
get_s_time                                    77      75      -2
cpu_init                                     251     249      -2
amd_mcheck_init                              754     752      -2
_svm_cpu_up                                  467     465      -2
tsc_check_writability                        187     184      -3
_probe_mask_msr                              106      99      -7
time_calibration_tsc_rendezvous              541     533      -8
time_calibration_std_rendezvous              197     189      -8
probe_cpuid_faulting                         180     172      -8
hvm_save                                     343     335      -8
amd_check_erratum_1474                       140     132      -8
calibrate_tsc                                382     372     -10
wait_for_nmis                                134     119     -15
read_msr                                    1220    1204     -16
guest_rdmsr                                 1869    1853     -16
check_tsc_warp.constprop                     201     185     -16
hwp_init_msrs                                440     420     -20
amd_log_freq.cold                            519     499     -20
Total: Before=3895379, After=3895210, chg -0.00%

arch_ioreq_server_get_type_addr goes from:

    and    $0x40,%dh
    je ...

to

    shl    $0x20,%rdx
    or     %rax,%rdx
    bt     $0x2e,%rdx
    jae ...

and pci_cfg_ok gets a "movabs $0x400000000000,%rcx" and 64bit test,
where previously it used "and    $0x40,%dh"

So, the use of 64bit variables as opposed to 32bit does seem to break
GCC's ability to substitute %dh.

Oh well - it's minor either way.

Reviewed-by: Andrew Cooper <[email protected]>

Reply via email to