[Bug target/82260] [x86] Unnecessary use of 8-bit registers with -Os. slightly slower and larger code

peter at cordes dot ca Wed, 20 Sep 2017 08:36:55 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82260


--- Comment #4 from Peter Cordes <peter at cordes dot ca> ---
(In reply to Jakub Jelinek from comment #2)
> From pure instruction size POV, for the first 2 alternatives as can be seen
> say on:
> ...
> movb $0x15, %al
> movl $0x15, %eax
> movb $-0x78, %bl
> movl $-0x78, %ebx

There are ways to save code-size when setting up constants.  If you already
have one constant in a register, you can get other nearby constants in 3 bytes
with LEA

  xor  %edi, %edi         # you often need a zero for something
  lea -0x78(%rdi), %ebx   # 3 bytes vs. 5 for mov $imm32, %r32

Or a 4-byte LEA with a 64-bit destination to replace a 7-byte mov $imm32, %r64.
 Modern CPUs have pretty good LEA throughput (2 per clock on Intel SnB-family +
KNL and AMD K8/K10/BD-family/Zen), especially for 2-component LEA (base + disp,
no index).  1 per clock on others, still 1c latency.  With efficient
xor-zeroing support, the LEA can execute without any extra delay even if it
issues in the same cycle as the xor-zeroing.  If using LEA relative to some
other constant, well it's still just 1c extra.

If gcc had a -Oz mode like clang does (optimize for size even more), you could
consider stuff like 3-byte push+pop (clobbering the top of the red zone).

  push $-0x78       # imm8 sign-extended to 64-bit 
  pop  %rbx

https://stackoverflow.com/questions/45105164/set-all-bits-in-cpu-register-to-1-efficiently
https://stackoverflow.com/questions/33825546/shortest-intel-x86-64-opcode-for-rax=1

[Bug target/82260] [x86] Unnecessary use of 8-bit registers with -Os. slightly slower and larger code

Reply via email to