https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82260
--- Comment #4 from Peter Cordes <peter at cordes dot ca> --- (In reply to Jakub Jelinek from comment #2) > From pure instruction size POV, for the first 2 alternatives as can be seen > say on: > ... > movb $0x15, %al > movl $0x15, %eax > movb $-0x78, %bl > movl $-0x78, %ebx There are ways to save code-size when setting up constants. If you already have one constant in a register, you can get other nearby constants in 3 bytes with LEA xor %edi, %edi # you often need a zero for something lea -0x78(%rdi), %ebx # 3 bytes vs. 5 for mov $imm32, %r32 Or a 4-byte LEA with a 64-bit destination to replace a 7-byte mov $imm32, %r64. Modern CPUs have pretty good LEA throughput (2 per clock on Intel SnB-family + KNL and AMD K8/K10/BD-family/Zen), especially for 2-component LEA (base + disp, no index). 1 per clock on others, still 1c latency. With efficient xor-zeroing support, the LEA can execute without any extra delay even if it issues in the same cycle as the xor-zeroing. If using LEA relative to some other constant, well it's still just 1c extra. If gcc had a -Oz mode like clang does (optimize for size even more), you could consider stuff like 3-byte push+pop (clobbering the top of the red zone). push $-0x78 # imm8 sign-extended to 64-bit pop %rbx https://stackoverflow.com/questions/45105164/set-all-bits-in-cpu-register-to-1-efficiently https://stackoverflow.com/questions/33825546/shortest-intel-x86-64-opcode-for-rax=1