https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82267
--- Comment #6 from Peter Cordes <peter at cordes dot ca> --- (In reply to H.J. Lu from comment #2) > > Are there still cases where -maddress-mode=long makes worse code? > > > Yes, there are more places where -maddress-mode=long needs to zero-extend > address to 64 bits where 0x67 prefix does for you. So ideally, gcc should use 0x67 opportunistically where it saves a zero-extension instruction. Using 64-bit address size opportunistically wherever we're sure it's safe seems like a good idea, but I assume that's not easy to implement. Can we teach -maddress-mode=long that a 0x67 prefix is a nearly-free way to zero-extend as part of an addressing-mode, so it will use that instead of extra instructions? > > SSSE3 and later instructions need 66 0F 3A/38 before the opcode, so an > > address-size or REX prefix will cause a decode stall on Silvermont. With > > That is true. > > Similarly, Bulldozer-family has a 3-prefix limit, but doesn't > > count escape bytes, and VEX only counts as 0 or 1 (for 2/3 byte VEX). > > But 0x67 prefix is still better. For tune=silvermont or knl, ideally we'd count prefixes and use an extra instruction when it avoids a decode bottleneck. For tune=generic we should probably always use 0x67 when it saves an instruction. IDK about tune=bdver2. Probably not worth worrying about too much. > Since the upper 32 bits of stack register are always zero for x32, we > can encode %esp as %rsp to avoid 0x67 prefix in address if there is no > index or base register. Note that %rsp can't be an index register, so you only have to check if it's the base register. The SIB encodings that would mean index=RSP actually mean "no index". The ModRM encoding that would mean base=RSP instead means "there's a SIB byte". https://stackoverflow.com/a/46263495/224132 This means that `(%rsp)` is encodeable, instead of (%rsp, %rsp, scale). Any other register can be used as a base with no SIB byte (unfortunately for code-size with -fomit-frame-pointer). Can this check be applied to %rbp in functions that use a frame pointer? That might be possible even if we can't as easily decide whether other registers need to be zero or sign extended if we're not sure whether they're "the pointer" or a signed integer pointer-difference. However, simple dereference addressing modes (one register, no displacement) can always use 64-bit address size when the register is known to be zero-extended.