https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69576
Bug ID: 69576 Summary: tailcall could use a conditional branch on x86, but doesn't Product: gcc Version: 5.3.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: peter at cordes dot ca Target Milestone: --- Target: i386-*, x86_64-* In x86, both jmp and jcc can use either a rel8 or rel32 displacement. Unless I'm misunderstanding something, the rel32 displacement in a jcc can be relocated at link time identically to the way the rel32 in a jmp can be. void ext(void); void foo(int x) { if (x > 10) ext(); } compiles to (gcc 5.3 -O3 -mtune=haswell) cmpl $10, %edi jg .L4 ret .L4: jmp ext Is this a missed optimization, or is there some reason gcc must avoid conditional branches for tail-calls that makes this not a bug? This sequence is clearly better, if it's safe: cmpl $10, %edi jg ext ret If targeting a CPU which statically predicts unknown forward branches as not-taken, and you can statically predict the tail-call as strongly taken, then it could make sense to use clang 3.7.1's sequence: cmpl $11, %edi jl .LBB0_1 jmp ext # TAILCALL .LBB0_1: retq According to Agner Fog's microarch guide, AMD CPUs use this static prediction strategy, but Pentium M / Core2 assign a BTB entry and use whatever prediction was in that entry already. He doesn't specifically mention static prediction for later Intel CPUs, but they're probably similar. (So using clang's sequence only helps on (some?) AMD CPUs, even if the call to ext() always happens.) AFAICT, gcc's sequence has no advantages in any case. Note that the code for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69569 demonstrates this bug as well, but is a separate issue. It's pure coincidence that I noticed this the day after that bug was filed.