[Bug target/69576] New: tailcall could use a conditional branch on x86, but doesn't

peter at cordes dot ca Sun, 31 Jan 2016 00:15:55 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69576


            Bug ID: 69576
           Summary: tailcall could use a conditional branch on x86, but
                    doesn't
           Product: gcc
           Version: 5.3.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: peter at cordes dot ca
  Target Milestone: ---
            Target: i386-*, x86_64-*

In x86, both jmp and jcc can use either a rel8 or rel32 displacement.  Unless
I'm misunderstanding something, the rel32 displacement in a jcc can be
relocated at link time identically to the way the rel32 in a jmp can be.


void ext(void);
void foo(int x) {
  if (x > 10) ext();
}

compiles to (gcc 5.3 -O3 -mtune=haswell)

        cmpl    $10, %edi
        jg      .L4
        ret
.L4:
        jmp     ext

Is this a missed optimization, or is there some reason gcc must avoid
conditional branches for tail-calls that makes this not a bug?  This sequence
is clearly better, if it's safe:

        cmpl    $10, %edi
        jg      ext
        ret


If targeting a CPU which statically predicts unknown forward branches as
not-taken, and you can statically predict the tail-call as strongly taken, then
it could make sense to use clang 3.7.1's sequence:

        cmpl    $11, %edi
        jl      .LBB0_1
        jmp     ext                     # TAILCALL
.LBB0_1:
        retq

According to Agner Fog's microarch guide, AMD CPUs use this static prediction
strategy, but Pentium M / Core2 assign a BTB entry and use whatever prediction
was in that entry already.  He doesn't specifically mention static prediction
for later Intel CPUs, but they're probably similar.   (So using clang's
sequence only helps on (some?) AMD CPUs, even if the call to ext() always
happens.)

AFAICT, gcc's sequence has no advantages in any case.

Note that the code for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69569
demonstrates this bug as well, but is a separate issue.  It's pure coincidence
that I noticed this the day after that bug was filed.

[Bug target/69576] New: tailcall could use a conditional branch on x86, but doesn't

Reply via email to