https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85605

            Bug ID: 85605
           Summary: Potentially missing optimization under x64 and ARM:
                    seemingly unnecessary branch in codegen
           Product: gcc
           Version: 7.3.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: sergey.ignatchenko at ithare dot com
  Target Milestone: ---

Code:

==========

#include <stdint.h>
#include <type_traits>

template<class T,class T2>
inline bool cmp(T a, T2 b) {
  return a<0 ? true : T2(a) < b;
}

template<class T,class T2>
inline bool cmp2(T a, T2 b) {
  return (a<0) | (T2(a) < b);
}

bool f(int a, int b) {
    return cmp(int64_t(a), unsigned(b));
}

bool f2(int a, int b) {
    return cmp2(int64_t(a), unsigned(b));
}

====

Functions cmp and cmp2 seem to be equivalent (at least under "as if" rule, as
side effects of reading and casting are non-observable). However, under
GCC/x64, cmp() generates code with branch, while seemingly-equivalent cmp2() -
manages to do without branching:

===============

f(int, int):
  testl %edi, %edi
  movl $1, %eax
  js .L1
  cmpl %edi, %esi
  seta %al
.L1:
  rep ret

f2(int, int):
  movl %edi, %edx
  shrl $31, %edx
  cmpl %edi, %esi
  seta %al
  orl %edx, %eax
  ret

===============

And f2() is expected to be significantly faster than f1() in most usage
scenarios (*NB: if you feel it is necessary to create a case to illustrate
detriment of branching - please LMK, but hopefully it is quite obvious*). 

Per Godbolt, similar behavior is observed under both GCC/x64, and GCC/ARM;
however, Clang manages to do without branching both for f1() and f2(). 

*Godbolt link*: https://godbolt.org/g/ktovvP

Reply via email to