https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85605
Bug ID: 85605 Summary: Potentially missing optimization under x64 and ARM: seemingly unnecessary branch in codegen Product: gcc Version: 7.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: sergey.ignatchenko at ithare dot com Target Milestone: --- Code: ========== #include <stdint.h> #include <type_traits> template<class T,class T2> inline bool cmp(T a, T2 b) { return a<0 ? true : T2(a) < b; } template<class T,class T2> inline bool cmp2(T a, T2 b) { return (a<0) | (T2(a) < b); } bool f(int a, int b) { return cmp(int64_t(a), unsigned(b)); } bool f2(int a, int b) { return cmp2(int64_t(a), unsigned(b)); } ==== Functions cmp and cmp2 seem to be equivalent (at least under "as if" rule, as side effects of reading and casting are non-observable). However, under GCC/x64, cmp() generates code with branch, while seemingly-equivalent cmp2() - manages to do without branching: =============== f(int, int): testl %edi, %edi movl $1, %eax js .L1 cmpl %edi, %esi seta %al .L1: rep ret f2(int, int): movl %edi, %edx shrl $31, %edx cmpl %edi, %esi seta %al orl %edx, %eax ret =============== And f2() is expected to be significantly faster than f1() in most usage scenarios (*NB: if you feel it is necessary to create a case to illustrate detriment of branching - please LMK, but hopefully it is quite obvious*). Per Godbolt, similar behavior is observed under both GCC/x64, and GCC/ARM; however, Clang manages to do without branching both for f1() and f2(). *Godbolt link*: https://godbolt.org/g/ktovvP