https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91400
Bug ID: 91400
Summary: __builtin_cpu_supports conjunction is optimized poorly
Product: gcc
Version: unknown
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: vanyacpp at gmail dot com
Target Milestone: ---
Clang 8 optimizes both f() and g() to the same code:
bool f()
{
return __builtin_cpu_supports("popcnt") && __builtin_cpu_supports("ssse3");
}
bool g()
{
extern unsigned int cpu_model;
return (cpu_model & 64) && (cpu_model & 4);
}
f()/g():
mov eax, dword ptr [rip + cpu_model]
and eax, 68
cmp eax, 68
sete al
ret
GCC generates this code only for g(). For f() GCC generates less optimal:
f():
mov edx, DWORD PTR __cpu_model[rip+12]
mov eax, edx
shr eax, 6
and eax, 1
and edx, 4
mov edx, 0
cmove eax, edx
ret
I believe it would be great if GCC is able to generate the same code for f()
too.