https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77877
Bug ID: 77877
Summary: missed optimization in switch of modulus value
Product: gcc
Version: 7.0
Status: UNCONFIRMED
Severity: enhancement
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: drepper.fsp+rhbz at gmail dot com
Target Milestone: ---
Compile this code with a recent trunk gcc:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
int zero;
int one;
int two;
extern unsigned compute_mod(unsigned);
void cnt(unsigned n) {
#ifdef HIDE
n = compute_mod(n);
#else
n %= 3;
#endif
switch (n) {
case 0: ++zero; break;
case 1: ++one; break;
case 2: ++two; break;
#ifdef OPT
default: __builtin_unreachable();
#endif
}
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
On x86-64, without HIDE defined, the code compiles with -O3 to:
0: 89 f8 mov %edi,%eax
2: ba ab aa aa aa mov $0xaaaaaaab,%edx
7: f7 e2 mul %edx
9: d1 ea shr %edx
b: 8d 04 52 lea (%rdx,%rdx,2),%eax
e: 29 c7 sub %eax,%edi
10: 83 ff 01 cmp $0x1,%edi
13: 74 1b je 30 <_Z3cntj+0x30>
15: 83 ff 02 cmp $0x2,%edi
18: 75 0e jne 28 <_Z3cntj+0x28>
1a: 83 05 00 00 00 00 01 addl $0x1,0x0(%rip) # 21
<_Z3cntj+0x21>
21: c3 retq
22: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
28: 83 05 00 00 00 00 01 addl $0x1,0x0(%rip) # 2f
<_Z3cntj+0x2f>
2f: c3 retq
30: 83 05 00 00 00 00 01 addl $0x1,0x0(%rip) # 37
<_Z3cntj+0x37>
37: c3 retq
This is good, the compiler knows there are only three possible values and does
not emit any code for a default case.
If I make sure the compiler doesn't know anything about the arithmetic
operation by calling a function but telling the compiler there is no other case
by using __builtin_unreachable() then the generated code is even better:
0: 48 83 ec 08 sub $0x8,%rsp
4: e8 00 00 00 00 callq 9 <_Z3cntj+0x9>
9: 83 f8 01 cmp $0x1,%eax
c: 74 22 je 30 <_Z3cntj+0x30>
e: 72 10 jb 20 <_Z3cntj+0x20>
10: 83 05 00 00 00 00 01 addl $0x1,0x0(%rip) # 17
<_Z3cntj+0x17>
17: 48 83 c4 08 add $0x8,%rsp
1b: c3 retq
1c: 0f 1f 40 00 nopl 0x0(%rax)
20: 83 05 00 00 00 00 01 addl $0x1,0x0(%rip) # 27
<_Z3cntj+0x27>
27: 48 83 c4 08 add $0x8,%rsp
2b: c3 retq
2c: 0f 1f 40 00 nopl 0x0(%rax)
30: 83 05 00 00 00 00 01 addl $0x1,0x0(%rip) # 37
<_Z3cntj+0x37>
37: 48 83 c4 08 add $0x8,%rsp
3b: c3 retq
There is only one compare instruction. This is how even the first case should
look like.
Even more interesting: just defining the OPT macro does not change anything.
So, there is currently no way to get the optimal behaviour. We certainly don't
want to artificially the function calls.