https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77877
Bug ID: 77877 Summary: missed optimization in switch of modulus value Product: gcc Version: 7.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: drepper.fsp+rhbz at gmail dot com Target Milestone: --- Compile this code with a recent trunk gcc: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ int zero; int one; int two; extern unsigned compute_mod(unsigned); void cnt(unsigned n) { #ifdef HIDE n = compute_mod(n); #else n %= 3; #endif switch (n) { case 0: ++zero; break; case 1: ++one; break; case 2: ++two; break; #ifdef OPT default: __builtin_unreachable(); #endif } } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ On x86-64, without HIDE defined, the code compiles with -O3 to: 0: 89 f8 mov %edi,%eax 2: ba ab aa aa aa mov $0xaaaaaaab,%edx 7: f7 e2 mul %edx 9: d1 ea shr %edx b: 8d 04 52 lea (%rdx,%rdx,2),%eax e: 29 c7 sub %eax,%edi 10: 83 ff 01 cmp $0x1,%edi 13: 74 1b je 30 <_Z3cntj+0x30> 15: 83 ff 02 cmp $0x2,%edi 18: 75 0e jne 28 <_Z3cntj+0x28> 1a: 83 05 00 00 00 00 01 addl $0x1,0x0(%rip) # 21 <_Z3cntj+0x21> 21: c3 retq 22: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) 28: 83 05 00 00 00 00 01 addl $0x1,0x0(%rip) # 2f <_Z3cntj+0x2f> 2f: c3 retq 30: 83 05 00 00 00 00 01 addl $0x1,0x0(%rip) # 37 <_Z3cntj+0x37> 37: c3 retq This is good, the compiler knows there are only three possible values and does not emit any code for a default case. If I make sure the compiler doesn't know anything about the arithmetic operation by calling a function but telling the compiler there is no other case by using __builtin_unreachable() then the generated code is even better: 0: 48 83 ec 08 sub $0x8,%rsp 4: e8 00 00 00 00 callq 9 <_Z3cntj+0x9> 9: 83 f8 01 cmp $0x1,%eax c: 74 22 je 30 <_Z3cntj+0x30> e: 72 10 jb 20 <_Z3cntj+0x20> 10: 83 05 00 00 00 00 01 addl $0x1,0x0(%rip) # 17 <_Z3cntj+0x17> 17: 48 83 c4 08 add $0x8,%rsp 1b: c3 retq 1c: 0f 1f 40 00 nopl 0x0(%rax) 20: 83 05 00 00 00 00 01 addl $0x1,0x0(%rip) # 27 <_Z3cntj+0x27> 27: 48 83 c4 08 add $0x8,%rsp 2b: c3 retq 2c: 0f 1f 40 00 nopl 0x0(%rax) 30: 83 05 00 00 00 00 01 addl $0x1,0x0(%rip) # 37 <_Z3cntj+0x37> 37: 48 83 c4 08 add $0x8,%rsp 3b: c3 retq There is only one compare instruction. This is how even the first case should look like. Even more interesting: just defining the OPT macro does not change anything. So, there is currently no way to get the optimal behaviour. We certainly don't want to artificially the function calls.