https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77877

            Bug ID: 77877
           Summary: missed optimization in switch of modulus value
           Product: gcc
           Version: 7.0
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: drepper.fsp+rhbz at gmail dot com
  Target Milestone: ---

Compile this code with a recent trunk gcc:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
int zero;
int one;
int two;

extern unsigned compute_mod(unsigned);

void cnt(unsigned n) {
#ifdef HIDE
  n = compute_mod(n);
#else
  n %= 3;
#endif
  switch (n) {
  case 0: ++zero; break;
  case 1: ++one; break;
  case 2: ++two; break;
#ifdef OPT
  default: __builtin_unreachable();
#endif
  }
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

On x86-64, without HIDE defined, the code compiles with -O3 to:

   0:   89 f8                   mov    %edi,%eax
   2:   ba ab aa aa aa          mov    $0xaaaaaaab,%edx
   7:   f7 e2                   mul    %edx
   9:   d1 ea                   shr    %edx
   b:   8d 04 52                lea    (%rdx,%rdx,2),%eax
   e:   29 c7                   sub    %eax,%edi
  10:   83 ff 01                cmp    $0x1,%edi
  13:   74 1b                   je     30 <_Z3cntj+0x30>
  15:   83 ff 02                cmp    $0x2,%edi
  18:   75 0e                   jne    28 <_Z3cntj+0x28>
  1a:   83 05 00 00 00 00 01    addl   $0x1,0x0(%rip)        # 21
<_Z3cntj+0x21>
  21:   c3                      retq   
  22:   66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)
  28:   83 05 00 00 00 00 01    addl   $0x1,0x0(%rip)        # 2f
<_Z3cntj+0x2f>
  2f:   c3                      retq   
  30:   83 05 00 00 00 00 01    addl   $0x1,0x0(%rip)        # 37
<_Z3cntj+0x37>
  37:   c3                      retq   


This is good, the compiler knows there are only three possible values and does
not emit any code for a default case.

If I make sure the compiler doesn't know anything about the arithmetic
operation by calling a function but telling the compiler there is no other case
by using __builtin_unreachable() then the generated code is even better:

   0:   48 83 ec 08             sub    $0x8,%rsp
   4:   e8 00 00 00 00          callq  9 <_Z3cntj+0x9>
   9:   83 f8 01                cmp    $0x1,%eax
   c:   74 22                   je     30 <_Z3cntj+0x30>
   e:   72 10                   jb     20 <_Z3cntj+0x20>
  10:   83 05 00 00 00 00 01    addl   $0x1,0x0(%rip)        # 17
<_Z3cntj+0x17>
  17:   48 83 c4 08             add    $0x8,%rsp
  1b:   c3                      retq   
  1c:   0f 1f 40 00             nopl   0x0(%rax)
  20:   83 05 00 00 00 00 01    addl   $0x1,0x0(%rip)        # 27
<_Z3cntj+0x27>
  27:   48 83 c4 08             add    $0x8,%rsp
  2b:   c3                      retq   
  2c:   0f 1f 40 00             nopl   0x0(%rax)
  30:   83 05 00 00 00 00 01    addl   $0x1,0x0(%rip)        # 37
<_Z3cntj+0x37>
  37:   48 83 c4 08             add    $0x8,%rsp
  3b:   c3                      retq   

There is only one compare instruction.  This is how even the first case should
look like.

Even more interesting: just defining the OPT macro does not change anything. 
So, there is currently no way to get the optimal behaviour.  We certainly don't
want to artificially the function calls.

Reply via email to