https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80520

            Bug ID: 80520
           Summary: Performance regression from missing if-conversion
           Product: gcc
           Version: 8.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: krister.walfridsson at gmail dot com
  Target Milestone: ---
            Target: x86_64-linux-gnu

Created attachment 41266
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41266&action=edit
Test case demonstrating the problem

The following test case from a CppCon 2016 talk benchmarking different
randomization constructs

  #include <random>

  void foo(std::mt19937 &gen)
  {
    for (int i = 0; i < 1000000000; ++i)
    {
      std::uniform_int_distribution<int> dist(0,99);
      volatile auto x = dist(gen);
    }
  }

runs much slower when compiled with gcc 8.0 (r247084) compared to gcc 6.3
  gcc 6.3.0: 3.9s
  gcc 8.0.0: 7.7s
(compiled as "g++ -O3" on x86_64-linux-gnu).

The benchmark is silly, but it indicates that the heuristics for the branch
optimizations could be improved....

The difference is that the .optimized dump generated by gcc 6.3.0 contains code
segments of the form

  _32 = __y_27 & 1;
  iftmp.1_33 = _32 != 0 ? 2567483615 : 0;
  _34 = _31 ^ iftmp.1_33;
  MEM[base: _97, offset: 0B] = _34;
  ivtmp.35_100 = ivtmp.35_101 + 8;
  if (_94 == ivtmp.35_100)
    goto <bb 6>;
  else
    goto <bb 5>;

where iftmp.1_33 is generated as a cmov, while the same code compiled by gcc
8.0.0 looks like

  _102 = __y_60 & 1;
  if (_102 != 0)
    goto <bb 7>; [50.00%]
  else
    goto <bb 8>; [50.00%]

  <bb 7> [49.50%]:
  _98 = _103 ^ 2567483615;
  MEM[base: _105, offset: 0B] = _98;
  ivtmp.33_91 = ivtmp.33_43 + 8;
  if (_47 == ivtmp.33_91)
    goto <bb 9>; [1.01%]
  else
    goto <bb 5>; [98.99%]

  <bb 8> [49.50%]:
  MEM[base: _105, offset: 0B] = _103;
  ivtmp.33_44 = ivtmp.33_43 + 8;
  if (ivtmp.33_44 == _47)
    goto <bb 9>; [1.01%]
  else
    goto <bb 5>; [98.99%]

and the CPU mispredicts the branch generated from "if (_102 != 0)".

Reply via email to