https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80520
Bug ID: 80520
Summary: Performance regression from missing if-conversion
Product: gcc
Version: 8.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: krister.walfridsson at gmail dot com
Target Milestone: ---
Target: x86_64-linux-gnu
Created attachment 41266
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41266&action=edit
Test case demonstrating the problem
The following test case from a CppCon 2016 talk benchmarking different
randomization constructs
#include <random>
void foo(std::mt19937 &gen)
{
for (int i = 0; i < 1000000000; ++i)
{
std::uniform_int_distribution<int> dist(0,99);
volatile auto x = dist(gen);
}
}
runs much slower when compiled with gcc 8.0 (r247084) compared to gcc 6.3
gcc 6.3.0: 3.9s
gcc 8.0.0: 7.7s
(compiled as "g++ -O3" on x86_64-linux-gnu).
The benchmark is silly, but it indicates that the heuristics for the branch
optimizations could be improved....
The difference is that the .optimized dump generated by gcc 6.3.0 contains code
segments of the form
_32 = __y_27 & 1;
iftmp.1_33 = _32 != 0 ? 2567483615 : 0;
_34 = _31 ^ iftmp.1_33;
MEM[base: _97, offset: 0B] = _34;
ivtmp.35_100 = ivtmp.35_101 + 8;
if (_94 == ivtmp.35_100)
goto <bb 6>;
else
goto <bb 5>;
where iftmp.1_33 is generated as a cmov, while the same code compiled by gcc
8.0.0 looks like
_102 = __y_60 & 1;
if (_102 != 0)
goto <bb 7>; [50.00%]
else
goto <bb 8>; [50.00%]
<bb 7> [49.50%]:
_98 = _103 ^ 2567483615;
MEM[base: _105, offset: 0B] = _98;
ivtmp.33_91 = ivtmp.33_43 + 8;
if (_47 == ivtmp.33_91)
goto <bb 9>; [1.01%]
else
goto <bb 5>; [98.99%]
<bb 8> [49.50%]:
MEM[base: _105, offset: 0B] = _103;
ivtmp.33_44 = ivtmp.33_43 + 8;
if (ivtmp.33_44 == _47)
goto <bb 9>; [1.01%]
else
goto <bb 5>; [98.99%]
and the CPU mispredicts the branch generated from "if (_102 != 0)".