https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80520
Bug ID: 80520 Summary: Performance regression from missing if-conversion Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: krister.walfridsson at gmail dot com Target Milestone: --- Target: x86_64-linux-gnu Created attachment 41266 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41266&action=edit Test case demonstrating the problem The following test case from a CppCon 2016 talk benchmarking different randomization constructs #include <random> void foo(std::mt19937 &gen) { for (int i = 0; i < 1000000000; ++i) { std::uniform_int_distribution<int> dist(0,99); volatile auto x = dist(gen); } } runs much slower when compiled with gcc 8.0 (r247084) compared to gcc 6.3 gcc 6.3.0: 3.9s gcc 8.0.0: 7.7s (compiled as "g++ -O3" on x86_64-linux-gnu). The benchmark is silly, but it indicates that the heuristics for the branch optimizations could be improved.... The difference is that the .optimized dump generated by gcc 6.3.0 contains code segments of the form _32 = __y_27 & 1; iftmp.1_33 = _32 != 0 ? 2567483615 : 0; _34 = _31 ^ iftmp.1_33; MEM[base: _97, offset: 0B] = _34; ivtmp.35_100 = ivtmp.35_101 + 8; if (_94 == ivtmp.35_100) goto <bb 6>; else goto <bb 5>; where iftmp.1_33 is generated as a cmov, while the same code compiled by gcc 8.0.0 looks like _102 = __y_60 & 1; if (_102 != 0) goto <bb 7>; [50.00%] else goto <bb 8>; [50.00%] <bb 7> [49.50%]: _98 = _103 ^ 2567483615; MEM[base: _105, offset: 0B] = _98; ivtmp.33_91 = ivtmp.33_43 + 8; if (_47 == ivtmp.33_91) goto <bb 9>; [1.01%] else goto <bb 5>; [98.99%] <bb 8> [49.50%]: MEM[base: _105, offset: 0B] = _103; ivtmp.33_44 = ivtmp.33_43 + 8; if (ivtmp.33_44 == _47) goto <bb 9>; [1.01%] else goto <bb 5>; [98.99%] and the CPU mispredicts the branch generated from "if (_102 != 0)".