https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56309
--- Comment #33 from Yuri Rumyantsev <ysrumyan at gmail dot com> --- With current compiler there is not performance difference for by-ref and by-val test-cases, but if we turn off if-convert transformation we will get ~2X speed-up: on Intel(R) Xeon(R) CPU X5670 @ 2.93GHz ./t1.exe Took 11.55 seconds total. ./t1.noifcvt.exe Took 6.51 seconds total. The test will be attached. This is caused by skew conditional branch probabilities for the loop: for (auto rhs_it = rbegin; rhs_it != rend; ++rhs_it) { tmp = x*(*rhs_it) + data[i] + carry; if (tmp >= imax) { carry = tmp >> numbits; tmp &= imax - 1; } else { carry = 0; } data[i++] = tmp; } Only 2.5% conditional branches are not taken since imax represents MAX_INT32 and profile estimation phase needs to be fixed to set-up unlikely probability for integral comparison with huge constants. To coupe with this issue we may implement Jakub approach to design Oracle for if-conversion profitability which simply computes region (loop) costs for if-converted and not-if-converted regions ( cost of all acyclic paths). Using such approach we can see that for fixed profile hammock predication is not profitable and if vectorization will not be successful loop must be restored to orginal one.