https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56309

--- Comment #33 from Yuri Rumyantsev <ysrumyan at gmail dot com> ---
With current compiler there is not performance difference for by-ref and by-val
test-cases, but if we turn off if-convert transformation we will get ~2X
speed-up:
on Intel(R) Xeon(R) CPU           X5670  @ 2.93GHz

 ./t1.exe
Took 11.55 seconds total.
 ./t1.noifcvt.exe                            
Took 6.51 seconds total.

The test will be attached.
This is caused by skew conditional branch probabilities for the loop:

    for (auto rhs_it = rbegin; rhs_it != rend; ++rhs_it) {
            tmp = x*(*rhs_it) + data[i] + carry;
            if (tmp >= imax) {
                    carry = tmp >> numbits;
                    tmp &= imax - 1;
            } else {
                    carry = 0;
            }
            data[i++] = tmp;
    }

Only 2.5% conditional branches are not taken since imax represents MAX_INT32
and profile estimation phase needs to be fixed to set-up unlikely probability
for integral comparison with huge constants.
To coupe with this issue we may implement Jakub approach to design Oracle for
if-conversion profitability which simply computes region (loop) costs for
if-converted and not-if-converted regions ( cost of all acyclic paths).
Using such approach we can see that for fixed profile hammock predication is
not profitable and if vectorization will not be successful loop must be
restored to orginal one.

Reply via email to