[Bug c/28417] suboptimal 'division by constant' optimization

vda dot linux at googlemail dot com Tue, 18 Jul 2006 01:40:32 -0700


------- Comment #1 from vda dot linux at googlemail dot com  2006-07-18 08:40 
-------
I didn't look too close at choose_multiplier(), yet.
If anybody will work upon it in order to make it better,
this is the routine which prints values 'K' and 'bits'
so that
    (A*K)>>bits == A/B
is true for given fixed B and all A's in range [0...max_A].
You may want to add similar code to choose_multiplier().


Mathematical explanation is in the comment.

/*
[below: 'div' is unsigned integer division]
['/' is real division with infinite precision]
[A,B,C... - integers, a,b,c... - reals]

Theory: we want to calculate A div B, fast.
B is const > 2 and is not a power of 2.

In fp: A/B = A*(L/B)/L. (If L is a large power of 2,
say 2^32, then it could be done really fast).
Let k := L/B, K := L div B + 1.
Then A/B = A*k/L.

Then this is true:

A div B <= A * K div L.

For small enough A: A div B = A * K div L.
Lets find first A for which it is not true.

Lets compare k/L and K/L. K/L is larger by a small value d:

d := K/L - k/L = (L div B + 1) / L - L/B/L =
= (L div B * B + B) / (L*B) - L/(L*B) =
= (L div B * B + B - L) / (L*B)

A*K/L is larger than A*k/L by A*d.

A*k/L is closest to 'overflowing into next integer'
when A = N*B-1. The 'deficit' with such A is exactly 1/B.
If A*d >= 1/B, then A*K/L will 'overflow'.

Thus bad_A >= 1/B / d = (1/B) * (L*B)/(L div B * B + B - L) =
= L/(L div B * B + B - L). Then you need to 'walk up' to next
A representable as N*B-1: bad_A2 = (bad_A div B) * B + B-1
This is the first A which will have wrong result
(i.e. for which (A*K div L) = (A div B)+1, not (A div B).

Practical use.

Suppose that A and B are 32-bit unsigned integers.

Unfortunately, there exist only two B values in 3..2^32-1 range
for which _any_ 32-bit unsigned A can be fast divided using L=2^32
(because bad_A=2^32 and any A is less than that):

B=641 K=6700417 d=1/2753074036736 bad_A=4294967296 A=unlimited
B=6700417 K=641 d=1/28778071884562432 bad_A=4294967296 A=unlimited

We need to use larger powers of 2 for L if we need to handle
many more B's.
*/

void fastdiv_params(unsigned B, unsigned max_A)
{
        unsigned K, d_LB, bits, max_bits;
        uint64_t L, KL, mask, bad_A, max_bad_A;

        if (!B || ((B-1)&B) == 0) { // B is a power of 2
//if(0)
                printf("B=%u: power of 2, division by shift\n", B);
                return;
        }
        L = (max_ull/max_unsigned - 1) * B;
        bits = 63;
        mask = 1ULL << 63;
        while( !(L & mask))
                bits--, mask >>= 1;
        L = mask;

        while ( (KL = L/B + 1) > max_unsigned)
                bits--, L >>= 1;
        K = KL;
        // There is not much point in multiplying by even number
        // and then shifting right. Reduce K & L to avoid it:
        while (!(K & 1) && bits > 32)
                bits--, L >>= 1, K = L/B + 1;

        d_LB = ((L/B) * B + B - L);
        bad_A = L / d_LB;
        bad_A = (bad_A / B) * B + B-1;
        if (bad_A <= max_A) {
                printf("B=%u,A<=%u: no suitable fastdiv params found\n", B,
max_A);
                return;
        }
        max_bits = bits;
        max_bad_A = bad_A;
        while(1) {
                uint64_t last_L = L;
                uint64_t last_bad_A = bad_A;
                unsigned last_K = K;
                bits--;
                L >>= 1;
                K = L/B + 1;
                d_LB = ((L/B) * B + B - L);
                bad_A = L / d_LB;
                bad_A = (bad_A / B) * B + B-1;
                if (bad_A <= max_A || bits < 32) {
                        // new params are bad, report last ones and bail out
//if(0)
                        printf("B=%u,A<=%u: L=%llx(bits=%u) K=%u  (bad_A=%llu,
bad_A=%llu at bits=%u)\n",
                                B, max_A, last_L, bits+1, last_K, last_bad_A, 
max_bad_A, max_bits);
                        return;
                }
        }
}


-- 

vda dot linux at googlemail dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |vda dot linux at googlemail
                   |                            |dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28417

[Bug c/28417] suboptimal 'division by constant' optimization

Reply via email to