------- Comment #1 from vda dot linux at googlemail dot com 2006-07-18 08:40 ------- I didn't look too close at choose_multiplier(), yet. If anybody will work upon it in order to make it better, this is the routine which prints values 'K' and 'bits' so that (A*K)>>bits == A/B is true for given fixed B and all A's in range [0...max_A]. You may want to add similar code to choose_multiplier().
Mathematical explanation is in the comment. /* [below: 'div' is unsigned integer division] ['/' is real division with infinite precision] [A,B,C... - integers, a,b,c... - reals] Theory: we want to calculate A div B, fast. B is const > 2 and is not a power of 2. In fp: A/B = A*(L/B)/L. (If L is a large power of 2, say 2^32, then it could be done really fast). Let k := L/B, K := L div B + 1. Then A/B = A*k/L. Then this is true: A div B <= A * K div L. For small enough A: A div B = A * K div L. Lets find first A for which it is not true. Lets compare k/L and K/L. K/L is larger by a small value d: d := K/L - k/L = (L div B + 1) / L - L/B/L = = (L div B * B + B) / (L*B) - L/(L*B) = = (L div B * B + B - L) / (L*B) A*K/L is larger than A*k/L by A*d. A*k/L is closest to 'overflowing into next integer' when A = N*B-1. The 'deficit' with such A is exactly 1/B. If A*d >= 1/B, then A*K/L will 'overflow'. Thus bad_A >= 1/B / d = (1/B) * (L*B)/(L div B * B + B - L) = = L/(L div B * B + B - L). Then you need to 'walk up' to next A representable as N*B-1: bad_A2 = (bad_A div B) * B + B-1 This is the first A which will have wrong result (i.e. for which (A*K div L) = (A div B)+1, not (A div B). Practical use. Suppose that A and B are 32-bit unsigned integers. Unfortunately, there exist only two B values in 3..2^32-1 range for which _any_ 32-bit unsigned A can be fast divided using L=2^32 (because bad_A=2^32 and any A is less than that): B=641 K=6700417 d=1/2753074036736 bad_A=4294967296 A=unlimited B=6700417 K=641 d=1/28778071884562432 bad_A=4294967296 A=unlimited We need to use larger powers of 2 for L if we need to handle many more B's. */ void fastdiv_params(unsigned B, unsigned max_A) { unsigned K, d_LB, bits, max_bits; uint64_t L, KL, mask, bad_A, max_bad_A; if (!B || ((B-1)&B) == 0) { // B is a power of 2 //if(0) printf("B=%u: power of 2, division by shift\n", B); return; } L = (max_ull/max_unsigned - 1) * B; bits = 63; mask = 1ULL << 63; while( !(L & mask)) bits--, mask >>= 1; L = mask; while ( (KL = L/B + 1) > max_unsigned) bits--, L >>= 1; K = KL; // There is not much point in multiplying by even number // and then shifting right. Reduce K & L to avoid it: while (!(K & 1) && bits > 32) bits--, L >>= 1, K = L/B + 1; d_LB = ((L/B) * B + B - L); bad_A = L / d_LB; bad_A = (bad_A / B) * B + B-1; if (bad_A <= max_A) { printf("B=%u,A<=%u: no suitable fastdiv params found\n", B, max_A); return; } max_bits = bits; max_bad_A = bad_A; while(1) { uint64_t last_L = L; uint64_t last_bad_A = bad_A; unsigned last_K = K; bits--; L >>= 1; K = L/B + 1; d_LB = ((L/B) * B + B - L); bad_A = L / d_LB; bad_A = (bad_A / B) * B + B-1; if (bad_A <= max_A || bits < 32) { // new params are bad, report last ones and bail out //if(0) printf("B=%u,A<=%u: L=%llx(bits=%u) K=%u (bad_A=%llu, bad_A=%llu at bits=%u)\n", B, max_A, last_L, bits+1, last_K, last_bad_A, max_bad_A, max_bits); return; } } } -- vda dot linux at googlemail dot com changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |vda dot linux at googlemail | |dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28417