Re: Bug fix for $((x**y)) algorithm on 64+ bits machines.

Nicolas ARGYROU Mon, 19 Sep 2011 13:48:10 -0700


Hello,


I noticed other shells have the same bug, and perhaps you would like to use
this code in other GNU projects (like making a library call or an executable).


The best, I think, is to transfer the copyright to bash maintainers. You can 
now copyright
and license it the way you want:

template<typename T>
inline T ipow(register /* unsigned */ T x, register /* unsigned */ T y)
{
    if (x == 0 && y != 0) return 0;
    // 1: ipow(x,y) = x ** y = Product [i=0; i<=log2(y)] (x ** 
(((y>>i)&1)*2**i))
    // 2: x**(2**i) = x**(2**(i-1)) * x**(2**(i-1))
    register T xy = 1;
    for(; y; y>>=1, x *= x)
        if (y & 1)
            xy *= x;
    return xy;
}


If you don't wish to use this algoritm, then there is another way of getting 
around the bug:
for exemple, on 64 bit systems, when performing x ** y, if y > 64 then the 
result
will be too large anyway to stand in 64 bits. It is true when you suppose x = 2,
and (even more) when x > 2. In this case you may then throw an overflow error.
In other cases, y < 64 is a manageable number for the loop algorithm.


But, I still prefer the O(log(y)) algorithm, which always computes the exact 
value (modulus
2**64) in less numbers of steps on average.


This is, in detail, how the algorithm works: 
When, for example, computing 3 ** 3 = 3 * 3 * 3 = 27,
you can write it also with a binary power: 3 ** (1*2**1 + 1*2**0),
and because, x ** (a+b) = x**a * x**b,
you may develop it: 3 ** (2**1) * 3 ** (2**0)
This gives the first equation in the comments of the code:
    // 1: ipow(x,y) = x ** y = Product [i=0; i<=log2(y)] (x ** 
(((y>>i)&1)*2**i))

The second thing to notice is that we need to compute: 3**2**i,
but as 2**i = 2*2**(i-1) = 2**(i-1) + 2**(i-1),
then 3**2**i = 3**(2**(i-1) + 2**(i-1)), 
since, x ** (a+b) = x**a * x**b,
then 3**2**i = 3**(2**(i-1) + 2**(i-1)) = 3**(2**(i-1)) * 3**(2**(i-1)).
This gives the second equation in the comments of the code:

    // 2: x**(2**i) = x**(2**(i-1)) * x**(2**(i-1))
and a reccurence to compute x**2**(i+1) when we know x**2**i, from x**2**0 = x.

We just then have to roll though the bits of y, computing x = x * x each step, 
and
multiplying xy (the temporary result) by one of the squares of x when the 
corresponding bit
of y is set.
As we always do multiplications modulus 2**64, it gives the result modulus 
2**64.
We only need to use 3 registers: one to roll y, one to hold the squares of x, 
and one to hold
the result.

Writing the same algoritm with assembler language can save a few instructions 
per loop,
because gcc doesn't catch that it can use the out-going bit of y to directly 
jump over xy *= x;
if not set. But it won't be portable then.


Best regards,
  Nicolas Argyrou

Re: Bug fix for $((x**y)) algorithm on 64+ bits machines.

Reply via email to