On 5/18/17 9:38 AM, Edward Cree wrote:
On 18/05/17 15:49, Edward Cree wrote:
Here's one idea that seemed to work when I did a couple of experiments:
let A = (a;am), B = (b;bm) where the m are the masks
Σ = am + bm + a + b
χ = Σ ^ (a + b) /* unknown carries */
μ = χ | am | bm /* mask of result */
then A + B = ((a + b) & ~μ; μ)

The idea is that we find which bits change between the case "all x are
 1" and "all x are 0", and those become xs too.

> https://gist.github.com/ecree-solarflare/0665d5b46c2d8d08de2377fbd527de8d

I played with it quite a bit trying to break it and have to
agree that the above algorithm works.
At least for add and sub I think it's solid.
Still feels a bit magical, since it gave me better results
than I could envision for my test vectors.

In your .py I'd only change __str__(self) to print them in mask,value
as the order they're passed into constructor to make it easier to read.
The bin(self) output is the most useful, of course.
We should carry it into the kernel too for debugging.

And now I've found a similar algorithm for subtraction, which (again) I
 can't prove but it seems to work.
α = a + am - b
β = a - b - bm
χ = α ^ β
μ = χ | α | β
then A - B = ((a - b) & ~μ; μ)
Again we're effectively finding the max. and min. values, and XORing
 them to find unknown carries.

Bitwise operations are easy, of course;
/* By assumption, a & am == b & bm == 0 */
A & B = (a & b; (a | am) & (b | bm) & ~(a & b))
A | B = (a | b; (am | bm) & ~(a | b))
/* It bothers me that & and | aren't symmetric, but I can't fix it */
A ^ B = (a ^ b; am | bm)

as are shifts by a constant (just shift 0s into both number and mask).

Multiplication by a constant can be done by decomposing into shifts
 and adds; but it can also be done directly; here we find (a;am) * k.
π = a * k
γ = am * k
then A * k = (π; 0) + (0; γ), for which we use our addition algo.

Multiplication of two unknown values is a nightmare, as unknown bits
 can propagate all over the place.  We can do a shift-add
 decomposition where the adds for unknown bits have all the 1s in
 the addend replaced with xs.  A few experiments suggest that this
 works, regardless of the order of operands.  For instance
 110x * x01 comes out as either
    110x
+ xx0x
= xxxx0x
or
     x0x
   x01
+ x01
= xxxx0x
We can slightly optimise this by handling all the 1 bits in one go;
 that is, for (a;am) * (b;bm) we first find (a;am) * b using our
 multiplication-by-a-constant algo above, then for each bit in bm
 we find (a;am) * bit and force all its nonzero bits to unknown;
 finally we add all our components.

this mul algo I don't completely understand. It feels correct,
but I'm not sure we really need it for the kernel.
For all practical cases llvm will likely emit shifts or sequence
of adds and shifts, so multiplies by crazy non-optimizable constant
or variable are rare and likely the end result is going to be
outside of packet boundary, so it will be rejected anyway and
precise alignment tracking doesn't matter much.
What I love about the whole thing that it works for access into
packet, access into map values and in the future for any other
variable length access.

Don't even ask about division; that scrambles bits so hard that the

yeah screw div and mod. We have an option to disable div/mod altogether
under some new 'prog_flags', since it has this ugly 'div by 0'
exception path. We don't even have 'signed division' instruction and
llvm errors like:
    errs() << "Unsupport signed division for DAG: ";
    errs() << "Please convert to unsigned div/mod.\n";
and no one complained. It just means that division is extremely rare.

Are you planning to work on the kernel patch for this algo?
Once we have it the verifier will be smarter regarding
alignment tracking than any compiler i know :)

Reply via email to