On Thu, 2016-12-08 at 00:35 +0000, Duyck, Alexander H wrote:

> Well there ends up being a few aspects to it.  First we don't need the
> precision of a full 64b inverse multiplication, that is why we can get
> away with multiple by 85 and shift.  The assumption is we should never
> see a buffer larger than 64K for a TSO frame.  That being the case we
> can do the same thing without having to use a 64b value which isn't an
> option on 32b architectures.
> 
> So basically what it comes down to is dealing with the "optimized for
> size" kernel option, and 32b architectures not being able to do this.
> Arguably both are corner cases but better to deal with them than take
> a performance hit we don't have to.

ok ok ;)

Too bad the 65536 value is accepted, (is it ?) otherwise

unsigned int foo(unsigned short size)
{
        return size / 0x3000;
}

-> generates the same kind of instructions, with maybe a better
precision.

foo:
        movzwl  %di, %eax
        imull   $43691, %eax, %eax
        shrl    $29, %eax
        ret





Reply via email to