John-Mark Gurney:

> So, as I was working on FreeBSD's implementation of gmac.c, I noticed
> that I was able to get a significant speed up by using a mask instead
> of an if branch in ghash_gfmul in gmac.c from OpenBSD...
> 
> Add a mask var and replace the code between the comments
> "update Z" and "update V" w/:
>                 mask = !!(x[i >> 3] & (1 << (~i & 7)));
>                 mask = ~(mask - 1);
> 
>                 z[0] ^= v[0] & mask;
>                 z[1] ^= v[1] & mask;
>                 z[2] ^= v[2] & mask;
>                 z[3] ^= v[3] & mask;
> 
> And you should see a nice performance increase...

I tried this on a Soekris net6501-50 and the performance increase
was around 1.3%.  (I set up an ESP transport association with
AES-128-GMAC and pushed UDP traffic with tcpbench over it.)

A look at the generated amd64 assembly code shows that the change
indeed removes a branch.  What's pretty shocking is that this code

    mul = v[3] & 1;
    ...
    v[0] = (v[0] >> 1) ^ (0xe1000000 * mul);

is turned into an actual imul instruction by GCC.  I used the same
masking approach to get rid of the multiplication, but the improvement
was minuscule (<1%).

> I also have an implementation of ghash that does a 4 bit lookup table
> version with the table split between cache lines in p4 at:
> https://p4db.freebsd.org/fileViewer.cgi?FSPC=//depot/projects/opencrypto/sys/opencrypto/gfmult.c&REV=4

I'll have to look at this, but haven't there been increasing
misgivings about table implementations for GHASH because of timing
attacks?

-- 
Christian "naddy" Weisgerber                          na...@mips.inka.de

Reply via email to