Christian Weisgerber wrote this message on Tue, Oct 07, 2014 at 23:08 +0200:
> John-Mark Gurney:
> 
> > So, as I was working on FreeBSD's implementation of gmac.c, I noticed
> > that I was able to get a significant speed up by using a mask instead
> > of an if branch in ghash_gfmul in gmac.c from OpenBSD...
> > 
> > Add a mask var and replace the code between the comments
> > "update Z" and "update V" w/:
> >                 mask = !!(x[i >> 3] & (1 << (~i & 7)));
> >                 mask = ~(mask - 1);
> > 
> >                 z[0] ^= v[0] & mask;
> >                 z[1] ^= v[1] & mask;
> >                 z[2] ^= v[2] & mask;
> >                 z[3] ^= v[3] & mask;
> > 
> > And you should see a nice performance increase...
> 
> I tried this on a Soekris net6501-50 and the performance increase
> was around 1.3%.  (I set up an ESP transport association with
> AES-128-GMAC and pushed UDP traffic with tcpbench over it.)

Yeh, I benchmarked the raw algo in userland, not part of IPsec.  I
forget the resulting perf increase, but it was well more than 10-20%.

> A look at the generated amd64 assembly code shows that the change
> indeed removes a branch.  What's pretty shocking is that this code
> 
>     mul = v[3] & 1;
>     ...
>     v[0] = (v[0] >> 1) ^ (0xe1000000 * mul);
> 
> is turned into an actual imul instruction by GCC.  I used the same
> masking approach to get rid of the multiplication, but the improvement
> was minuscule (<1%).

Hmm. interesting...  In my code I switched both to using the and
operator...

I also have code which switches the registers to 64bits so that on
amd64, we make better uses of registers, and on i386, the compilers
are good at breaking down the 64bit registers to 32bit w/o extra
work...

> > I also have an implementation of ghash that does a 4 bit lookup table
> > version with the table split between cache lines in p4 at:
> > https://p4db.freebsd.org/fileViewer.cgi?FSPC=//depot/projects/opencrypto/sys/opencrypto/gfmult.c&REV=4
> 
> I'll have to look at this, but haven't there been increasing
> misgivings about table implementations for GHASH because of timing
> attacks?

Well, the code avoids one issue but introduces another issue...  Each
table lookup accesses all four lines (assuming 64 byte cache lines), so
there isn't a problem there...  The issue intrroduded is that the first
64 bits of a cache line are faster to access than the remaining bits...

For more info, look at the NSS bug:
https://bugzilla.mozilla.org/show_bug.cgi?id=868948

Though considering that the AES implementation that FreeBSD is still
using is the standard table based AES, there are also timing issues
there too...  Yes, no excuse for opening up additional windows...

I have thought about introducing an option of slow but secure and
fast but insecure against local attackers...  Since we are already
on the fast but insecure against local attackers path, I figured I'll
take the extra performance...

As with all things, there is a trade off between speed and security...
As most IPsec gateways don't have a local user trying to target the
key, this is less of an issue...  And even if they did, it'd probably
be easier to use a local root exploit to get the key than try to mount
a timing attack to extract a key that might change in an hour or a day...

-- 
  John-Mark Gurney                              Voice: +1 415 225 5579

     "All that I will do, has been done, All that I have, has not."

Reply via email to