John-Mark Gurney: > So, as I was working on FreeBSD's implementation of gmac.c, I noticed > that I was able to get a significant speed up by using a mask instead > of an if branch in ghash_gfmul in gmac.c from OpenBSD... > > Add a mask var and replace the code between the comments > "update Z" and "update V" w/: > mask = !!(x[i >> 3] & (1 << (~i & 7))); > mask = ~(mask - 1); > > z[0] ^= v[0] & mask; > z[1] ^= v[1] & mask; > z[2] ^= v[2] & mask; > z[3] ^= v[3] & mask; > > And you should see a nice performance increase...
I tried this on a Soekris net6501-50 and the performance increase was around 1.3%. (I set up an ESP transport association with AES-128-GMAC and pushed UDP traffic with tcpbench over it.) A look at the generated amd64 assembly code shows that the change indeed removes a branch. What's pretty shocking is that this code mul = v[3] & 1; ... v[0] = (v[0] >> 1) ^ (0xe1000000 * mul); is turned into an actual imul instruction by GCC. I used the same masking approach to get rid of the multiplication, but the improvement was minuscule (<1%). > I also have an implementation of ghash that does a 4 bit lookup table > version with the table split between cache lines in p4 at: > https://p4db.freebsd.org/fileViewer.cgi?FSPC=//depot/projects/opencrypto/sys/opencrypto/gfmult.c&REV=4 I'll have to look at this, but haven't there been increasing misgivings about table implementations for GHASH because of timing attacks? -- Christian "naddy" Weisgerber na...@mips.inka.de