On Sun, Feb 27, 2022, at 22:00, Colin Percival wrote: > Ooh, nasty. Which version of the scrypt code are you using? In > particular, the latest version has a self-test which *should* > catch issues like this.
The self-test does catch the issue. I just have not been running it in my application (nor in its inexistent test suite). > We should probably make it operate on uint32_t rather than bytes, > since the data in question is used as arrays of uint32_t elsewhere? That might work indeed. But the loop on bytes is not as bad as it looks with a recent compiler. It will end up compiled unrolled with SIMD instructions: 0x0000000000062e10 <+112>: movdqu (%r12),%xmm2 0x0000000000062e16 <+118>: movdqu (%rbx),%xmm0 0x0000000000062e1a <+122>: pxor %xmm2,%xmm0 0x0000000000062e1e <+126>: movups %xmm0,(%rbx) 0x0000000000062e21 <+129>: movdqu 0x10(%rbx),%xmm0 0x0000000000062e26 <+134>: movdqu 0x10(%r12),%xmm3 0x0000000000062e2d <+141>: pxor %xmm3,%xmm0 0x0000000000062e31 <+145>: movups %xmm0,0x10(%rbx) 0x0000000000062e35 <+149>: movdqu 0x20(%rbx),%xmm0 0x0000000000062e3a <+154>: movdqu 0x20(%r12),%xmm4 0x0000000000062e41 <+161>: pxor %xmm4,%xmm0 0x0000000000062e45 <+165>: movups %xmm0,0x20(%rbx) 0x0000000000062e49 <+169>: movdqu 0x30(%rbx),%xmm0 0x0000000000062e4e <+174>: movdqu 0x30(%r12),%xmm5 0x0000000000062e55 <+181>: pxor %xmm5,%xmm0 0x0000000000062e59 <+185>: movups %xmm0,0x30(%rbx) One potentially simple improvement would be to have the buffers 16-byte aligned (and make sure the compiler understand that it is the case). Then you would have movdqa load/stores.
