I notice that the store crosses a cacheline boundary on an ARMv5 CPU
with 32-byte cache lines.

I see that the xorin8 function on line 104 of
https://fossies.org/linux/tor/src/ext/keccak-tiny/keccak-tiny-unrolled.c

assumes that the 'dst' pointer has 8 byte alignment, but the
gdb output only shows 4 byte alignment, which matches
the data structure definition for keccak_state in

https://fossies.org/linux/tor/src/ext/keccak-tiny/keccak-tiny.h

I would suggest adding

__attribute__((aligned(8)))

to the structure definition to force 8-byte alignment, which would
make the code more portable and avoid undefined behavior
(casting a pointer to a type of higher alignment).

I don't think this is actually supposed to be undefined behavior
for an ARMv5 CPU, as long as the destination for the 'strd' instruction
has at least four byte alignment, but since gcc never creates this
instruction sequence on valid code, a hardware erratum may have
gone unnoticed for a long time.

        Arnd

Reply via email to