On 15/05/2019 10:47, Will Deacon wrote:
On Mon, Apr 15, 2019 at 07:18:22PM +0100, Robin Murphy wrote:
On 12/04/2019 10:52, Will Deacon wrote:
I'm waiting for Robin to come back with numbers for a C implementation.
Robin -- did you get anywhere with that?
Still not what I would call finished, but where I've got so far (besides an
increasingly elaborate test rig) is as below - it still wants some unrolling
in the middle to really fly (and actual testing on BE), but the worst-case
performance already equals or just beats this asm version on Cortex-A53 with
GCC 7 (by virtue of being alignment-insensitive and branchless except for
the loop). Unfortunately, the advantage of C code being instrumentable does
also come around to bite me...
Is there any interest from anybody in spinning a proper patch out of this?
Shaokun?
FWIW I've learned how to fix the KASAN thing now, so I'll try giving
this some more love while I've got other outstanding optimisation stuff
to look at anyway.
Robin.