On Fri, Nov 06, 2020 at 05:39:38PM +0100, Ard Biesheuvel wrote:
> Based on lessons learnt from optimizing the 32-bit version of this driver,
> we can simplify the arm64 version considerably, by reordering the final
> two stores when the last block is not a multiple of 64 bytes. This removes
> the n
Based on lessons learnt from optimizing the 32-bit version of this driver,
we can simplify the arm64 version considerably, by reordering the final
two stores when the last block is not a multiple of 64 bytes. This removes
the need to use permutation instructions to calculate the elements that are
c