(+ Catalin) On Tue, 19 Feb 2019 at 16:08, Ilias Apalodimas <[email protected]> wrote: > > On Tue, Feb 19, 2019 at 12:08:42AM +0100, Ard Biesheuvel wrote: > > It turns out that the IP checksumming code is still exercised often, > > even though one might expect that modern NICs with checksum offload > > have no use for it. However, as Lingyan points out, there are > > combinations of features where the network stack may still fall back > > to software checksumming, and so it makes sense to provide an > > optimized implementation in software as well. > > > > So provide an implementation of do_csum() in scalar assembler, which, > > unlike C, gives direct access to the carry flag, making the code run > > substantially faster. The routine uses overlapping 64 byte loads for > > all input size > 64 bytes, in order to reduce the number of branches > > and improve performance on cores with deep pipelines. > > > > On Cortex-A57, this implementation is on par with Lingyan's NEON > > implementation, and roughly 7x as fast as the generic C code. > > > > Cc: "huanglingyan (A)" <[email protected]> > > Signed-off-by: Ard Biesheuvel <[email protected]> ... > > Acked-by: Ilias Apalodimas <[email protected]>
Full patch here https://lore.kernel.org/linux-arm-kernel/[email protected]/ This was a follow-up to some discussions about Lingyan's NEON code, CC'ed to netdev@ so people could chime in as to whether we need accelerated checksumming code in the first place.
