Hi Will, On 2019/8/16 0:46, Will Deacon wrote: > On Thu, May 16, 2019 at 11:14:35AM +0800, Zhangshaokun wrote: >> On 2019/5/15 17:47, Will Deacon wrote: >>> On Mon, Apr 15, 2019 at 07:18:22PM +0100, Robin Murphy wrote: >>>> On 12/04/2019 10:52, Will Deacon wrote: >>>>> I'm waiting for Robin to come back with numbers for a C implementation. >>>>> >>>>> Robin -- did you get anywhere with that? >>>> >>>> Still not what I would call finished, but where I've got so far (besides an >>>> increasingly elaborate test rig) is as below - it still wants some >>>> unrolling >>>> in the middle to really fly (and actual testing on BE), but the worst-case >>>> performance already equals or just beats this asm version on Cortex-A53 >>>> with >>>> GCC 7 (by virtue of being alignment-insensitive and branchless except for >>>> the loop). Unfortunately, the advantage of C code being instrumentable does >>>> also come around to bite me... >>> >>> Is there any interest from anybody in spinning a proper patch out of this? >>> Shaokun? >> >> HiSilicon's Kunpeng920(Hi1620) benefits from do_csum optimization, if Ard and >> Robin are ok, Lingyan or I can try to do it. >> Of course, if any guy posts the patch, we are happy to test it. >> Any will be ok. > > I don't mind who posts it, but Robin is super busy with SMMU stuff at the > moment so it probably makes more sense for you or Lingyan to do it.
Thanks for restarting this topic, I or Lingyan will do it soon. Thanks, Shaokun > > Will > > . >