On 5/17/20 9:13 AM, Lukas Wunner wrote: > On Sat, May 16, 2020 at 07:02:25PM -0700, David Miller wrote: >>> The KS8851SNL/SNLI and KS8851-16MLL/MLLI/MLLU are very much the same pieces >>> of silicon, except the former has an SPI interface, while the later has a >>> parallel bus interface. Thus far, Linux has two separate drivers for each >>> and they are diverging considerably. >>> >>> This series unifies them into a single driver with small SPI and parallel >>> bus specific parts. The approach here is to first separate out the SPI >>> specific parts into a separate file, then add parallel bus accessors in >>> another separate file and then finally remove the old parallel bus driver. >>> The reason for replacing the old parallel bus driver is because the SPI >>> bus driver is much higher quality. >> >> What strikes me in these changes is all of the new indirect jumps in >> the fast paths of TX and RX packet processing. It's just too much for >> my eyes. :-) >> >> Especially in the presence of Spectre mitigations, these costs are >> quite non-trivial. >> >> Seriously, I would recommend that instead of having these small >> indirect helpers, just inline the differences into two instances of >> the RX interrupt and the TX handler. > > I agree.
I do not. > However in terms of performance there's a bigger problem: > > Previously ks8851.c (SPI driver) had 8-bit and 32-bit register accessors. > The present series drops them and performs a 32-bit access as two 16-bit > accesses and an 8-bit access as one 16-bit access because that's what > ks8851_mll.c (16-bit parallel bus driver) does. That has a real, > measurable performance impact because in the case of 8-bit accesses, > another 8 bits need to be transferred over the SPI bus, and in the case > of 32-bit accesses, *two* SPI transfers need to be performed. > > The 8-bit and 32-bit accesses happen in ks8851_rx_pkts(), i.e. in the > RX hotpath. I've provided numbers for the performance impact and even > a patch to solve them but it was dismissed and not included in the > present series: > > https://lore.kernel.org/netdev/20200420140700.6632hztejwcgj...@wunner.de/ > > The reason given for the dismissal was that I had performed the measurements > on 4.19 which is allegedly "long dead" (in Andrew Lunn's words). > However I can assure you that performing two SPI transfers has not > magically become as fast as performing one SPI transfer since 4.19. > So the argument is nonsense. I invested time and even obtained the SPI variant of the card to perform actual comparative measurements on linux-next both on the SPI and parallel variant with iperf, both for latency and throughput, and I do not observe this problem. A month ago, I even provided you a branch with all the patches and the DT patch for RPi3 (the platform you claim to use for these tests, so I used the same) so you can perform the same test as I did, with the same hardware and the same software. So it should have been trivial to reproduce the tests I did and their results. > Nevertheless I was going to repeat the performance measurements on a > recent kernel but haven't gotten around to that yet because the > measurements need to be performed with CONFIG_PREEMPT_RT_FULL to > be reliable (a vanilla kernel is too jittery), so I have to create > a new branch with RT patches on the test machine, which is fairly > involved and time consuming. > > I think it's fair that the two drivers are unified, but the performance > for the SPI variant shouldn't be unnecessarily diminished in the process. Could it be that your problem is related to this huge out-of-tree patch you use then ? -- Best regards, Marek Vasut