On Sat, May 16, 2020 at 07:02:25PM -0700, David Miller wrote:
> > The KS8851SNL/SNLI and KS8851-16MLL/MLLI/MLLU are very much the same pieces
> > of silicon, except the former has an SPI interface, while the later has a
> > parallel bus interface. Thus far, Linux has two separate drivers for each
> > and they are diverging considerably.
> > 
> > This series unifies them into a single driver with small SPI and parallel
> > bus specific parts. The approach here is to first separate out the SPI
> > specific parts into a separate file, then add parallel bus accessors in
> > another separate file and then finally remove the old parallel bus driver.
> > The reason for replacing the old parallel bus driver is because the SPI
> > bus driver is much higher quality.
> 
> What strikes me in these changes is all of the new indirect jumps in
> the fast paths of TX and RX packet processing.  It's just too much for
> my eyes. :-)
> 
> Especially in the presence of Spectre mitigations, these costs are
> quite non-trivial.
> 
> Seriously, I would recommend that instead of having these small
> indirect helpers, just inline the differences into two instances of
> the RX interrupt and the TX handler.

I agree.

However in terms of performance there's a bigger problem:

Previously ks8851.c (SPI driver) had 8-bit and 32-bit register accessors.
The present series drops them and performs a 32-bit access as two 16-bit
accesses and an 8-bit access as one 16-bit access because that's what
ks8851_mll.c (16-bit parallel bus driver) does.  That has a real,
measurable performance impact because in the case of 8-bit accesses,
another 8 bits need to be transferred over the SPI bus, and in the case
of 32-bit accesses, *two* SPI transfers need to be performed.

The 8-bit and 32-bit accesses happen in ks8851_rx_pkts(), i.e. in the
RX hotpath.  I've provided numbers for the performance impact and even
a patch to solve them but it was dismissed and not included in the
present series:

https://lore.kernel.org/netdev/20200420140700.6632hztejwcgj...@wunner.de/

The reason given for the dismissal was that I had performed the measurements
on 4.19 which is allegedly "long dead" (in Andrew Lunn's words).
However I can assure you that performing two SPI transfers has not
magically become as fast as performing one SPI transfer since 4.19.
So the argument is nonsense.

Nevertheless I was going to repeat the performance measurements on a
recent kernel but haven't gotten around to that yet because the
measurements need to be performed with CONFIG_PREEMPT_RT_FULL to
be reliable (a vanilla kernel is too jittery), so I have to create
a new branch with RT patches on the test machine, which is fairly
involved and time consuming.

I think it's fair that the two drivers are unified, but the performance
for the SPI variant shouldn't be unnecessarily diminished in the process.

Thanks,

Lukas

Reply via email to