From: Eric Dumazet [mailto:[email protected]]
> Sent: 03 July 2015 17:39
> On Fri, 2015-07-03 at 16:18 +0000, David Laight wrote:
>
> > Even on x86 aligning the ethernet receive data on a 4n+2
> > boundary is likely to give marginally better performance
> > than aligning on a 4n boundary.
>
> You are coming late to the party.
I've been to many parties at many different times....
Going back many years, Sun's original sbus DMA part generated a lot
of single sbus transfers for 4n+2 aligned buffers - so it was necessary
to do a 'realignment' copy. The later DMA+ (definitely the DMA2) part
did sbus burst transfers even when the buffer was 4n+2 aligned.
So with the later parts you could correctly align the buffer.
> Intel guys decided to change NET_IP_ALIGN to 0 (it was 2 in the past)
...
> x86: Align skb w/ start of cacheline on newer core 2/Xeon Arch
>
> x86 architectures can handle unaligned accesses in hardware, and it has
> been shown that unaligned DMA accesses can be expensive on Nehalem
> architectures. As such we should overwrite NET_IP_ALIGN to resolve
> this issue.
My 2 cents:
I'd have thought it would depend on the nature of the 'DMA' requests
generated by the hardware - so ethernet hardware dependant.
The above may be correct for PCI masters - especially those that do
paired 16bit accesses for every 32bit word.
If the hardware generated cache line aligned PCI bursts I wouldn't
have thought it would matter.
I doubt it is valid for PCIe transfers - where the ethernet frame
will be split into (probably) 128byte TLPs. Even if it starts on
a 64n+2 boundary the splits will be on 64 byte boundaries since the
first and last 32bit words of the TLP have separate byte enables.
So I'd expect to see a cache line RMW for the first and last cache
lines - That may, or may not, be slower than the misaligned accesses
for the entire frame (1 clock data delay per access?)
Of course, modern nics will write 2 bytes of 'crap' before the frame.
Rounding up the transfer to the end of a cache line might also help
(especially if only a few extra words are needed).
David