On Tue, Sep 8, 2020 at 10:59 AM Maxim Mikityanskiy <maxi...@nvidia.com> wrote: > > On 2020-09-04 18:27, Willem de Bruijn wrote: > > On Thu, Sep 3, 2020 at 11:00 PM Saeed Mahameed <sae...@nvidia.com> wrote: > >> > >> From: Maxim Mikityanskiy <maxi...@mellanox.com> > >> > >> A huge function mlx5e_sq_xmit was split into several to achieve multiple > >> goals: > >> > >> 1. Reuse the code in IPoIB. > >> > >> 2. Better intergrate with TLS, IPSEC, GENEVE and checksum offloads. Now > >> it's possible to reserve space in the WQ before running eseg-based > >> offloads, so: > >> > >> 2.1. It's not needed to copy cseg and eseg after mlx5e_fill_sq_frag_edge > >> anymore. > >> > >> 2.2. mlx5e_txqsq_get_next_pi will be used instead of the legacy > >> mlx5e_fill_sq_frag_edge for better code maintainability and reuse. > >> > >> 3. Prepare for the upcoming TX MPWQE for SKBs. It will intervene after > >> mlx5e_sq_calc_wqe_attr to check if it's possible to use MPWQE, and the > >> code flow will split into two paths: MPWQE and non-MPWQE. > >> > >> Two high-level functions are provided to send packets: > >> > >> * mlx5e_xmit is called by the networking stack, runs offloads and sends > >> the packet. In one of the following patches, MPWQE support will be added > >> to this flow. > >> > >> * mlx5e_sq_xmit_simple is called by the TLS offload, runs only the > >> checksum offload and sends the packet. > >> > >> This change has no performance impact in TCP single stream test and > >> XDP_TX single stream test. > >> > >> UDP pktgen (burst 32), single stream: > >> Packet rate: 17.55 Mpps -> 19.23 Mpps > >> Instructions per packet: 420 -> 360 > >> Cycles per packet: 165 -> 142 > >> > >> CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz (x86_64) > >> NIC: Mellanox ConnectX-6 Dx > >> > >> To get this performance gain, manual optimizations of function inlining > >> were performed. It's important to have mlx5e_sq_xmit_wqe inline, > >> otherwise the packet rate will be 1 Mpps less in UDP pktgen test. > >> __always_inline is required, because gcc uninlines it when it's called > >> from two places (mlx5e_xmit and mlx5e_sq_xmit_simple). > >> > >> Signed-off-by: Maxim Mikityanskiy <maxi...@mellanox.com> > >> Signed-off-by: Saeed Mahameed <sae...@nvidia.com> > >> --- > >> .../net/ethernet/mellanox/mlx5/core/en/txrx.h | 63 +-- > >> .../mellanox/mlx5/core/en_accel/en_accel.h | 5 + > >> .../mellanox/mlx5/core/en_accel/tls_rxtx.c | 6 +- > >> .../net/ethernet/mellanox/mlx5/core/en_tx.c | 391 ++++++++++-------- > >> 4 files changed, 243 insertions(+), 222 deletions(-) > > > > This combines a lot of changes. Including supposed noops, but with > > subtle changes, like converting to struct initializers. > > Struct initializers are mostly used in the new code. I can split out the > only converted occurrence. > > > Probably deserves to be broken up a bit more. > > > > For instance, a pure noop patch that moves > > mlx5e_txwqe_build_eseg_csum, > > OK. Not sure I really need to move it though.
Even better. In general, I don't really care how this patch is simplified, but as is it is long and combines code moves, refactors that are supposedly a noop and new functionality. I imagine that there must be some strategy to break it up into sensible manageable chunks.