Resent to netdev@ without htm formatting, sorry.
On Mon, Dec 3, 2018 at 11:08 PM Eric Dumazet <eduma...@google.com> wrote: > > > > On Mon, Dec 3, 2018 at 10:48 PM Cong Wang <xiyou.wangc...@gmail.com> wrote: >> >> On Mon, Dec 3, 2018 at 10:34 PM Eric Dumazet <eduma...@google.com> wrote: >> > >> > On Mon, Dec 3, 2018 at 10:14 PM Cong Wang <xiyou.wangc...@gmail.com> wrote: >> > > >> > > When an ethernet frame is padded to meet the minimum ethernet frame >> > > size, the padding octets are not covered by the hardware checksum. >> > > Fortunately the padding octets are ususally zero's, which don't affect >> > > checksum. However, we have a switch which pads non-zero octets, this >> > > causes kernel hardware checksum fault repeatedly. >> > > >> > > Prior to commit 88078d98d1bb ("net: pskb_trim_rcsum() and >> > > CHECKSUM_COMPLETE are friends"), >> > > skb checksum was forced to be CHECKSUM_NONE when padding is detected. >> > > After it, we need to keep skb->csum updated, like what we do for RXFCS. >> > > However, fixing up CHECKSUM_COMPLETE requires to verify and parse IP >> > > headers, it is not worthy the effort as the packets are so small that >> > > CHECKSUM_COMPLETE can't save anything. >> > > >> > > I tested this patch with RXFCS on and off, it works fine without any >> > > warning in both cases. >> > > >> > > Fixes: 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are >> > > friends"), >> > > Cc: Saeed Mahameed <sae...@mellanox.com> >> > > Cc: Eric Dumazet <eduma...@google.com> >> > > Cc: Tariq Toukan <tar...@mellanox.com> >> > > Signed-off-by: Cong Wang <xiyou.wangc...@gmail.com> >> > > --- >> > > .../net/ethernet/mellanox/mlx5/core/en_rx.c | 22 ++++++++++++++++++- >> > > 1 file changed, 21 insertions(+), 1 deletion(-) >> > > >> > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c >> > > b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c >> > > index 624eed345b5d..1c153b8091da 100644 >> > > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c >> > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c >> > > @@ -732,6 +732,13 @@ static u8 get_ip_proto(struct sk_buff *skb, int >> > > network_depth, __be16 proto) >> > > ((struct ipv6hdr >> > > *)ip_p)->nexthdr; >> > > } >> > > >> > > +static bool is_short_frame(struct sk_buff *skb, bool has_fcs) >> > > +{ >> > > + u32 frame_len = has_fcs ? skb->len - ETH_FCS_LEN : skb->len; >> > > + >> > > + return frame_len <= ETH_ZLEN; >> > > +} >> > > + >> > > static inline void mlx5e_handle_csum(struct net_device *netdev, >> > > struct mlx5_cqe64 *cqe, >> > > struct mlx5e_rq *rq, >> > > @@ -755,9 +762,22 @@ static inline void mlx5e_handle_csum(struct >> > > net_device *netdev, >> > > goto csum_unnecessary; >> > > >> > > if (likely(is_last_ethertype_ip(skb, &network_depth, &proto))) { >> > > + bool has_fcs = !!(netdev->features & NETIF_F_RXFCS); >> > > + >> > > if (unlikely(get_ip_proto(skb, network_depth, proto) == >> > > IPPROTO_SCTP)) >> > > goto csum_unnecessary; >> > > >> > > + /* CQE csum doesn't cover padding octets in short >> > > ethernet >> > > + * frames. And the pad field is appended prior to >> > > calculating >> > > + * and appending the FCS field. >> > > + * >> > > + * Detecting these padded frames requires to verify and >> > > parse >> > > + * IP headers, so we simply force all those small frames >> > > to be >> > > + * CHECKSUM_NONE even if they are not padded. >> > > + */ >> > > + if (unlikely(is_short_frame(skb, has_fcs))) >> > > + goto csum_none; >> > >> > Should not this go to csum_unnecessary instead ? >> >> I don't see why we don't even want to validate the protocol checksum >> here. >> >> Any reason you are suggesting so? >> >> >> > >> > Probably not a big deal, but small UDP frames might hit this code path, >> > so ethtool -S would show a lot of csum_none which could confuse mlx5 >> > owners. >> >> Why it is confusing? We intentionally bypass hardware checksum >> and let protocol layer validate it. > > > The hardware has probably validated the L3 & L4 checksum just fine. > > Note that if ip_summed is CHECKSUM_UNNECESSARY, the padding bytes (if any) > have no impact on the csum that has been verified by the NIC. > > >> >> >> > >> > BTW, >> > It looks like mlx5 prefers delivering skbs with CHECKSUM_COMPLETE instead >> > of >> > CHECKSUM_UNNECESSARY but at least for ipv6 CHECKSUM_UNNECESSARY >> > would be slightly faster, by avoiding various csum_partial() costs >> > when headers are parsed. >> >> Sure, it is certainly faster if you don't want to validate L4 checksum. > > > Sorry I do not get your point. > >> >> The only question is why we don't either validate hardware checksum >> or L4 checksum? >> > > CHECKSUM_UNNECESSARY means the NIC has fully validated the checksums, > including L4 one. > > For example, mlx4 drivers first checks if CHECKSUM_UNNECESSARY status can be > given, > and only fallback on CHECKSUM_COMPLETE for the rare cases this > CHECKSUM_UNNECESSARY > status was not given by the NIC. > >