Hi Jiri, On Tue, May 10, 2016 at 02:01:06PM +0200, Jiri Benc wrote: > On Mon, 9 May 2016 17:04:22 +0900, Simon Horman wrote: > > It seems to be caused by the following: > > > > 1. __ipgre_rcv() calls skb_pop_mac_header() which > > sets skb->mac_header to the skb->network_header. > > > > 2. __ipgre_rcv() then calls ip_tunnel_rcv() which calls > > skb_reset_network_header(). This updates skb->network_header to > > just after the end of the GRE header. > > > > This is 28 bytes after the old skb->network_header > > as there is a 20 byte IPv4 header followed by an > > 8 byte GRE header. > > > > 3. Later, dev_gro_receive() calls skb_reset_mac_len(). > > This calculates skb->mac_len based on skb->network_header and > > skb->mac_header. I.e. 28 bytes. > > Right. Thanks for tracking this down! > > > I think this may be possible to address by calling > > skb_reset_network_header() instead of skb_pop_mac_header() > > in __ipgre_rcv(). > > We can't do that. The interface type is ARPHRD_IPGRE and not > ARPHRD_NONE, so the current behavior makes pretty good sense. See > e.g. commit 0e3da5bb8da45. > > We have two options here: > > 1. As for metadata tunnels all the info is in metadata_dst and we > don't need the IP/GRE header for anything, we can make the ipgre > interface ARPHRD_NONE in metadata based mode. > > 2. We can fix this up in ovs after receiving the packet from > ARPHRD_IPGRE interface. > > I think the first option is the correct one. We already don't assign > dev->header_ops in metadata mode. I'll prepare a patch.
I agree that 1. seems to be the better approach. > > Its possible that I've overlooked something but as things stand I think > > things look like this: > > > > * ovs_flow_key_extract() keys off dev->type and skb->protocol. > > * ovs_flow_key_extract() calls key_extract() which amongst other things > > sets up the skb->mac_header and skb->mac_len of the skb. > > * ovs_flow_key_extract() sets skb->protocol to that of the inner packet > > in the case of TEB > > * Actions update the above mentioned skb fields as appropriate. > > Okay, that actually eases things somewhat. > > > So it seems to me that it should be safe to rely on skb->protocol > > in the receive path. Or more specifically, in netdev_port_receive(). > > > > If mac_len is also able to be used then I think fine. But it seems to me > > that it needs to be set up by OvS at least for the ARPHRD_NONE case. This > > could be done early on, say in netdev_port_receive(). But it seems that > > would involve duplicating some of what is already occurring in > > key_extract(). > > I'd actually prefer doing this earlier, netdev_port_receive looks like > the right place. Just set mac_len there (or whatever) and let > key_extract do the rest of the work, not depending on dev->type in > there. > > My point about recirculation was not actually valid, as I missed you're > doing this in ovs_flow_key_extract and not in key_extract. Still, > I think the special handling of particular interface types belongs to > the tx processing on those interfaces, not to the common code. Sure, if that is your preference I think it should be simple enough to implement. I agree that netdev_port_receive() looks like a good place for this.