On 6/7/2019 9:20 PM, Davide Caratti wrote: > On Wed, 2019-06-05 at 18:42 -0700, Cong Wang wrote: >> On Tue, Jun 4, 2019 at 11:19 AM Eli Britstein <el...@mellanox.com> wrote: > hello Cong and Eli, > > and thanks for all the thoughts. > >>> On 6/4/2019 8:55 PM, Cong Wang wrote: >>>> On Sat, Jun 1, 2019 at 9:22 PM Eli Britstein <el...@mellanox.com> wrote: >>>>> I think that's because QinQ, or VLAN is not an encapsulation. There is >>>>> no outer/inner packets, and if you want to mangle fields in the packet >>>>> you can do it and the result is well-defined. >>>> Sort of, perhaps VLAN tags are too short to be called as an >>>> encapsulation, my point is that it still needs some endpoints to push >>>> or pop the tags, in a similar way we do encap/decap. >>>> >>>> >>>>> BTW, the motivation for my fix was a use case were 2 VGT VMs >>>>> communicating by OVS failed. Since OVS sees the same VLAN tag, it >>>>> doesn't add explicit VLAN pop/push actions (i.e pop, mangle, push). If >>>>> you force explicit pop/mangle/push you will break such applications. >>>> From what you said, it seems act_csum is in the middle of packet >>>> receive/transmit path. So, which is the one pops the VLAN tags in >>>> this scenario? If the VM's are the endpoints, why not use act_csum >>>> there? >>> In a switchdev mode, we can passthru the VFs to VMs, and have their >>> representors in the host, enabling us to manipulate the HW eswitch >>> without knowledge of the VMs. >>> >>> To simplify it, consider the following setup: >>> >>> v1a <-> v1b and v2a <-> v2b are veth pairs. >>> >>> Now, we configure v1a.20 and v2a.20 as VLAN devices over v1a/v2a >>> respectively (and put the "a" devs in separate namespaces). >>> >>> The TC rules are on the "b" devs, for example: >>> >>> tc filter add dev v1b ... action pedit ... action csum ... action >>> redirect dev v2b >>> >>> Now, ping from v1a.20 to v1b.20. The namespaces transmit/receive tagged >>> packets, and are not aware of the packet manipulation (and the required >>> act_csum). >> This is what I said, v1b is not the endpoint which pops the vlan tag, >> v1b.20 is. So, why not simply move at least the csum action to >> v1b.20? With that, you can still filter and redirect packets on v1b, >> you still even modify it too, just defer the checksum fixup to the >> endpoint. >> >> And to be fair, if this case is a valid concern, so is VXLAN case, >> just replace v1a.20 and v2a.20 with VXLAN tunnels. If you modify >> the inner header, you have to fixup the checksum in the outer >> UDP header. > at least with single "accelerated" vlan tags, I think that users of > tc_skb_protocol() that explicitly access the IP header should be fixed the > same way that Eli did to 'csum'. > > (note that 'pedit' is *not* among them, and that's why Eli only needed to > fix 'csum' in his setup :) ) > > Now I'm no more sure about what to do with the QinQ case, whether we > should: > > a) fix missing skb_may_pull() in csum, and fix (a couple of) other actions > in the same way, I'm for that. > > or > > b) rework act_csum when it wants to read the IP header, in a way that only > ip header is mangled only when there are only zero or a single > "accelerated" tag, and do the same for (a couple of) other actions.
What is the conceptually difference between single VLAN and QinQ? Or, even between "accelerated" to non-accelerated? Furthermore, doing that you would reduce functionality already exists. Don't do it. > > The pop/push approach built in the action ( option a) ) fixes my > environment - but like Cong says, it only would work with VLANs, and not > with other encapsulations. Probably it really makes sense to see if it's > possible to extend act_vlan in a way that it reconstructs the packet after > an iteration of 'vlan pop'. This might not be easy, because even the above > command assumes that inner and outer tag have the same ethertype (which is > not generally true for QinQ). Again, I think VLANs and encapsulations are not the same. Actions for tagged packets are well defined, the same as they are on native packets. In encapsulations you would *HAVE* to do make decap/encap in order to act on inner packet. > > But until then, we should assume that read/writes of the IP header are not > feasible for QinQ traffic, just like it's not possible for tunneled > packets, and adjust configuration accordingly. > > WDYT? >