On Tue, Oct 27, 2020 at 21:00, Vladimir Oltean <olte...@gmail.com> wrote: > On Tue, Oct 27, 2020 at 07:25:16PM +0100, Tobias Waldekranz wrote: >> > 1) trunk user ports, with team/bonding controlling it >> > 2) trunk DSA ports, i.e. the ports between switches in a D in DSA setup >> > 3) trunk CPU ports. > [...] >> I think that (2) and (3) are essentially the same problem, i.e. creating >> LAGs out of DSA links, be they switch-to-switch or switch-to-cpu >> connections. I think you are correct that the CPU port can not be a >> LAG/trunk, but I believe that limitation only applies to TO_CPU packets. > > Which would still be ok? They are called "slow protocol PDUs" for a reason.
Oh yes, completely agree. That was the point I was trying to make :) >> In order for this to work on transmit, we need to add forward offloading >> to the bridge so that we can, for example, send one FORWARD from the CPU >> to send an ARP broadcast to swp1..4 instead of four FROM_CPUs. > > That surely sounds like an interesting (and tough to implement) > optimization to increase the throughput, but why would it be _needed_ > for things to work? What's wrong with 4 FROM_CPU packets? We have internal patches that do this, and I can confirm that it is tough :) I really would like to figure out a way to solve this, that would also be acceptable upstream. I have some ideas, it is on my TODO. In a single-chip system I agree that it is not needed, the CPU can do the load-balancing in software. But in order to have the hardware do load-balancing on a switch-to-switch LAG, you need to send a FORWARD. FROM_CPUs would just follow whatever is in the device mapping table. You essentially have the inverse of the TO_CPU problem, but on Tx FROM_CPU would make up 100% of traffic. Other than that there are some things that, while strictly speaking possible to do without FORWARDs, become much easier to deal with: - Multicast routing. This is one case where performance _really_ suffers from having to skb_clone() to each recipient. - Bridging between virtual interfaces and DSA ports. Typical example is an L2 VPN tunnel or one end of a veth pair. On FROM_CPUs, the switch can not perform SA learning, which means that once you bridge traffic from the VPN out to a DSA port, the return traffic will be classified as unknown unicast by the switch and be flooded everywhere.