On Tue, Oct 27, 2020 at 09:53:45PM +0100, Tobias Waldekranz wrote: > So all FROM_CPU packets to a given device/port will always travel > through the same set of ports.
Ah, this is the part that didn't click for me. For the simple case where you have a multi-CPU port system: Switch 0 +--------------------------------+ DSA master #1 |CPU port #1 | +------+ +------+ +-------+ | eth0 | ----> | | -----\ ----- | sw0p0 | ------> +------+ +------+ \ / +-------+ | \/ | DSA master #2 |CPU port #2 /\ | +------+ +------+ / \ +-------| | eth1 | ----> | | -----/ \-----| sw0p1 | ------> +------+ +------+ +-------+ | | +--------------------------------+ you can have Linux do load balancing of CPU ports on TX for many streams being delivered to the same egress port (sw0p0). But if you have a cascade: Switch 0 Switch 1 +--------------------------------+ +--------------------------------+ DSA master #1 |CPU port #1 DSA link #1 | |DSA link #1 | +------+ +------+ +-------+ +------+ +-------+ | eth0 | ----> | | -----\ ----- | | ----> | | -----\ ----- | sw1p0 | ----> +------+ +------+ \ / +-------+ +------+ \ / +-------+ | \/ | | \/ | DSA master #2 |CPU port #2 /\ DSA link #2 | |DSA link #2 /\ | +------+ +------+ / \ +-------| +------+ / \ +-------| | eth1 | ----> | | -----/ \-----| | ----> | | -----/ \-----| sw1p1 | ----> +------+ +------+ +-------+ +------+ +-------+ | | | | +--------------------------------+ +--------------------------------+ then you have no good way to spread the same load (many streams all delivered to the same egress port, sw1p0) between DSA link #1 and DSA link #2. DSA link #1 will get congested, while DSA link #2 will remain unused. And this all happens because for FROM_CPU packets, the hardware is configured in mv88e6xxx_devmap_setup to deliver all packets with a non-local switch ID towards the same "routing" port, right? Whereas for FORWARD frames, the destination port for non-local switch ID will not be established based on mv88e6xxx_devmap_setup, but based on FDB lookup of {DMAC, VID}. In the second case above, this is the only way for your hardware that the FDB could select the LAG as the destination based on the FDB. Then, the hash code would be determined from the packet, and the appropriate egress port within the LAG would be selected. So, to answer my own question: Q: What does using FORWARD frames to offload TX flooding from the bridge have to do with a LAG between 2 switches? A: Nothing, they would just both need FORWARD frames to be used. > > Why don't you attempt to solve this more generically somehow? Your > > switch is not the only one that can't perform source address learning > > for injected traffic, there are tons more, some are not even DSA. We > > can't have everybody roll their own solution. > > Who said anything about rolling my solution? I'm going for a generic > solution where a netdev can announce to the bridge it is being added to > that it can offload forwarding of packets for all ports belonging to the > same switchdev device. Most probably modeled after how the macvlan > offloading stuff is done. The fact that I have no idea how the macvlan offloading is done does not really help me, but here, the fact that I understood nothing doesn't appear to stem from that. "a netdev can announce to the bridge it is being added to that it can offload forwarding of packets for all ports belonging to the same switchdev device" What do you mean? skb->offload_fwd_mark? Or are you still talking about its TX-side equivalent here, which is what we've been talking about in these past few mails? If so, I'm confused by you calling it "offload forwarding of packets", I was expecting a description more in the lines of "offload flooding of packets coming from host" or something like that. > In the case of mv88e6xxx that would kill two birds with one stone - > great! In other cases you might have to have the DSA subsystem listen to > new neighbors appearing on the bridge and sync those to hardware or > something. Hopefully someone working with that kind of hardware can > solve that problem. If by "neighbors" you mean that you bridge a DSA swp0 with an e1000 eth0, then that is not going to be enough. The CPU port of swp0 will need to learn not eth0's MAC address, but in fact the MAC address of all stations that might be connected to eth0. There might even be a network switch connected to eth0, not just a directly connected link partner. So there are potentially many MAC addresses to be learnt, and all are unknown off-hand. I admit I haven't actually looked at implementing this, but I would expect that what needs to be done is that the local (master) FDB of the bridge (which would get populated on the RX side of the "foreign interface" through software learning) would need to get offloaded in its entirety towards all switchdev ports, via a new switchdev "host FDB" object or something of that kind (where a "host FDB" entry offloaded on a port would mean "see this {DMAC, VID} pair? send it to the CPU"). With your FORWARD frames life-hack you can eschew all of that, good for you. I was just speculatively hoping you might be interested in tackling the hard way. Anyway, this discussion has started mixing up basic stuff (like resolving your source address learning issue on the CPU port, when bridged with a foreign interface) with advanced / optimization stuff (LAG, offload flooding from host), the only commonality appearing to be a need for FORWARD frames. Can you even believe we are still commenting on a series about something as mundane as link aggregation on DSA user ports? At least I can't. I'll go off and start reviewing your patches, before we manage to lose everybody along the way.