Re: [PATCH v3 net-next 2/4] net: dsa: permit cross-chip bridging between all trees in the system

Vladimir Oltean Fri, 08 May 2020 05:55:26 -0700

Hi Florian,

Thank you so much for the review!

On Fri, 8 May 2020 at 06:16, Florian Fainelli <[email protected]> wrote:
>
>
>
> On 5/3/2020 3:12 PM, Vladimir Oltean wrote:
> > From: Vladimir Oltean <[email protected]>
> >
> > One way of utilizing DSA is by cascading switches which do not all have
> > compatible taggers. Consider the following real-life topology:
> >
> >       +---------------------------------------------------------------+
> >       | LS1028A                                                       |
> >       |               +------------------------------+                |
> >       |               |      DSA master for Felix    |                |
> >       |               |(internal ENETC port 2: eno2))|                |
> >       |  +------------+------------------------------+-------------+  |
> >       |  | Felix embedded L2 switch                                |  |
> >       |  |                                                         |  |
> >       |  | +--------------+   +--------------+   +--------------+  |  |
> >       |  | |DSA master for|   |DSA master for|   |DSA master for|  |  |
> >       |  | |  SJA1105 1   |   |  SJA1105 2   |   |  SJA1105 3   |  |  |
> >       |  | |(Felix port 1)|   |(Felix port 2)|   |(Felix port 3)|  |  |
> >       +--+-+--------------+---+--------------+---+--------------+--+--+
> >
> > +-----------------------+ +-----------------------+ 
> > +-----------------------+
> > |   SJA1105 switch 1    | |   SJA1105 switch 2    | |   SJA1105 switch 3    
> > |
> > +-----+-----+-----+-----+ +-----+-----+-----+-----+ 
> > +-----+-----+-----+-----+
> > |sw1p0|sw1p1|sw1p2|sw1p3| |sw2p0|sw2p1|sw2p2|sw2p3| 
> > |sw3p0|sw3p1|sw3p2|sw3p3|
> > +-----+-----+-----+-----+ +-----+-----+-----+-----+ 
> > +-----+-----+-----+-----+
> >
> > The above can be described in the device tree as follows (obviously not
> > complete):
> >
> > mscc_felix {
> >       dsa,member = <0 0>;
> >       ports {
> >               port@4 {
> >                       ethernet = <&enetc_port2>;
> >               };
> >       };
> > };
> >
> > sja1105_switch1 {
> >       dsa,member = <1 1>;
> >       ports {
> >               port@4 {
> >                       ethernet = <&mscc_felix_port1>;
> >               };
> >       };
> > };
> >
> > sja1105_switch2 {
> >       dsa,member = <2 2>;
> >       ports {
> >               port@4 {
> >                       ethernet = <&mscc_felix_port2>;
> >               };
> >       };
> > };
> >
> > sja1105_switch3 {
> >       dsa,member = <3 3>;
> >       ports {
> >               port@4 {
> >                       ethernet = <&mscc_felix_port3>;
> >               };
> >       };
> > };
> >
> > Basically we instantiate one DSA switch tree for every hardware switch
> > in the system, but we still give them globally unique switch IDs (will
> > come back to that later). Having 3 disjoint switch trees makes the
> > tagger drivers "just work", because net devices are registered for the
> > 3 Felix DSA master ports, and they are also DSA slave ports to the ENETC
> > port. So packets received on the ENETC port are stripped of their
> > stacked DSA tags one by one.
> >
> > Currently, hardware bridging between ports on the same sja1105 chip is
> > possible, but switching between sja1105 ports on different chips is
> > handled by the software bridge. This is fine, but we can do better.
> >
> > In fact, the dsa_8021q tag used by sja1105 is compatible with cascading.
> > In other words, a sja1105 switch can correctly parse and route a packet
> > containing a dsa_8021q tag. So if we could enable hardware bridging on
> > the Felix DSA master ports, cross-chip bridging could be completely
> > offloaded.
> >
> > Such as system would be used as follows:
> >
> > ip link add dev br0 type bridge && ip link set dev br0 up
> > for port in sw0p0 sw0p1 sw0p2 sw0p3 \
> >           sw1p0 sw1p1 sw1p2 sw1p3 \
> >           sw2p0 sw2p1 sw2p2 sw2p3; do
> >       ip link set dev $port master br0
> > done
> >
> > The above makes switching between ports on the same row be performed in
> > hardware, and between ports on different rows in software. Now assume
> > the Felix switch ports are called swp0, swp1, swp2. By running the
> > following extra commands:
> >
> > ip link add dev br1 type bridge && ip link set dev br1 up
> > for port in swp0 swp1 swp2; do
> >       ip link set dev $port master br1
> > done
> >
> > the CPU no longer sees packets which traverse sja1105 switch boundaries
> > and can be forwarded directly by Felix. The br1 bridge would not be used
> > for any sort of traffic termination.
>
> Is there anything that prevents br1 from terminating traffic though
> (just curious)?
>

Well, one obvious limitation is the fact that to support termination
on br1, the bridge rx_handler would have to steal packets from DSA
software RX processing path. We just need the upstream switch to
forward packets in hardware between ports that are DSA masters, so the
choice was to at least permit that.
So given the fact that now we have a dummy rx_handler on br1, it _can_
not terminate any traffic.

For the particular hardware layout presented above, the choice was to
let the user bridge the Felix ports. Functionally it is optional
(sja1105 ports are still bridged both ways), but the data paths are
different:
- if br1 doesn't exist, then a packet that needs to go from sw1p0 to
sw2p0 is bridged in software by br0 (because Felix is not bridged, all
of its traffic goes to the CPU, then the rules on br0 kick in, and
this reinjects the packet to sw2p0, which calls dev_queue_xmit to
Felix port 2, which calls dev_queue_xmit to the one and only ENETC
master).
- If br1 exists and we want to forward packets along the same route
(sw1p0 -> sw2p0), then br0 only defines the forwarding domain to which
packets are allowed to go to. There would initially be one duplicate,
when Felix floods the first packet to the CPU _and_ to its other port
(the DSA master of sw2p0), because the packet sent to the stack will
still get software-bridged and re-enqueued just as in the case above.
But for further packets, Felix will no longer flood packets to the
CPU, but just to the other switch. On that end, the other switch will
look at the dsa_8021q tag and decide which ports are allowed to see
the packet and which aren't (these are the "crosschip links" that
depend on which ports are part of br0).

So to answer your question, we never need to terminate traffic on br1
because it only serves as double duty for br0 (accelerating its
forwarding path).
The alternative would have been to build some sja1105 awareness in
Felix of some sorts. The question, of course, is when can the Felix
driver automatically decide that its DSA masters can be bridged
together? And if we take an "automatic decision" route, is it sane
that Felix ports 1 and 2 are forwarding packets autonomously between
them, even though there is no Linux bridge that asked for that?

On the other hand, we may imagine a few situations where things might
look differently.
Let's say Felix had 4 ports, but sja1105 switches were hanging off of
only 3 of its ports. The 4th interface goes straight to a copper port.
If sw1p0 wants to talk to the copper port of Felix, how can we model
that, and what are the chances of it working in hardware?
Spoiler alert, it won't work purely in hardware, because the copper
port would see the unpopped dsa_8021q headers coming from sw1p0.
But we can still put sw1p0 and Felix port 4 in the same br0 interface,
and packets from sw1p0 would go to the CPU, where a new packet would
be forwarded to Felix port 4 without the dsa_8021q tag of sw1p0.
So bridging a Felix standalone (not DSA master) interface with a
sja1105 interface could work under some circumstances (through
software bridging), but that is non-ideal, so as long as the DSA
master switch doesn't have any understanding of the DSA headers it's
transporting, it's simply easier to not do that :) and design boards
where there's a sja1105 switch hanging off of every used Felix port.

But what if we build a super-Felix switch in the future, that
understands the DSA tags of the switches cascaded beneath it? Let's
treat this "super-Felix" in the generic case where it's not a DSA
device. Currently DSA only means that it has an Ethernet connection
towards the system, so its I/O is performed indirectly. But
"super-Felix" can be a pure switchdev device just as well, we need to
think about this situation in a generic way.
The point is just that "super-Felix" has awareness of the DSA tags of
switches beneath it. It can listen for "change upper" events for
bridging, and it can detect when its standalone copper port 4 gets
added to the same bridge as one such downstream switch that it can
understand.
So in that case, the "super-Felix" switch can do some magic in the
background: it can permit hardware bridging of its copper port 4 with
a downstream sja1105 hanging off of its port 0. Based on the topology
described in the device tree, packets sent to the sja1105 would
contain a DSA tag, and packets sent to the copper port wouldn't. From
a user perspective, things would "just work".

I know the data flow sja1105 <-> super-Felix copper port 4 that I just
described is different than what this patch set is providing. With
current Felix, this data flow is not even possible in hardware. But I
would like to look forward and imagine, with that super-Felix, if br1
would still be necessary for the simple case where we're bridging
sw1p0 with sw2p0. I think it would still be necessary, because there's
still no "natural" place for super-Felix to listen on "change upper"
events of the DSA net devices below it. That's the dilemma I'm having,
but it looks like br1 between masters is still the way to go, and that
model won't change regardless of whether the parent switchdev driver
is DSA-aware or not.

I would like to get some more feedback on this.

> >
> > For this to work, we need to give drivers an opportunity to listen for
> > bridging events on DSA trees other than their own, and pass that other
> > tree index as argument. I have made the assumption, for the moment, that
> > the other existing DSA notifiers don't need to be broadcast to other
> > trees. That assumption might turn out to be incorrect. But in the
> > meantime, introduce a dsa_broadcast function, similar in purpose to
> > dsa_port_notify, which is used only by the bridging notifiers.
> >
> > Signed-off-by: Vladimir Oltean <[email protected]>
> > ---
> Reviewed-by: Florian Fainelli <[email protected]>
> --
> Florian

Thanks,
-Vladimir

Re: [PATCH v3 net-next 2/4] net: dsa: permit cross-chip bridging between all trees in the system

Reply via email to