On Thu, 28 Jun 2018 06:50:32 +0300, Or Gerlitz wrote: > On Thu, Jun 28, 2018 at 2:08 AM, Jakub Kicinski > <jakub.kicin...@netronome.com> wrote: > > On Wed, 27 Jun 2018 23:07:29 +0300, Or Gerlitz wrote: > >> On Wed, Jun 27, 2018 at 1:31 AM, Jakub Kicinski > >> <jakub.kicin...@netronome.com> wrote: > >> > On Tue, 26 Jun 2018 17:57:08 +0300, Or Gerlitz wrote: > >> > >> >> 2. re the egress side of things. Some NIC HWs can't just use LAG > >> >> as the egress port destination of an ACL (tc rule) and the HW rule > >> >> needs to be duplicated to both HW ports. So... in that case, you > >> >> see the HW driver doing the duplication (:() or we can somehow > >> >> make it happen from user-space? > >> > >> > It's the TC core that does the duplication. Drivers which don't need > >> > the duplication (e.g. mlxsw) will not register a new callback for each > >> > port on which shared block is bound. They will keep one list of rules, > >> > and a list of ports that those rules apply to. > >> > >> [snip] > >> > >> > Drivers which need duplication (multiplication) (all NICs?) have to > >> > register a new callback for each port bound to a shared block. And TC > >> > will call those drivers as many times as they have callbacks registered > >> > == as many times as they have ports bound to the block. Each time > >> > callback is invoked the driver will figure out the ingress port based > >> > on the cb_priv and use <ingress, cookie> as the key in its rule table > >> > (or have a separate rule table per ingress port). > >> > >> [snip snip] > >> > >> > I may be wrong, but I think you split the rules tables per port for mlx5 > >> > > >> > >> correct, currently I have a rule table per physical port. > >> > >> > So again you just register a callback every time shared block is bound, > >> > and then TC core will send add/remove rule commands down to the driver, > >> > relaying existing rules as well if needed. > >> > >> Let's see, the NIC uplink rep port devices were bounded (say) by ovs to > >> a shared-block because they are the lower devices (hate the slavish jargon) > >> of a bond device. > >> > >> Next, the TC stack will invoke the callback over these ports, when ingress > >> rule is added on the bond. > >> > >> But we are talking on ingress rule set on a non-uplink rep (VF rep) port, > >> where bonding is the egress of the rule. I guess the callback which you > >> probably > >> refer to (you hinted there below) is the egdev one, correct? you are > >> suggesting > >> that bonding will do egdev registration... I am a bit confused. > > > > Ah, you really meant egress. We don't have this problem, but yes, I > > so how does it works for you -- the rule is: > > <ingress=vfrep netdev, egress=bond netdev> > > so from here, your driver logic does what inorder > to allow offloading into the lagged uplinks? can you > point the code please..
static int nfp_fl_output(struct nfp_app *app, struct nfp_fl_output *output, ... if (tun_type) { /* Verify the egress netdev matches the tunnel type. */ if (!nfp_fl_netdev_is_tunnel_type(out_dev, tun_type)) return -EOPNOTSUPP; if (*tun_out_cnt) return -EOPNOTSUPP; (*tun_out_cnt)++; output->flags = cpu_to_be16(tmp_flags | NFP_FL_OUT_FLAGS_USE_TUN); output->port = cpu_to_be32(NFP_FL_PORT_TYPE_TUN | tun_type); } else if (netif_is_lag_master(out_dev) && priv->flower_ext_feats & NFP_FL_FEATS_LAG) { int gid; output->flags = cpu_to_be16(tmp_flags); gid = nfp_flower_lag_get_output_id(app, out_dev); if (gid < 0) return gid; output->port = cpu_to_be32(NFP_FL_LAG_OUT | gid); } else { /* Set action output parameters. */ output->flags = cpu_to_be16(tmp_flags); /* Only offload if egress ports are on the same device as the * ingress port. */ if (!switchdev_port_same_parent_id(in_dev, out_dev)) return -EOPNOTSUPP; if (!nfp_netdev_is_nfp_repr(out_dev)) return -EOPNOTSUPP; output->port = cpu_to_be32(nfp_repr_get_port_id(out_dev)); if (!output->port) return -EOPNOTSUPP; } > the bond BTW doesn't have the same switchdev id as > the vfrep in case you keep different switchdev id's > for the uplink reps under bonding -- do you unite them?