On 10/08/2017 12:42 PM, Nikolay Aleksandrov wrote: > On 08/10/17 12:39, Nikolay Aleksandrov wrote: >> On 08/10/17 08:23, Yotam Gigi wrote: >>> On 10/05/2017 03:09 PM, Nikolay Aleksandrov wrote: >>>> On 05/10/17 13:36, Jiri Pirko wrote: >>>>> From: Yotam Gigi <yot...@mellanox.com> >>>>> >>>>> Every bridge port is in one of four mcast router port states: >>>>> - MDB_RTR_TYPE_PERM - the port is set by the user to be an mrouter port >>>>> regardless of IGMP queries. >>>>> - MDB_RTR_TYPE_DISABLED - the port is set by the user to not be an >>>>> mrouter >>>>> port regardless of IGMP queries. >>>>> - MDB_RTR_TYPE_TEMP - the port is set by the user to be in mcast router >>>>> learning state, but currently it is not an mrouter port as no IGMP >>>>> query >>>>> has been received by it for the last multicast_querier_interval. >>>>> - MDB_RTR_TYPE_TEMP_QUERY - the port is set by the user to be in mcast >>>>> router learning state, and currently it is an mrouter port due to an >>>>> IGMP query that has been received by it during the passed >>>>> multicast_querier_interval. >>>> I think you got the last two partially mixed up, MDB_RTR_TYPE_TEMP marks >>>> the port as a router >>>> regardless if there were any igmp queries, while TYPE_TEMP_QUERY means >>>> it's in learning >>>> state. It is the timer (armed vs not) that defines if currently the port >>>> is a router >>>> when one of the TEMP/TEMP_QUERY are set. In the _TEMP case it is always >>>> armed as it >>>> is refreshed by user or igmp queries which was the point of that mode. >>>> So this means in br_multicast_router() just check for the timer_pending or >>>> perm mode. >>> >>> As much as I tried to make this clear, it seems like I failed :) >>> >>> The 4 states I described are currently the "bridged port" states, not the >>> "bridge device" state. A bridged port has these 4 states, all can be set by >>> the >>> user, while the bridge device only uses 3 of these states. This patch makes >>> the >>> bridge device use the 4 states too. I thought it makes sense. >> (disclaimer: this is all about bridge ports, not bridge device) >> Right, I'll try to explain again: _TEMP always marks the port as a mcast >> router, >> it does not put it into just learning state waiting for an igmp query and it >> can >> be refreshed by either a query or the user again setting the port in _TEMP. >> While _TEMP_QUERY puts the port in learning state waiting for a query to >> become >> a router, and _TEMP downgrades to _TEMP_QUERY if it expires. >> >> Does that make it clearer ? >> >> so for _TEMP you say: >>>>> - MDB_RTR_TYPE_TEMP - the port is set by the user to be in mcast router >>>>> learning state, but currently it is not an mrouter port as no IGMP >>>>> query >>>>> has been received by it for the last multicast_querier_interval. >> which is not the case, it is always a router when that mode is set on a port. >> Same for _TEMP_QUERY. > Err, sorry by same I meant it is not correct, not that the _TEMP definition > is the same. > Need to get coffee :-)
Ho, I see that I was clear but not right :) I had a look at the code and seems like you are right - for some reason I thought that _TEMP is learning-active and _TEMP_QUERY is learning-inactive, and the ports change states when according to IGMP queries. > >>> The first paragraph describes the current states of a bridged port, and the >>> second one explains the difference between bridged port and bridge device. I >>> will (try to) make it clearer if we agree on resending this patch. >>> >>> Is it clearer now? Yes it is. >>> >>> >>>> In the port code you have the following transitions: >>>> _TEMP -> TEMP_QUERY (on timer fire or user-set val, port becomes learning >>>> only) >>>> _TEMP -> _TEMP (noop on user refresh or igmp query, timer refreshes) >>>> _TEMP_QUERY -> _TEMP_QUERY (on igmp query the timer is armed, port >>>> becomes router) >>>> >>>> you never have _TEMP_QUERY -> _TEMP, which you're using here to denote the >>>> timer >>>> getting armed and the bridge becoming a router. >>> >>> I am not sure I got this one. I do address that: when an IGMP query is >>> recieved >>> and the current state is _TEMP_QUERY, I arm the timer and set the state to >>> _TEMP. I marked that place on the patch, so you can see below. >>> >> Exactly, there is no such transition for the ports. I tried to say that >> you're using >> the router type to distinguish between when a query is received and it is >> just learning. >> I get that you need to do so, but that deviates from how ports are handled, >> thus I >> suggested to use the timer state instead and drop the _TEMP for bridge >> device altogether. >> If it's possible then the patch will be much simpler and you will not need >> the hacks >> to hide the state from user-space which is the part I really don't like. Yeah, I agree. Thanks for the feedback and sorry for the late answer :) >> >>>>> The bridge device (brX) itself can also be configured by the user to be >>>>> either fixed, disabled or learning mrouter port states, but currently >>>>> there >>>>> is no distinction between the MDB_RTR_TYPE_TEMP_QUERY and >>>>> MDB_RTR_TYPE_TEMP >>>>> in the bridge internal state. Due to that, when an IGMP query is received, >>>>> it is not straightforward to tell whether it changes the bridge device >>>>> mrouter port status or not. >>>> But before this patch the bridge device could not get that set. >>>> >>>>> Further patches in this patch-set will introduce notifications upon the >>>>> bridge device mrouter port state. In order to prevent resending bridge >>>>> mrouter notification when it is not needed, such distinction is necessary. >>>>> >>>> Granted the bridge device hasn't got a way to clearly distinguish the >>>> transitions >>>> without the chance for a race and if using the timer one could get an >>>> unnecessary >>> Is there a race condition? >>> >> With this state - no, if you try to use the timer state though there might be >> an extra notification as I've explained below, that's why I asked if it >> would be >> a problem. I think it is worth looking into using the timer state because it >> will >> remove a lot of the hacks for hiding the state needed in this patch. >> >>>> notification but that seems like a corner case when the timer fires >>>> exactly at the >>>> same time as the igmp query is received. Can't it be handled by just >>>> checking if >>>> the new state is different in the notification receiver ? >>>> If it can't and is a problem then I'd prefer to add a new boolean to >>>> denote that >>>> router on/off transition rather than doing this. >>>> >>>>> Hence, add the distinction between MDB_RTR_TYPE_TEMP and >>>>> MDB_RTR_TYPE_TEMP_QUERY states for the bridge device, similarly to any >>>>> other bridge port. >>>>> >>>> This does not add proper MDB_RTR_TYPE_TEMP support for the bridge device >>>> but seems to abuse it to distinguish the timer state, and changes >>>> the meaning of MDB_RTR_TYPE_TEMP. Can't you just use the timer instead ? >>>> I think it will simplify the set and avoid all of this. >>>> >>>>> In order to not break the current kernel-user API, don't propagate the new >>>>> state to the user and use it only in the bridge internal state. Thus, if >>>>> the user reads (either via sysfs or netlink) the bridge device mrouter >>>>> state, he will get the MDB_RTR_TYPE_TEMP_QUERY state even if the current >>>>> bridge state is MDB_RTR_TYPE_TEMP. >>>>> >>>>> Signed-off-by: Yotam Gigi <yot...@mellanox.com> >>>>> Reviewed-by: Nogah Frankel <nog...@mellanox.com> >>>>> Signed-off-by: Jiri Pirko <j...@mellanox.com> >>>>> --- >>>>> net/bridge/br_multicast.c | 25 +++++++++++++++++++++---- >>>>> net/bridge/br_netlink.c | 3 ++- >>>>> net/bridge/br_private.h | 13 ++++++++++--- >>>>> net/bridge/br_sysfs_br.c | 3 ++- >>>>> 4 files changed, 35 insertions(+), 9 deletions(-) >>>>> >>>>> diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c >>>>> index 8dc5c8d..b86307b 100644 >>>>> --- a/net/bridge/br_multicast.c >>>>> +++ b/net/bridge/br_multicast.c >>>>> @@ -861,6 +861,17 @@ static void br_multicast_router_expired(unsigned >>>>> long data) >>>>> >>>>> static void br_multicast_local_router_expired(unsigned long data) >>>>> { >>>>> + struct net_bridge *br = (struct net_bridge *) data; >>>>> + >>>>> + spin_lock(&br->multicast_lock); >>>>> + if (br->multicast_router == MDB_RTR_TYPE_DISABLED || >>>>> + br->multicast_router == MDB_RTR_TYPE_PERM || >>>>> + timer_pending(&br->multicast_router_timer)) >>>>> + goto out; >>>>> + >>>>> + br->multicast_router = MDB_RTR_TYPE_TEMP_QUERY; >>>>> +out: >>>>> + spin_unlock(&br->multicast_lock); >>>>> } >>>>> >>>>> static void br_multicast_querier_expired(struct net_bridge *br, >>>>> @@ -1364,9 +1375,12 @@ static void br_multicast_mark_router(struct >>>>> net_bridge *br, >>>>> unsigned long now = jiffies; >>>>> >>>>> if (!port) { >>>>> - if (br->multicast_router == MDB_RTR_TYPE_TEMP_QUERY) >>>>> + if (br->multicast_router == MDB_RTR_TYPE_TEMP_QUERY || >>>>> + br->multicast_router == MDB_RTR_TYPE_TEMP) { >>>>> mod_timer(&br->multicast_router_timer, >>>>> now + br->multicast_querier_interval); >>>>> + br->multicast_router = MDB_RTR_TYPE_TEMP; >>>>> + } >>> >>> These are the transitions: >>> _TEMP -> _TEMP_QUERY >>> _TEMP_QUERY -> _TEMP_QUERY >> You clearly always set multicast_router to _TEMP, where did you see a >> transition >> _TEMP_QUERY -> _TEMP_QUERY or _TEMP -> _TEMP_QUERY ? >> All transitions are -> _TEMP, because you use it to differentiate between >> learning >> state and when a query was received. Maybe we have different definition for >> a transition :-) >> >>> In both transitions the timer is armed. >>> >>> >>>>> return; >>>>> } >>>>> >>>>> @@ -1952,7 +1966,7 @@ void br_multicast_init(struct net_bridge *br) >>>>> >>>>> spin_lock_init(&br->multicast_lock); >>>>> setup_timer(&br->multicast_router_timer, >>>>> - br_multicast_local_router_expired, 0); >>>>> + br_multicast_local_router_expired, (unsigned long)br); >>>>> setup_timer(&br->ip4_other_query.timer, >>>>> br_ip4_multicast_querier_expired, (unsigned long)br); >>>>> setup_timer(&br->ip4_own_query.timer, br_ip4_multicast_query_expired, >>>>> @@ -2043,11 +2057,14 @@ int br_multicast_set_router(struct net_bridge >>>>> *br, unsigned long val) >>>>> case MDB_RTR_TYPE_DISABLED: >>>>> case MDB_RTR_TYPE_PERM: >>>>> del_timer(&br->multicast_router_timer); >>>>> - /* fall through */ >>>>> - case MDB_RTR_TYPE_TEMP_QUERY: >>>>> br->multicast_router = val; >>>>> err = 0; >>>>> break; >>>>> + case MDB_RTR_TYPE_TEMP_QUERY: >>>>> + if (br->multicast_router != MDB_RTR_TYPE_TEMP) >>>>> + br->multicast_router = val; >>>>> + err = 0; >>>>> + break; >>>>> } >>>>> >>>>> spin_unlock_bh(&br->multicast_lock); >>>>> diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c >>>>> index dea88a2..cee5016 100644 >>>>> --- a/net/bridge/br_netlink.c >>>>> +++ b/net/bridge/br_netlink.c >>>>> @@ -1357,7 +1357,8 @@ static int br_fill_info(struct sk_buff *skb, const >>>>> struct net_device *brdev) >>>>> return -EMSGSIZE; >>>>> #endif >>>>> #ifdef CONFIG_BRIDGE_IGMP_SNOOPING >>>>> - if (nla_put_u8(skb, IFLA_BR_MCAST_ROUTER, br->multicast_router) || >>>>> + if (nla_put_u8(skb, IFLA_BR_MCAST_ROUTER, >>>>> + br_multicast_router_translate(br->multicast_router)) || >>>>> nla_put_u8(skb, IFLA_BR_MCAST_SNOOPING, !br->multicast_disabled) || >>>>> nla_put_u8(skb, IFLA_BR_MCAST_QUERY_USE_IFADDR, >>>>> br->multicast_query_use_ifaddr) || >>>>> diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h >>>>> index ab4df24..e6e3fec 100644 >>>>> --- a/net/bridge/br_private.h >>>>> +++ b/net/bridge/br_private.h >>>>> @@ -649,9 +649,8 @@ void br_multicast_get_stats(const struct net_bridge >>>>> *br, >>>>> >>>>> static inline bool br_multicast_is_router(struct net_bridge *br) >>>>> { >>>>> - return br->multicast_router == 2 || >>>>> - (br->multicast_router == 1 && >>>>> - timer_pending(&br->multicast_router_timer)); >>>>> + return br->multicast_router == MDB_RTR_TYPE_PERM || >>>>> + br->multicast_router == MDB_RTR_TYPE_TEMP; >>>>> } >>>>> >>>>> static inline bool >>>>> @@ -790,6 +789,14 @@ static inline int br_multicast_igmp_type(const >>>>> struct sk_buff *skb) >>>>> } >>>>> #endif >>>>> >>>>> +static inline unsigned char >>>> u8 >>>> >>>>> +br_multicast_router_translate(unsigned char multicast_router) >>>> u8, if need be change the type of the struct member >>> >>> Sorry, didn't quite get that. Currently, both the bridge_port and bridge >>> have >>> the multicast_router field, and in both cases it is of type unsigned char. >>> Do >>> you suggest to change both to be "u8"? >>> >> Right, and this is unnecessarily long and requires to be broken ugly like >> that with >> the type above and name below. So I suggested to use u8 to avoid that. >> >>>>> +{ >>>>> + if (multicast_router == MDB_RTR_TYPE_TEMP) >>>>> + return MDB_RTR_TYPE_TEMP_QUERY; >>>>> + return multicast_router; >>>>> +} >>>>> + >>>>> /* br_vlan.c */ >>>>> #ifdef CONFIG_BRIDGE_VLAN_FILTERING >>>>> bool br_allowed_ingress(const struct net_bridge *br, >>>>> diff --git a/net/bridge/br_sysfs_br.c b/net/bridge/br_sysfs_br.c >>>>> index 723f25e..9b9c597 100644 >>>>> --- a/net/bridge/br_sysfs_br.c >>>>> +++ b/net/bridge/br_sysfs_br.c >>>>> @@ -340,7 +340,8 @@ static ssize_t multicast_router_show(struct device *d, >>>>> struct device_attribute *attr, char *buf) >>>>> { >>>>> struct net_bridge *br = to_bridge(d); >>>>> - return sprintf(buf, "%d\n", br->multicast_router); >>>>> + return sprintf(buf, "%d\n", >>>>> + br_multicast_router_translate(br->multicast_router)); >>>>> } >>>>> >>>>> static ssize_t multicast_router_store(struct device *d, >>>>>