[PATCH] Fix dumping vlan rules
From: "M. Braun" Given the following bridge rules: 1. ip protocol icmp accept 2. ether type vlan vlan type ip ip protocol icmp accept The are currently both dumped by "nft list ruleset" as 1. ip protocol icmp accept 2. ip protocol icmp accept Though, the netlink code actually is different bridge filter FORWARD 4 [ payload load 2b @ link header + 12 => reg 1 ] [ cmp eq reg 1 0x0008 ] [ payload load 1b @ network header + 9 => reg 1 ] [ cmp eq reg 1 0x0001 ] [ immediate reg 0 accept ] bridge filter FORWARD 5 4 [ payload load 2b @ link header + 12 => reg 1 ] [ cmp eq reg 1 0x0081 ] [ payload load 2b @ link header + 16 => reg 1 ] [ cmp eq reg 1 0x0008 ] [ payload load 1b @ network header + 9 => reg 1 ] [ cmp eq reg 1 0x0001 ] [ immediate reg 0 accept ] Fix this by avoiding the removal of all vlan statements in the given example. Signed-off-by: Michael Braun --- src/payload.c | 12 1 file changed, 12 insertions(+) diff --git a/src/payload.c b/src/payload.c index 3bf1ecc..905422a 100644 --- a/src/payload.c +++ b/src/payload.c @@ -506,6 +506,18 @@ static bool payload_may_dependency_kill(struct payload_dep_ctx *ctx, dep->left->payload.desc == &proto_ip6) && expr->payload.base == PROTO_BASE_TRANSPORT_HDR) return false; + /* Do not kill +* ether type vlan and vlan type ip and ip protocol icmp +* into +* ip protocol icmp +* as this lacks ether type vlan. +* More generally speaking, do not kill protocol type +* for stacked protocols if we only have protcol type matches. +*/ + if (dep->left->etype == EXPR_PAYLOAD && dep->op == OP_EQ && + expr->flags & EXPR_F_PROTOCOL && + expr->payload.base == dep->left->payload.base) + return false; break; } -- 2.20.1
[PATCH] gianfar: fix jumbo packets+napi+rx overrun crash
From: Michael Braun When using jumbo packets and overrunning rx queue with napi enabled, the following sequence is observed in gfar_add_rx_frag: | lstatus | | skb | t | lstatus, size, flags| first | len, data_len, *ptr | ---+--+---+---+ 13 | 18002348, 9032, INTERRUPT LAST | 0 | 9600, 8000, f554c12e | 12 | 1640, 1600, INTERRUPT| 0 | 8000, 6400, f554c12e | 11 | 1640, 1600, INTERRUPT| 0 | 6400, 4800, f554c12e | 10 | 1640, 1600, INTERRUPT| 0 | 4800, 3200, f554c12e | 09 | 1640, 1600, INTERRUPT| 0 | 3200, 1600, f554c12e | 08 | 14000640, 1600, INTERRUPT FIRST | 0 | 1600, 0, f554c12e | 07 | 14000640, 1600, INTERRUPT FIRST | 1 | 0,0, f554c12e | 06 | 1c80, 128, INTERRUPT LAST FIRST | 1 | 0,0, abf3bd6e | 05 | 18002348, 9032, INTERRUPT LAST | 0 | 8000, 6400, c5a57780 | 04 | 1640, 1600, INTERRUPT| 0 | 6400, 4800, c5a57780 | 03 | 1640, 1600, INTERRUPT| 0 | 4800, 3200, c5a57780 | 02 | 1640, 1600, INTERRUPT| 0 | 3200, 1600, c5a57780 | 01 | 1640, 1600, INTERRUPT| 0 | 1600, 0, c5a57780 | 00 | 14000640, 1600, INTERRUPT FIRST | 1 | 0,0, c5a57780 | So at t=7 a new packets is started but not finished, probably due to rx overrun - but rx overrun is not indicated in the flags. Instead a new packets starts at t=8. This results in skb->len to exceed size for the LAST fragment at t=13 and thus a negative fragment size added to the skb. This then crashes: kernel BUG at include/linux/skbuff.h:2277! Oops: Exception in kernel mode, sig: 5 [#1] ... NIP [c04689f4] skb_pull+0x2c/0x48 LR [c03f62ac] gfar_clean_rx_ring+0x2e4/0x844 Call Trace: [ec4bfd38] [c06a84c4] _raw_spin_unlock_irqrestore+0x60/0x7c (unreliable) [ec4bfda8] [c03f6a44] gfar_poll_rx_sq+0x48/0xe4 [ec4bfdc8] [c048d504] __napi_poll+0x54/0x26c [ec4bfdf8] [c048d908] net_rx_action+0x138/0x2c0 [ec4bfe68] [c06a8f34] __do_softirq+0x3a4/0x4fc [ec4bfed8] [c0040150] run_ksoftirqd+0x58/0x70 [ec4bfee8] [c0066ecc] smpboot_thread_fn+0x184/0x1cc [ec4bff08] [c0062718] kthread+0x140/0x144 [ec4bff38] [c0012350] ret_from_kernel_thread+0x14/0x1c This patch fixes this by checking for computed LAST fragment size, so a negative sized fragment is never added. In order to prevent the newer rx frame from getting corrupted, the FIRST flag is checked to discard the incomplete older frame. Signed-off-by: Michael Braun --- drivers/net/ethernet/freescale/gianfar.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/drivers/net/ethernet/freescale/gianfar.c b/drivers/net/ethernet/freescale/gianfar.c index 541de32ea662..2aecae23bfd0 100644 --- a/drivers/net/ethernet/freescale/gianfar.c +++ b/drivers/net/ethernet/freescale/gianfar.c @@ -2390,6 +2390,10 @@ static bool gfar_add_rx_frag(struct gfar_rx_buff *rxb, u32 lstatus, if (lstatus & BD_LFLAG(RXBD_LAST)) size -= skb->len; + WARN(size < 0, "gianfar: rx fragment size underflow"); + if (size < 0) + return false; + skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, page, rxb->page_offset + RXBUF_ALIGNMENT, size, GFAR_RXB_TRUESIZE); @@ -2552,6 +2556,16 @@ static int gfar_clean_rx_ring(struct gfar_priv_rx_q *rx_queue, if (lstatus & BD_LFLAG(RXBD_EMPTY)) break; + /* lost RXBD_LAST descriptor due to overrun */ + if (skb && + (lstatus & BD_LFLAG(RXBD_FIRST))) { + /* discard faulty buffer */ + dev_kfree_skb(skb); + skb = NULL; + + /* can continue normally */ + } + /* order rx buffer descriptor reads */ rmb(); -- 2.20.1
[PATCHv2] gianfar: fix jumbo packets+napi+rx overrun crash
From: Michael Braun When using jumbo packets and overrunning rx queue with napi enabled, the following sequence is observed in gfar_add_rx_frag: | lstatus | | skb | t | lstatus, size, flags| first | len, data_len, *ptr | ---+--+---+---+ 13 | 18002348, 9032, INTERRUPT LAST | 0 | 9600, 8000, f554c12e | 12 | 1640, 1600, INTERRUPT| 0 | 8000, 6400, f554c12e | 11 | 1640, 1600, INTERRUPT| 0 | 6400, 4800, f554c12e | 10 | 1640, 1600, INTERRUPT| 0 | 4800, 3200, f554c12e | 09 | 1640, 1600, INTERRUPT| 0 | 3200, 1600, f554c12e | 08 | 14000640, 1600, INTERRUPT FIRST | 0 | 1600, 0, f554c12e | 07 | 14000640, 1600, INTERRUPT FIRST | 1 | 0,0, f554c12e | 06 | 1c80, 128, INTERRUPT LAST FIRST | 1 | 0,0, abf3bd6e | 05 | 18002348, 9032, INTERRUPT LAST | 0 | 8000, 6400, c5a57780 | 04 | 1640, 1600, INTERRUPT| 0 | 6400, 4800, c5a57780 | 03 | 1640, 1600, INTERRUPT| 0 | 4800, 3200, c5a57780 | 02 | 1640, 1600, INTERRUPT| 0 | 3200, 1600, c5a57780 | 01 | 1640, 1600, INTERRUPT| 0 | 1600, 0, c5a57780 | 00 | 14000640, 1600, INTERRUPT FIRST | 1 | 0,0, c5a57780 | So at t=7 a new packets is started but not finished, probably due to rx overrun - but rx overrun is not indicated in the flags. Instead a new packets starts at t=8. This results in skb->len to exceed size for the LAST fragment at t=13 and thus a negative fragment size added to the skb. This then crashes: kernel BUG at include/linux/skbuff.h:2277! Oops: Exception in kernel mode, sig: 5 [#1] ... NIP [c04689f4] skb_pull+0x2c/0x48 LR [c03f62ac] gfar_clean_rx_ring+0x2e4/0x844 Call Trace: [ec4bfd38] [c06a84c4] _raw_spin_unlock_irqrestore+0x60/0x7c (unreliable) [ec4bfda8] [c03f6a44] gfar_poll_rx_sq+0x48/0xe4 [ec4bfdc8] [c048d504] __napi_poll+0x54/0x26c [ec4bfdf8] [c048d908] net_rx_action+0x138/0x2c0 [ec4bfe68] [c06a8f34] __do_softirq+0x3a4/0x4fc [ec4bfed8] [c0040150] run_ksoftirqd+0x58/0x70 [ec4bfee8] [c0066ecc] smpboot_thread_fn+0x184/0x1cc [ec4bff08] [c0062718] kthread+0x140/0x144 [ec4bff38] [c0012350] ret_from_kernel_thread+0x14/0x1c This patch fixes this by checking for computed LAST fragment size, so a negative sized fragment is never added. In order to prevent the newer rx frame from getting corrupted, the FIRST flag is checked to discard the incomplete older frame. Signed-off-by: Michael Braun --- drivers/net/ethernet/freescale/gianfar.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/drivers/net/ethernet/freescale/gianfar.c b/drivers/net/ethernet/freescale/gianfar.c index 541de32ea662..1cf8ef717453 100644 --- a/drivers/net/ethernet/freescale/gianfar.c +++ b/drivers/net/ethernet/freescale/gianfar.c @@ -2390,6 +2390,10 @@ static bool gfar_add_rx_frag(struct gfar_rx_buff *rxb, u32 lstatus, if (lstatus & BD_LFLAG(RXBD_LAST)) size -= skb->len; + WARN(size < 0, "gianfar: rx fragment size underflow"); + if (size < 0) + return false; + skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, page, rxb->page_offset + RXBUF_ALIGNMENT, size, GFAR_RXB_TRUESIZE); @@ -2552,6 +2556,17 @@ static int gfar_clean_rx_ring(struct gfar_priv_rx_q *rx_queue, if (lstatus & BD_LFLAG(RXBD_EMPTY)) break; + /* lost RXBD_LAST descriptor due to overrun */ + if (skb && + (lstatus & BD_LFLAG(RXBD_FIRST))) { + /* discard faulty buffer */ + dev_kfree_skb(skb); + skb = NULL; + rx_queue->stats.rx_dropped++; + + /* can continue normally */ + } + /* order rx buffer descriptor reads */ rmb(); -- 2.20.1
Re: [PATCHv2] gianfar: fix jumbo packets+napi+rx overrun crash
Am 04.03.2021 21:17, schrieb Vladimir Oltean: Just for my understanding, do you have a reproducer for the issue? I notice you haven't answered Claudiu's questions posted on v1. On LS1021A I cannot trigger this apparent hardware issue even if I force RX overruns (by reducing the ring size). Judging from the "NIP" register from your stack trace, this is a PowerPC device, which one is it? This is on P1020WLAN (AP) devices and it happens reproducably when I use ipsec + iperf3 --udp -b 1000M on some other server targetting the AP. Yes this is PPC. Regards, Michael
Re: [PATCH] gianfar: fix jumbo packets+napi+rx overrun crash
Hi, sorry I missed the mail. Am 04.03.2021 11:05, schrieb Claudiu Manoil: Could you help provide some context, e.g.: On what board/soc were you able to trigger this issue? I have an OpenWRT running on P1020WLAN boards and have IPsec for some gretap tunnel configured. I run iperf3 -s on the AP and iperf3 -c --udp -b 1000M on some server and then the AP reliably crashes. It also crashed sometimes during normal operations when doing some fast download over the IPsec tunnel, but that was hard to reproduce. How often does the overrun occur? Reliably within seconds with the above test. What's the use case? Is the issue triggered with smaller packets than 9600B? It cannot be triggered with 1500 Byte MTU packets as these have FIRST and LAST set in one. I use jumbo frames to speed up ipsec. Increasing the Rx ring size does significantly reduce ring overruns? I did not test this. Regards, Michael Braun
Re: [PATCH] bridge: increase mtu to 64K
Am 07.05.2020 13:06, schrieb Nikolay Aleksandrov: That isn't correct, have you tested with a recent kernel? After: commit 804b854d374e Author: Nikolay Aleksandrov Date: Fri Mar 30 13:46:19 2018 +0300 Thanks for pointing out, I'm sorry my test kernel was too old. Regards, Michael
Re: [PATCH net-next] bridge: multicast to unicast
Am 09.01.2017 13:15, schrieb Johannes Berg: That is bridge fdb entries (need to) expire so the bridge might "forget" a still-connected station not sending but only consuming broadcast traffic. Ok, that I don't know. Somehow if you address a unicast packet there the bridge has to make a decision - so it really should know? If the bridge has not learned the unicast destination mac address on any port, it will flood the packet on all ports except the port it received the packet on. Regards, M. Braun
Re: [PATCH] iproute2: macvlan: add "source" mode
Please ignore this patch, something went wrong. Regards, M. Braun Am 25.09.2016 20:52, schrieb Michael Braun: Adjusting iproute2 utility to support new macvlan link type mode called "source". Example of commands that can be applied: ip link add link eth0 name macvlan0 type macvlan mode source ip link set link dev macvlan0 type macvlan macaddr add 00:11:11:11:11:11 ip link set link dev macvlan0 type macvlan macaddr del 00:11:11:11:11:11 ip link set link dev macvlan0 type macvlan macaddr flush ip -details link show dev macvlan0 Based on previous work of Stefan Gula Signed-off-by: Michael Braun Cc: ste...@gmail.com --- include/linux/if_link.h | 2 ++ man/man8/ip-link.8.in | 57 + 2 files changed, 59 insertions(+) diff --git a/include/linux/if_link.h b/include/linux/if_link.h index 1feb708..ec5e64e 100644 --- a/include/linux/if_link.h +++ b/include/linux/if_link.h @@ -401,6 +401,8 @@ enum macvlan_macaddr_mode { }; #define MACVLAN_FLAG_NOPROMISC 1 +#define MACVLAN_FLAG_UNICAST 2 +#define MACVLAN_FLAG_UNICAST_ALL 4 /* VRF section */ enum { diff --git a/man/man8/ip-link.8.in b/man/man8/ip-link.8.in index ffc4160..1ad3cfe 100644 --- a/man/man8/ip-link.8.in +++ b/man/man8/ip-link.8.in @@ -138,6 +138,9 @@ ip-link \- network device configuration .IR NAME " ]" .br .RB "[ " addrgenmode " { " eui64 " | " none " | " stable_secret " | " random " } ]" +.br +.B macaddr " |" +.IR "COMMAND MACADDR |" .ti -8 @@ -228,8 +231,46 @@ Link types: - IP over Infiniband device .sp .B macvlan +.I MODE - Virtual interface base on link layer address (MAC) .sp +Modes: +.in +8 +.B private +- The device never communicates with any other device on the same upper_dev. +This even includes frames coming back from a reflective relay, where supported +by the adjacent bridge. +.sp +.B vepa +- we assume that the adjacent bridge returns all frames where both source and +destination are local to the macvlan port, i.e. the bridge is set up as a +reflective relay. Broadcast frames coming in from the upper_dev get flooded to +all macvlan interfaces in VEPA mode. We never deliver any frames locally. +.sp +.B bridge +- behave as simple bridge between different macvlan interfaces on the same +port. Frames from one interface to another one get delivered directly and are +not sent out externally. Broadcast frames get flooded to all other bridge +ports and to the external interface, but when they come back from a reflective +relay, we don't deliver them again. Since we know all the MAC addresses, the +macvlan bridge mode does not require learning or STP like the bridge module +does. +.sp +.B passthru +- allows takeover of the underlying device and passing it to a guest using +virtio with macvtap backend. Only one macvlan device is allowed in passthru +mode and it inherits the mac address from the underlying device and sets it in +promiscuous mode to receive and forward all the packets. +.sp +.B source +- allows one to set a list of allowed mac address, which is used to match +against source mac address from received frames on underlying interface. This +allows creating mac based VLAN associations, instead of standard port or tag +based. The feature is useful to deploy 802.1x mac based behavior, +where drivers of underlying interfaces doesn't allows that. +.sp +.in -8 +.sp .B macvtap - Virtual interface based on link layer address (MAC) and TAP. .sp @@ -1074,6 +1115,22 @@ specifies the type of the device. .SS ip link set - change device attributes +.TP +.BI macaddr " COMMAND MACADDR" +add or removes MACADDR from allowed list for source mode macvlan type link +Commands: +.in +8 +.B add +- add MACADDR to allowed list +.sp +.B del +- remove MACADDR from allowed list +.sp +.B flush +- flush whole allowed list +.sp +.in -8 + .PP .B Warning: If multiple parameter changes are requested,
Re: [PATCH 3/3] mac80211: multicast to unicast conversion
Am 05.10.2016 12:19, schrieb Johannes Berg: on both ends. Furthermore, I've seen a few mobile phone stations locally that indicate qos support but won't complete DHCP if their broadcasts are encapsulated as A-MSDU. Though they work fine with this series approach. Presumably those phones also don't even try to use DMS, right? When I traced this two years ago, almost no device indicated DMS support, even though almost all seem to accepted multicast in unicast a-msdu frames. This patch therefore does not opt to implement DMS but instead just replicates the packet and changes the destination address. As this works fine with ARP, IPv4 and IPv6, it is limited to these protocols and normal 802.11 multicast frames are send out for all other payload protocols. How did you determine that it "works fine"? First, I tested this manually using my own devices or asked friends. I think this covered at least a recent debian x64 with an intel wireless card, a windows 7 x64 with an intel wireless card, an android phone, an ios phone and some recent macbook. Manually testing included visiting an IPv6 only website (this network uses IPv6 router advertismentens (RA) but no DHCPv6), so RA is accepted and ND working. Additionally, arping'ing these station using broadcast arp request worked fine, so broadcast arp requests are working. Finally, DHCP worked fine and UPNP multicast discovery for some closed source media streaming wireless device was reported working. Next, that change was rolled out. It is now in use for at least three years with about 300 simulatenously online stations and >2000 currently registered devices and there hasn't been a single problem report that could be related to that change. Though, e.g. our samsung galaxy users report consistently that their devices refuse to connect using WPA-PSK as our network advertises FT-PSK next to WPA-PSK and I learned that there was at least one device there that did not like the multicast-as-unicast-amsdu packets due to a user problem report. I see at least one undesirable impact of this, which DMS doesn't have; it breaks a client's MUST NOT requirement from RFC 1122: Okay, so this cannot go into linux, right? The thing I dislike most about DMS is that it is client driven, that is an AP will only apply unicast conversion if a station actively requests so. You should split the patch into cfg80211 and mac80211, IMHO it's big enough to do that. ok + * @set_ap_unicast: set the multicast to unicast flag for a AP interface That API name isn't very descriptive, I'm sure we can do something better there. proposal: "request multicast packets to be trasnmitted as unicast" ? Also, perhaps we should structure this already like we would DMS, with a per-station toggle or even list of multicast addresses? should be possible, yes +static int ieee80211_set_ap_unicast(struct wiphy *wiphy, struct net_device *dev, + const bool unicast) +{ + struct ieee80211_sub_if_data *sdata = IEEE80211_DEV_TO_SUB_IF(dev); + + if (sdata->vif.type != NL80211_IFTYPE_AP) + return -1; Was this not documented but also intended to apply to its dependent VLANs? it was intended as a per per-BSS toggle, so it applies to all dependent VLANs automatically. +/* Check if multicast to unicast conversion is needed and do it. + * Returns 1 if skb was freed and should not be send out. */ wrong comment style :) you mean the */ at end of line instead of on a new line? + unicast = nla_data(info->attrs[NL80211_ATTR_UNICAST]); What's this supposed to mean? this was supposed to be nla_get_u8. michael
Re: [PATCH 3/3] mac80211: multicast to unicast conversion
Am 05.10.2016 13:58, schrieb Johannes Berg: Anyway, perhaps this needs to change to take DMS/per-station into account? Then again, this kind of setting - global multicast-to-unicast - fundamentally *cannot* be done on a per-station basis, since if you enable it for one station and not for another, the first station that has it enabled would get the packets twice... as I see it, that is exactly how DMS is standarized. IEEE 802.11-2012 section 10.23.15 DMS procedures: "If the requested DMS is accepted by the AP, the AP shall send subsequent group addressed MSDUs that match the frame classifier specified in the DMS Descriptors to the requesting STA as A-MSDU subframes within an individually addressed A-MSDU frame (see 8.3.2.2 and 9.11)." -> so the multicast packets shall go out as unicast A-MSDU frames to stations that requested this "The AP shall continue to transmit the matching frames as group addressed frames (see 9.3.6, and 10.2.1.16) if at least one associated STA has not requested DMS for these frames." -> so it will continue to send it as multicast frames as well. As with DMS the station requested DMS for a specific multicast address, it could then drop multicast frames addressed to the multicast address it registered for DMS. Regards, M. Braun