Re: [tcpdump-workers] vlan tagged packets and libpcap breakage
On Sat, Nov 17, 2012 at 11:14 PM, Michael Richardson wrote: > > Thank you for this reply. > >> "Eric" == Eric W Biederman writes: > Eric> I don't see any need to add any kernel code to allow checking > Eric> if vlan tags are stripped. Vlan headers are stripped on all > Eric> kernel interfaces today. Vlan headers have been stripped on > Eric> all but a handful of software interfaces for 6+ years. For > Eric> all kernels if the vlan header is stripped it is reported in > Eric> the auxdata, upon packet reception. Careful code should also > Eric> look for TP_STATUS_VLAN_VALID which allows for distinguishing > Eric> a striped vlan header of 0 from no vlan header. > > I can regularly see vlan tags on raw dumps from the untagged ("eth0") > today, running 3.2 (debian stable): > > obiwan-[~] mcr 4848 %sudo tcpdump -i eth0 -n -p -e | grep -i vlan > listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes > 17:05:15.404909 38:60:77:38:e6:47 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q > (0x8100), length 46: vlan 3800, p 0, ethertype ARP, Request who-has > 172.30.42.1 tell 172.30.42.11, length 28 > > So, I'm curious about the statement that vlan tags have been stripped > for some time, because I don't see them stripped today. My desktop has > an Intel 82579V NIC in it... Speaking of netsniff-ng where we don't reconstruct VLAN headers, users have reported that depending on the NIC/driver resp. ethtool setting, they can come in stripped or not (in the pf_packet's rx_ring buffer). However, I assume VLAN AUXDATA is always consistent (and so the BPF/BPF JIT filtering). > Eric> For old kernels that do not support the new extensions it is > Eric> possible to generate code that looks at the ethernet header > Eric> and sees if the ethertype is 0x8100 and then does things with > Eric> it, but that will only work on a small handful of software > Eric> only interfaces. > > at tcpdump.org, our concern is to release code that works on both new, > and what for kernel.org folks would be considered "ancient" systems, > such as Centos5/RHEL5 machines which are regularly still in production > in the field (sadly...), but often need the latest diagnostics. > > What I hear you saying is that our existing code will work on older > kernels, and that once we have new code to use the VLAN tag extensions, > we should simply attempt to load it, and either it loads, or we get an > error, and we go back to the code we had before. That's great news. Yes, this should be handled in such a way. ___ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
Re: [tcpdump-workers] vlan tagged packets and libpcap breakage
On 12/12/2012 10:53 PM, Ani Sinha wrote: unsigned int netdev_8021q_inskb = 1; ... { .ctl_name = NET_CORE_8021q_INSKB, .procname = "netdev_8021q_inskb", .data = &netdev_8021q_inskb, .maxlen = sizeof(int), .mode = 0444, .proc_handler = proc_dointvec }, would seem to do it to me. Then pcap can fopen("/proc/sys/net/core/netdev_8021q_inskb") and if it finds it, and it is >0, then do the cmsg thing. Does this work? This is just an experimental patch and by no means final. I just want to have an idea what everyone thought about it. Once we debate and discusss, I can cook up a final patch that would be worth commiting. Also instead of having this /proc interface, we can perhaps check for a specific kernel version that : (a) has the vlan tag info in the skb metadata (as opposed to in the packet itself) (b) has the following patch that adds the capability to generate a filter based on the tag value : commit f3335031b9452baebfe49b8b5e55d3fe0c4677d1 Author: Eric Dumazet Date: Sat Oct 27 02:26:17 2012 + net: filter: add vlan tag access WE need both of the above two things for the userland to generate a filter code that compares vlan tag values in the skb metadata. For kernels that has the vlan tag in the skb metadata but does not have the above commit (b), there is nothing that can be done. For older kernels that had the vlan tag info in the packet itself, the filter code can be generated differently to look at specific offsets within the packet (something that libpcap does currently). We have already ruled out the idea of generating a filter and trying to load and see if that fails (see previous emails on this thread). Hope this makes sense. I think it doesn't. Because then you are obviously considering adding one procfs file into /proc/sys/net/core/ *for each* feature that is added into the ancillary ops which cannot be the right way ... diff --git a/include/linux/filter.h b/include/linux/filter.h index c45eabc..91e2ba3 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -36,6 +36,7 @@ static inline unsigned int sk_filter_len(const struct sk_filter *fp) return fp->len * sizeof(struct sock_filter) + sizeof(*fp); } +extern bool sysctl_8021q_inskb; extern int sk_filter(struct sock *sk, struct sk_buff *skb); extern unsigned int sk_run_filter(const struct sk_buff *skb, const struct sock_filter *filter); diff --git a/net/core/filter.c b/net/core/filter.c index c23543c..4f5a657 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -41,6 +41,8 @@ #include #include +bool sysctl_8021q_inskb = 1; + /* No hurry in this branch * * Exported for the bpf jit load helper. diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c index d1b0804..f9a3700 100644 --- a/net/core/sysctl_net_core.c +++ b/net/core/sysctl_net_core.c @@ -15,6 +15,7 @@ #include #include #include +#include #include #include @@ -189,6 +190,13 @@ static struct ctl_table net_core_table[] = { .mode = 0644, .proc_handler = proc_dointvec }, + { + .procname = "8021q_inskb", + .data = &sysctl_8021q_inskb, + .maxlen = sizeof(bool), + .mode = 0444, + .proc_handler = proc_dointvec + }, { } }; ___ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
Re: [tcpdump-workers] vlan tagged packets and libpcap breakage
On Thu, Dec 13, 2012 at 6:34 PM, Ani Sinha wrote: > On Thu, Dec 13, 2012 at 12:35 AM, Daniel Borkmann wrote: >> On 12/12/2012 10:53 PM, Ani Sinha wrote: >>>> >>>> unsigned int netdev_8021q_inskb = 1; >>>> >>>> ... >>>> { >>>> .ctl_name = NET_CORE_8021q_INSKB, >>>> .procname = "netdev_8021q_inskb", >>>> .data = &netdev_8021q_inskb, >>>> .maxlen = sizeof(int), >>>> .mode = 0444, >>>> .proc_handler = proc_dointvec >>>> }, >>>> >>>> would seem to do it to me. >>>> Then pcap can fopen("/proc/sys/net/core/netdev_8021q_inskb") and if it >>>> finds it, and it is >0, then do the cmsg thing. >>>> >> >> I think it doesn't. Because then you are obviously considering adding one >> procfs file into /proc/sys/net/core/ *for each* feature that is added into >> the ancillary ops which cannot be the right way ... > > We had already brought up this topic previously in the same thread. A > suggestion was made to add that proc entry and no one from netdev > responded to it saying that it did not make any sense. Therefore > before I went ahead and made the fixes in libpcap, I wanted to run > this by your guys again to make sure we are still on the same page. Well, not everyone on netdev might be following this thread (resp. following fully). The best way to get responses for a patch is to go through the normal patch submission process on netdev, and if you like to request for comments, then mark it as RFC in the subject. This way, people will know and likely comment on it if it makes sense or not. > I do agree that instead of a /proc entry, we should check for a kenrel > version >= X where X is the upstream version that first started > supporting all the features needed by libpcap for vlan filtering. This > is not a compile time check but a run time one. Does anyone see any > issues with this? Is there any long term implications of this, like if > you backport patches to an older long term supported kernel? Are there > other better ways to do this, like may be returning feature bits from > an ioctl call? This is something we need to deal with on a continuous > basis as we keep supporting newer AUX fields and libpcap and other > user land code needs to make use of it. At the same time, they need to > handle backward compatibility issues with older kernels. As Eric mentioned earlier, for now there seems not to be a reliable way to get to know which ops are present and which not. It's not really nice, but if you want to make use of those new (ANC*) features, probably checking kernel version might be the only way if I'm not missing something. Now net-next is closed, but if it reopens, I'll submit a version 2 of my patch where you've been CC'd to. If it gets in, then at least it's for sure that since kernel this kind of feature test is present. ___ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
Re: [tcpdump-workers] vlan tagged packets and libpcap breakage
On Mon, Dec 17, 2012 at 11:35 AM, Guy Harris wrote: > On Dec 17, 2012, at 1:50 AM, "David Laight" wrote: > >> How are you going to tell whether a feature is present in a non-Linux >> kernel ? > > The Linux memory-mapped capture mechanism is not present in a non-Linux > kernel, so all the libpcap work involved here would, if necessary on other > platforms, have to be done differently on those platforms. Those platforms > would have to have their own mechanisms to indicate whether any changes to > filter code, processing of VLAN tags supplied out of band, etc. would need to > be done. > > The same would apply to other additional features of the Linux memory-mapped > capture mechanism that require changes in libpcap. Exactly. ___ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
Re: [tcpdump-workers] [PATCH net 1/2] net: dev_queue_xmit_nit: fix skb->vlan_tci field value
On 01/11/2013 03:37 AM, Paul Pearce wrote: My opinion as a kernel developer is that the network tap is here to have a copy of the exact frame given to the _device_. Agreed. Good: as someone who spends lots of time with tcpdump doing both network and protocol diagnostics, it's really important to see exactly there. If that means turning off some hardware offload in order to get the intact 1p header, then that may be fine for many situations. (At 10G, on a live router... well...) I agree as well. But I think Ani's point was that for RX packets, as of commit bcc6d47903612c3861201cc3a866fb604f26b8b2, the filters are not getting exactly what's "on the wire." Independent of hardware acceleration the vlan headers are being stripped off and skb->vlan_tci is being set. That's was the origin of this whole mess. The msg from that commit reads in part: Vlan untagging happens early in __netif_receive_skb so the rest of code (ptype_all handlers, rx_handlers) see the skb like it was untagged by hw. His confusion (which I share) is why it's acceptable to have this behavior of removing headers and setting skb->vlan_tci (regardless of hardware acceleration) on the RX path but not also set skb->vlan_tci on the TX path. Indepdent of proposed userspace or PACKET_AUXDATA solutions, clarification on the RX skb->vlan_tci behavior would be appreciated. My knowledge of this code is quite limited so it's entirely possible I'm off base here. If so please tell me. While we're at the topic, though it's slightly unrelated this particular problem, but related to capturing VLANs and ``what's seen on the wire'', since it was mentioned. For different NICs/drivers you might get different default behaviours, and mostly it's the case (I assume, correct me if I'm wrong) that libpcap has to ``un-untag'' the VLAN headers in user space, doing a memmove(3) for each stripped VLAN packet in order to ``fix'' this. Because of this hack, I even got a report of a user recently, that in Wireshark, he saw a QinQ header, although it should just have been one VLAN encapsulation (AR8131 driver with ethtool -K eth0 rxvlan off) as he saw with netsniff-ng (no memmove(3) done there). (I didn't further follow or verify this report though.) ___ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
Re: [tcpdump-workers] PROBLEM: Software injected vlan tagged packets are unable to be identified using recent BPF modifications
On 02/15/2013 08:20 AM, Eric W. Biederman wrote: Tue, Jan 08, 2013 at 01:05:39AM CET, pea...@cs.berkeley.edu wrote: Hello folks, PROBLEM: vlan tagged packets that are injected via software are not picked up by filters using recent (kernel commit f3335031b9452baebfe49b8b5e55d3fe0c4677d1) BPF vlan modifications. I suspect this is a problem with the Linux kernel. linux-netdev and tcpdump-workers are both cc'd. BACKGROUND: Kernel commit bcc6d47903612c3861201cc3a866fb604f26b8b2 (Jiri Pirko/David S. Miller) removed vlan headers on rx packets prior to them reaching the packet filters. This broke BPF/libpcap's ability to do kernel-level packet filtering based on vlan tag information (the 'vlan' keyword). Kernel commit f3335031b9452baebfe49b8b5e55d3fe0c4677d1 (Eric Dumazet/David S. Miller, just merged into Linus's tree http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=f3335031b9452baebfe49b8b5e55d3fe0c4677d1) added the ability to use BPF to once again filter based on vlan tags. Related bpf jit commit: http://www.spinics.net/lists/netdev/msg214759.html libpcap (Ani Sinha) recently RFC'd a patch to use Eric/David's BPF modifications to restore vlan filtering to libpcap. http://www.mail-archive.com/tcpdump-workers@lists.tcpdump.org/msg06810.html I'm using this patch and it works. DETAILS: Under these patches vlan tagged packets received from mediam (actual packets from the wire) can be identified based on vlan tag information using the new BPF functionality.This is good. However, raw vlan tagged packets that are *injected* into the interface using libpcap's pcap_inject() (which is just a fancy wrapper for the send() syscall) are not identified by filters using the recent BPF modifications. The bug manifests itself if you attempt to use the new BPF modifications to filter vlan tagged packets on a live interface. All packets from the medium show up, but all injected packets are dropped. Prior to commit bcc6d47 both medium and injected packets could both be identified using BPFs. These injected packets can however still be identified using the previous, now incorrect "offset into the header" technique. Given this, I suspect what's going on is the kernel code path for these injected packets is not setting skb->vlan_tci correctly (at all?). Since the vlan tag is not in the skb data structure the new BPF modifications don't identify the packets as having a vlan tag, despite it being in the packet header. You are right. skb->vlan_tci is not set. Looking at packet_snd() function in net/packet/af_packet.c I guess that something like following patch would be needed: diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index e639645..2238559 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -2292,6 +2292,12 @@ static int packet_snd(struct socket *sock, if (unlikely(extra_len == 4)) skb->no_fcs = 1; + if (skb->protocol == cpu_to_be16(ETH_P_8021Q)) { + skb = vlan_untag(skb); + if (unlikely(!skb)) + goto out_unlock; + } + /* * Now send it */ Thoughts? That sounds about right. I don't know how much NIC drivers care but it looks like af_packet is the only path through the code that can generate a vlan tagged packet that we will transmit where a vlan tagged packet can be generated without setting vlan_tci. So it should make the code safer. Certainly we want this to look to the vlan driver like a proper case of nested vlans and not something weird. But note we need to handle tpacket_snd as well as packet_snd. Isn't this overkill? vlan_untag() in TX path performs internally a memmove(), not sure if its worth the effort, and in the worst case, if your driver doesn't support vlan hw accel, it puts the tag back in via yet another memmove() before transmission via __vlan_put_tag() in dev_hard_start_xmit(). Besides, it also doesn't solve the issue that was stated here, if I'm not mistaken. Isn't the problem, that when you send such a packet while a local packet analyser is running at the same time, that back on the input path vlan_tci is reset to 0 and you won't be able to use the vlan_tci JIT filter if your NIC doesn't have hw accel? This change therefore doesn't help in this situation either. The better solution might be to generate a different BPF code in userland, no? ___ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
[tcpdump-workers] [PATCH libpcap] linktype: add netlink link/dlt type
For pcap interoperability, introduce a common link type for netlink captures. Netlink debugging workflow looks like the following: Setup: modprobe nlmon ip link add type nlmon ip link set nlmon0 up Capture: tcpdump -i nlmon0 ... Teardown: ip link set nlmon0 down ip link del dev nlmon0 rmmod nlmon Signed-off-by: Daniel Borkmann CC: Thomas Graf CC: Tobias Klauser --- pcap-common.c | 7 ++- pcap/bpf.h| 7 ++- 2 files changed, 12 insertions(+), 2 deletions(-) diff --git a/pcap-common.c b/pcap-common.c index 6175a5a..f26d22e 100644 --- a/pcap-common.c +++ b/pcap-common.c @@ -932,7 +932,12 @@ */ #define LINKTYPE_WIRESHARK_UPPER_PDU 252 -#define LINKTYPE_MATCHING_MAX 252 /* highest value in the "matching" range */ +/* + * Link-layer header type for the netlink protocol (nlmon devices). + */ +#define LINKTYPE_NETLINK 253 + +#define LINKTYPE_MATCHING_MAX 253 /* highest value in the "matching" range */ static struct linktype_map { int dlt; diff --git a/pcap/bpf.h b/pcap/bpf.h index ad36eb6..8286ed5 100644 --- a/pcap/bpf.h +++ b/pcap/bpf.h @@ -1224,7 +1224,12 @@ struct bpf_program { */ #define DLT_WIRESHARK_UPPER_PDU252 -#define DLT_MATCHING_MAX 252 /* highest value in the "matching" range */ +/* + * DLT type for the netlink protocol (nlmon devices). + */ +#define DLT_NETLINK253 + +#define DLT_MATCHING_MAX 253 /* highest value in the "matching" range */ /* * DLT and savefile link type values are split into a class and -- 1.7.11.7 ___ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
Re: [tcpdump-workers] [PATCH libpcap] linktype: add netlink link/dlt type
On 07/19/2013 08:23 PM, Guy Harris wrote: On Jul 3, 2013, at 3:49 AM, Daniel Borkmann wrote: For pcap interoperability, introduce a common link type for netlink captures. What do the link-layer headers for this look like? That is struct nlmsghdr, found in include/uapi/linux/netlink.h. Presumably making that work also involves changes to libpcap to support > capturing on nlmon devices (so that DLT_NETLINK is returned for them) and, > if you're not using the -w flag to tcpdump, changes to tcpdump to analyze > DLT_NETLINK packets. Right, for the device type identification, this is being exported as ARPHRD_NETLINK (include/uapi/linux/if_arp.h) in pf_packet's sll's sll_hatype member. I can have a look how libpcap handles this and send a follow-up patch for further inclusion next week if wished. ___ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
[tcpdump-workers] [PATCH libpcap v2] linktype: add netlink link/dlt type
With upcoming Linux 3.11, we have the possibility to debug local netlink traffic [1] i.e. the workflow looks like this: Setup: modprobe nlmon ip link add type nlmon ip link set nlmon0 up Capture: tcpdump -i nlmon0 ... Teardown: ip link set nlmon0 down ip link del dev nlmon0 rmmod nlmon For pcap interoperability, introduce a common link type for netlink captures and map to it for ARPHRD_NETLINK. Signed-off-by: Daniel Borkmann CC: Thomas Graf CC: Tobias Klauser --- v1->v2: - Do ARPHRD to DLT mapping - Rebase on latest Git tree pcap-common.c | 7 ++- pcap-linux.c | 7 +++ pcap/bpf.h| 7 ++- 3 files changed, 19 insertions(+), 2 deletions(-) diff --git a/pcap-common.c b/pcap-common.c index 6175a5a..f26d22e 100644 --- a/pcap-common.c +++ b/pcap-common.c @@ -932,7 +932,12 @@ */ #define LINKTYPE_WIRESHARK_UPPER_PDU 252 -#define LINKTYPE_MATCHING_MAX 252 /* highest value in the "matching" range */ +/* + * Link-layer header type for the netlink protocol (nlmon devices). + */ +#define LINKTYPE_NETLINK 253 + +#define LINKTYPE_MATCHING_MAX 253 /* highest value in the "matching" range */ static struct linktype_map { int dlt; diff --git a/pcap-linux.c b/pcap-linux.c index 7c14821..ee063ae 100644 --- a/pcap-linux.c +++ b/pcap-linux.c @@ -2934,6 +2934,13 @@ static void map_arphrd_to_dlt(pcap_t *handle, int arptype, int cooked_ok) handle->linktype = DLT_IEEE802_15_4_NOFCS; break; +#ifndef ARPHRD_NETLINK +#define ARPHRD_NETLINK 824 +#endif + case ARPHRD_NETLINK: + handle->linktype = DLT_NETLINK; + break; + default: handle->linktype = -1; break; diff --git a/pcap/bpf.h b/pcap/bpf.h index ad36eb6..8286ed5 100644 --- a/pcap/bpf.h +++ b/pcap/bpf.h @@ -1224,7 +1224,12 @@ struct bpf_program { */ #define DLT_WIRESHARK_UPPER_PDU252 -#define DLT_MATCHING_MAX 252 /* highest value in the "matching" range */ +/* + * DLT type for the netlink protocol (nlmon devices). + */ +#define DLT_NETLINK253 + +#define DLT_MATCHING_MAX 253 /* highest value in the "matching" range */ /* * DLT and savefile link type values are split into a class and -- 1.7.11.7 ___ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
Re: [tcpdump-workers] [PATCH libpcap v2] linktype: add netlink link/dlt type
On 08/06/2013 12:29 AM, Guy Harris wrote: On Jul 24, 2013, at 2:26 AM, Daniel Borkmann wrote: With upcoming Linux 3.11, we have the possibility to debug local netlink traffic [1] i.e. the workflow looks like this: Setup: modprobe nlmon ip link add type nlmon ip link set nlmon0 up Capture: tcpdump -i nlmon0 ... Teardown: ip link set nlmon0 down ip link del dev nlmon0 rmmod nlmon For pcap interoperability, introduce a common link type for netlink captures So DLT_NETLINK packets are netlink messages, as described by, for example, section 3.4 "Netlink message format" of: http://1984.lsi.us.es/~pablo/docs/spae.pdf or section 2.2 "Message Format" of http://tools.ietf.org/html/rfc3549 That is correct, i.e. section 2.3.2. "Netlink Message Header" shows the message header format. For new link-layer header types, it should be possible http://www.tcpdump.org/linktypes.html to include them; I'd want to point to one of those sources if possible. ___ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers