[tcpdump-workers] vlan tagged packets and libpcap breakage
hello folks : Before I go into describing the real problem, I'd like to give some background information for the context. Please bear with me. As you guys might be aware, the way the Linux kernel handles vlan tagged packets have changed. Please see the following commit in mainline Torvalds' tree : commit bcc6d47903612c3861201cc3a866fb604f26b8b2 Author: Jiri Pirko Date: Thu Apr 7 19:48:33 2011 + net: vlan: make non-hw-accel rx path similar to hw-accel Now there are 2 paths for rx vlan frames. When rx-vlan-hw-accel is enabled, skb is untagged by NIC, vlan_tci is set and the skb gets into vlan code in __netif_receive_skb - vlan_hwaccel_do_receive. For non-rx-vlan-hw-accel however, tagged skb goes thru whole __netif_receive_skb, it's untagged in ptype_base hander and reinjected This incosistency is fixed by this patch. Vlan untagging happens early in __netif_receive_skb so the rest of code (ptype_all handlers, rx_handlers) see the skb like it was untagged by hw. Signed-off-by: Jiri Pirko v1->v2: remove "inline" from vlan_core.c functions Signed-off-by: David S. Miller As a result of this patch, with or without HW acceleration support in the kernel driver, the vlan tag information is pulled out from the packet and is inserted in the skb metadata. Thereafter, the vlan tag information for a 802.1q packet can be obtained my applications running in the user space by using the auxdata and cmsg structure. More recently, two more patches were committed to net-next that enable the kernel BPF filter to filter packets using their vlan tags : http://www.spinics.net/lists/netdev/msg214758.html http://www.spinics.net/lists/netdev/msg214759.html Now to the real problem. The filter that is currently generated by libpcap (gencode.c) uses offsets into the packet to obtain the vlan tag for a packet and then compare it to the one provided by the user in the filter expression (gen_vlan()). This will no longer work if the tag has been pulled off from the packet itself. One has to use the negative offsets (SKF_AD_VLAN_TAG) as used by the kernel BPF filter to pass down the attribute that we need to compare against. Now here's a bigger problem. How do we avoid breakage in case we run libpcap on top of an older kernel that does not extract the tags from the packet? How do we handle cases of parsing and filtering against captured pcap files that already have the vlan tags reinserted into the packet data? There might be other corner cases that I am missing and not understanding here. Are you guys aware of the problem? Is there any thoughts as to how we may be able to address the problem or not at all? Thanks for the patience, Ani ___ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
Re: [tcpdump-workers] vlan tagged packets and libpcap breakage
cc'ing netdev. On Wed, Oct 31, 2012 at 2:01 PM, Michael Richardson wrote: > > Thanks for this email. > >>>>>> "Ani" == Ani Sinha writes: > Ani> remove "inline" from vlan_core.c functions > Ani> Signed-off-by: David S. Miller > > Ani> As a result of this patch, with or without HW acceleration support in > Ani> the kernel driver, the vlan tag information is pulled out from the > Ani> packet and is inserted in the skb metadata. Thereafter, the vlan tag > Ani> information for a 802.1q packet can be obtained my applications > Ani> running in the user space by using the auxdata and cmsg > Ani> structure. > > Do you think that the existance of this behaviour could be exposed in > sysctl, /proc/net or /sys equivalent (we still don't have /sys/net...)? > As a read only file that had a 0/1 in it? yes, we definitely need a run time check. Whether this could be in the form of a socket option or a /proc entry I don't know. > > Should be one stanza to net/dev/sysctl_net_core, I think. > If we are fast, maybe no kernel will ship without it. > > (An aside, is this auxdata/cmsg info available on UDP sockets too?) > > Ani> More recently, two more patches were committed to net-next that > enable > Ani> the kernel BPF filter to filter packets using their vlan tags : > > Ani> http://www.spinics.net/lists/netdev/msg214758.html > Ani> http://www.spinics.net/lists/netdev/msg214759.html > > okay... > > Ani> Now to the real problem. The filter that is currently generated by > Ani> libpcap (gencode.c) uses offsets into the packet to obtain the vlan > Ani> tag for a packet and then compare it to the one provided by the user > Ani> in the filter expression (gen_vlan()). This will no longer work if > the > Ani> tag has been pulled off from the packet itself. One has to use the > Ani> negative offsets (SKF_AD_VLAN_TAG) as used by the kernel BPF filter > to > Ani> pass down the attribute that we need to compare against. Now here's > a > Ani> bigger problem. How do we avoid breakage in case we run libpcap on > top > Ani> of an older kernel that does not extract the tags from the packet? > How > Ani> do we handle cases of parsing and filtering against captured pcap > Ani> files that already have the vlan tags reinserted into the packet > data? > Ani> There might be other corner cases that I am missing and not > Ani> understanding here. > > AFAIK, We will have to modify pcap-linux to produce different filters. > > Ani> Are you guys aware of the problem? Is there any thoughts as to how we > Ani> may be able to address the problem or not at all? > > Nobody gave us a heads up, no. Glad that I brought this up. Actually Bill Fenner who also works for us helped me a lot in understanding the problem and potentially how it can be addressed. > > It sounds like it's impossible to capture all traffic now, and do vlan > filtering later on? pcap files that already have the tags reinsrted should work with current filter code. However for live traffic, one has to get the tags from CMSG() and then reinsert it back to the packet for the current filter to work. This is slow. ani ___ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
Re: [tcpdump-workers] vlan tagged packets and libpcap breakage
On Wed, Oct 31, 2012 at 3:20 PM, Guy Harris wrote: > > On Oct 31, 2012, at 2:50 PM, Ani Sinha wrote: > >> pcap files that already have the tags reinsrted should work with >> current filter code. However for live traffic, one has to get the tags >> from CMSG() and then reinsert it back to the packet for the current >> filter to work. > > *Somebody* has to do that, at least to packets that pass the filter, yes but if the packet is passed to the filter within libpcap (when we are not using the kernel filter) before the reinsertion, then the filter has to be taught to look into the metadata for tag information and not in the packet itself. before they're handed to a libpcap-based application, for programs that expect to see packets as they arrived from/were transmitted to the wire to work. > > I.e., the tags *should* be reinserted by libpcap, and, as I understand it, > that's what the > > #if defined(HAVE_PACKET_AUXDATA) && > defined(HAVE_LINUX_TPACKET_AUXDATA_TP_VLAN_TCI) > ... > #endif > > blocks of code in pcap-linux.c in libpcap are doing. yes, I agree. ___ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
Re: [tcpdump-workers] vlan tagged packets and libpcap breakage
On Wed, Oct 31, 2012 at 5:50 PM, Guy Harris wrote: > > On Oct 31, 2012, at 3:35 PM, Ani Sinha wrote: > >> yes but if the packet is passed to the filter within libpcap (when we >> are not using the kernel filter) before the reinsertion, > > ...that would be a bug. > > Currently, that bug doesn't exist in the recvfrom() code path, but *does* > appear to exist in the tpacket code path - and that code path also runs the > filter before the SLL header is constructed. That should be fixed. yes, agreed. ___ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
Re: [tcpdump-workers] vlan tagged packets and libpcap breakage
Folks, I'm wondering if there has been any progress on this. Are there any thoughts on what Bill wrote in his email? Thanks ani On Fri, Nov 2, 2012 at 9:13 AM, Bill Fenner wrote: > On Wed, Oct 31, 2012 at 6:20 PM, Guy Harris wrote: > > > > On Oct 31, 2012, at 2:50 PM, Ani Sinha wrote: > > > >> pcap files that already have the tags reinsrted should work with > >> current filter code. However for live traffic, one has to get the tags > >> from CMSG() and then reinsert it back to the packet for the current > >> filter to work. > > > > *Somebody* has to do that, at least to packets that pass the filter, > before they're handed to a libpcap-based application, for programs that > expect to see packets as they arrived from/were transmitted to the wire to > work. > > > > I.e., the tags *should* be reinserted by libpcap, and, as I understand > it, that's what the > > > > #if defined(HAVE_PACKET_AUXDATA) && > defined(HAVE_LINUX_TPACKET_AUXDATA_TP_VLAN_TCI) > > ... > > #endif > > > > blocks of code in pcap-linux.c in libpcap are doing. > > > > Now, if filtering is being done in the *kernel*, and the tags aren't > being reinserted by the kernel, then filter code stuffed into the kernel > would need to differ from filter code run in userland. There's already > precedent for that on Linux, with the "cooked mode" headers; those are > synthesized by libpcap from the metadata returned for PF_PACKET sockets, > and the code that attempts to hand the kernel a filter goes through the > filter code, which was generated under the assumption that the packet > begins with a "cooked mode" header, and modifies (a copy of) the code to, > instead, use the special Linux-BPF-interpreter offsets to access the > metadata. > > > > The right thing to do here would be to, if possible, do the same, so > that the kernel doesn't have to reinsert VLAN tags for packets that aren't > going to be handed to userland. > > In this case, it would be incredibly complicated to do this just > postprocessing a set of bpf instructions. The problem is that when > running the filter in the kernel, the IP header, etc. are not offset, > so "off_macpl" and "off_linktype" would be zero, not 4, while > generating the rest of the expression. We would also have to insert > code when comparing the ethertype to 0x8100 to instead load the > vlan-tagged metadata, so all jumps crossing that point would have to > be adjusted, and if the "if-false" instruction was also testing the > ethertype, then the ethertype would have to be reloaded (again > inserting another instruction). > > Basically, take a look at the output of "tcpdump -d tcp port 22 or > (vlan and tcp port 22)". Are the IPv4 tcp ports at x+14/x+16, or at > x+18/x+20? If we're filtering in the kernel, they're at x+14/x+16 > whether the packet is vlan tagged or not. If we're filtering on the > actual packet contents (from a savefile, for example), they're at > x+18/x+20 if the packet is vlan tagged. > > Also, an expression such as 'tcp port 22' would have to have some > instructions added at the beginning, for "vlan-tagged == false", or it > would match both tagged and untagged packets. > > This would be much more straightforward to deal with in the code > generation phase, except until now the code generation phase hasn't > known whether the filter is headed for the kernel or not. > > Bill > ___ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
Re: [tcpdump-workers] vlan tagged packets and libpcap breakage
Folks, I'm wondering if there has been any progress on this. Are there any thoughts on what Bill wrote in his email? Thanks On Tue, Nov 13, 2012 at 2:41 PM, Ani Sinha wrote: > Folks, > > I'm wondering if there has been any progress on this. Are there any thoughts > on what Bill wrote in his email? > > Thanks > ani > > > > On Fri, Nov 2, 2012 at 9:13 AM, Bill Fenner wrote: >> >> On Wed, Oct 31, 2012 at 6:20 PM, Guy Harris wrote: >> > >> > On Oct 31, 2012, at 2:50 PM, Ani Sinha wrote: >> > >> >> pcap files that already have the tags reinsrted should work with >> >> current filter code. However for live traffic, one has to get the tags >> >> from CMSG() and then reinsert it back to the packet for the current >> >> filter to work. >> > >> > *Somebody* has to do that, at least to packets that pass the filter, >> > before they're handed to a libpcap-based application, for programs that >> > expect to see packets as they arrived from/were transmitted to the wire to >> > work. >> > >> > I.e., the tags *should* be reinserted by libpcap, and, as I understand >> > it, that's what the >> > >> > #if defined(HAVE_PACKET_AUXDATA) && >> > defined(HAVE_LINUX_TPACKET_AUXDATA_TP_VLAN_TCI) >> > ... >> > #endif >> > >> > blocks of code in pcap-linux.c in libpcap are doing. >> > >> > Now, if filtering is being done in the *kernel*, and the tags aren't >> > being reinserted by the kernel, then filter code stuffed into the kernel >> > would need to differ from filter code run in userland. There's already >> > precedent for that on Linux, with the "cooked mode" headers; those are >> > synthesized by libpcap from the metadata returned for PF_PACKET sockets, >> > and >> > the code that attempts to hand the kernel a filter goes through the filter >> > code, which was generated under the assumption that the packet begins with >> > a >> > "cooked mode" header, and modifies (a copy of) the code to, instead, use >> > the >> > special Linux-BPF-interpreter offsets to access the metadata. >> > >> > The right thing to do here would be to, if possible, do the same, so >> > that the kernel doesn't have to reinsert VLAN tags for packets that aren't >> > going to be handed to userland. >> >> In this case, it would be incredibly complicated to do this just >> postprocessing a set of bpf instructions. The problem is that when >> running the filter in the kernel, the IP header, etc. are not offset, >> so "off_macpl" and "off_linktype" would be zero, not 4, while >> generating the rest of the expression. We would also have to insert >> code when comparing the ethertype to 0x8100 to instead load the >> vlan-tagged metadata, so all jumps crossing that point would have to >> be adjusted, and if the "if-false" instruction was also testing the >> ethertype, then the ethertype would have to be reloaded (again >> inserting another instruction). >> >> Basically, take a look at the output of "tcpdump -d tcp port 22 or >> (vlan and tcp port 22)". Are the IPv4 tcp ports at x+14/x+16, or at >> x+18/x+20? If we're filtering in the kernel, they're at x+14/x+16 >> whether the packet is vlan tagged or not. If we're filtering on the >> actual packet contents (from a savefile, for example), they're at >> x+18/x+20 if the packet is vlan tagged. >> >> Also, an expression such as 'tcp port 22' would have to have some >> instructions added at the beginning, for "vlan-tagged == false", or it >> would match both tagged and untagged packets. >> >> This would be much more straightforward to deal with in the code >> generation phase, except until now the code generation phase hasn't >> known whether the filter is headed for the kernel or not. >> >> Bill > > ___ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
Re: [tcpdump-workers] vlan tagged packets and libpcap breakage
On Wed, Oct 31, 2012 at 5:50 PM, Guy Harris wrote: > > On Oct 31, 2012, at 3:35 PM, Ani Sinha wrote: > >> yes but if the packet is passed to the filter within libpcap (when we >> are not using the kernel filter) before the reinsertion, > > ...that would be a bug. > > Currently, that bug doesn't exist in the recvfrom() code path, but *does* > appear to exist in the tpacket code path - and that code path also runs the > filter before the SLL header is constructed. That should be fixed. Something like this? Index: libpcap-1.1.1/pcap-linux.c === --- libpcap-1.1.1.orig/pcap-linux.c +++ libpcap-1.1.1/pcap-linux.c @@ -132,6 +132,7 @@ static const char rcsid[] _U_ = #include #include #include +#include #include #include #include @@ -3469,23 +3476,6 @@ pcap_read_linux_mmap(pcap_t *handle, int return -1; } - /* run filter on received packet - * If the kernel filtering is enabled we need to run the - * filter until all the frames present into the ring - * at filter creation time are processed. - * In such case md.use_bpf is used as a counter for the - * packet we need to filter. - * Note: alternatively it could be possible to stop applying - * the filter when the ring became empty, but it can possibly - * happen a lot later... */ - bp = (unsigned char*)h.raw + tp_mac; - run_bpf = (!handle->md.use_bpf) || - ((handle->md.use_bpf>1) && handle->md.use_bpf--); - if (run_bpf && handle->fcode.bf_insns && - (bpf_filter(handle->fcode.bf_insns, bp, - tp_len, tp_snaplen) == 0)) - goto skip; - /* * Do checks based on packet direction. */ @@ -3582,6 +3576,23 @@ pcap_read_linux_mmap(pcap_t *handle, int } #endif + /* run filter on received packet + * If the kernel filtering is enabled we need to run the + * filter until all the frames present into the ring + * at filter creation time are processed. + * In such case md.use_bpf is used as a counter for the + * packet we need to filter. + * Note: alternatively it could be possible to stop applying + * the filter when the ring became empty, but it can possibly + * happen a lot later... */ + bp = (unsigned char*)h.raw + tp_mac; + run_bpf = (!handle->md.use_bpf) || + ((handle->md.use_bpf>1) && handle->md.use_bpf--); + if (run_bpf && handle->fcode.bf_insns && + (bpf_filter(handle->fcode.bf_insns, bp, + tp_len, tp_snaplen) == 0)) + goto skip; + /* * The only way to tell the kernel to cut off the * packet at a snapshot length is with a filter program; ___ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
Re: [tcpdump-workers] vlan tagged packets and libpcap breakage
On Sat, Nov 17, 2012 at 3:33 PM, Eric W. Biederman wrote: > the vlan header in packets as we receive them. > > The code is correct except for the case of packets in vlan 0. Currently > the packet reconstruction is ambiguous. The most recent kernels have > a TP_STATUS_VLAN_VALID flag that can be checked to see if the packet was > in vlan 0 or if there was no vlan at all. libpcap probably should be > taught how to handle TP_STATUS_VLAN_VALID so that it can get the vlan 0 > handling correct. > May be this? Index: libpcap-1.1.1/pcap-linux.c === --- libpcap-1.1.1.orig/pcap-linux.c +++ libpcap-1.1.1/pcap-linux.c @@ -132,6 +132,7 @@ static const char rcsid[] _U_ = #include #include #include +#include #include #include #include @@ -1486,7 +1487,13 @@ pcap_read_packet(pcap_t *handle, pcap_ha continue; aux = (struct tpacket_auxdata *)CMSG_DATA(cmsg); - if (aux->tp_vlan_tci == 0) +#if defined(TP_STATUS_VLAN_VALID) +if (!(aux->tp_vlan_tci & TP_STATUS_VLAN_VALID)) +#else + if (aux->tp_vlan_tci == 0) /* this is ambigious but without the + TP_STATUS_VLAN_VALID flag, there is + nothing that we can do */ +#endif continue; len = packet_len > iov.iov_len ? iov.iov_len : packet_len; @@ -3565,7 +3555,11 @@ pcap_read_linux_mmap(pcap_t *handle, int } #ifdef HAVE_TPACKET2 - if (handle->md.tp_version == TPACKET_V2 && h.h2->tp_vlan_tci && +#if defined(TP_STATUS_VLAN_VALID) +if (handle->md.tp_version == TPACKET_V2 && (h.h2->tp_vlan_tci & TP_STATUS_VLAN_VALID) && +#else +if (handle->md.tp_version == TPACKET_V2 && h.h2->tp_vlan_tci && +#endif handle->md.vlan_offset != -1 && tp_snaplen >= (unsigned int) handle->md.vlan_offset) { struct vlan_tag *tag; ___ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
Re: [tcpdump-workers] vlan tagged packets and libpcap breakage
On Thu, Dec 6, 2012 at 2:19 PM, Eric W. Biederman wrote: > Ani Sinha writes: > >> On Sat, Nov 17, 2012 at 3:33 PM, Eric W. Biederman >> wrote: >>> the vlan header in packets as we receive them. >>> >>> The code is correct except for the case of packets in vlan 0. Currently >>> the packet reconstruction is ambiguous. The most recent kernels have >>> a TP_STATUS_VLAN_VALID flag that can be checked to see if the packet was >>> in vlan 0 or if there was no vlan at all. libpcap probably should be >>> taught how to handle TP_STATUS_VLAN_VALID so that it can get the vlan 0 >>> handling correct. >>> >> >> May be this? > > Two things. > > - TP_STATUS_VLAN_VALID lives in the tp_status field not the tp_vlan_tci field. > - To work on older kernels with binaries compiled with newer headers you > first want to test for tp_vlan_tci == 0 and then look at the status field > for > TP_STATUS_VALID. yes you are right Eric on both counts. I will resend the patch again in a bit. ani ___ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
Re: [tcpdump-workers] vlan tagged packets and libpcap breakage
On Thu, Dec 6, 2012 at 2:19 PM, Eric W. Biederman wrote: > Ani Sinha writes: >>> >> >> May be this? > > Two things. > > - TP_STATUS_VLAN_VALID lives in the tp_status field not the tp_vlan_tci field. > - To work on older kernels with binaries compiled with newer headers you > first want to test for tp_vlan_tci == 0 and then look at the status field > for > TP_STATUS_VALID. trying again : pcap-linux.c | 50 +++--- 1 files changed, 31 insertions(+), 19 deletions(-) diff --git a/pcap-linux.c b/pcap-linux.c index a42c3ac..8e355d3 100644 --- a/pcap-linux.c +++ b/pcap-linux.c @@ -133,6 +133,7 @@ static const char rcsid[] _U_ = #include #include #include +#include #include #include #include @@ -1543,7 +1544,13 @@ pcap_read_packet(pcap_t *handle, pcap_handler callback, u_char *userdata) continue; aux = (struct tpacket_auxdata *)CMSG_DATA(cmsg); - if (aux->tp_vlan_tci == 0) +#if defined(TP_STATUS_VLAN_VALID) +if ((aux->tp_vlan_tci == 0) || !(aux->tp_status & TP_STATUS_VLAN_VALID)) +#else + if (aux->tp_vlan_tci == 0) /* this is ambigious but without the + TP_STATUS_VLAN_VALID flag, there is + nothing that we can do */ +#endif continue; len = packet_len > iov.iov_len ? iov.iov_len : packet_len; @@ -3936,7 +3926,12 @@ pcap_read_linux_mmap(pcap_t *handle, int max_packets, pcap_handler callback, } #ifdef HAVE_TPACKET2 - if (handle->md.tp_version == TPACKET_V2 && h.h2->tp_vlan_tci && +if ((handle->md.tp_version == TPACKET_V2) && +#if defined(TP_STATUS_VLAN_VALID) +(h.h2->tp_vlan_tci || (h.h2->tp_status & TP_STATUS_VLAN_VALID)) && +#else +h.h2->tp_vlan_tci && +#endif handle->md.vlan_offset != -1 && tp_snaplen >= (unsigned int) handle->md.vlan_offset) { struct vlan_tag *tag; ___ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
Re: [tcpdump-workers] vlan tagged packets and libpcap breakage
> > The patch is whitespace damaged. And one of your test is using || > instead of && > OK, using alpine now :-) > > The test should be && not ||. > aargh! I am retarded! Fixed. Hopefully 3rd time is a charm :-) ani pcap-linux.c | 50 +++--- 1 files changed, 31 insertions(+), 19 deletions(-) diff --git a/pcap-linux.c b/pcap-linux.c index a42c3ac..b2c1a08 100644 --- a/pcap-linux.c +++ b/pcap-linux.c @@ -133,6 +133,7 @@ static const char rcsid[] _U_ = #include #include #include +#include #include #include #include @@ -1543,7 +1544,13 @@ pcap_read_packet(pcap_t *handle, pcap_handler callback, u_char *userdata) continue; aux = (struct tpacket_auxdata *)CMSG_DATA(cmsg); - if (aux->tp_vlan_tci == 0) +#if defined(TP_STATUS_VLAN_VALID) +if ((aux->tp_vlan_tci == 0) && !(aux->tp_status & TP_STATUS_VLAN_VALID)) +#else + if (aux->tp_vlan_tci == 0) /* this is ambigious but without the + TP_STATUS_VLAN_VALID flag, there is + nothing that we can do */ +#endif continue; len = packet_len > iov.iov_len ? iov.iov_len : packet_len; @@ -3936,7 +3926,12 @@ pcap_read_linux_mmap(pcap_t *handle, int max_packets, pcap_handler callback, } #ifdef HAVE_TPACKET2 - if (handle->md.tp_version == TPACKET_V2 && h.h2->tp_vlan_tci && +if ((handle->md.tp_version == TPACKET_V2) && +#if defined(TP_STATUS_VLAN_VALID) +(h.h2->tp_vlan_tci || (h.h2->tp_status & TP_STATUS_VLAN_VALID)) && +#else +h.h2->tp_vlan_tci && +#endif handle->md.vlan_offset != -1 && tp_snaplen >= (unsigned int) handle->md.vlan_offset) { struct vlan_tag *tag; ___ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
Re: [tcpdump-workers] vlan tagged packets and libpcap breakage
> > The patch is whitespace damaged. And one of your test is using || > instead of && OK, using alpine now :-) > > The test should be && not ||. aargh! I am retarded! Fixed. Hopefully 3rd time is a charm :-) ani pcap-linux.c | 50 +++--- 1 files changed, 31 insertions(+), 19 deletions(-) diff --git a/pcap-linux.c b/pcap-linux.c index a42c3ac..b2c1a08 100644 --- a/pcap-linux.c +++ b/pcap-linux.c @@ -133,6 +133,7 @@ static const char rcsid[] _U_ = #include #include #include +#include #include #include #include @@ -1543,7 +1544,13 @@ pcap_read_packet(pcap_t *handle, pcap_handler callback, u_char *userdata) continue; aux = (struct tpacket_auxdata *)CMSG_DATA(cmsg); - if (aux->tp_vlan_tci == 0) +#if defined(TP_STATUS_VLAN_VALID) +if ((aux->tp_vlan_tci == 0) && !(aux->tp_status & TP_STATUS_VLAN_VALID)) +#else + if (aux->tp_vlan_tci == 0) /* this is ambigious but without the + TP_STATUS_VLAN_VALID flag, there is + nothing that we can do */ +#endif continue; len = packet_len > iov.iov_len ? iov.iov_len : packet_len; @@ -3936,7 +3926,12 @@ pcap_read_linux_mmap(pcap_t *handle, int max_packets, pcap_handler callback, } #ifdef HAVE_TPACKET2 - if (handle->md.tp_version == TPACKET_V2 && h.h2->tp_vlan_tci && +if ((handle->md.tp_version == TPACKET_V2) && +#if defined(TP_STATUS_VLAN_VALID) +(h.h2->tp_vlan_tci || (h.h2->tp_status & TP_STATUS_VLAN_VALID)) && +#else +h.h2->tp_vlan_tci && +#endif handle->md.vlan_offset != -1 && tp_snaplen >= (unsigned int) handle->md.vlan_offset) { struct vlan_tag *tag; ___ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
Re: [tcpdump-workers] vlan tagged packets and libpcap breakage
On Thu, 6 Dec 2012, Michael Richardson wrote: > Date: Thu, 6 Dec 2012 17:59:13 > From: Michael Richardson > To: Eric W. Biederman > Cc: a...@aristanetworks.com, tcpdump-workers@lists.tcpdump.org, > net...@vger.kernel.org, Francesco Ruggeri > Subject: Re: [tcpdump-workers] vlan tagged packets and libpcap breakage > > > >>>>> "Eric" == Eric W Biederman writes: > Eric> It is a bit odd that you are indenting with spaces instead of tabs > Eric> in a file that is indented with tab. Again libpcap isn't my code > so I > Eric> don't care but someone else might. Here's an updated patch. I have converted spaces to tabs consistent with rest of the file. I have also set up github and sent a pull request : commit b977ebd9d9608bb8a8e4927e7a6286bdfd6b94a8 Author: Ani Sinha Date: Mon Dec 10 14:58:15 2012 -0800 Linux kernel 3.0 uses TP_STATUS_VLAN_VALID flag in packet auxilliary data aux->tp_status to indicate a packet tagged with a valid vlan ID 0 with another packet that is not vlan tagged. Use this flag to check for the presence of a vlan tagged packet. Also be careful not to cause any breakage when libpcap is compiled against a newer kernel (>=3.0) and used on top of an older kernel that does not use this flag. Signed-off-by: Ani Sinha diff --git a/pcap-linux.c b/pcap-linux.c index a42c3ac..ffcabdf 100644 --- a/pcap-linux.c +++ b/pcap-linux.c @@ -133,6 +133,7 @@ static const char rcsid[] _U_ = #include #include #include +#include #include #include #include @@ -1543,7 +1544,13 @@ pcap_read_packet(pcap_t *handle, pcap_handler callback, u_char *userdata) continue; aux = (struct tpacket_auxdata *)CMSG_DATA(cmsg); - if (aux->tp_vlan_tci == 0) +#if defined(TP_STATUS_VLAN_VALID) + if ((aux->tp_vlan_tci == 0) && !(aux->tp_status & TP_STATUS_VLAN_VALID)) +#else + if (aux->tp_vlan_tci == 0) /* this is ambigious but without the + TP_STATUS_VLAN_VALID flag, there is + nothing that we can do */ +#endif continue; len = packet_len > iov.iov_len ? iov.iov_len : packet_len; @@ -3936,7 +3943,12 @@ pcap_read_linux_mmap(pcap_t *handle, int max_packets, pcap_handler callback, } #ifdef HAVE_TPACKET2 - if (handle->md.tp_version == TPACKET_V2 && h.h2->tp_vlan_tci && + if ((handle->md.tp_version == TPACKET_V2) && +#if defined(TP_STATUS_VLAN_VALID) + (h.h2->tp_vlan_tci || (h.h2->tp_status & TP_STATUS_VLAN_VALID)) && +#else + h.h2->tp_vlan_tci && +#endif handle->md.vlan_offset != -1 && tp_snaplen >= (unsigned int) handle->md.vlan_offset) { struct vlan_tag *tag; ___ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
Re: [tcpdump-workers] vlan tagged packets and libpcap breakage
> > It is possible to test for the presence of support of the new vlan bpf > extensions by attempting to load a filter that uses them. As only valid > filters can be loaded, old kernels that do not support filtering of vlan > tags will fail to load the a test filter with uses them. Unfortunately I do not see this. The sk_chk_filter() does not have a default in the case statement and the check will not detect an unknown instruction. It will fail when the filter is run and as far as I can see, the packet will be dropped. Something like this might help? diff --git a/net/core/filter.c b/net/core/filter.c index c23543c..96338aa 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -548,6 +548,8 @@ int sk_chk_filter(struct sock_filter *filter, unsigned int flen) return -EINVAL; /* Some instructions need special checks */ switch (code) { + /* for unknown instruction, return EINVAL */ + default : return -EINVAL; case BPF_S_ALU_DIV_K: /* check for division by zero */ if (ftest->k == 0) ___ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
Re: [tcpdump-workers] vlan tagged packets and libpcap breakage
On Tue, Dec 11, 2012 at 3:04 PM, Eric Dumazet wrote: > On Tue, 2012-12-11 at 14:36 -0800, Ani Sinha wrote: >> > >> > It is possible to test for the presence of support of the new vlan bpf >> > extensions by attempting to load a filter that uses them. As only valid >> > filters can be loaded, old kernels that do not support filtering of vlan >> > tags will fail to load the a test filter with uses them. >> >> Unfortunately I do not see this. The sk_chk_filter() does not have a >> default in the case statement and the check will not detect an unknown >> instruction. It will fail when the filter is run and as far as I can see, >> the packet will be dropped. Something like this might help? >> >> diff --git a/net/core/filter.c b/net/core/filter.c >> index c23543c..96338aa 100644 >> --- a/net/core/filter.c >> +++ b/net/core/filter.c >> @@ -548,6 +548,8 @@ int sk_chk_filter(struct sock_filter *filter, unsigned >> int flen) >> return -EINVAL; >> /* Some instructions need special checks */ >> switch (code) { >> + /* for unknown instruction, return EINVAL */ >> + default : return -EINVAL; >> case BPF_S_ALU_DIV_K: >> /* check for division by zero */ >> if (ftest->k == 0) > > This patch is wrong. yes I generated this patch wrong. > > Check lines 546, 547, 548 where we do the check for unknown instructions > > code = codes[code]; > if (!code) > return -EINVAL; yepph it's OK here. > > If you want to test ANCILLARY possible values, its already too late, as > old kernels wont use any patch anyway. yepph, I was looking at possible ancilliary values. Basically this case statement : #define ANCILLARY(CODE) case SKF_AD_OFF + SKF_AD_##CODE:\ code = BPF_S_ANC_##CODE;\ break switch (ftest->k) { ANCILLARY(PROTOCOL); ANCILLARY(PKTTYPE); ANCILLARY(IFINDEX); ANCILLARY(NLATTR); ANCILLARY(NLATTR_NEST); ANCILLARY(MARK); ANCILLARY(QUEUE); ANCILLARY(HATYPE); ANCILLARY(RXHASH); ANCILLARY(CPU); ANCILLARY(ALU_XOR_X); ANCILLARY(VLAN_TAG); ANCILLARY(VLAN_TAG_PRESENT); } ___ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
Re: [tcpdump-workers] vlan tagged packets and libpcap breakage
On Tue, Dec 11, 2012 at 3:04 PM, Eric Dumazet wrote: > On Tue, 2012-12-11 at 14:36 -0800, Ani Sinha wrote: >> > >> > It is possible to test for the presence of support of the new vlan bpf > If you want to test ANCILLARY possible values, its already too late, as > old kernels wont use any patch anyway. > So basically this means that if we generate a filter with these special negative offset values and expect that the kernel will complain if it does not recognize the newer values then we would be wrong. And you are right. Old kernels never knew about them and the code wasn't written in a way to return EINVAL if it didn't recognize a special negative anciliary offset value. ___ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
Re: [tcpdump-workers] vlan tagged packets and libpcap breakage
> > unsigned int netdev_8021q_inskb = 1; > > ... > { > .ctl_name = NET_CORE_8021q_INSKB, > .procname = "netdev_8021q_inskb", > .data = &netdev_8021q_inskb, > .maxlen = sizeof(int), > .mode = 0444, > .proc_handler = proc_dointvec > }, > > would seem to do it to me. > Then pcap can fopen("/proc/sys/net/core/netdev_8021q_inskb") and if it > finds it, and it is >0, then do the cmsg thing. > Does this work? This is just an experimental patch and by no means final. I just want to have an idea what everyone thought about it. Once we debate and discusss, I can cook up a final patch that would be worth commiting. Also instead of having this /proc interface, we can perhaps check for a specific kernel version that : (a) has the vlan tag info in the skb metadata (as opposed to in the packet itself) (b) has the following patch that adds the capability to generate a filter based on the tag value : commit f3335031b9452baebfe49b8b5e55d3fe0c4677d1 Author: Eric Dumazet Date: Sat Oct 27 02:26:17 2012 + net: filter: add vlan tag access WE need both of the above two things for the userland to generate a filter code that compares vlan tag values in the skb metadata. For kernels that has the vlan tag in the skb metadata but does not have the above commit (b), there is nothing that can be done. For older kernels that had the vlan tag info in the packet itself, the filter code can be generated differently to look at specific offsets within the packet (something that libpcap does currently). We have already ruled out the idea of generating a filter and trying to load and see if that fails (see previous emails on this thread). Hope this makes sense. diff --git a/include/linux/filter.h b/include/linux/filter.h index c45eabc..91e2ba3 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -36,6 +36,7 @@ static inline unsigned int sk_filter_len(const struct sk_filter *fp) return fp->len * sizeof(struct sock_filter) + sizeof(*fp); } +extern bool sysctl_8021q_inskb; extern int sk_filter(struct sock *sk, struct sk_buff *skb); extern unsigned int sk_run_filter(const struct sk_buff *skb, const struct sock_filter *filter); diff --git a/net/core/filter.c b/net/core/filter.c index c23543c..4f5a657 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -41,6 +41,8 @@ #include #include +bool sysctl_8021q_inskb = 1; + /* No hurry in this branch * * Exported for the bpf jit load helper. diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c index d1b0804..f9a3700 100644 --- a/net/core/sysctl_net_core.c +++ b/net/core/sysctl_net_core.c @@ -15,6 +15,7 @@ #include #include #include +#include #include #include @@ -189,6 +190,13 @@ static struct ctl_table net_core_table[] = { .mode = 0644, .proc_handler = proc_dointvec }, + { + .procname = "8021q_inskb", + .data = &sysctl_8021q_inskb, + .maxlen = sizeof(bool), + .mode = 0444, + .proc_handler = proc_dointvec + }, { } }; ___ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
Re: [tcpdump-workers] vlan tagged packets and libpcap breakage
+ Eric B. On Wed, Dec 12, 2012 at 1:53 PM, Ani Sinha wrote: > >> >> unsigned int netdev_8021q_inskb = 1; >> >> ... >> { >> .ctl_name = NET_CORE_8021q_INSKB, >> .procname = "netdev_8021q_inskb", >> .data = &netdev_8021q_inskb, >> .maxlen = sizeof(int), >> .mode = 0444, >> .proc_handler = proc_dointvec >> }, >> >> would seem to do it to me. >> Then pcap can fopen("/proc/sys/net/core/netdev_8021q_inskb") and if it >> finds it, and it is >0, then do the cmsg thing. >> > > Does this work? This is just an experimental patch and by no means final. > I just want to have an idea what everyone thought about it. Once we debate > and discusss, I can cook up a final patch that would be worth commiting. > > Also instead of having this /proc interface, we can perhaps check for a > specific > kernel version that : > > (a) has the vlan tag info in the skb metadata (as opposed to in the packet > itself) > (b) has the following patch that adds the capability to generate a filter > based on the tag value : > > commit f3335031b9452baebfe49b8b5e55d3fe0c4677d1 > Author: Eric Dumazet > Date: Sat Oct 27 02:26:17 2012 + > > net: filter: add vlan tag access > > WE need both of the above two things for the userland to generate a filter > code that compares vlan tag values in the skb metadata. For kernels that > has the vlan tag in > the skb metadata but does not have the above commit (b), there is nothing > that can be done. For older kernels that had the vlan tag info in the > packet itself, the filter code can be generated differently to look at > specific offsets within the packet (something that libpcap does > currently). > > We have already ruled out the idea of generating a filter and trying to > load and see if that fails (see previous emails on this thread). > > Hope this makes sense. > > > diff --git a/include/linux/filter.h b/include/linux/filter.h > index c45eabc..91e2ba3 100644 > --- a/include/linux/filter.h > +++ b/include/linux/filter.h > @@ -36,6 +36,7 @@ static inline unsigned int sk_filter_len(const struct > sk_filter *fp) > return fp->len * sizeof(struct sock_filter) + sizeof(*fp); > } > > +extern bool sysctl_8021q_inskb; > extern int sk_filter(struct sock *sk, struct sk_buff *skb); > extern unsigned int sk_run_filter(const struct sk_buff *skb, > const struct sock_filter *filter); > diff --git a/net/core/filter.c b/net/core/filter.c > index c23543c..4f5a657 100644 > --- a/net/core/filter.c > +++ b/net/core/filter.c > @@ -41,6 +41,8 @@ > #include > #include > > +bool sysctl_8021q_inskb = 1; > + > /* No hurry in this branch > * > * Exported for the bpf jit load helper. > diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c > index d1b0804..f9a3700 100644 > --- a/net/core/sysctl_net_core.c > +++ b/net/core/sysctl_net_core.c > @@ -15,6 +15,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -189,6 +190,13 @@ static struct ctl_table net_core_table[] = { > .mode = 0644, > .proc_handler = proc_dointvec > }, > + { > + .procname = "8021q_inskb", > + .data = &sysctl_8021q_inskb, > + .maxlen = sizeof(bool), > + .mode = 0444, > + .proc_handler = proc_dointvec > + }, > { } > }; > ___ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
Re: [tcpdump-workers] vlan tagged packets and libpcap breakage
On Thu, Dec 13, 2012 at 12:35 AM, Daniel Borkmann wrote: > On 12/12/2012 10:53 PM, Ani Sinha wrote: >>> >>> unsigned int netdev_8021q_inskb = 1; >>> >>> ... >>> { >>> .ctl_name = NET_CORE_8021q_INSKB, >>> .procname = "netdev_8021q_inskb", >>> .data = &netdev_8021q_inskb, >>> .maxlen = sizeof(int), >>> .mode = 0444, >>> .proc_handler = proc_dointvec >>> }, >>> >>> would seem to do it to me. >>> Then pcap can fopen("/proc/sys/net/core/netdev_8021q_inskb") and if it >>> finds it, and it is >0, then do the cmsg thing. >>> >> > > I think it doesn't. Because then you are obviously considering adding one > procfs file into /proc/sys/net/core/ *for each* feature that is added into > the ancillary ops which cannot be the right way ... We had already brought up this topic previously in the same thread. A suggestion was made to add that proc entry and no one from netdev responded to it saying that it did not make any sense. Therefore before I went ahead and made the fixes in libpcap, I wanted to run this by your guys again to make sure we are still on the same page. I do agree that instead of a /proc entry, we should check for a kenrel version >= X where X is the upstream version that first started supporting all the features needed by libpcap for vlan filtering. This is not a compile time check but a run time one. Does anyone see any issues with this? Is there any long term implications of this, like if you backport patches to an older long term supported kernel? Are there other better ways to do this, like may be returning feature bits from an ioctl call? This is something we need to deal with on a continuous basis as we keep supporting newer AUX fields and libpcap and other user land code needs to make use of it. At the same time, they need to handle backward compatibility issues with older kernels. Thanks ___ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
Re: [tcpdump-workers] vlan tagged packets and libpcap breakage
On Thu, Dec 13, 2012 at 1:49 PM, Daniel Borkmann wrote: > Well, not everyone on netdev might be following this thread (resp. > following fully). The best way to get responses for a patch is to go > through the normal patch submission process on netdev, and if you like > to request for comments, then mark it as RFC in the subject. This way, > people will know and likely comment on it if it makes sense or not. OK good to know. > As Eric mentioned earlier, for now there seems not to be a reliable > way to get to know which ops are present and which not. It's not > really nice, but if you want to make use of those new (ANC*) features, > probably checking kernel version might be the only way if I'm not > missing something. Now net-next is closed, but if it reopens, I'll > submit a version 2 of my patch where you've been CC'd to. If it gets > in, then at least it's for sure that since kernel this kind of > feature test is present. thanks, yes, I believe we do need some sort of validation on the ancilliary features. ___ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
Re: [tcpdump-workers] vlan tagged packets and libpcap breakage
On Mon, Dec 17, 2012 at 2:35 AM, Guy Harris wrote: > > On Dec 17, 2012, at 1:50 AM, "David Laight" wrote: > >> How are you going to tell whether a feature is present in a non-Linux >> kernel ? > > The Linux memory-mapped capture mechanism is not present in a non-Linux > kernel, so all the libpcap work involved here would, if necessary on other > platforms, have to be done >differently on those platforms. Those platforms > would have to have their own mechanisms to indicate whether any changes to > filter code, processing of VLAN tags supplied out of >band, etc. would need > to be done. Actually lib-pcap has these pcap-.c files that are kind of like platform specific drivers that plug into platform independent code like gencode.c or bpf_filter.c. These platform specific drivers are responsible for getting packets from the kernel and running filters (kernel or userland) on it. So all linux specific code to get a packet and packet metadata from the kernel can neatly reside in pcap-linux.c. Unfortunately though, in this specific problem involving filtering with vlan tags, both code generation (gentags.c) and code running the filter (bpf_filter.c) will have to be aware of linux specific semantics. Due to the issues that Bill had explained earlier in the thread, we can not rely on post processing before installing the kernel filter. Therefore, we need to generate a filter that can be directly installed in the kernel. For the same reason, bpf_filter() code also needs to change - be aware of linux specific semantics. ___ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
[tcpdump-workers] [PATCH][RFC] fix BPF filter for VLAN tagged packets
Hi folks : This is the first cut of my patch that tries to fix the BPF filter code generation issue. Please note that this is the first time I have hacked libpcap, so I might be off in terms of code philosophy/organization of code etc. This change has been tested by me and independently by another grad student at UC Berkeley (CC'd here) although none of us have yet done a very thorough testing of all the aspects. I am sending it out to the community so that others too can give it a try and report fixes. I will also give it more testing in the coming weeks. The description below summarizes the patch. I'd appreciate all the comments that I can get on this issue. Thanks, ani Fix BPF filter for VLAN tagged packets - Linux kernel no longer puts (outermost) vlan tags within the packet but in packet metadata. Use special ANC negative offsets when generating the filter code to check if it's a vlan tagged packet and whether the vlan ID matches that of the one specified in the filter expression. - When executing the filter code in userland, take care of these special ANC values - use the vlan tag extracted from the packet metadata for comparison. - For saving the captured packets, re-insert the vlan tags back into the packet. That way, both old and new pcap files will have vlan tags inserted into the packet itself in the pcap files. The behaviour would not change between old and new kernels. - For older Linux kernels that had the vlan tags as a part of the packet and those kernels that did not have the special ANC opcodes for vlan tag access, use the old mechanism - generate filter code that looks into the packet offset for vlan tags. - For all platforms other than Linux, do not change the current behaviour. What this patch does not handle : When 'vlan' keyword is not specified in the filter expression, the filter code generator does not by default add a 'not filter' expresion so that only untagged packets pass the filter. So for example, an expression like 'tcp port 80' will also include vlan tagged packets. We need to handle this in a future patch. Signed-off-by: Ani Sinha diff --git a/Makefile.in b/Makefile.in index 772cc7d..c030a7e 100644 --- a/Makefile.in +++ b/Makefile.in @@ -82,7 +82,7 @@ YACC = @V_YACC@ @rm -f $@ $(CC) $(FULL_CFLAGS) -c $(srcdir)/$*.c -PSRC = pcap-@V_PCAP@.c @USB_SRC@ @BT_SRC@ @CAN_SRC@ @NETFILTER_SRC@ @CANUSB_SRC@ +PSRC = pcap-@V_PCAP@.c @USB_SRC@ @BT_SRC@ @CAN_SRC@ @NETFILTER_SRC@ @CANUSB_SRC@ @BPF_FILTER_P@ FSRC = fad-@V_FINDALLDEVS@.c SSRC = @SSRC@ CSRC = pcap.c inet.c gencode.c optimize.c nametoaddr.c etherent.c \ @@ -301,6 +301,7 @@ EXTRA_DIST = \ pcap-int.h \ pcap-libdlpi.c \ pcap-linux.c \ + bpf_filter_linux.c \ pcap-namedb.h \ pcap-netfilter-linux.c \ pcap-netfilter-linux.h \ diff --git a/bpf_filter_linux.c b/bpf_filter_linux.c new file mode 100644 index 000..1d5d70f --- /dev/null +++ b/bpf_filter_linux.c @@ -0,0 +1,571 @@ +/*- + * Copyright (c) 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997 + * The Regents of the University of California. All rights reserved. + * + * This code is derived from the Stanford/CMU enet packet filter, + * (net/enet.c) distributed as part of 4.3BSD, and code contributed + * to Berkeley by Steven McCanne and Van Jacobson both of Lawrence + * Berkeley Laboratory. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + *notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + *notice, this list of conditions and the following disclaimer in the + *documentation and/or other materials provided with the distribution. + * 3. All advertising materials mentioning features or use of this software + *must display the following acknowledgement: + * This product includes software developed by the University of + * California, Berkeley and its contributors. + * 4. Neither the name of the University nor the names of its contributors + *may be used to endorse or promote products derived from this software + *without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND
Re: [tcpdump-workers] PROBLEM: Software injected vlan tagged packets are unable to be identified using recent BPF modifications
On Mon, Jan 7, 2013 at 4:05 PM, Paul Pearce wrote: > Hello folks, > > PROBLEM: > > vlan tagged packets that are injected via software are not picked up > by filters using recent (kernel commit > f3335031b9452baebfe49b8b5e55d3fe0c4677d1) > BPF vlan modifications. I suspect this is a problem with the Linux > kernel. > > linux-netdev and tcpdump-workers are both cc'd. > Just to be clear, up until now we did not see this issue because the BPF filter code generated by libpcap would always look into packet offsets for vlan tag information. With the patch that I submitted to tcpdump-workers a day ago, it no longer looks into the packet but in the skb meta data (which is the right thing to do going forward). This breaks raw packets. We will have to handle this in the kernel to fix it. ani ___ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
Re: [tcpdump-workers] [PATCH net 1/2] net: dev_queue_xmit_nit: fix skb->vlan_tci field value
+tcpdump-workers On Tue, Jan 8, 2013 at 10:27 PM, Eric Dumazet wrote: > On Tue, 2013-01-08 at 22:06 -0800, Ani Sinha wrote: > >> The proposed patch tries to fix the issue that arose after the >> following commit : >> >> commit b40863c667c16b7a73d4f034a8eab67029b5b15a >> Author: Eric Dumazet >> Date: Tue Sep 18 20:44:49 2012 + >> >> net: more accurate network taps in transmit path >> >> >> I do not believe 3.6.11 kernel has this change. 3.6.11 should not need >> the patch. > > Thats irrelevant. This only shows that user land was depending on a > prior undocumented behavior. > > It seems a libpcap issue to me. Kernel side provides all needed bits. > > When I want "tcpdump src port 2030", filter is : > > (000) ldh [12] > (001) jeq #0x86dd jt 2jf 8 > (002) ldb [20] > (003) jeq #0x84jt 6jf 4 > (004) jeq #0x6 jt 6jf 5 > (005) jeq #0x11jt 6jf 19 > (006) ldh [54] > (007) jeq #0x7ee jt 18 jf 19 > (008) jeq #0x800 jt 9jf 19 > (009) ldb [23] > (010) jeq #0x84jt 13 jf 11 > (011) jeq #0x6 jt 13 jf 12 > (012) jeq #0x11jt 13 jf 19 > (013) ldh [20] > (014) jset #0x1fff jt 19 jf 15 > (015) ldxb 4*([14]&0xf) > (016) ldh [x + 14] > (017) jeq #0x7ee jt 18 jf 19 > (018) ret #96 > (019) ret #0 > > See how it handles both IPv4 and IPv6, and various protocols > automatically ? > > If I only wanted "udp and src port 2030" it would give : > > (000) ldh [12] > (001) jeq #0x86dd jt 2jf 6 > (002) ldb [20] > (003) jeq #0x11jt 4jf 15 > (004) ldh [54] > (005) jeq #0x7ee jt 14 jf 15 > (006) jeq #0x800 jt 7jf 15 > (007) ldb [23] > (008) jeq #0x11jt 9jf 15 > (009) ldh [20] > (010) jset #0x1fff jt 15 jf 11 > (011) ldxb 4*([14]&0xf) > (012) ldh [x + 14] > (013) jeq #0x7ee jt 14 jf 15 > (014) ret #96 > (015) ret #0 > > > > So when I want "tcpdump vlan 100" it generates : > > (000) ldh [12] > (001) jeq #0x8100 jt 2jf 6 > (002) ldh [14] > (003) and #0xfff > (004) jeq #0x64jt 5jf 6 > (005) ret #96 > (006) ret #0 > > What's wrong instructing libpcap to extend the filter to be able to > get the correct result, vlan id being in skb->vlan_id (vlan accel on), > or in the packet itself (vlan accel off) > > This way, you could chose if you want to get only accelerated vlan, > or non accelerated vlan, or both. And you need no kernel hacking. > > > ___ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
Re: [tcpdump-workers] [PATCH net 1/2] net: dev_queue_xmit_nit: fix skb->vlan_tci field value
>> Thats irrelevant. This only shows that user land was depending on a >> prior undocumented behavior. Why do you say that? The following patch from Pirko ensured that on both RX and TX regardless whether the driver/hw supported vlan acceleration, the outermost vlan tags will always be extracted out of the packet and put in skb aux data : commit bcc6d47903612c3861201cc3a866fb604f26b8b2 Author: Jiri Pirko Date: Thu Apr 7 19:48:33 2011 + net: vlan: make non-hw-accel rx path similar to hw-accel Now this meant that the filter code should always look into the aux data for vlan tagging, not in the packet, regardless of hw acceleration availability. Your patch b40863c667c16b7a73d4f034a8eab67029b5b15a broke this symmetric semantics - now on TX on the network tap, we do not have the vlan tags in the skb aux data. In my opinion, for a given kernel, the filter code should either look into the packet offset or in the packet aux data for vlan tags but not both. Otherwise the filter code becomes incredibly complex since an inline vlan tag in the packet changes offsets of all headers coming afterwords and I don't even know if filter code can be correctly generated in this case. tcpdump-workers folks are CC's here and they clearly have more experience with libpcap filter code that I do. Hence I leave it up to them to provide inputs here. >> What's wrong instructing libpcap to extend the filter to be able to >> get the correct result, vlan id being in skb->vlan_id (vlan accel on), >> or in the packet itself (vlan accel off) >> >> This way, you could chose if you want to get only accelerated vlan, >> or non accelerated vlan, or both. And you need no kernel hacking. This is wrong. Accelerated or not, the kernel code was organized to have the tags in the packet aux data. So I think this is how user land should be coded as well. ani ___ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
Re: [tcpdump-workers] [PATCH net 1/2] net: dev_queue_xmit_nit: fix skb->vlan_tci field value
On Wed, Jan 9, 2013 at 11:51 AM, Eric Dumazet wrote: > On Wed, 2013-01-09 at 11:27 -0800, Ani Sinha wrote: > >> This is wrong. Accelerated or not, the kernel code was organized to >> have the tags in the packet aux data. So I think this is how user land >> should be coded as well. > > You have your opinion, thats good. > > My opinion as a kernel developer is that the network tap is here to have > a copy of the exact frame given to the _device_. > It is fine by me if that is how you see it. In that case. the behaviour can me made symmetric on both TX and RX. Tap processing in __netif_receive_skb() can be done before vlan_untag() so that taps see the exact frame received from the _device_ as you put it. ani ___ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
Re: [tcpdump-workers] [PATCH net 1/2] net: dev_queue_xmit_nit: fix skb->vlan_tci field value
On Wed, Jan 9, 2013 at 12:01 PM, Ani Sinha wrote: > On Wed, Jan 9, 2013 at 11:51 AM, Eric Dumazet wrote: >> On Wed, 2013-01-09 at 11:27 -0800, Ani Sinha wrote: >> >>> This is wrong. Accelerated or not, the kernel code was organized to >>> have the tags in the packet aux data. So I think this is how user land >>> should be coded as well. >> >> You have your opinion, thats good. >> >> My opinion as a kernel developer is that the network tap is here to have >> a copy of the exact frame given to the _device_. >> > > It is fine by me if that is how you see it. In that case. the > behaviour can me made symmetric on both TX and RX. Tap processing in > __netif_receive_skb() can be done before vlan_untag() so that taps see > the exact frame received from the _device_ as you put it. Although for accelerated vlan tags, it will be in the meta data anyways. All I am asking is, let's have the same behaviour on both TX and RX. If the tag in the packet let's have it that way in both ways in what the tap sees. ___ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
[tcpdump-workers] "not vlan" filter expression broken catastrophically!
hello folks : As you guys have been aware, I am hacking libpcap for a while. Me and Bill noticed something seriously broken for any filter expression that has a "not vlan" in it. For example, take a look at the filter code generated by libpcap with an expression like "not vlan and tcp port 80" : BpfExpression '(not vlan and tcp port 80)' { 0x28, 0, 0, 0x000c }, //(000) ldh [12] { 0x15, 19, 0, 0x8100 }, //(001) jeq #0x8100 jt 21 jf 2 { 0x28, 0, 0, 0x0010 }, //(002) ldh [16] { 0x15, 0, 6, 0x86dd }, //(003) jeq #0x86dd jt 4 jf 10 { 0x30, 0, 0, 0x0018 }, //(004) ldb [24] { 0x15, 0, 15, 0x0006 }, //(005) jeq #0x6jt 6 jf 21 { 0x28, 0, 0, 0x003a }, //(006) ldh [58] { 0x15, 12, 0, 0x0050 }, //(007) jeq #0x50 jt 20 jf 8 { 0x28, 0, 0, 0x003c }, //(008) ldh [60] { 0x15, 10, 11, 0x0050 }, //(009) jeq #0x50 jt 20 jf 21 { 0x15, 0, 10, 0x0800 }, //(010) jeq #0x800 jt 11 jf 21 { 0x30, 0, 0, 0x001b }, //(011) ldb [27] { 0x15, 0, 8, 0x0006 }, //(012) jeq #0x6jt 13 jf 21 { 0x28, 0, 0, 0x0018 }, //(013) ldh [24] { 0x45, 6, 0, 0x1fff }, //(014) jset #0x1fff jt 21 jf 15 { 0xb1, 0, 0, 0x0012 }, //(015) ldxb 4*([18]&0xf) { 0x48, 0, 0, 0x0012 }, //(016) ldh [x + 18] { 0x15, 2, 0, 0x0050 }, //(017) jeq #0x50 jt 20 jf 18 { 0x48, 0, 0, 0x0014 }, //(018) ldh [x + 20] { 0x15, 0, 1, 0x0050 }, //(019) jeq #0x50 jt 20 jf 21 { 0x6, 0, 0, 0x }, //(020) ret #65535 { 0x6, 0, 0, 0x }, //(021) ret #0 As you can see, it loads offset 12 (ethertype). For vlan packets, it jumps to #21 and returns false right away. However, for packets that are not vlan tagged, it goes to #2 which loads offset 16 in the packet. Notice that this is wrong! The offsets should be incremented by 4 only for vlan tagged packets and not for non-vlan packets. The problem is that in gencode.c, the off_linktype increments by 4 unconditionally whether or not the packet actually contains a vlan tag. We do not want to increment this offset if "not vlan" is true. So the above filter code is generated wrong. I just wanted to point this out to folks who wishes to dig in and fix it. I do not have time right now to think of a proper solution. It would seem using unconditional increments of offsets like off_linktype below the parser is not going to work. How do you know if the parser is going to take your code generated from the "vlan" expression and just negate it? Or may be we can hack another rule in grammar.y. I don't know. cheers, ani ___ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
Re: [tcpdump-workers] "not vlan" filter expression broken catastrophically!
In this case, I have used a tool written by Bill Fenner called BpfExpression for Arista. Not sure if the code is GNU licensed. He's in the mailing list and I will let him comment. ani On Fri, Feb 1, 2013 at 5:07 PM, Michael Richardson wrote: > >>>>>> "Ani" == Ani Sinha writes: > Ani> hello folks : > > Ani> As you guys have been aware, I am hacking libpcap for a > Ani> while. Me and Bill noticed something seriously broken for any > Ani> filter expression that has a "not vlan" in it. For example, > Ani> take a look at the filter code generated by libpcap with an > Ani> expression like "not vlan and tcp port 80" : > > Ani> BpfExpression '(not vlan and tcp port 80)' { 0x28, 0, 0, > > Do we have any way to test libpcap expression outputs other than -d > options to tcpdump? I'm thinking regression tests here. > > -- > ] Never tell me the odds! | ipv6 mesh networks [ > ] Michael Richardson, Sandelman Software Works| network architect [ > ] m...@sandelman.ca http://www.sandelman.ca/| ruby on rails > [ > > ___ > tcpdump-workers mailing list > tcpdump-workers@lists.tcpdump.org > https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers ___ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
Re: [tcpdump-workers] "not vlan" filter expression broken catastrophically!
On Fri, Feb 1, 2013 at 1:13 PM, Bill Fenner wrote: > On Fri, Feb 1, 2013 at 3:50 PM, Paul Pearce wrote: > > That was my proposal to Ani, since the kernel guys seemed to insist > that asymmetry was a virtue. > yes, I think ultimately we will have to do something like what Bill suggested to me. I am still not an expert with that filter generation code, so I am trying to handle things in steps, keeping in mind my company's immediate needs. Too many stuff is broken :-) The funny thing is, up until that commit in last October, the kernel was symmetric in how it handled vlan tags :-) ani ___ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
Re: [tcpdump-workers] "not vlan" filter expression brokencatastrophically!
On Mon, Feb 4, 2013 at 1:29 AM, David Laight wrote: >> I agree. Honestly I think a perfectly reasonable stance to take is >> requesting that the filters get packets *as seen on the wire/nic*. I >> think that's the mental model everyone uses, and any deviation from >> that model is prone to bugs in the kernel, libpcap, and for the enduser. > > TX and RX segmentation offload also confuse matters here. > I think Linux can give libpcap large TCP fragments even when the > hardware isn't doing segmentation offload. > This also breaks the mental model. > It would be nice if someone can initiate a discussion/debate on netdev on the points that we are raising here. I tried one time with vlan issues and it met with lot of resistance. May be another person will have better success than me :-) ___ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers