[tcpdump-workers] [libpcap] OR'ing vlans impossible in tcpdump filter (issue #158)
Hi Everyone, I'm happy to join the mailing list. There is a prolonged issue with libpcap and vlan filtering, explained in this ticket: https://github.com/the-tcpdump-group/libpcap/issues/158 In short, filters containing ORs and one or more "VLAN" keywords, behave unexpectedly. This is explained very well in the comment in gencode.c:7857: /* * Check for a VLAN packet, and then change the offsets to point * to the type and data fields within the VLAN packet. Just * increment the offsets, so that we can support a hierarchy, e.g. * "vlan 300 && vlan 200" to capture VLAN 200 encapsulated within * VLAN 100. * * XXX - this is a bit of a kludge. If we were to split the * compiler into a parser that parses an expression and * generates an expression tree, and a code generator that * takes an expression tree (which could come from our * parser or from some other parser) and generates BPF code, * we could perhaps make the offsets parameters of routines * and, in the handler for an "AND" node, pass to subnodes * other than the VLAN node the adjusted offsets. * * This would mean that "vlan" would, instead of changing the * behavior of *all* tests after it, change only the behavior * of tests ANDed with it. That would change the documented * semantics of "vlan", which might break some expressions. * However, it would mean that "(vlan and ip) or ip" would check * both for VLAN-encapsulated IP and IP-over-Ethernet, rather than * checking only for VLAN-encapsulated IP, so that could still * be considered worth doing; it wouldn't break expressions * that are of the form "vlan and ..." or "vlan N and ...", * which I suspect are the most common expressions involving * "vlan". "vlan or ..." doesn't necessarily do what the user * would really want, now, as all the "or ..." tests would * be done assuming a VLAN, even though the "or" could be viewed * as meaning "or, if this isn't a VLAN packet...". */ This comment, commited by <https://github.com/yuguy> @yuguy in 2005 explains this issue very well. yacc parsers the bpf from left to right without saving the state, and doesn't provide a tree of some kind, which would allow an easy solution. <https://github.com/yuguy> @yuguy says that OR'ing vlans in the current parsing methodology is impossible. But there might be a solution, if GCC used yacc in previous version to parse C code, a state *can* be saved. We simply want yacc to parse parenthesis, and using them to increment the offset, and with each 'OR' it encounters, resetting the offset to its last state. Let me explain: tcpdump -d 'vlan and (vlan or arp) or ip' would mean: 1. filter vlan with the current offset (0) and increment offset ( = 4) 2. open parenthesis. push the offset in a stack 3. filter vlan with the current offset (0) and increment offset ( = 8) 4. or. reset the offset to it's state in the last parenthesis from the offset stack ( = 4) 5. filter arp with the current offset (4) 6. close parenthesis. pop the offset's state 7. or. reset the offset to it's state in the last parenthesis from the offset stack ( = 0) 8. filter ip with the current offset (0) As it seems to me, this will solve the issue, and would allow OR'ing vlans. What do you say? Thanks in advance, Shoham Peller ___ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
Re: [tcpdump-workers] [libpcap] OR'ing vlans impossible in tcpdump filter (issue #158)
Thought about it, and this is not a complete solution. It doesn't solve things like: * (vlan 1 or vlan 2) and ip * (vlan 1 or ether) and ip So the solution isn't complete, but it sure does improve the current situation. So what do you say? Should we proceed to develop this logic? Shoham Peller wrote: >Hi Everyone, > > > >I'm happy to join the mailing list. > > > >There is a prolonged issue with libpcap and vlan filtering, explained in >this ticket: > >https://github.com/the-tcpdump-group/libpcap/issues/158 > > > >In short, filters containing ORs and one or more "VLAN" keywords, behave >unexpectedly. > > > >This is explained very well in the comment in gencode.c:7857: > >/* > >* Check for a VLAN packet, and then change the offsets to point > >* to the type and data fields within the VLAN packet. Just > >* increment the offsets, so that we can support a hierarchy, e.g. > >* "vlan 300 && vlan 200" to capture VLAN 200 encapsulated within > >* VLAN 100. > >* > >* XXX - this is a bit of a kludge. If we were to split the > >* compiler into a parser that parses an expression and > >* generates an expression tree, and a code generator that > >* takes an expression tree (which could come from our > >* parser or from some other parser) and generates BPF code, > >* we could perhaps make the offsets parameters of routines > >* and, in the handler for an "AND" node, pass to subnodes > >* other than the VLAN node the adjusted offsets. > >* > >* This would mean that "vlan" would, instead of changing the > >* behavior of *all* tests after it, change only the behavior > >* of tests ANDed with it. That would change the documented > >* semantics of "vlan", which might break some expressions. > >* However, it would mean that "(vlan and ip) or ip" would check > >* both for VLAN-encapsulated IP and IP-over-Ethernet, rather than > >* checking only for VLAN-encapsulated IP, so that could still > >* be considered worth doing; it wouldn't break expressions > >* that are of the form "vlan and ..." or "vlan N and ...", > >* which I suspect are the most common expressions involving > >* "vlan". "vlan or ..." doesn't necessarily do what the user > >* would really want, now, as all the "or ..." tests would > >* be done assuming a VLAN, even though the "or" could be viewed > >* as meaning "or, if this isn't a VLAN packet...". > >*/ > > > >This comment, commited by <https://github.com/yuguy> @yuguy in 2005 >explains this issue very well. yacc parsers the bpf from left to right >without saving the state, and doesn't provide a tree of some kind, which >would allow an easy solution. <https://github.com/yuguy> @yuguy says that >OR'ing vlans in the current parsing methodology is impossible. > >But there might be a solution, if GCC used yacc in previous version to parse >C code, a state *can* be saved. We simply want yacc to parse parenthesis, >and using them to increment the offset, and with each 'OR' it encounters, >resetting the offset to its last state. Let me explain: > >tcpdump -d 'vlan and (vlan or arp) or ip' >would mean: > >1. filter vlan with the current offset (0) and increment offset ( = 4) >2. open parenthesis. push the offset in a stack >3. filter vlan with the current offset (0) and increment offset ( = 8) >4. or. reset the offset to it's state in the last parenthesis from the >offset stack ( = 4) >5. filter arp with the current offset (4) >6. close parenthesis. pop the offset's state >7. or. reset the offset to it's state in the last parenthesis from the >offset stack ( = 0) >8. filter ip with the current offset (0) > >As it seems to me, this will solve the issue, and would allow OR'ing vlans. > >What do you say? > >Thanks in advance, > >Shoham Peller > >___ >tcpdump-workers mailing list >tcpdump-workers@lists.tcpdump.org >https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers ___ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
Re: [tcpdump-workers] [libpcap] OR'ing vlans impossible in tcpdump filter (issue #158)
yahoo.com> writes: > > That does not solve the cases I wrote below. The filters I wrote are also difficult to translate to the syntax > you suggested: > * (vlan 1 or vlan 2) and ip > * (vlan 1 or ether) and ip > > I'm hoping to be free to implement the algorithm I suggested in the near future. Once I'll get around to it, > you're gonna have a pull request that solves half the problem, as I suggested. > Haven't got the time to get to it. I intend to, soon. Just a question to check that my work won't be for nothing: How do you think we should document the new vlan-filter handling? The documentation today states: Note that the first vlan keyword encountered in expression changes the decoding offsets for the remainder of expression on the assumption that the packet is a VLAN packet. The vlan [vlan_id] expression may be used more than once, to filter on VLAN hierarchies. Each use of that expression increments the filter offsets by 4. After the pull, It'll be harder to explain why "vlan or ip and udp" works but "(vlan or ip) and udp" doesn't. How do you think it should be documented? Do you think we should explain the whole algorithm, so the user can understand the exact behavior, or is it too complicated for the average user? If not, How do you think it should be documented? Thanks, Shoham ___ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
Re: [tcpdump-workers] vlan handling
There is a few problems with your solution: * It isn't backward-compatible * I doesn't solve the issue. (vlanid-2 and arp or vlanid-3 and ip) is not neccessarily solving the offset problems. If you think the vlan syntax should change it's doable, only you have to be backward-compatible and it's another matter from solving the offset issue. I'm reminding you my 2 questions: 1) How will this be documented? 2) Do you even want me to implement it, since "(vlan or ip) and proto 2" would continue not work with the suggested solution Thank you, Shoham From: Denis Ovsienko To: tcpdump-workers Sent: Monday, March 31, 2014 9:00 AM Subject: Re: [tcpdump-workers] vlan handling 31.03.2014, 02:18, "Michael Richardson" : > {For reasons I do not understand, yahoo.com doesn't even attempt to deliver > email from Shoham to tcpdump.org. There is simply no connections in the > logs of the spam filter system...} > > From Shoham: > > Haven't got the time to get to it. I intend to, soon. > > 2 questions (that are very related to each-other) to check that my work > won't be for nothing: > > 1) How do you think we should document the new vlan-filter handling? > The documentation today states: > Note that the first vlan keyword encountered in expression changes > the decoding offsets for the remainder of expression on > the assumption that the packet is a VLAN packet. The vlan [vlan_id] > expression may be used more than once, to filter on > VLAN hierarchies. Each use /* Best Regards, Shohamp */ of that expression increments the filter > offsets by 4. > > After the pull, It'll be harder to explain why "vlan or ip and udp" > works but "(vlan or ip) and udp" doesn't. > > How do you think it should be documented? If the present behaviour was to change anyway that could be used for syntax clarification. For example, instead of single "vlan" keyword that means different things in different context the new syntax could be based on keywords like below: 8021q: EtherType = 0x8100 vlanid N: EtherType = 0x8100 and VID = N in the outermost 32-bit tag vlanid-2 M: EtherType = 0x8100 and there are at least two (Q-in-Q) 32-bit tags and the 2nd (inner) VID = N vlanid-3 K: EtherType = 0x8100 and there are at least three (Q-in-Q-in-Q) 32-bit tags and the 3rd (most inner) VID = K If I got the problem wrong and/or there's a cleaner wolution, please illustrate with examples. Thank you. -- Denis Ovsienko ___ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers ___ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
Re: [tcpdump-workers] vlan handling
Hi, I'm having problems implementing it. The problem is the action precedence bison does. For example: "vlan or vlan" I've written code that makes the "or" keyword to restore the off_linktype so that the second "vlan" keyword uses off_linktype that is reseted. But the action precedence that bison does is: 1. Compute the left side of the or 2. Compute the right side of the or 3. "or" between them In 1 and 2, the off_linktype is incremented, which makes the right side "vlan" to use off_linktype that was already incremented. It's only in 3 that the off_linktype is restored to be the value before the 2 increments happened in 1 and 2. Can anyone suggest a solution to this problem? Thanks, Shoham ___ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers