[tcpdump-workers] [libpcap] OR'ing vlans impossible in tcpdump filter (issue #158)

2013-10-12 Thread Shoham Peller
Hi Everyone,

 

I'm happy to join the mailing list.

 

There is a prolonged issue with libpcap and vlan filtering, explained in
this ticket:

https://github.com/the-tcpdump-group/libpcap/issues/158

 

In short, filters containing ORs and one or more "VLAN" keywords, behave
unexpectedly.

 

This is explained very well in the comment in gencode.c:7857:

/*

* Check for a VLAN packet, and then change the offsets to point

* to the type and data fields within the VLAN packet.  Just

* increment the offsets, so that we can support a hierarchy, e.g.

* "vlan 300 && vlan 200" to capture VLAN 200 encapsulated within

* VLAN 100.

*

* XXX - this is a bit of a kludge.  If we were to split the

* compiler into a parser that parses an expression and

* generates an expression tree, and a code generator that

* takes an expression tree (which could come from our

* parser or from some other parser) and generates BPF code,

* we could perhaps make the offsets parameters of routines

* and, in the handler for an "AND" node, pass to subnodes

* other than the VLAN node the adjusted offsets.

*

* This would mean that "vlan" would, instead of changing the

* behavior of *all* tests after it, change only the behavior

* of tests ANDed with it.  That would change the documented

* semantics of "vlan", which might break some expressions.

* However, it would mean that "(vlan and ip) or ip" would check

* both for VLAN-encapsulated IP and IP-over-Ethernet, rather than

* checking only for VLAN-encapsulated IP, so that could still

* be considered worth doing; it wouldn't break expressions

* that are of the form "vlan and ..." or "vlan N and ...",

* which I suspect are the most common expressions involving

* "vlan".  "vlan or ..." doesn't necessarily do what the user

* would really want, now, as all the "or ..." tests would

* be done assuming a VLAN, even though the "or" could be viewed

* as meaning "or, if this isn't a VLAN packet...".

*/

 

This comment, commited by  <https://github.com/yuguy> @yuguy in 2005
explains this issue very well. yacc parsers the bpf from left to right
without saving the state, and doesn't provide a tree of some kind, which
would allow an easy solution. <https://github.com/yuguy> @yuguy says that
OR'ing vlans in the current parsing methodology is impossible.

But there might be a solution, if GCC used yacc in previous version to parse
C code, a state *can* be saved. We simply want yacc to parse parenthesis,
and using them to increment the offset, and with each 'OR' it encounters,
resetting the offset to its last state. Let me explain:

tcpdump -d 'vlan and (vlan or arp) or ip'
would mean:

1. filter vlan with the current offset (0) and increment offset ( = 4)
2. open parenthesis. push the offset in a stack
3. filter vlan with the current offset (0) and increment offset ( = 8)
4. or. reset the offset to it's state in the last parenthesis from the
offset stack ( = 4)
5. filter arp with the current offset (4)
6. close parenthesis. pop the offset's state
7. or. reset the offset to it's state in the last parenthesis from the
offset stack ( = 0)
8. filter ip with the current offset (0)

As it seems to me, this will solve the issue, and would allow OR'ing vlans.

What do you say?

Thanks in advance,

Shoham Peller

___
tcpdump-workers mailing list
tcpdump-workers@lists.tcpdump.org
https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers


Re: [tcpdump-workers] [libpcap] OR'ing vlans impossible in tcpdump filter (issue #158)

2013-10-26 Thread Shoham Peller
Thought about it, and this is not a complete solution.
It doesn't solve things like:
* (vlan 1 or vlan 2) and ip
* (vlan 1 or ether) and ip

So the solution isn't complete, but it sure does improve the current situation.
So what do you say? Should we proceed to develop this logic?

Shoham Peller  wrote:

>Hi Everyone,
>
> 
>
>I'm happy to join the mailing list.
>
> 
>
>There is a prolonged issue with libpcap and vlan filtering, explained in
>this ticket:
>
>https://github.com/the-tcpdump-group/libpcap/issues/158
>
> 
>
>In short, filters containing ORs and one or more "VLAN" keywords, behave
>unexpectedly.
>
> 
>
>This is explained very well in the comment in gencode.c:7857:
>
>/*
>
>* Check for a VLAN packet, and then change the offsets to point
>
>* to the type and data fields within the VLAN packet.  Just
>
>* increment the offsets, so that we can support a hierarchy, e.g.
>
>* "vlan 300 && vlan 200" to capture VLAN 200 encapsulated within
>
>* VLAN 100.
>
>*
>
>* XXX - this is a bit of a kludge.  If we were to split the
>
>* compiler into a parser that parses an expression and
>
>* generates an expression tree, and a code generator that
>
>* takes an expression tree (which could come from our
>
>* parser or from some other parser) and generates BPF code,
>
>* we could perhaps make the offsets parameters of routines
>
>* and, in the handler for an "AND" node, pass to subnodes
>
>* other than the VLAN node the adjusted offsets.
>
>*
>
>* This would mean that "vlan" would, instead of changing the
>
>* behavior of *all* tests after it, change only the behavior
>
>* of tests ANDed with it.  That would change the documented
>
>* semantics of "vlan", which might break some expressions.
>
>* However, it would mean that "(vlan and ip) or ip" would check
>
>* both for VLAN-encapsulated IP and IP-over-Ethernet, rather than
>
>* checking only for VLAN-encapsulated IP, so that could still
>
>* be considered worth doing; it wouldn't break expressions
>
>* that are of the form "vlan and ..." or "vlan N and ...",
>
>* which I suspect are the most common expressions involving
>
>* "vlan".  "vlan or ..." doesn't necessarily do what the user
>
>* would really want, now, as all the "or ..." tests would
>
>* be done assuming a VLAN, even though the "or" could be viewed
>
>* as meaning "or, if this isn't a VLAN packet...".
>
>*/
>
> 
>
>This comment, commited by  <https://github.com/yuguy> @yuguy in 2005
>explains this issue very well. yacc parsers the bpf from left to right
>without saving the state, and doesn't provide a tree of some kind, which
>would allow an easy solution. <https://github.com/yuguy> @yuguy says that
>OR'ing vlans in the current parsing methodology is impossible.
>
>But there might be a solution, if GCC used yacc in previous version to parse
>C code, a state *can* be saved. We simply want yacc to parse parenthesis,
>and using them to increment the offset, and with each 'OR' it encounters,
>resetting the offset to its last state. Let me explain:
>
>tcpdump -d 'vlan and (vlan or arp) or ip'
>would mean:
>
>1. filter vlan with the current offset (0) and increment offset ( = 4)
>2. open parenthesis. push the offset in a stack
>3. filter vlan with the current offset (0) and increment offset ( = 8)
>4. or. reset the offset to it's state in the last parenthesis from the
>offset stack ( = 4)
>5. filter arp with the current offset (4)
>6. close parenthesis. pop the offset's state
>7. or. reset the offset to it's state in the last parenthesis from the
>offset stack ( = 0)
>8. filter ip with the current offset (0)
>
>As it seems to me, this will solve the issue, and would allow OR'ing vlans.
>
>What do you say?
>
>Thanks in advance,
>
>Shoham Peller
>
>___
>tcpdump-workers mailing list
>tcpdump-workers@lists.tcpdump.org
>https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
___
tcpdump-workers mailing list
tcpdump-workers@lists.tcpdump.org
https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers


Re: [tcpdump-workers] [libpcap] OR'ing vlans impossible in tcpdump filter (issue #158)

2014-03-20 Thread Shoham Peller
  yahoo.com> writes:

> 
> That does not solve the cases I wrote below. The filters I wrote are also
difficult to translate to the syntax
> you suggested:
> * (vlan 1 or vlan 2) and ip
> * (vlan 1 or ether) and ip
> 
> I'm hoping to be free to implement the algorithm I suggested in the near
future. Once I'll get around to it,
> you're gonna have a pull request that solves half the problem, as I suggested.
> 

Haven't got the time to get to it. I intend to, soon.

Just a question to check that my work won't be for nothing:
How do you think we should document the new vlan-filter handling?

The documentation today states:
Note that the first vlan keyword encountered in expression changes the
decoding offsets for the remainder of expression on
the assumption that the packet is a VLAN packet. The vlan [vlan_id]
expression may be used more than once, to filter on
VLAN hierarchies. Each use of that expression increments the filter
offsets by 4.

After the pull, It'll be harder to explain why "vlan or ip and udp" works
but "(vlan or ip) and udp" doesn't.

How do you think it should be documented?
Do you think we should explain the whole algorithm, so the user can
understand the exact behavior, or is it too complicated for the average user?
If not, How do you think it should be documented?

Thanks,
Shoham

___
tcpdump-workers mailing list
tcpdump-workers@lists.tcpdump.org
https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers


Re: [tcpdump-workers] vlan handling

2014-03-31 Thread Shoham peller
There is a few problems with your solution:
* It isn't backward-compatible
* I doesn't solve the issue. (vlanid-2 and arp or vlanid-3 and ip) is not 
neccessarily solving the offset problems.

If you think the vlan syntax should change it's doable, only you have to be 
backward-compatible and it's another matter from solving the offset issue.

I'm reminding you my 2 questions:
1) How will this be documented?
2) Do you even want me to implement it, since "(vlan or ip) and proto 2" would 
continue not work with the suggested solution

Thank you,
    Shoham




 From: Denis Ovsienko 
To: tcpdump-workers  
Sent: Monday, March 31, 2014 9:00 AM
Subject: Re: [tcpdump-workers] vlan handling
 

31.03.2014, 02:18, "Michael Richardson" :

> {For reasons I do not understand, yahoo.com doesn't even attempt to deliver
> email from Shoham to tcpdump.org. There is simply no connections in the
> logs of the spam filter system...}
>
> From Shoham:
>
>    Haven't got the time to get to it. I intend to, soon.
>
>    2 questions (that are very related to each-other) to check that my work
>    won't be for nothing:
>
>    1) How do you think we should document the new vlan-filter handling?
>    The documentation today states:
>    Note that the first vlan keyword encountered in expression changes
>    the decoding offsets for the remainder of expression on
>    the assumption that the packet is a VLAN packet. The vlan [vlan_id]
>    expression may be used more than once, to filter on
>    VLAN hierarchies. Each use
/*
Best Regards,
                Shohamp
*/

of that expression increments the filter
>    offsets by 4.
>
>    After the pull, It'll be harder to explain why "vlan or ip and udp"
>    works but "(vlan or ip) and udp" doesn't.
>
>    How do you think it should be documented?

If the present behaviour was to change anyway that could be used for syntax 
clarification. For example, instead of single "vlan" keyword that means 
different things in different context the new syntax could be based on keywords 
like below:

8021q: EtherType = 0x8100
vlanid N: EtherType = 0x8100 and VID = N in the outermost 32-bit tag
vlanid-2 M: EtherType = 0x8100 and there are at least two (Q-in-Q) 32-bit tags 
and the 2nd (inner) VID = N
vlanid-3 K: EtherType = 0x8100 and there are at least three (Q-in-Q-in-Q) 
32-bit tags and the 3rd (most inner) VID = K

If I got the problem wrong and/or there's a cleaner wolution, please illustrate 
with examples.

Thank you.

-- 
    Denis Ovsienko
___
tcpdump-workers mailing list
tcpdump-workers@lists.tcpdump.org
https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
___
tcpdump-workers mailing list
tcpdump-workers@lists.tcpdump.org
https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers


Re: [tcpdump-workers] vlan handling

2014-04-04 Thread Shoham peller
Hi,

I'm having problems implementing it. The problem is the action
precedence bison does.

For example:
"vlan or vlan"

I've written code that makes the "or" keyword to restore the
off_linktype so that the second "vlan" keyword uses off_linktype
that is reseted.

But the action precedence that bison does is:
1. Compute the left side of the or
2. Compute the right side of the or
3. "or" between them

In 1 and 2, the off_linktype is incremented, which makes the right
side "vlan" to use off_linktype that was already incremented.
It's only in 3 that the off_linktype is restored to be the value
before the 2 increments happened in 1 and 2.


Can anyone suggest a solution to this problem?

Thanks,
    Shoham
___
tcpdump-workers mailing list
tcpdump-workers@lists.tcpdump.org
https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers