[tcpdump-workers] Wenfei: how does tcpdump filter packets?

2013-01-29 Thread Wenfei Wu
Hi, all,
  When using tcpdump capture trace, we can add filter expressions (  in a
form of  primitive [and/or primitive] ).
  I want to know how the packets are parsed and matched to this filter
expression. Is there some intermediate data structure for the filter
expression? Is the filter used as it is parsed on each layer of the headers
or used once after the packet is parsed completely?
  Is there some material about this?
  Regards,
  Wenfei Wu
___
tcpdump-workers mailing list
tcpdump-workers@lists.tcpdump.org
https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers


Re: [tcpdump-workers] Wenfei: how does tcpdump filter packets?

2013-01-29 Thread Guy Harris

On Jan 29, 2013, at 12:54 PM, Wenfei Wu  wrote:

>  When using tcpdump capture trace, we can add filter expressions (  in a
> form of  primitive [and/or primitive] ).
>  I want to know how the packets are parsed and matched to this filter
> expression. Is there some intermediate data structure for the filter
> expression?

Yes.

libpcap/WinPcap compiles filter expressions into machine code for an 
accumulator-based pseudo-machine; interpreters (simulators) for that machine 
exist in libpcap/WinPcap, in several UN*Xes in kernel-mode code (*BSD, OS X, 
AIX, Tru64 UNIX, sufficiently recent Linux kernels), and in the WinPcap kernel 
driver.  The kernel-mode version means that the capture mechanism 
libpcap/WinPcap uses can ignore "uninteresting" packets before copying them 
into a kernel-mode buffer or into the user address space.

> Is the filter used as it is parsed on each layer of the headers
> or used once after the packet is parsed completely?

The filter is compiled into a single program in BPF pseudo-machine code; the 
program does all the checks at all layers.  For example, a filter such as "tcp 
port 80" compiles, for Ethernet packets, into a program such as (with comments 
added by me):

(000) ldh  [12] # load Ethernet type - 2 byte 
"h"alfword at an offset of 12
(001) jeq  #0x86dd  jt 2jf 8# if equal to 0x86dd for IPv6, 
go to 2, else go to 8
(002) ldb  [20] # load IPv6 "next header" value 
- 1 "b"yte at an offset of 20
(003) jeq  #0x6 jt 4jf 19   # if equal to 6 for TCP, go to 
4, else go to 19
(004) ldh  [54] # load TCP source port value - 
2 byte halfword at an offset of 54
(005) jeq  #0x50jt 18   jf 6# if equal to 0x50 = 80, go to 
18, else go to 6
(006) ldh  [56] # load TCP dest port value - 2 
byte halfword at an offset of 56
(007) jeq  #0x50jt 18   jf 19   # if equal to 0x50 = 80, go to 
18, else go to 19

# we got here from (001), so 
the accumulator has the Ethernet type
(008) jeq  #0x800   jt 9jf 19   # if equal to 0x0800 for IPv4, 
go to 9, else go to 19
(009) ldb  [23] # load IPv4 protocol value - 1 
byte at an offset of 23
(010) jeq  #0x6 jt 11   jf 19   # if equal to 6 for TCP, go to 
11, else go to 19
(011) ldh  [20] # load fragment offset and 
flags from IPv6 header (2 bytes at 20)
(012) jset #0x1fff  jt 19   jf 13   # if fragment offset is 
non-zero, go to 19, else go to 13
(013) ldxb 4*([14]&0xf) # get offset of TCP header, 
based on IPv4 header length
(014) ldh  [x + 14] # load TCP source port value
(015) jeq  #0x50jt 18   jf 16   # if equal to 0x50 = 80, go to 
18, else go to 16
(016) ldh  [x + 16] # load TCP destination port 
value
(017) jeq  #0x50jt 18   jf 19   # if equal to 0x50 = 80, go to 
18, else go to 19

(018) ret  #65535   # success - return 65535, so we 
get up to 65535 bytes of packet

(019) ret  #0   # failure - return 0, meaning 
"ignore this packet"

This is the OS X 10.8 tcpdump and libpcap; newer versions of libpcap generate 
IPv6 code that also checks for fragments other than the first fragment, just as 
is done for IPv4 - the first fragment is the one that'll have the TCP header, 
so you can't check the TCP ports in those fragments.

> Is there some material about this?

Here's the paper on the Berkeley Packet Filter (BPF) mechanism, as used in *BSD 
and OS X (and, perhaps with some changes, in AIX and, I think, Solaris 11), 
which includes the machine-code interpreter:

http://www.tcpdump.org/papers/bpf-usenix93.pdf

A lot of that only applies to *BSD and OS X, and some might also apply to AIX 
and/or Solaris 11.  The BPF filter language, however, applies to all of them, 
as well as to Tru64 UNIX, Linux (in kernel versions that have the "socket 
filter" mechanism), and WinPcap.
___
tcpdump-workers mailing list
tcpdump-workers@lists.tcpdump.org
https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers


Re: [tcpdump-workers] Wenfei: how does tcpdump filter packets?

2013-01-29 Thread Wenfei Wu
Thanks, this is really helpful.
On Tue, Jan 29, 2013 at 3:21 PM, Guy Harris  wrote:

> er, so you can't check the TCP ports in tho
___
tcpdump-workers mailing list
tcpdump-workers@lists.tcpdump.org
https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers


Re: [tcpdump-workers] Wenfei: how does tcpdump filter packets?

2013-01-29 Thread Guy Harris

On Jan 29, 2013, at 2:24 PM, Wenfei Wu  wrote:

> Thanks, this is really helpful.
> On Tue, Jan 29, 2013 at 3:21 PM, Guy Harris  wrote:
> er, so you can't check the TCP ports in tho

I'm not sure whether you intended to quote that part of my response, but, if 
you did, because handling fragmented IP datagrams is an issue:

If you want to filter based on TCP-level or UDP-level information, *and* you 
want to handle IP fragments, whatever software does the capturing and filtering 
will have to, when it sees a fragment that's either not the first fragment or 
that is the first fragment but not the last fragment, see whether other 
fragments of the same datagram have been seen.  If so, then associate the new 
fragment with the other fragments; if all fragments have been seen, check 
whether the packet matches the filter (if all the information being checked is 
in the first fragment, you won't need to reassemble the packet to do that) and 
then treat all the fragments as having passed the filter.

That doesn't handle, for example, a case where you have a filter such as

ether src host XX:XX:XX:XX:XX:XX and tcp port 80

and some, but not all, of the fragments are from MAC address XX:XX:XX:XX:XX:XX 
- I'm not sure what the right thing to do in that case would be.

It also makes in-order delivery of link-layer packets complicated, as some 
packets have to wait - if there are any unfinished fragmented packets, *all* 
packets would have to be queued up behind them and released when there are no 
remaining fragments with time stamps before those packets.
___
tcpdump-workers mailing list
tcpdump-workers@lists.tcpdump.org
https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers