On Jan 29, 2013, at 12:54 PM, Wenfei Wu wrote:
> When using tcpdump capture trace, we can add filter expressions ( in a
> form of primitive [and/or primitive] ).
> I want to know how the packets are parsed and matched to this filter
> expression. Is there some intermediate data structure for the filter
> expression?
Yes.
libpcap/WinPcap compiles filter expressions into machine code for an
accumulator-based pseudo-machine; interpreters (simulators) for that machine
exist in libpcap/WinPcap, in several UN*Xes in kernel-mode code (*BSD, OS X,
AIX, Tru64 UNIX, sufficiently recent Linux kernels), and in the WinPcap kernel
driver. The kernel-mode version means that the capture mechanism
libpcap/WinPcap uses can ignore "uninteresting" packets before copying them
into a kernel-mode buffer or into the user address space.
> Is the filter used as it is parsed on each layer of the headers
> or used once after the packet is parsed completely?
The filter is compiled into a single program in BPF pseudo-machine code; the
program does all the checks at all layers. For example, a filter such as "tcp
port 80" compiles, for Ethernet packets, into a program such as (with comments
added by me):
(000) ldh [12] # load Ethernet type - 2 byte
"h"alfword at an offset of 12
(001) jeq #0x86dd jt 2jf 8# if equal to 0x86dd for IPv6,
go to 2, else go to 8
(002) ldb [20] # load IPv6 "next header" value
- 1 "b"yte at an offset of 20
(003) jeq #0x6 jt 4jf 19 # if equal to 6 for TCP, go to
4, else go to 19
(004) ldh [54] # load TCP source port value -
2 byte halfword at an offset of 54
(005) jeq #0x50jt 18 jf 6# if equal to 0x50 = 80, go to
18, else go to 6
(006) ldh [56] # load TCP dest port value - 2
byte halfword at an offset of 56
(007) jeq #0x50jt 18 jf 19 # if equal to 0x50 = 80, go to
18, else go to 19
# we got here from (001), so
the accumulator has the Ethernet type
(008) jeq #0x800 jt 9jf 19 # if equal to 0x0800 for IPv4,
go to 9, else go to 19
(009) ldb [23] # load IPv4 protocol value - 1
byte at an offset of 23
(010) jeq #0x6 jt 11 jf 19 # if equal to 6 for TCP, go to
11, else go to 19
(011) ldh [20] # load fragment offset and
flags from IPv6 header (2 bytes at 20)
(012) jset #0x1fff jt 19 jf 13 # if fragment offset is
non-zero, go to 19, else go to 13
(013) ldxb 4*([14]&0xf) # get offset of TCP header,
based on IPv4 header length
(014) ldh [x + 14] # load TCP source port value
(015) jeq #0x50jt 18 jf 16 # if equal to 0x50 = 80, go to
18, else go to 16
(016) ldh [x + 16] # load TCP destination port
value
(017) jeq #0x50jt 18 jf 19 # if equal to 0x50 = 80, go to
18, else go to 19
(018) ret #65535 # success - return 65535, so we
get up to 65535 bytes of packet
(019) ret #0 # failure - return 0, meaning
"ignore this packet"
This is the OS X 10.8 tcpdump and libpcap; newer versions of libpcap generate
IPv6 code that also checks for fragments other than the first fragment, just as
is done for IPv4 - the first fragment is the one that'll have the TCP header,
so you can't check the TCP ports in those fragments.
> Is there some material about this?
Here's the paper on the Berkeley Packet Filter (BPF) mechanism, as used in *BSD
and OS X (and, perhaps with some changes, in AIX and, I think, Solaris 11),
which includes the machine-code interpreter:
http://www.tcpdump.org/papers/bpf-usenix93.pdf
A lot of that only applies to *BSD and OS X, and some might also apply to AIX
and/or Solaris 11. The BPF filter language, however, applies to all of them,
as well as to Tru64 UNIX, Linux (in kernel versions that have the "socket
filter" mechanism), and WinPcap.
___
tcpdump-workers mailing list
tcpdump-workers@lists.tcpdump.org
https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers