On Jan 29, 2013, at 12:54 PM, Wenfei Wu <wenfe...@cs.wisc.edu> wrote:
> When using tcpdump capture trace, we can add filter expressions ( in a > form of primitive [and/or primitive] ). > I want to know how the packets are parsed and matched to this filter > expression. Is there some intermediate data structure for the filter > expression? Yes. libpcap/WinPcap compiles filter expressions into machine code for an accumulator-based pseudo-machine; interpreters (simulators) for that machine exist in libpcap/WinPcap, in several UN*Xes in kernel-mode code (*BSD, OS X, AIX, Tru64 UNIX, sufficiently recent Linux kernels), and in the WinPcap kernel driver. The kernel-mode version means that the capture mechanism libpcap/WinPcap uses can ignore "uninteresting" packets before copying them into a kernel-mode buffer or into the user address space. > Is the filter used as it is parsed on each layer of the headers > or used once after the packet is parsed completely? The filter is compiled into a single program in BPF pseudo-machine code; the program does all the checks at all layers. For example, a filter such as "tcp port 80" compiles, for Ethernet packets, into a program such as (with comments added by me): (000) ldh [12] # load Ethernet type - 2 byte "h"alfword at an offset of 12 (001) jeq #0x86dd jt 2 jf 8 # if equal to 0x86dd for IPv6, go to 2, else go to 8 (002) ldb [20] # load IPv6 "next header" value - 1 "b"yte at an offset of 20 (003) jeq #0x6 jt 4 jf 19 # if equal to 6 for TCP, go to 4, else go to 19 (004) ldh [54] # load TCP source port value - 2 byte halfword at an offset of 54 (005) jeq #0x50 jt 18 jf 6 # if equal to 0x50 = 80, go to 18, else go to 6 (006) ldh [56] # load TCP dest port value - 2 byte halfword at an offset of 56 (007) jeq #0x50 jt 18 jf 19 # if equal to 0x50 = 80, go to 18, else go to 19 # we got here from (001), so the accumulator has the Ethernet type (008) jeq #0x800 jt 9 jf 19 # if equal to 0x0800 for IPv4, go to 9, else go to 19 (009) ldb [23] # load IPv4 protocol value - 1 byte at an offset of 23 (010) jeq #0x6 jt 11 jf 19 # if equal to 6 for TCP, go to 11, else go to 19 (011) ldh [20] # load fragment offset and flags from IPv6 header (2 bytes at 20) (012) jset #0x1fff jt 19 jf 13 # if fragment offset is non-zero, go to 19, else go to 13 (013) ldxb 4*([14]&0xf) # get offset of TCP header, based on IPv4 header length (014) ldh [x + 14] # load TCP source port value (015) jeq #0x50 jt 18 jf 16 # if equal to 0x50 = 80, go to 18, else go to 16 (016) ldh [x + 16] # load TCP destination port value (017) jeq #0x50 jt 18 jf 19 # if equal to 0x50 = 80, go to 18, else go to 19 (018) ret #65535 # success - return 65535, so we get up to 65535 bytes of packet (019) ret #0 # failure - return 0, meaning "ignore this packet" This is the OS X 10.8 tcpdump and libpcap; newer versions of libpcap generate IPv6 code that also checks for fragments other than the first fragment, just as is done for IPv4 - the first fragment is the one that'll have the TCP header, so you can't check the TCP ports in those fragments. > Is there some material about this? Here's the paper on the Berkeley Packet Filter (BPF) mechanism, as used in *BSD and OS X (and, perhaps with some changes, in AIX and, I think, Solaris 11), which includes the machine-code interpreter: http://www.tcpdump.org/papers/bpf-usenix93.pdf A lot of that only applies to *BSD and OS X, and some might also apply to AIX and/or Solaris 11. The BPF filter language, however, applies to all of them, as well as to Tru64 UNIX, Linux (in kernel versions that have the "socket filter" mechanism), and WinPcap. _______________________________________________ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers