On Jul 7, 2025, at 6:42 AM, Denis Ovsienko <de...@ovsienko.info> wrote:
> One thing that can complicate this is that some always-true and > always-false components are in fact specific to the link-layer type, > for example, "ip" generates: > * always-true for DLT_IPV4 > * always-false for DLT_IPV6 > * a load and a comparison for DLT_RAW Yes. What I was thinking of was to generate a higher-level intermediate representation in the parser; that IR would be link-layer-independent, and would *not* be a form of cBPF or eBPF machine code, so, for example, it wouldn't know about particular registers, and operations would not necessarily correspond to particular cBPF or eBPF instructions. There could probably be a bunch of optimizations done to programs in that IR. A separate pass would, for a given link-layer type, modify the IR code to correspond to code for that link-layer type, e.g. replacing a higher-level operations such as "compare the destination MAC address against this value" or "compare the link layer's protocol field against this type" with code that knows where those fields are in the packet (and, in the case of he protocol field, what values correspond to particular protocols), and do further optimizations. The final pass would generate machine code for a particular target: cBPF for a packet that corresponds to what's on the wire; cBPF for a packet that has the outermost VLAN tag removed and put into special metadata; etc. and possible eBPF versions of those if there are advantages to directly handing eBPF to the Linux kernel rather than handing it cBPF and letting it translate that to eBPF. (If we can figure out how to eliminate recursive algorithms in favor of iterative ones, that might be an advantage; sadly, with all these fuzzers out there, "to iterate is human; to recurse is divine" has turned into "to iterate is human; to recurse is to request a ton of "ZOMG this test gets a stack overflow!!!!111ONE!!!!!!". Generating a parse tree in the first pass risks adding a shiny new recursive algorithm to upset fuzzers, although, if it makes certain things easier, if we can limit the recursion depth to something such that a fuzzer would have to *really* go crazy to provoke a stack overflow, that might be OK.) _______________________________________________ tcpdump-workers mailing list -- tcpdump-workers@lists.tcpdump.org To unsubscribe send an email to tcpdump-workers-le...@lists.tcpdump.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s