On Jul 7, 2025, at 6:42 AM, Denis Ovsienko <de...@ovsienko.info> wrote:

> One thing that can complicate this is that some always-true and
> always-false components are in fact specific to the link-layer type,
> for example, "ip" generates:
> * always-true for DLT_IPV4
> * always-false for DLT_IPV6
> * a load and a comparison for DLT_RAW

Yes.

What I was thinking of was to generate a higher-level intermediate 
representation in the parser; that IR would be link-layer-independent, and 
would *not* be a form of cBPF or eBPF machine code, so, for example, it 
wouldn't know about particular registers, and operations would not necessarily 
correspond to particular cBPF or eBPF instructions.  There could probably be a 
bunch of optimizations done to programs in that IR.

A separate pass would, for a given link-layer type, modify the IR code to 
correspond to code for that link-layer type, e.g. replacing a higher-level 
operations such as "compare the destination MAC address against this value" or 
"compare the link layer's protocol field against this type" with code that 
knows where those fields are in the packet (and, in the case of he protocol 
field, what values correspond to particular protocols), and do further 
optimizations.

The final pass would generate machine code for a particular target:

        cBPF for a packet that corresponds to what's on the wire;

        cBPF for a packet that has the outermost VLAN tag removed and put into 
special metadata;

        etc.

and possible eBPF versions of those if there are advantages to directly handing 
eBPF to the Linux kernel rather than handing it cBPF and letting it translate 
that to eBPF.

(If we can figure out how to eliminate recursive algorithms in favor of 
iterative ones, that might be an advantage; sadly, with all these fuzzers out 
there, "to iterate is human; to recurse is divine" has turned into "to iterate 
is human; to recurse is to request a ton of "ZOMG this test gets a stack 
overflow!!!!111ONE!!!!!!".

Generating a parse tree in the first pass risks adding a shiny new recursive 
algorithm to upset fuzzers, although, if it makes certain things easier, if we 
can limit the recursion depth to something such that a fuzzer would have to 
*really* go crazy to provoke a stack overflow, that might be OK.)
_______________________________________________
tcpdump-workers mailing list -- tcpdump-workers@lists.tcpdump.org
To unsubscribe send an email to tcpdump-workers-le...@lists.tcpdump.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

Reply via email to