On Fri, Dec 2, 2016 at 4:20 PM, Alexei Starovoitov <alexei.starovoi...@gmail.com> wrote: > On Fri, Dec 02, 2016 at 11:42:15AM -0800, John Fastabend wrote: >> >> As far as pattern search for DNS packets... >> >> it was requested by Cloudflare guys back in March: >> >> https://github.com/iovisor/bcc/issues/471 >> >> and it is useful for several tracing use cases as well. >> >> Unfortunately no one had time to implement it yet. >> > >> > The string operations you proposed on the other hand, which would count >> > as one eBPF instructions, would give a lot more flexibility and allow >> > more cycles to burn, but don't help parsing binary protocols like IPv6 >> > extension headers. > > these are two separate things. we need pattern search regardless > of bounded loops. bpf program shouldn't be doing any complicated > algorithms. The main reasons to have loops are: > - speed up execution (smaller I-cache footprint) > - avoid forcing compiler to unroll loops (easier for users) > - support loops where unroll is not possible (like example below) > >> My rough thinking on this was the verifier had to start looking for loop >> invariants and to guarantee termination. Sounds scary in general but >> LLVM could put these in some normal form for us and the verifier could >> only accept decreasing loops, the invariants could be required to be >> integers, etc. By simplifying the loop enough the problem becomes >> tractable. > > yep. I think what Hannes was proposing earlier is straighforward > to implement for a compiler guy. The following: > for (int i = 0; i < (var & 0xff); i++) > sum += map->value[i]; /* map value_size >= 0xff */ > is obviously bounded and dataflow analysis can easily prove > that all memory operations are valid. > Static analysis tools do way way more than this. > >> I think this would be better than new instructions and/or multiple >> verifiers. > > agree that it's better than new instructions that would have > required JIT changes. Though there are pros to new insns too :) > Has there been any thought to adding a map, or foldl helper a la the tail call helper? Although you'd want to allocate an accumulator of kinds for the foldl, I imagine this could be bounded in size quite small for things like binary parsing operations -- we could reasonably allow the accumulator to be updated, and return a special value to exit the loop. I also started working on a map function a while ago which would call a bpf program for each set cell in an arraymap, and each set key/value in a hash map.
My intent was to intentionally make it so I could do this on the context itself, so I could do encryption in BPF. I wanted to be able to fold over the packet 16, or 32 bytes at a time, and (1) modify the content, and (2) generate the authentication tag. Any opinions on that approach?