[tcpdump-workers] Re: upcoming tcpslice 1.8

2024-09-09 Thread Michael Richardson
Denis Ovsienko  wrote:
> Let me suggest making tcpslice 1.8 release in 1-2 weeks to avoid yet
> another oversized change log section.  If anyone sees a good reason not
> to, please make your point before long.

Who are the users of tcpslice?
Are there any heavy users that would like to identify themselves, and verify
the releases?

___
tcpdump-workers mailing list -- tcpdump-workers@lists.tcpdump.org
To unsubscribe send an email to tcpdump-workers-le...@lists.tcpdump.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s


[tcpdump-workers] BPF64: proposal of platform-independent hardware-friendly backwards-compatible eBPF alternative

2024-09-09 Thread Vadim Goncharov
Hello!

 We don't need ELF relocations!
   We want better loop control!
   No so little parameters,
Verifier! Leave our code alone!
  -- Ping Floyd

I've recently had some experience with Linux's ePBF in it's XDP, and this left
quite negative impression. I was following via 
https://github.com/xdp-project/xdp-tutorial
and after 3rd lesson was trying to create a simple program for searching TCP
timestamp option and incrementing it by one. As you know, eBPF tool stack
consists of at least clang and eBPF verifier in the kernel, and after two dozen
tries eBPF verifier still didn't accept my code. I was digging into verifier
sources, and the abysses opened in front of me! Carefully and boringly going
via disassembler and verifier output, I've found that clang optimizer ignores
just checked register - patching one byte in assembler sources (and target .o)
did help. I've filed https://github.com/iovisor/bcc/issues/5062 with details
if one curious.

So, looking at eBPF ecosystem, I must say it's a Frankenstein. Sewn from good,
sometimes brilliant parts, it's a monster in outcome. Verifier is in it's own
right, compiler/optimizer is in it's own right... But at the end you even
don't have a high-level programming language! You must write in C, relatively
low-level C, and restricted subset of C. This requires very skilled
professionals - it's far from something not even user-friendly, but at least
sysadmin-friendly, like `ipfw` or `iptables` firewall rules.

Thus I looked at the foundation of eBPF architecture, with which presuppositions
in mind it was created with. In fact, it tries to be just usual programming
after checks - that is, with all that pointers. It's too x86-centric and
Linux-centric - number of registers was added just ten. So if you look at the
GitHub ticket above, when I tried to add debug to program - you know, just
specific `printf()`s - it failed verifier checks again because compiler now
had to move some variables between registers and memory, as there is limit on
just 5 arguments to call due to limit of 5 registers! And verifier, despite
being more than 20,000 lines of code, still was not smart enough to track info
between registers and stack.

So, if we'd started from beginning, what should we do? Remember classic BPF:
it has very simple validator due to it's Virtual Machine design - only forward
jumps, checks for packet boundaries at runtime, etc. You'd say eBPF tries for
performance if verifier's checks were passed? But in practice you have to toss
in as much packet boundary checks as near to actual access as possible, or
verifier may "forget" it, because of compiler optimizer. So this is not of
much difference for checking if access is after packet in classic BPF - the
same CMP/JUMP in JIT if buffer is linear, and if your OS has put packet in
several buffers, like *BSD or DPDK `mbuf`'s, the runtime check overhead is
negligible in comparison.

Ensuring kernel stability? Just don't allow arbitrary pointers, like original 
BPF.
Guaranteed termination time? It's possible if you place some restrictions. For
example, don't allow backward jumps but allow function calls - in case of
stack overflow, terminate program. Really need backward jumps? Let's analyze
for what purpose. You'll find these are needed for loops on packet contents.
Solve it but supporting loops in "hardware"-controlled loops, which can't be
infinite.

Finally, platforms. It's beginning of sunset of x86 era now - RISC is coming.
ARM is now not only on mobiles, but on desktops and servers. Moreover, it's
era of specialized hardware accelerators - e.g. GPU, neural processors. Even
general purpose ARM64 has 31 register, and specialized hardware can
implement much more. Then, don't tie to Linux kernel - BPF helpers are very
rigid interface, from ancient era, like syscalls.

So, let's continue *Berkeley* Packet Filter with Berkeley RISC design - having
register window idea, updated by SPARC and then by Itanium (to not waste
registers). Take NetBSD's coprocessor functions which set is passed with
a context, instead of hardcoded enums of functions - for example, BPF maps is
not something universal, both NetBSD and FreeBSD have their own tables in
firewall.

Add more features actually needed for *network* processor - e.g. 128-bit
registers for IPv6 (eBPF axed out even BPF_MSH!). And do all of this in fully
backwards-compatible way - new language should allow to run older programs
from e.g. `tcpdump` to run without any modifications, binary-compatible
(again, eBPF does not do this - it's incompatible with classiv BPF and uses
a translator from it).

Next, eBPF took "we are masquerading usual x86 programming" way not only just
in assembly language. They have very complex ELF infrastructure around it which
may be not suitable