Cloudflare L4LB - UNIMOG - using XDP and TC cls

2020-09-11 Thread Marek Majkowski
Hello, I know the community is looking for examples of eBPF usage. David from Cloudflare wrote a blog post about our Layer 4 Load Balancer called UNIMOG. It's a long read but goes into many architectural details: https://blog.cloudflare.com/unimog-cloudflares-edge-load-balancer/ We added the tc

Re: [PATCH v6 bpf-next 0/3] Introduce CAP_BPF

2020-05-13 Thread Marek Majkowski
On Wed, May 13, 2020 at 7:54 PM Alexei Starovoitov wrote: > > On Wed, May 13, 2020 at 07:30:05PM +0100, Marek Majkowski wrote: > > On Wed, May 13, 2020 at 6:53 PM Alexei Starovoitov > > wrote: > > > On Wed, May 13, 2020 at 11:50:42AM +0100, Marek Majkowski wrote: >

Re: [PATCH v6 bpf-next 0/3] Introduce CAP_BPF

2020-05-13 Thread Marek Majkowski
On Wed, May 13, 2020 at 6:53 PM Alexei Starovoitov wrote: > On Wed, May 13, 2020 at 11:50:42AM +0100, Marek Majkowski wrote: > > On Wed, May 13, 2020 at 4:19 AM Alexei Starovoitov > > wrote: > > > > > > CAP_BPF solves three main goals: > > > 1. provi

Re: [PATCH v6 bpf-next 0/3] Introduce CAP_BPF

2020-05-13 Thread Marek Majkowski
On Wed, May 13, 2020 at 4:19 AM Alexei Starovoitov wrote: > > CAP_BPF solves three main goals: > 1. provides isolation to user space processes that drop CAP_SYS_ADMIN and > switch to CAP_BPF. >More on this below. This is the major difference vs v4 set back from Sep > 2019. > 2. makes network

Re: [PATCH net] tcp: better handle TCP_USER_TIMEOUT in SYN_SENT state

2019-09-27 Thread Marek Majkowski
On 9/27/19 10:25 AM, Jonathan Maxwell wrote: > Acked-by: Jon Maxwell > > Thanks for fixing that Eric. > The patch seems to do the job. Tested-by: Marek Majkowski Here's a selftest: ---8<--- From: Marek Majkowski Date: Fri, 27 Sep 2019 13:37:52 +0200 Subject: [

TCP_USER_TIMEOUT, SYN-SENT and tcp_syn_retries

2019-09-25 Thread Marek Majkowski
Hello my favorite mailing list! Recently I've been looking into TCP_USER_TIMEOUT and noticed some strange behaviour on fresh sockets in SYN-SENT state. Full writeup: https://blog.cloudflare.com/when-tcp-sockets-refuse-to-die/ Here's a reproducer. It does a simple thing: sets TCP_USER_TIMEOUT and

Re: OOM triggered by SCTP

2019-07-17 Thread Marek Majkowski
n Wed, Jul 17, 2019 at 1:59 AM malc wrote: > > On Tue, Jul 16, 2019 at 10:49 PM Marek Majkowski wrote: > > > > Morning, > > > > My poor man's fuzzer found something interesting in SCTP. It seems > > like creating large number of SCTP sockets + s

OOM triggered by SCTP

2019-07-16 Thread Marek Majkowski
Morning, My poor man's fuzzer found something interesting in SCTP. It seems like creating large number of SCTP sockets + some magic dance, upsets a memory subsystem related to SCTP. The sequence: - create SCTP socket - call setsockopts (SCTP_EVENTS) - call bind(::1, port) - call sendmsg(long

Re: IPv6 flow label reflection behave for RST packets

2019-07-09 Thread Marek Majkowski
I can confirm the patch works for the RST case I checked. Thanks! On Tue, Jul 9, 2019 at 3:37 PM Eric Dumazet wrote: > > > > On 7/9/19 3:22 PM, Eric Dumazet wrote: > > > > > > On 7/9/19 2:33 PM, Marek Majkowski wrote: > >> Ha, thanks. I missed that. >

Re: IPv6 flow label reflection behave for RST packets

2019-07-09 Thread Marek Majkowski
Now it seem to work reliably. Tested on net-next under virtme. Marek On Tue, Jul 9, 2019 at 1:19 PM Eric Dumazet wrote: > > > > On 7/9/19 1:10 PM, Marek Majkowski wrote: > > Morning, > > > > I'm experimenting with flow label reflection from a server point of >

IPv6 flow label reflection behave for RST packets

2019-07-09 Thread Marek Majkowski
Morning, I'm experimenting with flow label reflection from a server point of view. I'm able to get it working in both supported ways: (a) per-socket with flow manager IPV6_FL_F_REFLECT and flowlabel_consistency=0 (b) with global flowlabel_reflect sysctl However, I was surprised to see that RST

NEIGH: BUG, double timer add, state is 8

2019-07-04 Thread Marek Majkowski
Morning, I found a way to hit an obscure BUG in the net/core/neighbour.c:neigh_add_timer(), by piping two carefully crafted messages into AF_NETLINK socket. https://github.com/torvalds/linux/blob/v5.2-rc7/net/core/neighbour.c#L259 if (unlikely(mod_timer(&n->timer, when))) { printk("N

Re: SOCKET_FILTER regression - eBPF can't subtract when attached from unprivileged user

2019-03-01 Thread Marek Majkowski
Great, appreciated. One more thing (since upgrading kernels takes time) do you think I can amend eBPF on my side to avoid triggering this? Naive stuff like this doesn't work sadly: uint64_t delta = b + ~a + 1; I tried couple more variants with uint32_t types, but to no avail. Ideas? Marek

SOCKET_FILTER regression - eBPF can't subtract when attached from unprivileged user

2019-02-28 Thread Marek Majkowski
Howdy, After some dramatic debugging, I think I managed to isolate a problem that looks like a funny eBPF runtime regression. It seems to be introduced somewhere after 4.14. The eBPF in question is running on network sockets with SO_ATTACH_BPF. The BPF_PROG_TYPE_SOCKET_FILTER code: uint64_t

Using SOCKMAP as echo TCP server - kernel stack overflow (double-fault)

2019-01-17 Thread Marek Majkowski
Hi, perhaps you can tell me if I'm doing something wrong. I'm playing with BPF_MAP_TYPE_SOCKMAP map with trivial BPF_SK_SKB_STREAM_PARSER and BPF_SK_SKB_STREAM_VERDICT ebpf programs to do basic TCP echo server. The code: https://gist.github.com/majek/a09bcbeb8ab548cde6c18c930895c3f2#file-sockmap

Re: MSG_ZEROCOPY doesn't work on half-open TCP sockets

2019-01-09 Thread Marek Majkowski
must implement a fallback from EINVAL return code on the transmission code. An adversarial client who does shutdown(SHUT_WR), will trigger EINVAL in the sender.. Marek On Wed, Jan 9, 2019 at 1:01 PM Marek Majkowski wrote: > > Hi, > > Current implementation of MSG_ZEROCOPY for TCP requires t

MSG_ZEROCOPY doesn't work on half-open TCP sockets

2019-01-09 Thread Marek Majkowski
Hi, Current implementation of MSG_ZEROCOPY for TCP requires the socket to be ESTABLISHED: https://elixir.bootlin.com/linux/v5.0-rc1/source/net/ipv4/tcp.c#L1188 if (sk->sk_state != TCP_ESTABLISHED) { err = -EINVAL; goto out_err; } In TCP it's totally fine to have half-open sockets, for ex

Re: splice() performance for TCP socket forwarding

2018-12-13 Thread Marek Majkowski
rned 121KiB and the second one 11KiB. The first one can be explained by data+metadata crossing 128KiB threshold. I'm not sure about the second splice. On Thu, Dec 13, 2018 at 2:18 PM Marek Majkowski wrote: > > On Thu, Dec 13, 2018 at 2:17 PM Marek Majkowski wrote: > > > > E

Re: splice() performance for TCP socket forwarding

2018-12-13 Thread Marek Majkowski
On Thu, Dec 13, 2018 at 2:17 PM Marek Majkowski wrote: > > Eric, > > On Thu, Dec 13, 2018 at 1:49 PM Eric Dumazet wrote: > > On 12/13/2018 03:25 AM, Marek Majkowski wrote: > > > Hi! > > > > > > I'm basically trying to do TCP splicing in Linux.

Re: splice() performance for TCP socket forwarding

2018-12-13 Thread Marek Majkowski
Eric, On Thu, Dec 13, 2018 at 1:49 PM Eric Dumazet wrote: > On 12/13/2018 03:25 AM, Marek Majkowski wrote: > > Hi! > > > > I'm basically trying to do TCP splicing in Linux. I'm focusing on > > performance of the simplest case: receive data from one TCP

splice() performance for TCP socket forwarding

2018-12-13 Thread Marek Majkowski
Hi! I'm basically trying to do TCP splicing in Linux. I'm focusing on performance of the simplest case: receive data from one TCP socket, write data to another TCP socket. I get poor performance with splice. First, the naive code, pretty much: while(1){ n = read(rs, buf); write(ws, buf, n); }