splice() performance for TCP socket forwarding

Marek Majkowski Thu, 13 Dec 2018 03:26:09 -0800

Hi!

I'm basically trying to do TCP splicing in Linux. I'm focusing on
performance of the simplest case: receive data from one TCP socket,
write data to another TCP socket. I get poor performance with splice.


First, the naive code, pretty much:

while(1){
 n = read(rs, buf);
 write(ws, buf, n);
}

With GRO enabled, this code does roughly line-rate of 10Gbps, hovering
~50% of CPU in application (sys mostly).

When replaced with splice version:

pipe(pfd);
fcntl(pfd[0], F_SETPIPE_SZ, 1024 * 1024);
while(1) {
 n = splice(rd, NULL, pfd[1], NULL, 1024*1024,
                       SPLICE_F_MOVE);
  splice(pfd[0], NULL, wd, NULL, n, SPLICE_F_MOVE);
}

Full code:
https://gist.github.com/majek/c58a97b9be7d9217fe3ebd6c1328faaa#file-proxy-splice-c-L59

I get 100% cpu (sys) and dramatically worse performance (1.5x slower).

naive run of perf record ./proxy-splice shows:
   5.73%  [k] queued_spin_lock_slowpath
   5.23%  [k] ipt_do_table
   4.72%  [k] __splice_segment.part.59
   4.72%  [k] do_tcp_sendpages
   3.47%  [k] _raw_spin_lock_bh
   3.36%  [k] __x86_indirect_thunk_rax

(kernel 4.14.71)

Is it possible to squeeze more from splice? Is it possible to force
splice() to hang forever and not return quickly (SO_RCVLOWAT doesn't
work).

Is there another way of doing TCP splicing? I'm aware of TCP ZEROCOPY
that landed in 4.19.

Cheers,
   Marek

splice() performance for TCP socket forwarding

Reply via email to