Hi!

I'm basically trying to do TCP splicing in Linux. I'm focusing on
performance of the simplest case: receive data from one TCP socket,
write data to another TCP socket. I get poor performance with splice.

First, the naive code, pretty much:

while(1){
 n = read(rs, buf);
 write(ws, buf, n);
}

With GRO enabled, this code does roughly line-rate of 10Gbps, hovering
~50% of CPU in application (sys mostly).

When replaced with splice version:

pipe(pfd);
fcntl(pfd[0], F_SETPIPE_SZ, 1024 * 1024);
while(1) {
 n = splice(rd, NULL, pfd[1], NULL, 1024*1024,
                       SPLICE_F_MOVE);
  splice(pfd[0], NULL, wd, NULL, n, SPLICE_F_MOVE);
}

Full code:
https://gist.github.com/majek/c58a97b9be7d9217fe3ebd6c1328faaa#file-proxy-splice-c-L59

I get 100% cpu (sys) and dramatically worse performance (1.5x slower).

naive run of perf record ./proxy-splice shows:
   5.73%  [k] queued_spin_lock_slowpath
   5.23%  [k] ipt_do_table
   4.72%  [k] __splice_segment.part.59
   4.72%  [k] do_tcp_sendpages
   3.47%  [k] _raw_spin_lock_bh
   3.36%  [k] __x86_indirect_thunk_rax

(kernel 4.14.71)

Is it possible to squeeze more from splice? Is it possible to force
splice() to hang forever and not return quickly (SO_RCVLOWAT doesn't
work).

Is there another way of doing TCP splicing? I'm aware of TCP ZEROCOPY
that landed in 4.19.

Cheers,
   Marek

Reply via email to