Hi all, hopefully I'm reporting this in the right place.
While working on my C++ networking library, I ran into a problem that I have traced back to the kernel itself. The problem is related to setting a socket buffer size (snd/rcv). At first I noticed the problem while trying to test the thread-safety of my lock free stream buffer implementation, with std::streambuf interface, that allows reading and writing by different threads at the same time. In order to test this, I read data from a socket into a buffer and write that back to the same socket from the same buffer. This resulted in a speed of only a few 10kB/s bandwidth, and after investigation turned out to be caused by epoll_pwait() not returning until 205 to 208 ms had passed, even though data was already available. At first I thought it could be a bug in libev that I was using as wrapper around epoll - so I ripped that out and wrote my own code to interface with epoll, but with the same result. Next I changed the test into a one-way stream: just send data from one socket to another (not back); also with the same result. Then I wrote a test application that was standalone, single threaded and in C. Here I discovered that first calling connect() on a socket and then (immediately after) setting the socket buffer sizes caused the stall to occur. Assuming that this is not allowed I thought I had solved it. But, back to sending data back and forth over the TCP connection caused the problem to return. I wrote again a standalone C program, single threaded, that reproduces the problem. You can find this program here (I'm scared to trigger spam filters by attaching it here, but let me know if that is OK): https://github.com/CarloWood/ai-evio-testsuite/blob/e6e36b66256ec5d1760d2b7b96b34d60a12477aa/src/epoll_bug.c With socket snd/rcv buffers smaller than 32kB I always get the problem, but using 64kB (65536 bytes) I don't, on my box. I tested this with kernels 4.15, 5.2.0-rc7 and had someone else confirm it with 5.1.15. When running the program as-is, it will start printing a flood of "epoll_wait() was delayed!" meaning that epoll_wait() didn't return within 10 ms. The actual delay can be seen with for example strace -T. You can also uncomment the commented printf's which will print what is the amount of bytes written to and read from each socket and therefore what is the number of bytes that are still "in the pipe line". What I observe is that the number of bytes in the pipe line is very constant (because this just writes till no more can be written, and then - on the other end - reads till there is nothing to be read anymore - and that single threaded) until suddenly that amount jumps UP to about 2.5 MB - then is SLOWLY eaten till a say 1.5 MB (without that epoll says more can be written) when suddenly more can be written and it fills up the pipe line again. I have no idea where those mega bytes of data are buffered; the socket buffers are WAY smaller. Obviously I have no idea if this is a problem with epoll itself, or that there is a problem with TCP/IP for this case. But I am not a kernel hacker, so all I could to was provide this little test case that allows you to reproduce it. -- Carlo Wood <ca...@alinoe.com>