Hi Simon, I see there may have been some changes to address the TCP_NODELAY issue on Linux in https://github.com/wch/r-source/commit/82369f73fc297981e64cac8c9a696d05116f0797.
I gave this a try with R 4.1.1, but I still see a 40ms compute floor. Am I misunderstanding these changes or how socketOptions is intended to be used? -Jeff library(parallel) library(microbenchmark) options(socketOptions = "no-delay") cl <- makeCluster(1) (x <- microbenchmark(clusterEvalQ(cl, iris), times = 100, unit = "us")) # Unit: microseconds # expr min lq mean median uq max neval # clusterEvalQ(cl, iris) 96.9 43986.73 40535.93 43999.59 44012.79 48046.6 100 > On 11/04/2020 5:41 AM Iñaki Ucar <iu...@fedoraproject.org> wrote: > > > Please, check a tcpdump session on localhost while running the following > script: > > library(parallel) > library(tictoc) > cl <- makeCluster(1) > Sys.sleep(1) > > for (i in 1:10) { > tic() > x <- clusterEvalQ(cl, iris) > toc() > } > > The initialization phase comprises 7 packets. Then, the 1-second sleep > will help you see where the evaluation starts. Each clusterEvalQ > generates 6 packets: > > 1. main -> worker PSH, ACK 1026 bytes > 2. worker -> main ACK 66 bytes > 3. worker -> main PSH, ACK 3758 bytes > 4. main -> worker ACK 66 bytes > 5. worker -> main PSH, ACK 2484 bytes > 6. main -> worker ACK 66 bytes > > The first two are the command and its ACK, the following are the data > back and their ACKs. In the first 4-5 iterations, I see no delay at > all. Then, in the following iterations, a 40 ms delay starts to happen > between packets 3 and 4, that is: the main process delays the ACK to > the first packet of the incoming result. > > So I'd say Nagle is hardly to blame for this. It would be interesting > to see how many packets are generated with TCP_NODELAY on. If there > are still 6 packets, then we are fine. If we suddenly see a gazillion > packets, then TCP_NODELAY does more harm than good. On the other hand, > TCP_QUICKACK would surely solve the issue without any drawback. As > Nagle himself put it once, "set TCP_QUICKACK. If you find a case where > that makes things worse, let me know." > > Iñaki > > On Wed, 4 Nov 2020 at 04:34, Simon Urbanek <simon.urba...@r-project.org> > wrote: > > > > I'm not sure the user would know ;). This is very system-specific issue > > just because the Linux network stack behaves so differently from other OSes > > (for purely historical reasons). That makes it hard to abstract as a > > "feature" for the R sockets that are supposed to be platform-independent. > > At least TCP_NODELAY is actually part of POSIX so it is on better footing, > > and disabling delayed ACK is practically only useful to work around the > > other side having Nagle on, so I would expect it to be rarely used. > > > > This is essentially RFC since we don't have a mechanism for socket options > > (well, almost, there is timeout and blocking already...) and I don't think > > we want to expose low-level details so perhaps one idea would be to add > > something like delay=NA to socketConnection() in order to not touch (NA), > > enable (TRUE) or disable (FALSE) TCP_NODELAY. I wonder if there is any > > other way we could infer the intention of the user to try to choose the > > right approach... > > > > Cheers, > > Simon > > > > > > > On Nov 3, 2020, at 02:28, Jeff <j...@vtkellers.com> wrote: > > > > > > Could TCP_NODELAY and TCP_QUICKACK be exposed to the R user so that they > > > might determine what is best for their potentially latency- or > > > throughput-sensitive application? > > > > > > Best, > > > Jeff > > > > > > On Mon, Nov 2, 2020 at 14:05, Iñaki Ucar <iu...@fedoraproject.org> wrote: > > >> On Mon, 2 Nov 2020 at 02:22, Simon Urbanek <simon.urba...@r-project.org> > > >> wrote: > > >>> It looks like R sockets on Linux could do with TCP_NODELAY -- without > > >>> (status quo): > > >> How many network packets are generated with and without it? If there > > >> are many small writes and thus setting TCP_NODELAY causes many small > > >> packets to be sent, it might make more sense to set TCP_QUICKACK > > >> instead. > > >> Iñaki > > >>> Unit: microseconds > > >>> expr min lq mean median uq > > >>> max > > >>> clusterEvalQ(cl, iris) 1449.997 43991.99 43975.21 43997.1 44001.91 > > >>> 48027.83 > > >>> neval > > >>> 1000 > > >>> exactly the same machine + R but with TCP_NODELAY enabled in > > >>> R_SockConnect(): > > >>> Unit: microseconds > > >>> expr min lq mean median uq > > >>> max neval > > >>> clusterEvalQ(cl, iris) 156.125 166.41 180.8806 170.247 174.298 > > >>> 5322.234 1000 > > >>> Cheers, > > >>> Simon > > >>> > On 2/11/2020, at 3:39 AM, Jeff <j...@vtkellers.com> wrote: > > >>> > > > >>> > I'm exploring latency overhead of parallel PSOCK workers and noticed > > >>> > that serializing/unserializing data back to the main R session is > > >>> > significantly slower on Linux than it is on Windows/MacOS with > > >>> > similar hardware. Is there a reason for this difference and is there > > >>> > a way to avoid the apparent additional Linux overhead? > > >>> > > > >>> > I attempted to isolate the behavior with a test that simply returns > > >>> > an existing object from the worker back to the main R session. > > >>> > > > >>> > library(parallel) > > >>> > library(microbenchmark) > > >>> > gcinfo(TRUE) > > >>> > cl <- makeCluster(1) > > >>> > (x <- microbenchmark(clusterEvalQ(cl, iris), times = 1000, unit = > > >>> > "us")) > > >>> > plot(x$time, ylab = "microseconds") > > >>> > head(x$time, n = 10) > > >>> > > > >>> > On Windows/MacOS, the test runs in 300-500 microseconds depending on > > >>> > hardware. A few of the 1000 runs are an order of magnitude slower but > > >>> > this can probably be attributed to garbage collection on the worker. > > >>> > > > >>> > On Linux, the first 5 or so executions run at comparable speeds but > > >>> > all subsequent executions are two orders of magnitude slower (~40 > > >>> > milliseconds). > > >>> > > > >>> > I see this behavior across various platforms and hardware > > >>> > combinations: > > >>> > > > >>> > Ubuntu 18.04 (Intel Xeon Platinum 8259CL) > > >>> > Linux Mint 19.3 (AMD Ryzen 7 1800X) > > >>> > Linux Mint 20 (AMD Ryzen 7 3700X) > > >>> > Windows 10 (AMD Ryzen 7 4800H) > > >>> > MacOS 10.15.7 (Intel Core i7-8850H) > > >>> > > > >>> > ______________________________________________ > > >>> > R-devel@r-project.org mailing list > > >>> > https://stat.ethz.ch/mailman/listinfo/r-devel > > >>> > > > >>> ______________________________________________ > > >>> R-devel@r-project.org mailing list > > >>> https://stat.ethz.ch/mailman/listinfo/r-devel > > >> -- > > >> Iñaki Úcar > > > > > > ______________________________________________ > > > R-devel@r-project.org mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > > > > > -- > Iñaki Úcar ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel