Hi all, Please disregard my previous email as I misread the pasted output. Sorry for the noise.
Best, ~G On Mon, Nov 1, 2021 at 6:45 PM Jeff <j...@vtkellers.com> wrote: > Hi Gabriel, > > Yes, 40 milliseconds (ms) == 40,000 microseconds (us). My benchmarking > output is reporting the latter, which is considerably higher than the 40us > you are seeing. If I benchmark just the serialization round trip as you > did, I get comparable results: 14us median on my Linux system. So at least > on Linux, there is something else contributing the remaining 39,986us. The > conclusion from earlier in this thread was that the culprit was TCP > behavior unique to the Linux network stack. > > Jeff > > On Mon, Nov 1 2021 at 05:55:45 PM -0700, Gabriel Becker < > gabembec...@gmail.com> wrote: > > Jeff, > > Perhaps I'm just missing something here, but ms is generally milliseconds, > not microseconds (which are much smaller), right? > > Also, this seems to just be how long it takes to roundtrip serialize iris > (in 4.1.0 on mac osx, as thats what I have handy right this moment): > > > microbenchmark({x <- unserialize(serialize(iris, connection = NULL))}) > > Unit: microseconds > > expr min > lq > > { x <- unserialize(serialize(iris, connection = NULL)) } 35.378 > 36.0085 > > mean median uq max neval > > 40.26888 36.4345 43.641 80.39 100 > > > > > res <- system.time(replicate(10000, {x <- unserialize(serialize(iris, > connection = NULL))})) > > > res/10000 > > user system elapsed > > 4.58e-05 2.90e-06 4.88e-05 > > > Thus the overhead appears to be extremely minimal in your results above, > right? In fact it seems to be comparable or lower than replicate. > > ~G > > > > > > On Mon, Nov 1, 2021 at 5:20 PM Jeff Keller <j...@vtkellers.com> wrote: > >> Hi Simon, >> >> I see there may have been some changes to address the TCP_NODELAY issue >> on Linux in >> https://github.com/wch/r-source/commit/82369f73fc297981e64cac8c9a696d05116f0797 >> . >> >> I gave this a try with R 4.1.1, but I still see a 40ms compute floor. Am >> I misunderstanding these changes or how socketOptions is intended to be >> used? >> >> -Jeff >> >> library(parallel) >> library(microbenchmark) >> options(socketOptions = "no-delay") >> cl <- makeCluster(1) >> (x <- microbenchmark(clusterEvalQ(cl, iris), times = 100, unit = "us")) >> # Unit: microseconds >> # expr min lq mean median uq max >> neval >> # clusterEvalQ(cl, iris) 96.9 43986.73 40535.93 43999.59 44012.79 >> 48046.6 100 >> >> > On 11/04/2020 5:41 AM Iñaki Ucar <iu...@fedoraproject.org> wrote: >> > >> > >> > Please, check a tcpdump session on localhost while running the >> following script: >> > >> > library(parallel) >> > library(tictoc) >> > cl <- makeCluster(1) >> > Sys.sleep(1) >> > >> > for (i in 1:10) { >> > tic() >> > x <- clusterEvalQ(cl, iris) >> > toc() >> > } >> > >> > The initialization phase comprises 7 packets. Then, the 1-second sleep >> > will help you see where the evaluation starts. Each clusterEvalQ >> > generates 6 packets: >> > >> > 1. main -> worker PSH, ACK 1026 bytes >> > 2. worker -> main ACK 66 bytes >> > 3. worker -> main PSH, ACK 3758 bytes >> > 4. main -> worker ACK 66 bytes >> > 5. worker -> main PSH, ACK 2484 bytes >> > 6. main -> worker ACK 66 bytes >> > >> > The first two are the command and its ACK, the following are the data >> > back and their ACKs. In the first 4-5 iterations, I see no delay at >> > all. Then, in the following iterations, a 40 ms delay starts to happen >> > between packets 3 and 4, that is: the main process delays the ACK to >> > the first packet of the incoming result. >> > >> > So I'd say Nagle is hardly to blame for this. It would be interesting >> > to see how many packets are generated with TCP_NODELAY on. If there >> > are still 6 packets, then we are fine. If we suddenly see a gazillion >> > packets, then TCP_NODELAY does more harm than good. On the other hand, >> > TCP_QUICKACK would surely solve the issue without any drawback. As >> > Nagle himself put it once, "set TCP_QUICKACK. If you find a case where >> > that makes things worse, let me know." >> > >> > Iñaki >> > >> > On Wed, 4 Nov 2020 at 04:34, Simon Urbanek <simon.urba...@r-project.org> >> wrote: >> > > >> > > I'm not sure the user would know ;). This is very system-specific >> issue just because the Linux network stack behaves so differently from >> other OSes (for purely historical reasons). That makes it hard to abstract >> as a "feature" for the R sockets that are supposed to be >> platform-independent. At least TCP_NODELAY is actually part of POSIX so it >> is on better footing, and disabling delayed ACK is practically only useful >> to work around the other side having Nagle on, so I would expect it to be >> rarely used. >> > > >> > > This is essentially RFC since we don't have a mechanism for socket >> options (well, almost, there is timeout and blocking already...) and I >> don't think we want to expose low-level details so perhaps one idea would >> be to add something like delay=NA to socketConnection() in order to not >> touch (NA), enable (TRUE) or disable (FALSE) TCP_NODELAY. I wonder if there >> is any other way we could infer the intention of the user to try to choose >> the right approach... >> > > >> > > Cheers, >> > > Simon >> > > >> > > >> > > > On Nov 3, 2020, at 02:28, Jeff <j...@vtkellers.com> wrote: >> > > > >> > > > Could TCP_NODELAY and TCP_QUICKACK be exposed to the R user so that >> they might determine what is best for their potentially latency- or >> throughput-sensitive application? >> > > > >> > > > Best, >> > > > Jeff >> > > > >> > > > On Mon, Nov 2, 2020 at 14:05, Iñaki Ucar <iu...@fedoraproject.org> >> wrote: >> > > >> On Mon, 2 Nov 2020 at 02:22, Simon Urbanek < >> simon.urba...@r-project.org> wrote: >> > > >>> It looks like R sockets on Linux could do with TCP_NODELAY -- >> without (status quo): >> > > >> How many network packets are generated with and without it? If >> there >> > > >> are many small writes and thus setting TCP_NODELAY causes many >> small >> > > >> packets to be sent, it might make more sense to set TCP_QUICKACK >> > > >> instead. >> > > >> Iñaki >> > > >>> Unit: microseconds >> > > >>> expr min lq mean median >> uq max >> > > >>> clusterEvalQ(cl, iris) 1449.997 43991.99 43975.21 43997.1 >> 44001.91 48027.83 >> > > >>> neval >> > > >>> 1000 >> > > >>> exactly the same machine + R but with TCP_NODELAY enabled in >> R_SockConnect(): >> > > >>> Unit: microseconds >> > > >>> expr min lq mean median uq >> max neval >> > > >>> clusterEvalQ(cl, iris) 156.125 166.41 180.8806 170.247 174.298 >> 5322.234 1000 >> > > >>> Cheers, >> > > >>> Simon >> > > >>> > On 2/11/2020, at 3:39 AM, Jeff <j...@vtkellers.com> wrote: >> > > >>> > >> > > >>> > I'm exploring latency overhead of parallel PSOCK workers and >> noticed that serializing/unserializing data back to the main R session is >> significantly slower on Linux than it is on Windows/MacOS with similar >> hardware. Is there a reason for this difference and is there a way to avoid >> the apparent additional Linux overhead? >> > > >>> > >> > > >>> > I attempted to isolate the behavior with a test that simply >> returns an existing object from the worker back to the main R session. >> > > >>> > >> > > >>> > library(parallel) >> > > >>> > library(microbenchmark) >> > > >>> > gcinfo(TRUE) >> > > >>> > cl <- makeCluster(1) >> > > >>> > (x <- microbenchmark(clusterEvalQ(cl, iris), times = 1000, unit >> = "us")) >> > > >>> > plot(x$time, ylab = "microseconds") >> > > >>> > head(x$time, n = 10) >> > > >>> > >> > > >>> > On Windows/MacOS, the test runs in 300-500 microseconds >> depending on hardware. A few of the 1000 runs are an order of magnitude >> slower but this can probably be attributed to garbage collection on the >> worker. >> > > >>> > >> > > >>> > On Linux, the first 5 or so executions run at comparable speeds >> but all subsequent executions are two orders of magnitude slower (~40 >> milliseconds). >> > > >>> > >> > > >>> > I see this behavior across various platforms and hardware >> combinations: >> > > >>> > >> > > >>> > Ubuntu 18.04 (Intel Xeon Platinum 8259CL) >> > > >>> > Linux Mint 19.3 (AMD Ryzen 7 1800X) >> > > >>> > Linux Mint 20 (AMD Ryzen 7 3700X) >> > > >>> > Windows 10 (AMD Ryzen 7 4800H) >> > > >>> > MacOS 10.15.7 (Intel Core i7-8850H) >> > > >>> > >> > > >>> > ______________________________________________ >> > > >>> > R-devel@r-project.org mailing list >> > > >>> > https://stat.ethz.ch/mailman/listinfo/r-devel >> > > >>> > >> > > >>> ______________________________________________ >> > > >>> R-devel@r-project.org mailing list >> > > >>> https://stat.ethz.ch/mailman/listinfo/r-devel >> > > >> -- >> > > >> Iñaki Úcar >> > > > >> > > > ______________________________________________ >> > > > R-devel@r-project.org mailing list >> > > > https://stat.ethz.ch/mailman/listinfo/r-devel >> > > > >> > > >> > >> > >> > -- >> > Iñaki Úcar >> >> ______________________________________________ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel