[Rd] R 4.1.2 is released
The build system rolled up R-4.1.2.tar.gz (codename "Bird Hippie") this morning. The list below details the changes in this release. You can get the source code from https://cran.r-project.org/src/base/R-4/R-4.1.2.tar.gz or wait for it to be mirrored at a CRAN site nearer to you. Binaries for various platforms will appear in due course. For the R Core Team, Peter Dalgaard These are the checksums (md5 and SHA-256) for the freshly created files, in case you wish to check that they are uncorrupted: MD5 (AUTHORS) = 320967884b547734d6279dedbc739dd4 MD5 (COPYING) = eb723b61539feef013de476e68b5c50a MD5 (COPYING.LIB) = a6f89e2100d9b6cdffcea4f398e37343 MD5 (FAQ) = ade6a3d38fe5e6a456929cae2b94d568 MD5 (INSTALL) = 7893f754308ca31f1ccf62055090ad7b MD5 (NEWS) = 924e68decbf327f538a09afb1838506b MD5 (NEWS.0) = bfcd7c147251b5474d96848c6f57e5a8 MD5 (NEWS.1) = eb78c4d053ec9c32b815cf0c2ebea801 MD5 (NEWS.2) = a767f7809324c73c49eaff47d14bce81 MD5 (NEWS.3) = e55ed2c8a547b827b46e08eb7137ba23 MD5 (R-latest.tar.gz) = 6e28db9d02c6d3dae51a149b8e261ab1 MD5 (README) = f468f281c919665e276a1b691decbbe6 MD5 (RESOURCES) = a79b9b338cab09bd665f6b62ac6f455b MD5 (THANKS) = 251d20510bfc3cc93b82c5a99f7efcc6 MD5 (VERSION-INFO.dcf) = a72a49578a254b9163f0f10322a3eecc MD5 (R-4/R-4.1.2.tar.gz) = 6e28db9d02c6d3dae51a149b8e261ab1 60a0d150e6fc1f424be76ad7b645d236b56e747692a4679f81ce6536c550e949 AUTHORS e6d6a009505e345fe949e1310334fcb0747f28dae2856759de102ab66b722cb4 COPYING 6095e9ffa777dd22839f7801aa845b31c9ed07f3d6bf8a26dc5d2dec8ccc0ef3 COPYING.LIB e84c67931e9b925abb9142d4a6b4ef03b7605948bbf384d7e3d2401823c7f1fe FAQ f87461be6cbaecc4dce44ac58e5bd52364b0491ccdadaf846cb9b452e9550f31 INSTALL 73d5bfb8711bb7833ce8fe7a1359566d48001d13cd32affbd800d759f0b3232a NEWS 4e21b62f515b749f80997063fceab626d7258c7d650e81a662ba8e0640f12f62 NEWS.0 12b30c724117b1b2b11484673906a6dcd48a361f69fc420b36194f9218692d01 NEWS.1 ba74618bc3f4c0e336dca13d472402a1863d12ba6f7f91a1782bc469ee986f6d NEWS.2 1910a2405300b9bc7c76beeb0753a5249cf799afe175ce28f8d782fab723e012 NEWS.3 2036225e9f7207d4ce097e54972aecdaa8b40d7d9911cd26491fac5a0fab38af R-latest.tar.gz 2fdd3e90f23f32692d4b3a0c0452f2c219a10882033d1774f8cadf25886c3ddc README 8b7d3856100220f4555d4d57140829f2e81c27eccec5b441f5dce616e9ec9061 RESOURCES c9c7cb32308b4e560a22c858819ade9de524a602abd4e92d1c328c89f8037d73 THANKS 1e74ef089b526538bbb658dc189bc3d34d931839e9933415fb2f267fd57b0b69 VERSION-INFO.dcf 2036225e9f7207d4ce097e54972aecdaa8b40d7d9911cd26491fac5a0fab38af R-4/R-4.1.2.tar.gz This is the relevant part of the NEWS file CHANGES IN R 4.1.2: C-LEVEL FACILITIES: * The workaround in headers R.h and Rmath.h (using namespace std;) for the Oracle Developer Studio compiler is no longer needed now C++11 is required so has been removed. A couple more usages of log() (which should have been std::log()) with an int argument are reported on Solaris. * The undocumented limit of 4095 bytes on messages from the S-compatibility macros PROBLEM and MESSAGE is now documented and longer messages will be silently truncated rather than potentially causing segfaults. * If the R_NO_SEGV_HANDLER environment variable is non-empty, the signal handler for SEGV/ILL/BUS signals (which offers recovery user interface) is not set. This allows more reliable debugging of crashes that involve the console. DEPRECATED AND DEFUNCT: * The legacy S-compatibility macros PROBLEM, MESSAGE, ERROR, WARN, WARNING, RECOVER, ... are deprecated and will be hidden in R 4.2.0. R's native interface of Rf_error and Rf_warning has long been preferred. BUG FIXES: * .mapply(F, dots, .) no longer segfaults when dots is not a list and uses match.fun(F) as always documented; reported by Andrew Simmons in PR#18164. * hist(, ...) and hist(, ...) no longer pass arguments for rect() (such as col and density) to axis(). (Thanks to Sebastian Meyer's PR#18171.) * \Sexpr{ch} now preserves Encoding(ch). (Thanks to report and patch by Jeroen Ooms in PR#18152.) * Setting the RNG to "Marsaglia-Multicarry" e.g., by RNGkind(), now warns in more places, thanks to Andr'e Gillibert's report and patch in PR#18168. * gray(numeric(), alpha=1/2) no longer segfaults, fixing PR#18183, reported by Till Krenz. * Fixed dnbinom(x, size=, .., log=TRUE) regression, reported by Martin Morgan. * as.Date.POSIXlt(x) now keeps names(x), thanks to Davis Vaughan's report and patch in PR#18188. * model.response() now strips an "AsIs" class typically, thanks to Duncan Murdoch's report and other discussants in PR#18190. * try() is considerably faster in case of an error and long call, as e.g., from some do.call(). Thanks to Alexander Kaever's suggestion posted to R-devel. * qqline(y = ) such as y=I(.), now works, see also PR#18190. * Non-integer mgp par(
[Rd] Wrong number of names?
The StackOverflow post https://stackoverflow.com/a/69767361/2554330 discusses a dataframe which has a named numeric column of length 1488 that has 744 names. I don't think this is ever legal, but am I wrong about that? The `dat.rds` file mentioned in the post is temporarily available online in case anyone else wants to examine it. Assuming that the file contains a badly formed object, I wonder if readRDS() should do some sanity checks as it reads. Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Wrong number of names?
> Duncan Murdoch > on Mon, 1 Nov 2021 06:36:17 -0400 writes: > The StackOverflow post > https://stackoverflow.com/a/69767361/2554330 discusses a > dataframe which has a named numeric column of length 1488 > that has 744 names. I don't think this is ever legal, but > am I wrong about that? > The `dat.rds` file mentioned in the post is temporarily > available online in case anyone else wants to examine it. > Assuming that the file contains a badly formed object, I > wonder if readRDS() should do some sanity checks as it > reads. > Duncan Murdoch Good question. In the mean time, I've also added a bit on the SO page above.. e.g. --- d <- readRDS("<.>dat.rds") str(d) ## 'data.frame':1488 obs. of 4 variables: ## $ facet_var: chr "AUT" "AUT" "AUT" "AUT" ... ## $ date : Date, format: "2020-04-26" "2020-04-27" ... ## $ variable : Factor w/ 2 levels "arima","prophet": 1 1 1 1 1 1 1 1 1 1 ... ## $ score: Named num 2.74e-06 2.41e-06 2.48e-06 2.39e-06 2.79e-06 ... ## ..- attr(*, "names")= chr [1:744] "new_confirmed10" "new_confirmed10" "new_confirmed10" "new_confirmed10" ... ds <- d$score c(length(ds), length(names(ds))) ## 1488 744 dput(ds) # -> ## *** caught segfault *** ## address (nil), cause 'memory not mapped' --- Hence "proving" that the dat.rds really contains an invalid object, when simple dput(.) directly gives a segmentation fault. I think we are aware that using C code and say .Call(..) one can create all kinds of invalid objects "easily".. and I think it's clear that it's not feasible to check for validity of such objects "everwhere". Your proposal to have at least our deserialization code used in readRDS() do (at least *some*) validity checks seems good, but maybe we should think of more cases, and / or do such validity checks already during serialization { <-> saveRDS() here } ? .. Such questions then really are for those who understand more than me about (de)serialization in R, its performance bottlenecks etc. Given the speed impact we should probably have such checks *optional* but have them *on* by default e.g., at least for saveRDS() ? Martin __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Wrong number of names?
> On 1 Nov 2021, at 11:36 , Duncan Murdoch wrote: > > The StackOverflow post https://stackoverflow.com/a/69767361/2554330 discusses > a dataframe which has a named numeric column of length 1488 that has 744 > names. I don't think this is ever legal, but am I wrong about that? > It is certainly not easy to create such objects at the R level, e.g.: > x <- 1:10 > names(x) <- 1:10 > length(names(x)) <- 5 > x 12345 123456789 10 > names(x) [1] "1" "2" "3" "4" "5" NA NA NA NA NA or even > x <- 1:10 > attributes(x)$foo <- 1:5 > x [1] 1 2 3 4 5 6 7 8 9 10 attr(,"foo") [1] 1 2 3 4 5 > names(attributes(x)) <- "names" > x 12345 123456789 10 > dput(x) structure(1:10, .Names = c("1", "2", "3", "4", "5", NA, NA, NA, NA, NA)) of course, at the C level, everything is possible... > The `dat.rds` file mentioned in the post is temporarily available online in > case anyone else wants to examine it. > > Assuming that the file contains a badly formed object, I wonder if readRDS() > should do some sanity checks as it reads. > > Duncan Murdoch > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Wrong number of names?
On 01/11/2021 9:10 a.m., Martin Maechler wrote: Duncan Murdoch on Mon, 1 Nov 2021 06:36:17 -0400 writes: > The StackOverflow post > https://stackoverflow.com/a/69767361/2554330 discusses a > dataframe which has a named numeric column of length 1488 > that has 744 names. I don't think this is ever legal, but > am I wrong about that? > The `dat.rds` file mentioned in the post is temporarily > available online in case anyone else wants to examine it. > Assuming that the file contains a badly formed object, I > wonder if readRDS() should do some sanity checks as it > reads. > Duncan Murdoch Good question. In the mean time, I've also added a bit on the SO page above.. e.g. --- d <- readRDS("<.>dat.rds") str(d) ## 'data.frame':1488 obs. of 4 variables: ## $ facet_var: chr "AUT" "AUT" "AUT" "AUT" ... ## $ date : Date, format: "2020-04-26" "2020-04-27" ... ## $ variable : Factor w/ 2 levels "arima","prophet": 1 1 1 1 1 1 1 1 1 1 ... ## $ score: Named num 2.74e-06 2.41e-06 2.48e-06 2.39e-06 2.79e-06 ... ## ..- attr(*, "names")= chr [1:744] "new_confirmed10" "new_confirmed10" "new_confirmed10" "new_confirmed10" ... ds <- d$score c(length(ds), length(names(ds))) ## 1488 744 dput(ds) # -> ## *** caught segfault *** ## address (nil), cause 'memory not mapped' --- Hence "proving" that the dat.rds really contains an invalid object, when simple dput(.) directly gives a segmentation fault. I think we are aware that using C code and say .Call(..) one can create all kinds of invalid objects "easily".. and I think it's clear that it's not feasible to check for validity of such objects "everwhere". Your proposal to have at least our deserialization code used in readRDS() do (at least *some*) validity checks seems good, but maybe we should think of more cases, and / or do such validity checks already during serialization { <-> saveRDS() here } ? .. Such questions then really are for those who understand more than me about (de)serialization in R, its performance bottlenecks etc. Given the speed impact we should probably have such checks *optional* but have them *on* by default e.g., at least for saveRDS() ? It might make sense to start with a contributed package. It could include lots of checks without worrying about how expensive they are; if some of them prove to be cost-effective, they could be moved into base functions. Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] parallel PSOCK connection latency is greater on Linux?
Hi Simon, I see there may have been some changes to address the TCP_NODELAY issue on Linux in https://github.com/wch/r-source/commit/82369f73fc297981e64cac8c9a696d05116f0797. I gave this a try with R 4.1.1, but I still see a 40ms compute floor. Am I misunderstanding these changes or how socketOptions is intended to be used? -Jeff library(parallel) library(microbenchmark) options(socketOptions = "no-delay") cl <- makeCluster(1) (x <- microbenchmark(clusterEvalQ(cl, iris), times = 100, unit = "us")) # Unit: microseconds # expr min lq mean median uq max neval # clusterEvalQ(cl, iris) 96.9 43986.73 40535.93 43999.59 44012.79 48046.6 100 > On 11/04/2020 5:41 AM Iñaki Ucar wrote: > > > Please, check a tcpdump session on localhost while running the following > script: > > library(parallel) > library(tictoc) > cl <- makeCluster(1) > Sys.sleep(1) > > for (i in 1:10) { > tic() > x <- clusterEvalQ(cl, iris) > toc() > } > > The initialization phase comprises 7 packets. Then, the 1-second sleep > will help you see where the evaluation starts. Each clusterEvalQ > generates 6 packets: > > 1. main -> worker PSH, ACK 1026 bytes > 2. worker -> main ACK 66 bytes > 3. worker -> main PSH, ACK 3758 bytes > 4. main -> worker ACK 66 bytes > 5. worker -> main PSH, ACK 2484 bytes > 6. main -> worker ACK 66 bytes > > The first two are the command and its ACK, the following are the data > back and their ACKs. In the first 4-5 iterations, I see no delay at > all. Then, in the following iterations, a 40 ms delay starts to happen > between packets 3 and 4, that is: the main process delays the ACK to > the first packet of the incoming result. > > So I'd say Nagle is hardly to blame for this. It would be interesting > to see how many packets are generated with TCP_NODELAY on. If there > are still 6 packets, then we are fine. If we suddenly see a gazillion > packets, then TCP_NODELAY does more harm than good. On the other hand, > TCP_QUICKACK would surely solve the issue without any drawback. As > Nagle himself put it once, "set TCP_QUICKACK. If you find a case where > that makes things worse, let me know." > > Iñaki > > On Wed, 4 Nov 2020 at 04:34, Simon Urbanek > wrote: > > > > I'm not sure the user would know ;). This is very system-specific issue > > just because the Linux network stack behaves so differently from other OSes > > (for purely historical reasons). That makes it hard to abstract as a > > "feature" for the R sockets that are supposed to be platform-independent. > > At least TCP_NODELAY is actually part of POSIX so it is on better footing, > > and disabling delayed ACK is practically only useful to work around the > > other side having Nagle on, so I would expect it to be rarely used. > > > > This is essentially RFC since we don't have a mechanism for socket options > > (well, almost, there is timeout and blocking already...) and I don't think > > we want to expose low-level details so perhaps one idea would be to add > > something like delay=NA to socketConnection() in order to not touch (NA), > > enable (TRUE) or disable (FALSE) TCP_NODELAY. I wonder if there is any > > other way we could infer the intention of the user to try to choose the > > right approach... > > > > Cheers, > > Simon > > > > > > > On Nov 3, 2020, at 02:28, Jeff wrote: > > > > > > Could TCP_NODELAY and TCP_QUICKACK be exposed to the R user so that they > > > might determine what is best for their potentially latency- or > > > throughput-sensitive application? > > > > > > Best, > > > Jeff > > > > > > On Mon, Nov 2, 2020 at 14:05, Iñaki Ucar wrote: > > >> On Mon, 2 Nov 2020 at 02:22, Simon Urbanek > > >> wrote: > > >>> It looks like R sockets on Linux could do with TCP_NODELAY -- without > > >>> (status quo): > > >> How many network packets are generated with and without it? If there > > >> are many small writes and thus setting TCP_NODELAY causes many small > > >> packets to be sent, it might make more sense to set TCP_QUICKACK > > >> instead. > > >> Iñaki > > >>> Unit: microseconds > > >>>expr min lq mean median uq > > >>> max > > >>> clusterEvalQ(cl, iris) 1449.997 43991.99 43975.21 43997.1 44001.91 > > >>> 48027.83 > > >>> neval > > >>> 1000 > > >>> exactly the same machine + R but with TCP_NODELAY enabled in > > >>> R_SockConnect(): > > >>> Unit: microseconds > > >>>expr min lq mean median uq > > >>> max neval > > >>> clusterEvalQ(cl, iris) 156.125 166.41 180.8806 170.247 174.298 > > >>> 5322.234 1000 > > >>> Cheers, > > >>> Simon > > >>> > On 2/11/2020, at 3:39 AM, Jeff wrote: > > >>> > > > >>> > I'm exploring latency overhead of parallel PSOCK workers and noticed > > >>> > that serializing/unserializing data back to the main R session is > > >>> > significantly slower on Linux than it is on Windows/MacOS with > > >>> > similar hardware. Is there a reason f
Re: [Rd] parallel PSOCK connection latency is greater on Linux?
Jeff, Perhaps I'm just missing something here, but ms is generally milliseconds, not microseconds (which are much smaller), right? Also, this seems to just be how long it takes to roundtrip serialize iris (in 4.1.0 on mac osx, as thats what I have handy right this moment): > microbenchmark({x <- unserialize(serialize(iris, connection = NULL))}) Unit: microseconds exprmin lq { x <- unserialize(serialize(iris, connection = NULL)) } 35.378 36.0085 mean median uq max neval 40.26888 36.4345 43.641 80.39 100 > res <- system.time(replicate(1, {x <- unserialize(serialize(iris, connection = NULL))})) > res/1 user system elapsed 4.58e-05 2.90e-06 4.88e-05 Thus the overhead appears to be extremely minimal in your results above, right? In fact it seems to be comparable or lower than replicate. ~G On Mon, Nov 1, 2021 at 5:20 PM Jeff Keller wrote: > Hi Simon, > > I see there may have been some changes to address the TCP_NODELAY issue on > Linux in > https://github.com/wch/r-source/commit/82369f73fc297981e64cac8c9a696d05116f0797 > . > > I gave this a try with R 4.1.1, but I still see a 40ms compute floor. Am I > misunderstanding these changes or how socketOptions is intended to be used? > > -Jeff > > library(parallel) > library(microbenchmark) > options(socketOptions = "no-delay") > cl <- makeCluster(1) > (x <- microbenchmark(clusterEvalQ(cl, iris), times = 100, unit = "us")) > # Unit: microseconds > # expr min lq mean median uq max > neval > # clusterEvalQ(cl, iris) 96.9 43986.73 40535.93 43999.59 44012.79 48046.6 > 100 > > > On 11/04/2020 5:41 AM Iñaki Ucar wrote: > > > > > > Please, check a tcpdump session on localhost while running the following > script: > > > > library(parallel) > > library(tictoc) > > cl <- makeCluster(1) > > Sys.sleep(1) > > > > for (i in 1:10) { > > tic() > > x <- clusterEvalQ(cl, iris) > > toc() > > } > > > > The initialization phase comprises 7 packets. Then, the 1-second sleep > > will help you see where the evaluation starts. Each clusterEvalQ > > generates 6 packets: > > > > 1. main -> worker PSH, ACK 1026 bytes > > 2. worker -> main ACK 66 bytes > > 3. worker -> main PSH, ACK 3758 bytes > > 4. main -> worker ACK 66 bytes > > 5. worker -> main PSH, ACK 2484 bytes > > 6. main -> worker ACK 66 bytes > > > > The first two are the command and its ACK, the following are the data > > back and their ACKs. In the first 4-5 iterations, I see no delay at > > all. Then, in the following iterations, a 40 ms delay starts to happen > > between packets 3 and 4, that is: the main process delays the ACK to > > the first packet of the incoming result. > > > > So I'd say Nagle is hardly to blame for this. It would be interesting > > to see how many packets are generated with TCP_NODELAY on. If there > > are still 6 packets, then we are fine. If we suddenly see a gazillion > > packets, then TCP_NODELAY does more harm than good. On the other hand, > > TCP_QUICKACK would surely solve the issue without any drawback. As > > Nagle himself put it once, "set TCP_QUICKACK. If you find a case where > > that makes things worse, let me know." > > > > Iñaki > > > > On Wed, 4 Nov 2020 at 04:34, Simon Urbanek > wrote: > > > > > > I'm not sure the user would know ;). This is very system-specific > issue just because the Linux network stack behaves so differently from > other OSes (for purely historical reasons). That makes it hard to abstract > as a "feature" for the R sockets that are supposed to be > platform-independent. At least TCP_NODELAY is actually part of POSIX so it > is on better footing, and disabling delayed ACK is practically only useful > to work around the other side having Nagle on, so I would expect it to be > rarely used. > > > > > > This is essentially RFC since we don't have a mechanism for socket > options (well, almost, there is timeout and blocking already...) and I > don't think we want to expose low-level details so perhaps one idea would > be to add something like delay=NA to socketConnection() in order to not > touch (NA), enable (TRUE) or disable (FALSE) TCP_NODELAY. I wonder if there > is any other way we could infer the intention of the user to try to choose > the right approach... > > > > > > Cheers, > > > Simon > > > > > > > > > > On Nov 3, 2020, at 02:28, Jeff wrote: > > > > > > > > Could TCP_NODELAY and TCP_QUICKACK be exposed to the R user so that > they might determine what is best for their potentially latency- or > throughput-sensitive application? > > > > > > > > Best, > > > > Jeff > > > > > > > > On Mon, Nov 2, 2020 at 14:05, Iñaki Ucar > wrote: > > > >> On Mon, 2 Nov 2020 at 02:22, Simon Urbanek < > simon.urba...@r-project.org> wrote: > > > >>> It looks like R sockets on Linux could do with TCP_NODELAY -- > without (status quo): > > > >> How many network packets are generated with and without it? If there > >
[Rd] FLIBS in MacOS M1 binary at odds with documentation for optional libraries/tools
The Mac OS M1 pre-built binary arrives with a /Library/Frameworks/R.framework/Resources/etc/Makevars containing FLIBS = -L/Volumes/Builds/opt/R/arm64/gfortran/lib/gcc/aarch64-apple-darwin20.2.0/11.0.0 -L/Volumes/Builds/opt/R/arm64/gfortran/lib/gcc -L/Volumes/Builds/opt/R/arm64/gfortran/lib -lgfortran -lemutls_w -lm This is inconsistent with what is at said at the top of https://mac.r-project.org/libs-arm64/: that all binaries live in /opt/R/arm64, not /Volumes/Builds/opt/R/arm64. So no one would be able to build a source package containing Fortran without either modifying Makevars or creating symbolic links. -Naras [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] parallel PSOCK connection latency is greater on Linux?
Hi all, Please disregard my previous email as I misread the pasted output. Sorry for the noise. Best, ~G On Mon, Nov 1, 2021 at 6:45 PM Jeff wrote: > Hi Gabriel, > > Yes, 40 milliseconds (ms) == 40,000 microseconds (us). My benchmarking > output is reporting the latter, which is considerably higher than the 40us > you are seeing. If I benchmark just the serialization round trip as you > did, I get comparable results: 14us median on my Linux system. So at least > on Linux, there is something else contributing the remaining 39,986us. The > conclusion from earlier in this thread was that the culprit was TCP > behavior unique to the Linux network stack. > > Jeff > > On Mon, Nov 1 2021 at 05:55:45 PM -0700, Gabriel Becker < > gabembec...@gmail.com> wrote: > > Jeff, > > Perhaps I'm just missing something here, but ms is generally milliseconds, > not microseconds (which are much smaller), right? > > Also, this seems to just be how long it takes to roundtrip serialize iris > (in 4.1.0 on mac osx, as thats what I have handy right this moment): > > > microbenchmark({x <- unserialize(serialize(iris, connection = NULL))}) > > Unit: microseconds > > exprmin > lq > > { x <- unserialize(serialize(iris, connection = NULL)) } 35.378 > 36.0085 > > mean median uq max neval > > 40.26888 36.4345 43.641 80.39 100 > > > > > res <- system.time(replicate(1, {x <- unserialize(serialize(iris, > connection = NULL))})) > > > res/1 > > user system elapsed > > 4.58e-05 2.90e-06 4.88e-05 > > > Thus the overhead appears to be extremely minimal in your results above, > right? In fact it seems to be comparable or lower than replicate. > > ~G > > > > > > On Mon, Nov 1, 2021 at 5:20 PM Jeff Keller wrote: > >> Hi Simon, >> >> I see there may have been some changes to address the TCP_NODELAY issue >> on Linux in >> https://github.com/wch/r-source/commit/82369f73fc297981e64cac8c9a696d05116f0797 >> . >> >> I gave this a try with R 4.1.1, but I still see a 40ms compute floor. Am >> I misunderstanding these changes or how socketOptions is intended to be >> used? >> >> -Jeff >> >> library(parallel) >> library(microbenchmark) >> options(socketOptions = "no-delay") >> cl <- makeCluster(1) >> (x <- microbenchmark(clusterEvalQ(cl, iris), times = 100, unit = "us")) >> # Unit: microseconds >> # expr min lq mean median uq max >> neval >> # clusterEvalQ(cl, iris) 96.9 43986.73 40535.93 43999.59 44012.79 >> 48046.6 100 >> >> > On 11/04/2020 5:41 AM Iñaki Ucar wrote: >> > >> > >> > Please, check a tcpdump session on localhost while running the >> following script: >> > >> > library(parallel) >> > library(tictoc) >> > cl <- makeCluster(1) >> > Sys.sleep(1) >> > >> > for (i in 1:10) { >> > tic() >> > x <- clusterEvalQ(cl, iris) >> > toc() >> > } >> > >> > The initialization phase comprises 7 packets. Then, the 1-second sleep >> > will help you see where the evaluation starts. Each clusterEvalQ >> > generates 6 packets: >> > >> > 1. main -> worker PSH, ACK 1026 bytes >> > 2. worker -> main ACK 66 bytes >> > 3. worker -> main PSH, ACK 3758 bytes >> > 4. main -> worker ACK 66 bytes >> > 5. worker -> main PSH, ACK 2484 bytes >> > 6. main -> worker ACK 66 bytes >> > >> > The first two are the command and its ACK, the following are the data >> > back and their ACKs. In the first 4-5 iterations, I see no delay at >> > all. Then, in the following iterations, a 40 ms delay starts to happen >> > between packets 3 and 4, that is: the main process delays the ACK to >> > the first packet of the incoming result. >> > >> > So I'd say Nagle is hardly to blame for this. It would be interesting >> > to see how many packets are generated with TCP_NODELAY on. If there >> > are still 6 packets, then we are fine. If we suddenly see a gazillion >> > packets, then TCP_NODELAY does more harm than good. On the other hand, >> > TCP_QUICKACK would surely solve the issue without any drawback. As >> > Nagle himself put it once, "set TCP_QUICKACK. If you find a case where >> > that makes things worse, let me know." >> > >> > Iñaki >> > >> > On Wed, 4 Nov 2020 at 04:34, Simon Urbanek >> wrote: >> > > >> > > I'm not sure the user would know ;). This is very system-specific >> issue just because the Linux network stack behaves so differently from >> other OSes (for purely historical reasons). That makes it hard to abstract >> as a "feature" for the R sockets that are supposed to be >> platform-independent. At least TCP_NODELAY is actually part of POSIX so it >> is on better footing, and disabling delayed ACK is practically only useful >> to work around the other side having Nagle on, so I would expect it to be >> rarely used. >> > > >> > > This is essentially RFC since we don't have a mechanism for socket >> options (well, almost, there is timeout and blocking already...) and I >> don't think we want to expose low-level details so perhaps one
Re: [Rd] parallel PSOCK connection latency is greater on Linux?
Jeff, you are not setting the option on the server side, only on the client side, so the worker will still wait (which is where it matters). If you set it on the server (worker) side then it works as expected: > cl <- makeCluster(1, rscript_args="-e 'options(socketOptions=\"no-delay\")'") > (x <- microbenchmark(clusterEvalQ(cl, iris), times = 100, unit = "us")) Unit: microseconds exprmin lq mean median uq max neval clusterEvalQ(cl, iris) 112.41 115.33 128.9988 117.3065 120.5385 348.702 100 Cheers, Simon > On Nov 2, 2021, at 12:45 PM, Jeff Keller wrote: > > Hi Simon, > > I see there may have been some changes to address the TCP_NODELAY issue on > Linux in > https://github.com/wch/r-source/commit/82369f73fc297981e64cac8c9a696d05116f0797. > > I gave this a try with R 4.1.1, but I still see a 40ms compute floor. Am I > misunderstanding these changes or how socketOptions is intended to be used? > > -Jeff > > library(parallel) > library(microbenchmark) > options(socketOptions = "no-delay") > cl <- makeCluster(1) > (x <- microbenchmark(clusterEvalQ(cl, iris), times = 100, unit = "us")) > # Unit: microseconds > # expr min lq mean median uq max > neval > # clusterEvalQ(cl, iris) 96.9 43986.73 40535.93 43999.59 44012.79 48046.6 > 100 > >> On 11/04/2020 5:41 AM Iñaki Ucar wrote: >> >> >> Please, check a tcpdump session on localhost while running the following >> script: >> >> library(parallel) >> library(tictoc) >> cl <- makeCluster(1) >> Sys.sleep(1) >> >> for (i in 1:10) { >> tic() >> x <- clusterEvalQ(cl, iris) >> toc() >> } >> >> The initialization phase comprises 7 packets. Then, the 1-second sleep >> will help you see where the evaluation starts. Each clusterEvalQ >> generates 6 packets: >> >> 1. main -> worker PSH, ACK 1026 bytes >> 2. worker -> main ACK 66 bytes >> 3. worker -> main PSH, ACK 3758 bytes >> 4. main -> worker ACK 66 bytes >> 5. worker -> main PSH, ACK 2484 bytes >> 6. main -> worker ACK 66 bytes >> >> The first two are the command and its ACK, the following are the data >> back and their ACKs. In the first 4-5 iterations, I see no delay at >> all. Then, in the following iterations, a 40 ms delay starts to happen >> between packets 3 and 4, that is: the main process delays the ACK to >> the first packet of the incoming result. >> >> So I'd say Nagle is hardly to blame for this. It would be interesting >> to see how many packets are generated with TCP_NODELAY on. If there >> are still 6 packets, then we are fine. If we suddenly see a gazillion >> packets, then TCP_NODELAY does more harm than good. On the other hand, >> TCP_QUICKACK would surely solve the issue without any drawback. As >> Nagle himself put it once, "set TCP_QUICKACK. If you find a case where >> that makes things worse, let me know." >> >> Iñaki >> >> On Wed, 4 Nov 2020 at 04:34, Simon Urbanek >> wrote: >>> >>> I'm not sure the user would know ;). This is very system-specific issue >>> just because the Linux network stack behaves so differently from other OSes >>> (for purely historical reasons). That makes it hard to abstract as a >>> "feature" for the R sockets that are supposed to be platform-independent. >>> At least TCP_NODELAY is actually part of POSIX so it is on better footing, >>> and disabling delayed ACK is practically only useful to work around the >>> other side having Nagle on, so I would expect it to be rarely used. >>> >>> This is essentially RFC since we don't have a mechanism for socket options >>> (well, almost, there is timeout and blocking already...) and I don't think >>> we want to expose low-level details so perhaps one idea would be to add >>> something like delay=NA to socketConnection() in order to not touch (NA), >>> enable (TRUE) or disable (FALSE) TCP_NODELAY. I wonder if there is any >>> other way we could infer the intention of the user to try to choose the >>> right approach... >>> >>> Cheers, >>> Simon >>> >>> On Nov 3, 2020, at 02:28, Jeff wrote: Could TCP_NODELAY and TCP_QUICKACK be exposed to the R user so that they might determine what is best for their potentially latency- or throughput-sensitive application? Best, Jeff On Mon, Nov 2, 2020 at 14:05, Iñaki Ucar wrote: > On Mon, 2 Nov 2020 at 02:22, Simon Urbanek > wrote: >> It looks like R sockets on Linux could do with TCP_NODELAY -- without >> (status quo): > How many network packets are generated with and without it? If there > are many small writes and thus setting TCP_NODELAY causes many small > packets to be sent, it might make more sense to set TCP_QUICKACK > instead. > Iñaki >> Unit: microseconds >> expr min lq mean median uq >> max >> clusterEvalQ(cl, iris) 1449.997 43991.99 43975.21 43997.1 44001.91 >> 48027.83 >> neval >>
Re: [Rd] FLIBS in MacOS M1 binary at odds with documentation for optional libraries/tools
Naras, thanks. It seems that the FLIBS check resolves symlinks, unfortunately (all others are fine). I would like to remind people that reports are a lot more useful *before* the release - that's why we publish RCs. Thanks, Simon > On Nov 2, 2021, at 3:03 PM, Balasubramanian Narasimhan > wrote: > > The Mac OS M1 pre-built binary arrives with a > /Library/Frameworks/R.framework/Resources/etc/Makevars containing > > FLIBS = > -L/Volumes/Builds/opt/R/arm64/gfortran/lib/gcc/aarch64-apple-darwin20.2.0/11.0.0 > -L/Volumes/Builds/opt/R/arm64/gfortran/lib/gcc > -L/Volumes/Builds/opt/R/arm64/gfortran/lib -lgfortran -lemutls_w -lm > > This is inconsistent with what is at said at the top of > https://mac.r-project.org/libs-arm64/: that all binaries live in > /opt/R/arm64, not /Volumes/Builds/opt/R/arm64. > > So no one would be able to build a source package containing Fortran > without either modifying Makevars or creating symbolic links. > > -Naras > > > [[alternative HTML version deleted]] > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel