[Rd] R 4.1.2 is released

2021-11-01 Thread Peter Dalgaard
The build system rolled up R-4.1.2.tar.gz (codename "Bird Hippie") this morning.

The list below details the changes in this release. 

You can get the source code from

https://cran.r-project.org/src/base/R-4/R-4.1.2.tar.gz

or wait for it to be mirrored at a CRAN site nearer to you.

Binaries for various platforms will appear in due course.


For the R Core Team,

Peter Dalgaard

These are the checksums (md5 and SHA-256) for the freshly created files, in 
case you wish
to check that they are uncorrupted:

MD5 (AUTHORS) = 320967884b547734d6279dedbc739dd4
MD5 (COPYING) = eb723b61539feef013de476e68b5c50a
MD5 (COPYING.LIB) = a6f89e2100d9b6cdffcea4f398e37343
MD5 (FAQ) = ade6a3d38fe5e6a456929cae2b94d568
MD5 (INSTALL) = 7893f754308ca31f1ccf62055090ad7b
MD5 (NEWS) = 924e68decbf327f538a09afb1838506b
MD5 (NEWS.0) = bfcd7c147251b5474d96848c6f57e5a8
MD5 (NEWS.1) = eb78c4d053ec9c32b815cf0c2ebea801
MD5 (NEWS.2) = a767f7809324c73c49eaff47d14bce81
MD5 (NEWS.3) = e55ed2c8a547b827b46e08eb7137ba23
MD5 (R-latest.tar.gz) = 6e28db9d02c6d3dae51a149b8e261ab1
MD5 (README) = f468f281c919665e276a1b691decbbe6
MD5 (RESOURCES) = a79b9b338cab09bd665f6b62ac6f455b
MD5 (THANKS) = 251d20510bfc3cc93b82c5a99f7efcc6
MD5 (VERSION-INFO.dcf) = a72a49578a254b9163f0f10322a3eecc
MD5 (R-4/R-4.1.2.tar.gz) = 6e28db9d02c6d3dae51a149b8e261ab1

60a0d150e6fc1f424be76ad7b645d236b56e747692a4679f81ce6536c550e949  AUTHORS
e6d6a009505e345fe949e1310334fcb0747f28dae2856759de102ab66b722cb4  COPYING
6095e9ffa777dd22839f7801aa845b31c9ed07f3d6bf8a26dc5d2dec8ccc0ef3  COPYING.LIB
e84c67931e9b925abb9142d4a6b4ef03b7605948bbf384d7e3d2401823c7f1fe  FAQ
f87461be6cbaecc4dce44ac58e5bd52364b0491ccdadaf846cb9b452e9550f31  INSTALL
73d5bfb8711bb7833ce8fe7a1359566d48001d13cd32affbd800d759f0b3232a  NEWS
4e21b62f515b749f80997063fceab626d7258c7d650e81a662ba8e0640f12f62  NEWS.0
12b30c724117b1b2b11484673906a6dcd48a361f69fc420b36194f9218692d01  NEWS.1
ba74618bc3f4c0e336dca13d472402a1863d12ba6f7f91a1782bc469ee986f6d  NEWS.2
1910a2405300b9bc7c76beeb0753a5249cf799afe175ce28f8d782fab723e012  NEWS.3
2036225e9f7207d4ce097e54972aecdaa8b40d7d9911cd26491fac5a0fab38af  
R-latest.tar.gz
2fdd3e90f23f32692d4b3a0c0452f2c219a10882033d1774f8cadf25886c3ddc  README
8b7d3856100220f4555d4d57140829f2e81c27eccec5b441f5dce616e9ec9061  RESOURCES
c9c7cb32308b4e560a22c858819ade9de524a602abd4e92d1c328c89f8037d73  THANKS
1e74ef089b526538bbb658dc189bc3d34d931839e9933415fb2f267fd57b0b69  
VERSION-INFO.dcf
2036225e9f7207d4ce097e54972aecdaa8b40d7d9911cd26491fac5a0fab38af  
R-4/R-4.1.2.tar.gz

This is the relevant part of the NEWS file

CHANGES IN R 4.1.2:

  C-LEVEL FACILITIES:

* The workaround in headers R.h and Rmath.h (using namespace std;)
  for the Oracle Developer Studio compiler is no longer needed now
  C++11 is required so has been removed.  A couple more usages of
  log() (which should have been std::log()) with an int argument
  are reported on Solaris.

* The undocumented limit of 4095 bytes on messages from the
  S-compatibility macros PROBLEM and MESSAGE is now documented and
  longer messages will be silently truncated rather than
  potentially causing segfaults.

* If the R_NO_SEGV_HANDLER environment variable is non-empty, the
  signal handler for SEGV/ILL/BUS signals (which offers recovery
  user interface) is not set. This allows more reliable debugging
  of crashes that involve the console.

  DEPRECATED AND DEFUNCT:

* The legacy S-compatibility macros PROBLEM, MESSAGE, ERROR, WARN,
  WARNING, RECOVER, ... are deprecated and will be hidden in R
  4.2.0. R's native interface of Rf_error and Rf_warning has long
  been preferred.

  BUG FIXES:

* .mapply(F, dots, .) no longer segfaults when dots is not a list
  and uses match.fun(F) as always documented; reported by Andrew
  Simmons in PR#18164.

* hist(, ...) and hist(, ...)  no longer pass
  arguments for rect() (such as col and density) to axis().
  (Thanks to Sebastian Meyer's PR#18171.)

* \Sexpr{ch} now preserves Encoding(ch). (Thanks to report and
  patch by Jeroen Ooms in PR#18152.)

* Setting the RNG to "Marsaglia-Multicarry" e.g., by RNGkind(), now
  warns in more places, thanks to Andr'e Gillibert's report and
  patch in PR#18168.

* gray(numeric(), alpha=1/2) no longer segfaults, fixing PR#18183,
  reported by Till Krenz.

* Fixed dnbinom(x, size=, .., log=TRUE) regression,
  reported by Martin Morgan.

* as.Date.POSIXlt(x) now keeps names(x), thanks to Davis Vaughan's
  report and patch in PR#18188.

* model.response() now strips an "AsIs" class typically, thanks to
  Duncan Murdoch's report and other discussants in PR#18190.

* try() is considerably faster in case of an error and long call,
  as e.g., from some do.call().  Thanks to Alexander Kaever's
  suggestion posted to R-devel.

* qqline(y = ) such as y=I(.), now works, see also
  PR#18190.

* Non-integer mgp par(

[Rd] Wrong number of names?

2021-11-01 Thread Duncan Murdoch
The StackOverflow post https://stackoverflow.com/a/69767361/2554330 
discusses a dataframe which has a named numeric column of length 1488 
that has 744 names. I don't think this is ever legal, but am I wrong 
about that?


The `dat.rds` file mentioned in the post is temporarily available online 
in case anyone else wants to examine it.


Assuming that the file contains a badly formed object, I wonder if 
readRDS() should do some sanity checks as it reads.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Wrong number of names?

2021-11-01 Thread Martin Maechler
> Duncan Murdoch 
> on Mon, 1 Nov 2021 06:36:17 -0400 writes:

> The StackOverflow post
> https://stackoverflow.com/a/69767361/2554330 discusses a
> dataframe which has a named numeric column of length 1488
> that has 744 names. I don't think this is ever legal, but
> am I wrong about that?

> The `dat.rds` file mentioned in the post is temporarily
> available online in case anyone else wants to examine it.

> Assuming that the file contains a badly formed object, I
> wonder if readRDS() should do some sanity checks as it
> reads.

> Duncan Murdoch

Good question.

In the mean time, I've also added a bit on the SO page
above.. e.g.

---

d <- readRDS("<.>dat.rds")
str(d)
## 'data.frame':1488 obs. of  4 variables:
##  $ facet_var: chr  "AUT" "AUT" "AUT" "AUT" ...
##  $ date : Date, format: "2020-04-26" "2020-04-27" ...
##  $ variable : Factor w/ 2 levels "arima","prophet": 1 1 1 1 1 1 1 1 1 1 ...
##  $ score: Named num  2.74e-06 2.41e-06 2.48e-06 2.39e-06 2.79e-06 ...
##   ..- attr(*, "names")= chr [1:744] "new_confirmed10" "new_confirmed10" 
"new_confirmed10" "new_confirmed10" ...

ds <- d$score
c(length(ds), length(names(ds)))
## 1488   744

dput(ds) # -> 

##  *** caught segfault ***
## address (nil), cause 'memory not mapped'

---

Hence  "proving" that the dat.rds  really contains an invalid object,
when simple  dput(.) directly gives a segmentation fault.

I think we are aware that using C code and say .Call(..)  one
can create all kinds of invalid objects "easily".. and I think
it's clear that it's not feasible to check for validity of such
objects "everwhere".

Your proposal to have at least our deserialization code used in
readRDS() do (at least *some*) validity checks seems good, but
maybe we should think of more cases, and / or  do such validity
checks already during serialization { <-> saveRDS() here } ?

.. Such questions then really are for those who understand more than
me about (de)serialization in R, its performance bottlenecks etc.
Given the speed impact we should probably have such checks *optional*
but have them *on* by default e.g., at least for saveRDS() ?

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Wrong number of names?

2021-11-01 Thread peter dalgaard



> On 1 Nov 2021, at 11:36 , Duncan Murdoch  wrote:
> 
> The StackOverflow post https://stackoverflow.com/a/69767361/2554330 discusses 
> a dataframe which has a named numeric column of length 1488 that has 744 
> names. I don't think this is ever legal, but am I wrong about that?
> 

It is certainly not easy to create such objects at the R level, e.g.:

> x <- 1:10 
> names(x) <- 1:10 
> length(names(x)) <- 5
> x
   12345  
   123456789   10 
> names(x)
 [1] "1" "2" "3" "4" "5" NA  NA  NA  NA  NA 

or even

> x <- 1:10 
> attributes(x)$foo <- 1:5
> x
 [1]  1  2  3  4  5  6  7  8  9 10
attr(,"foo")
[1] 1 2 3 4 5
> names(attributes(x)) <- "names"
> x
   12345  
   123456789   10 
> dput(x)
structure(1:10, .Names = c("1", "2", "3", "4", "5", NA, NA, NA, 
NA, NA))

of course, at the C level, everything is possible...




> The `dat.rds` file mentioned in the post is temporarily available online in 
> case anyone else wants to examine it.
> 
> Assuming that the file contains a badly formed object, I wonder if readRDS() 
> should do some sanity checks as it reads.
> 
> Duncan Murdoch
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Wrong number of names?

2021-11-01 Thread Duncan Murdoch

On 01/11/2021 9:10 a.m., Martin Maechler wrote:

Duncan Murdoch
 on Mon, 1 Nov 2021 06:36:17 -0400 writes:


 > The StackOverflow post
 > https://stackoverflow.com/a/69767361/2554330 discusses a
 > dataframe which has a named numeric column of length 1488
 > that has 744 names. I don't think this is ever legal, but
 > am I wrong about that?

 > The `dat.rds` file mentioned in the post is temporarily
 > available online in case anyone else wants to examine it.

 > Assuming that the file contains a badly formed object, I
 > wonder if readRDS() should do some sanity checks as it
 > reads.

 > Duncan Murdoch

Good question.

In the mean time, I've also added a bit on the SO page
above.. e.g.

---

d <- readRDS("<.>dat.rds")
str(d)
## 'data.frame':1488 obs. of  4 variables:
##  $ facet_var: chr  "AUT" "AUT" "AUT" "AUT" ...
##  $ date : Date, format: "2020-04-26" "2020-04-27" ...
##  $ variable : Factor w/ 2 levels "arima","prophet": 1 1 1 1 1 1 1 1 1 1 ...
##  $ score: Named num  2.74e-06 2.41e-06 2.48e-06 2.39e-06 2.79e-06 ...
##   ..- attr(*, "names")= chr [1:744] "new_confirmed10" "new_confirmed10" 
"new_confirmed10" "new_confirmed10" ...

ds <- d$score
c(length(ds), length(names(ds)))
## 1488   744

dput(ds) # ->

##  *** caught segfault ***
## address (nil), cause 'memory not mapped'

---

Hence  "proving" that the dat.rds  really contains an invalid object,
when simple  dput(.) directly gives a segmentation fault.

I think we are aware that using C code and say .Call(..)  one
can create all kinds of invalid objects "easily".. and I think
it's clear that it's not feasible to check for validity of such
objects "everwhere".

Your proposal to have at least our deserialization code used in
readRDS() do (at least *some*) validity checks seems good, but
maybe we should think of more cases, and / or  do such validity
checks already during serialization { <-> saveRDS() here } ?

.. Such questions then really are for those who understand more than
me about (de)serialization in R, its performance bottlenecks etc.
Given the speed impact we should probably have such checks *optional*
but have them *on* by default e.g., at least for saveRDS() ?


It might make sense to start with a contributed package.  It could 
include lots of checks without worrying about how expensive they are; if 
some of them prove to be cost-effective, they could be moved into base 
functions.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] parallel PSOCK connection latency is greater on Linux?

2021-11-01 Thread Jeff Keller
Hi Simon,

I see there may have been some changes to address the TCP_NODELAY issue on 
Linux in 
https://github.com/wch/r-source/commit/82369f73fc297981e64cac8c9a696d05116f0797.

I gave this a try with R 4.1.1, but I still see a 40ms compute floor. Am I 
misunderstanding these changes or how socketOptions is intended to be used?

-Jeff

library(parallel)
library(microbenchmark)
options(socketOptions = "no-delay")
cl <- makeCluster(1)
(x <- microbenchmark(clusterEvalQ(cl, iris), times = 100, unit = "us"))
# Unit: microseconds
#   expr  min   lq mean   median   uq max neval
# clusterEvalQ(cl, iris) 96.9 43986.73 40535.93 43999.59 44012.79 48046.6   100

> On 11/04/2020 5:41 AM Iñaki Ucar  wrote:
> 
>  
> Please, check a tcpdump session on localhost while running the following 
> script:
> 
> library(parallel)
> library(tictoc)
> cl <- makeCluster(1)
> Sys.sleep(1)
> 
> for (i in 1:10) {
>   tic()
>   x <- clusterEvalQ(cl, iris)
>   toc()
> }
> 
> The initialization phase comprises 7 packets. Then, the 1-second sleep
> will help you see where the evaluation starts. Each clusterEvalQ
> generates 6 packets:
> 
> 1. main -> worker PSH, ACK 1026 bytes
> 2. worker -> main ACK 66 bytes
> 3. worker -> main PSH, ACK 3758 bytes
> 4. main -> worker ACK 66 bytes
> 5. worker -> main PSH, ACK 2484 bytes
> 6. main -> worker ACK 66 bytes
> 
> The first two are the command and its ACK, the following are the data
> back and their ACKs. In the first 4-5 iterations, I see no delay at
> all. Then, in the following iterations, a 40 ms delay starts to happen
> between packets 3 and 4, that is: the main process delays the ACK to
> the first packet of the incoming result.
> 
> So I'd say Nagle is hardly to blame for this. It would be interesting
> to see how many packets are generated with TCP_NODELAY on. If there
> are still 6 packets, then we are fine. If we suddenly see a gazillion
> packets, then TCP_NODELAY does more harm than good. On the other hand,
> TCP_QUICKACK would surely solve the issue without any drawback. As
> Nagle himself put it once, "set TCP_QUICKACK. If you find a case where
> that makes things worse, let me know."
> 
> Iñaki
> 
> On Wed, 4 Nov 2020 at 04:34, Simon Urbanek  
> wrote:
> >
> > I'm not sure the user would know ;). This is very system-specific issue 
> > just because the Linux network stack behaves so differently from other OSes 
> > (for purely historical reasons). That makes it hard to abstract as a 
> > "feature" for the R sockets that are supposed to be platform-independent. 
> > At least TCP_NODELAY is actually part of POSIX so it is on better footing, 
> > and disabling delayed ACK is practically only useful to work around the 
> > other side having Nagle on, so I would expect it to be rarely used.
> >
> > This is essentially RFC since we don't have a mechanism for socket options 
> > (well, almost, there is timeout and blocking already...) and I don't think 
> > we want to expose low-level details so perhaps one idea would be to add 
> > something like delay=NA to socketConnection() in order to not touch (NA), 
> > enable (TRUE) or disable (FALSE) TCP_NODELAY. I wonder if there is any 
> > other way we could infer the intention of the user to try to choose the 
> > right approach...
> >
> > Cheers,
> > Simon
> >
> >
> > > On Nov 3, 2020, at 02:28, Jeff  wrote:
> > >
> > > Could TCP_NODELAY and TCP_QUICKACK be exposed to the R user so that they 
> > > might determine what is best for their potentially latency- or 
> > > throughput-sensitive application?
> > >
> > > Best,
> > > Jeff
> > >
> > > On Mon, Nov 2, 2020 at 14:05, Iñaki Ucar  wrote:
> > >> On Mon, 2 Nov 2020 at 02:22, Simon Urbanek  
> > >> wrote:
> > >>> It looks like R sockets on Linux could do with TCP_NODELAY -- without 
> > >>> (status quo):
> > >> How many network packets are generated with and without it? If there
> > >> are many small writes and thus setting TCP_NODELAY causes many small
> > >> packets to be sent, it might make more sense to set TCP_QUICKACK
> > >> instead.
> > >> Iñaki
> > >>> Unit: microseconds
> > >>>expr  min   lq mean  median   uq 
> > >>>  max
> > >>>  clusterEvalQ(cl, iris) 1449.997 43991.99 43975.21 43997.1 44001.91 
> > >>> 48027.83
> > >>>  neval
> > >>>   1000
> > >>> exactly the same machine + R but with TCP_NODELAY enabled in 
> > >>> R_SockConnect():
> > >>> Unit: microseconds
> > >>>expr min lq mean  median  uq  
> > >>> max neval
> > >>>  clusterEvalQ(cl, iris) 156.125 166.41 180.8806 170.247 174.298 
> > >>> 5322.234  1000
> > >>> Cheers,
> > >>> Simon
> > >>> > On 2/11/2020, at 3:39 AM, Jeff  wrote:
> > >>> >
> > >>> > I'm exploring latency overhead of parallel PSOCK workers and noticed 
> > >>> > that serializing/unserializing data back to the main R session is 
> > >>> > significantly slower on Linux than it is on Windows/MacOS with 
> > >>> > similar hardware. Is there a reason f

Re: [Rd] parallel PSOCK connection latency is greater on Linux?

2021-11-01 Thread Gabriel Becker
Jeff,

Perhaps I'm just missing something here, but ms is generally milliseconds,
not microseconds (which are much smaller), right?

Also, this seems to just be how long it takes to roundtrip serialize iris
(in 4.1.0  on mac osx, as thats what I have handy right this moment):

> microbenchmark({x <- unserialize(serialize(iris, connection = NULL))})

Unit: microseconds

 exprmin  lq

 { x <- unserialize(serialize(iris, connection = NULL)) } 35.378 36.0085

 mean  median uq   max neval

 40.26888 36.4345 43.641 80.39   100



> res <- system.time(replicate(1, {x <- unserialize(serialize(iris,
connection = NULL))}))

> res/1

user   system  elapsed

4.58e-05 2.90e-06 4.88e-05


Thus the overhead appears to be extremely minimal in your results above,
right? In fact it seems to be comparable or lower than replicate.

~G





On Mon, Nov 1, 2021 at 5:20 PM Jeff Keller  wrote:

> Hi Simon,
>
> I see there may have been some changes to address the TCP_NODELAY issue on
> Linux in
> https://github.com/wch/r-source/commit/82369f73fc297981e64cac8c9a696d05116f0797
> .
>
> I gave this a try with R 4.1.1, but I still see a 40ms compute floor. Am I
> misunderstanding these changes or how socketOptions is intended to be used?
>
> -Jeff
>
> library(parallel)
> library(microbenchmark)
> options(socketOptions = "no-delay")
> cl <- makeCluster(1)
> (x <- microbenchmark(clusterEvalQ(cl, iris), times = 100, unit = "us"))
> # Unit: microseconds
> #   expr  min   lq mean   median   uq max
> neval
> # clusterEvalQ(cl, iris) 96.9 43986.73 40535.93 43999.59 44012.79 48046.6
>  100
>
> > On 11/04/2020 5:41 AM Iñaki Ucar  wrote:
> >
> >
> > Please, check a tcpdump session on localhost while running the following
> script:
> >
> > library(parallel)
> > library(tictoc)
> > cl <- makeCluster(1)
> > Sys.sleep(1)
> >
> > for (i in 1:10) {
> >   tic()
> >   x <- clusterEvalQ(cl, iris)
> >   toc()
> > }
> >
> > The initialization phase comprises 7 packets. Then, the 1-second sleep
> > will help you see where the evaluation starts. Each clusterEvalQ
> > generates 6 packets:
> >
> > 1. main -> worker PSH, ACK 1026 bytes
> > 2. worker -> main ACK 66 bytes
> > 3. worker -> main PSH, ACK 3758 bytes
> > 4. main -> worker ACK 66 bytes
> > 5. worker -> main PSH, ACK 2484 bytes
> > 6. main -> worker ACK 66 bytes
> >
> > The first two are the command and its ACK, the following are the data
> > back and their ACKs. In the first 4-5 iterations, I see no delay at
> > all. Then, in the following iterations, a 40 ms delay starts to happen
> > between packets 3 and 4, that is: the main process delays the ACK to
> > the first packet of the incoming result.
> >
> > So I'd say Nagle is hardly to blame for this. It would be interesting
> > to see how many packets are generated with TCP_NODELAY on. If there
> > are still 6 packets, then we are fine. If we suddenly see a gazillion
> > packets, then TCP_NODELAY does more harm than good. On the other hand,
> > TCP_QUICKACK would surely solve the issue without any drawback. As
> > Nagle himself put it once, "set TCP_QUICKACK. If you find a case where
> > that makes things worse, let me know."
> >
> > Iñaki
> >
> > On Wed, 4 Nov 2020 at 04:34, Simon Urbanek 
> wrote:
> > >
> > > I'm not sure the user would know ;). This is very system-specific
> issue just because the Linux network stack behaves so differently from
> other OSes (for purely historical reasons). That makes it hard to abstract
> as a "feature" for the R sockets that are supposed to be
> platform-independent. At least TCP_NODELAY is actually part of POSIX so it
> is on better footing, and disabling delayed ACK is practically only useful
> to work around the other side having Nagle on, so I would expect it to be
> rarely used.
> > >
> > > This is essentially RFC since we don't have a mechanism for socket
> options (well, almost, there is timeout and blocking already...) and I
> don't think we want to expose low-level details so perhaps one idea would
> be to add something like delay=NA to socketConnection() in order to not
> touch (NA), enable (TRUE) or disable (FALSE) TCP_NODELAY. I wonder if there
> is any other way we could infer the intention of the user to try to choose
> the right approach...
> > >
> > > Cheers,
> > > Simon
> > >
> > >
> > > > On Nov 3, 2020, at 02:28, Jeff  wrote:
> > > >
> > > > Could TCP_NODELAY and TCP_QUICKACK be exposed to the R user so that
> they might determine what is best for their potentially latency- or
> throughput-sensitive application?
> > > >
> > > > Best,
> > > > Jeff
> > > >
> > > > On Mon, Nov 2, 2020 at 14:05, Iñaki Ucar 
> wrote:
> > > >> On Mon, 2 Nov 2020 at 02:22, Simon Urbanek <
> simon.urba...@r-project.org> wrote:
> > > >>> It looks like R sockets on Linux could do with TCP_NODELAY --
> without (status quo):
> > > >> How many network packets are generated with and without it? If there
> >

[Rd] FLIBS in MacOS M1 binary at odds with documentation for optional libraries/tools

2021-11-01 Thread Balasubramanian Narasimhan
The Mac OS M1 pre-built binary arrives with a 
/Library/Frameworks/R.framework/Resources/etc/Makevars containing

FLIBS =  
-L/Volumes/Builds/opt/R/arm64/gfortran/lib/gcc/aarch64-apple-darwin20.2.0/11.0.0
 -L/Volumes/Builds/opt/R/arm64/gfortran/lib/gcc 
-L/Volumes/Builds/opt/R/arm64/gfortran/lib -lgfortran -lemutls_w -lm

This is inconsistent with what is at said at the top of 
https://mac.r-project.org/libs-arm64/: that all binaries live in 
/opt/R/arm64, not /Volumes/Builds/opt/R/arm64.

So no one would be able to build a source package containing Fortran 
without either modifying Makevars or creating symbolic links.

-Naras


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] parallel PSOCK connection latency is greater on Linux?

2021-11-01 Thread Gabriel Becker
Hi all,

Please disregard my previous email as I misread the pasted output. Sorry
for the noise.

Best,
~G

On Mon, Nov 1, 2021 at 6:45 PM Jeff  wrote:

> Hi Gabriel,
>
> Yes, 40 milliseconds (ms) == 40,000 microseconds (us). My benchmarking
> output is reporting the latter, which is considerably higher than the 40us
> you are seeing. If I benchmark just the serialization round trip as you
> did, I get comparable results: 14us median on my Linux system. So at least
> on Linux, there is something else contributing the remaining 39,986us. The
> conclusion from earlier in this thread was that the culprit was TCP
> behavior unique to the Linux network stack.
>
> Jeff
>
> On Mon, Nov 1 2021 at 05:55:45 PM -0700, Gabriel Becker <
> gabembec...@gmail.com> wrote:
>
> Jeff,
>
> Perhaps I'm just missing something here, but ms is generally milliseconds,
> not microseconds (which are much smaller), right?
>
> Also, this seems to just be how long it takes to roundtrip serialize iris
> (in 4.1.0  on mac osx, as thats what I have handy right this moment):
>
> > microbenchmark({x <- unserialize(serialize(iris, connection = NULL))})
>
> Unit: microseconds
>
>  exprmin
> lq
>
>  { x <- unserialize(serialize(iris, connection = NULL)) } 35.378
> 36.0085
>
>  mean  median uq   max neval
>
>  40.26888 36.4345 43.641 80.39   100
>
>
>
> > res <- system.time(replicate(1, {x <- unserialize(serialize(iris,
> connection = NULL))}))
>
> > res/1
>
> user   system  elapsed
>
> 4.58e-05 2.90e-06 4.88e-05
>
>
> Thus the overhead appears to be extremely minimal in your results above,
> right? In fact it seems to be comparable or lower than replicate.
>
> ~G
>
>
>
>
>
> On Mon, Nov 1, 2021 at 5:20 PM Jeff Keller  wrote:
>
>> Hi Simon,
>>
>> I see there may have been some changes to address the TCP_NODELAY issue
>> on Linux in
>> https://github.com/wch/r-source/commit/82369f73fc297981e64cac8c9a696d05116f0797
>> .
>>
>> I gave this a try with R 4.1.1, but I still see a 40ms compute floor. Am
>> I misunderstanding these changes or how socketOptions is intended to be
>> used?
>>
>> -Jeff
>>
>> library(parallel)
>> library(microbenchmark)
>> options(socketOptions = "no-delay")
>> cl <- makeCluster(1)
>> (x <- microbenchmark(clusterEvalQ(cl, iris), times = 100, unit = "us"))
>> # Unit: microseconds
>> #   expr  min   lq mean   median   uq max
>> neval
>> # clusterEvalQ(cl, iris) 96.9 43986.73 40535.93 43999.59 44012.79
>> 48046.6   100
>>
>> > On 11/04/2020 5:41 AM Iñaki Ucar  wrote:
>> >
>> >
>> > Please, check a tcpdump session on localhost while running the
>> following script:
>> >
>> > library(parallel)
>> > library(tictoc)
>> > cl <- makeCluster(1)
>> > Sys.sleep(1)
>> >
>> > for (i in 1:10) {
>> >   tic()
>> >   x <- clusterEvalQ(cl, iris)
>> >   toc()
>> > }
>> >
>> > The initialization phase comprises 7 packets. Then, the 1-second sleep
>> > will help you see where the evaluation starts. Each clusterEvalQ
>> > generates 6 packets:
>> >
>> > 1. main -> worker PSH, ACK 1026 bytes
>> > 2. worker -> main ACK 66 bytes
>> > 3. worker -> main PSH, ACK 3758 bytes
>> > 4. main -> worker ACK 66 bytes
>> > 5. worker -> main PSH, ACK 2484 bytes
>> > 6. main -> worker ACK 66 bytes
>> >
>> > The first two are the command and its ACK, the following are the data
>> > back and their ACKs. In the first 4-5 iterations, I see no delay at
>> > all. Then, in the following iterations, a 40 ms delay starts to happen
>> > between packets 3 and 4, that is: the main process delays the ACK to
>> > the first packet of the incoming result.
>> >
>> > So I'd say Nagle is hardly to blame for this. It would be interesting
>> > to see how many packets are generated with TCP_NODELAY on. If there
>> > are still 6 packets, then we are fine. If we suddenly see a gazillion
>> > packets, then TCP_NODELAY does more harm than good. On the other hand,
>> > TCP_QUICKACK would surely solve the issue without any drawback. As
>> > Nagle himself put it once, "set TCP_QUICKACK. If you find a case where
>> > that makes things worse, let me know."
>> >
>> > Iñaki
>> >
>> > On Wed, 4 Nov 2020 at 04:34, Simon Urbanek 
>> wrote:
>> > >
>> > > I'm not sure the user would know ;). This is very system-specific
>> issue just because the Linux network stack behaves so differently from
>> other OSes (for purely historical reasons). That makes it hard to abstract
>> as a "feature" for the R sockets that are supposed to be
>> platform-independent. At least TCP_NODELAY is actually part of POSIX so it
>> is on better footing, and disabling delayed ACK is practically only useful
>> to work around the other side having Nagle on, so I would expect it to be
>> rarely used.
>> > >
>> > > This is essentially RFC since we don't have a mechanism for socket
>> options (well, almost, there is timeout and blocking already...) and I
>> don't think we want to expose low-level details so perhaps one 

Re: [Rd] parallel PSOCK connection latency is greater on Linux?

2021-11-01 Thread Simon Urbanek


Jeff,

you are not setting the option on the server side, only on the client side, so 
the worker will still wait (which is where it matters). If you set it on the 
server (worker) side then it works as expected:

> cl <- makeCluster(1, rscript_args="-e 'options(socketOptions=\"no-delay\")'")
>  (x <- microbenchmark(clusterEvalQ(cl, iris), times = 100, unit = "us"))
Unit: microseconds
   exprmin lq mean   median   uq max neval
 clusterEvalQ(cl, iris) 112.41 115.33 128.9988 117.3065 120.5385 348.702   100

Cheers,
Simon



> On Nov 2, 2021, at 12:45 PM, Jeff Keller  wrote:
> 
> Hi Simon,
> 
> I see there may have been some changes to address the TCP_NODELAY issue on 
> Linux in 
> https://github.com/wch/r-source/commit/82369f73fc297981e64cac8c9a696d05116f0797.
> 
> I gave this a try with R 4.1.1, but I still see a 40ms compute floor. Am I 
> misunderstanding these changes or how socketOptions is intended to be used?
> 
> -Jeff
> 
> library(parallel)
> library(microbenchmark)
> options(socketOptions = "no-delay")
> cl <- makeCluster(1)
> (x <- microbenchmark(clusterEvalQ(cl, iris), times = 100, unit = "us"))
> # Unit: microseconds
> #   expr  min   lq mean   median   uq max 
> neval
> # clusterEvalQ(cl, iris) 96.9 43986.73 40535.93 43999.59 44012.79 48046.6   
> 100
> 
>> On 11/04/2020 5:41 AM Iñaki Ucar  wrote:
>> 
>> 
>> Please, check a tcpdump session on localhost while running the following 
>> script:
>> 
>> library(parallel)
>> library(tictoc)
>> cl <- makeCluster(1)
>> Sys.sleep(1)
>> 
>> for (i in 1:10) {
>>  tic()
>>  x <- clusterEvalQ(cl, iris)
>>  toc()
>> }
>> 
>> The initialization phase comprises 7 packets. Then, the 1-second sleep
>> will help you see where the evaluation starts. Each clusterEvalQ
>> generates 6 packets:
>> 
>> 1. main -> worker PSH, ACK 1026 bytes
>> 2. worker -> main ACK 66 bytes
>> 3. worker -> main PSH, ACK 3758 bytes
>> 4. main -> worker ACK 66 bytes
>> 5. worker -> main PSH, ACK 2484 bytes
>> 6. main -> worker ACK 66 bytes
>> 
>> The first two are the command and its ACK, the following are the data
>> back and their ACKs. In the first 4-5 iterations, I see no delay at
>> all. Then, in the following iterations, a 40 ms delay starts to happen
>> between packets 3 and 4, that is: the main process delays the ACK to
>> the first packet of the incoming result.
>> 
>> So I'd say Nagle is hardly to blame for this. It would be interesting
>> to see how many packets are generated with TCP_NODELAY on. If there
>> are still 6 packets, then we are fine. If we suddenly see a gazillion
>> packets, then TCP_NODELAY does more harm than good. On the other hand,
>> TCP_QUICKACK would surely solve the issue without any drawback. As
>> Nagle himself put it once, "set TCP_QUICKACK. If you find a case where
>> that makes things worse, let me know."
>> 
>> Iñaki
>> 
>> On Wed, 4 Nov 2020 at 04:34, Simon Urbanek  
>> wrote:
>>> 
>>> I'm not sure the user would know ;). This is very system-specific issue 
>>> just because the Linux network stack behaves so differently from other OSes 
>>> (for purely historical reasons). That makes it hard to abstract as a 
>>> "feature" for the R sockets that are supposed to be platform-independent. 
>>> At least TCP_NODELAY is actually part of POSIX so it is on better footing, 
>>> and disabling delayed ACK is practically only useful to work around the 
>>> other side having Nagle on, so I would expect it to be rarely used.
>>> 
>>> This is essentially RFC since we don't have a mechanism for socket options 
>>> (well, almost, there is timeout and blocking already...) and I don't think 
>>> we want to expose low-level details so perhaps one idea would be to add 
>>> something like delay=NA to socketConnection() in order to not touch (NA), 
>>> enable (TRUE) or disable (FALSE) TCP_NODELAY. I wonder if there is any 
>>> other way we could infer the intention of the user to try to choose the 
>>> right approach...
>>> 
>>> Cheers,
>>> Simon
>>> 
>>> 
 On Nov 3, 2020, at 02:28, Jeff  wrote:
 
 Could TCP_NODELAY and TCP_QUICKACK be exposed to the R user so that they 
 might determine what is best for their potentially latency- or 
 throughput-sensitive application?
 
 Best,
 Jeff
 
 On Mon, Nov 2, 2020 at 14:05, Iñaki Ucar  wrote:
> On Mon, 2 Nov 2020 at 02:22, Simon Urbanek  
> wrote:
>> It looks like R sockets on Linux could do with TCP_NODELAY -- without 
>> (status quo):
> How many network packets are generated with and without it? If there
> are many small writes and thus setting TCP_NODELAY causes many small
> packets to be sent, it might make more sense to set TCP_QUICKACK
> instead.
> Iñaki
>> Unit: microseconds
>>   expr  min   lq mean  median   uq  
>> max
>> clusterEvalQ(cl, iris) 1449.997 43991.99 43975.21 43997.1 44001.91 
>> 48027.83
>> neval
>>  

Re: [Rd] FLIBS in MacOS M1 binary at odds with documentation for optional libraries/tools

2021-11-01 Thread Simon Urbanek


Naras,

thanks. It seems that the FLIBS check resolves symlinks, unfortunately (all 
others are fine).

I would like to remind people that reports are a lot more useful *before* the 
release - that's why we publish RCs.

Thanks,
Simon


> On Nov 2, 2021, at 3:03 PM, Balasubramanian Narasimhan  
> wrote:
> 
> The Mac OS M1 pre-built binary arrives with a 
> /Library/Frameworks/R.framework/Resources/etc/Makevars containing
> 
> FLIBS =  
> -L/Volumes/Builds/opt/R/arm64/gfortran/lib/gcc/aarch64-apple-darwin20.2.0/11.0.0
>  -L/Volumes/Builds/opt/R/arm64/gfortran/lib/gcc 
> -L/Volumes/Builds/opt/R/arm64/gfortran/lib -lgfortran -lemutls_w -lm
> 
> This is inconsistent with what is at said at the top of 
> https://mac.r-project.org/libs-arm64/: that all binaries live in 
> /opt/R/arm64, not /Volumes/Builds/opt/R/arm64.
> 
> So no one would be able to build a source package containing Fortran 
> without either modifying Makevars or creating symbolic links.
> 
> -Naras
> 
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel