Re: [Rd] parallel PSOCK connection latency is greater on Linux?

2021-11-02 Thread Jeff
Hi Gabriel,

Yes, 40 milliseconds (ms) == 40,000 microseconds (us). My benchmarking 
output is reporting the latter, which is considerably higher than the 
40us you are seeing. If I benchmark just the serialization round trip 
as you did, I get comparable results: 14us median on my Linux system. 
So at least on Linux, there is something else contributing the 
remaining 39,986us. The conclusion from earlier in this thread was that 
the culprit was TCP behavior unique to the Linux network stack.

Jeff

On Mon, Nov 1 2021 at 05:55:45 PM -0700, Gabriel Becker 
 wrote:
> Jeff,
> 
> Perhaps I'm just missing something here, but ms is generally 
> milliseconds, not microseconds (which are much smaller), right?
> 
> Also, this seems to just be how long it takes to roundtrip serialize 
> iris (in 4.1.0  on mac osx, as thats what I have handy right this 
> moment):
> 
>> > microbenchmark({x <- unserialize(serialize(iris, connection = 
>> NULL))})
>> 
>> Unit: microseconds
>> 
>> expr   min   
>>   lq
>> 
>>  {x <- unserialize(serialize(iris, connection = NULL)) } 35.378 
>> 36.0085
>> 
>> mean medianuq  max neval
>> 
>>  40.26888 36.4345 43.641 80.39  100
>> 
>> 
> 
>> > res <- system.time(replicate(1, {x <- 
>> unserialize(serialize(iris, connection = NULL))}))
>> 
>> > res/1
>> 
>>user  system elapsed
>> 
>> 4.58e-05 2.90e-06 4.88e-05
>> 
> 
> Thus the overhead appears to be extremely minimal in your results 
> above, right? In fact it seems to be comparable or lower than 
> replicate.
> 
> ~G
> 
> 
> 
> 
> 
> On Mon, Nov 1, 2021 at 5:20 PM Jeff Keller  > wrote:
>> Hi Simon,
>> 
>>  I see there may have been some changes to address the TCP_NODELAY 
>> issue on Linux in 
>> .
>> 
>>  I gave this a try with R 4.1.1, but I still see a 40ms compute 
>> floor. Am I misunderstanding these changes or how socketOptions is 
>> intended to be used?
>> 
>>  -Jeff
>> 
>>  library(parallel)
>>  library(microbenchmark)
>>  options(socketOptions = "no-delay")
>>  cl <- makeCluster(1)
>>  (x <- microbenchmark(clusterEvalQ(cl, iris), times = 100, unit = 
>> "us"))
>>  # Unit: microseconds
>>  #   expr  min   lq mean   median   uq   
>>   max neval
>>  # clusterEvalQ(cl, iris) 96.9 43986.73 40535.93 43999.59 44012.79 
>> 48046.6   100
>> 
>>  > On 11/04/2020 5:41 AM I�aki Ucar > > wrote:
>>  >
>>  >
>>  > Please, check a tcpdump session on localhost while running the 
>> following script:
>>  >
>>  > library(parallel)
>>  > library(tictoc)
>>  > cl <- makeCluster(1)
>>  > Sys.sleep(1)
>>  >
>>  > for (i in 1:10) {
>>  >   tic()
>>  >   x <- clusterEvalQ(cl, iris)
>>  >   toc()
>>  > }
>>  >
>>  > The initialization phase comprises 7 packets. Then, the 1-second 
>> sleep
>>  > will help you see where the evaluation starts. Each clusterEvalQ
>>  > generates 6 packets:
>>  >
>>  > 1. main -> worker PSH, ACK 1026 bytes
>>  > 2. worker -> main ACK 66 bytes
>>  > 3. worker -> main PSH, ACK 3758 bytes
>>  > 4. main -> worker ACK 66 bytes
>>  > 5. worker -> main PSH, ACK 2484 bytes
>>  > 6. main -> worker ACK 66 bytes
>>  >
>>  > The first two are the command and its ACK, the following are the 
>> data
>>  > back and their ACKs. In the first 4-5 iterations, I see no delay 
>> at
>>  > all. Then, in the following iterations, a 40 ms delay starts to 
>> happen
>>  > between packets 3 and 4, that is: the main process delays the ACK 
>> to
>>  > the first packet of the incoming result.
>>  >
>>  > So I'd say Nagle is hardly to blame for this. It would be 
>> interesting
>>  > to see how many packets are generated with TCP_NODELAY on. If 
>> there
>>  > are still 6 packets, then we are fine. If we suddenly see a 
>> gazillion
>>  > packets, then TCP_NODELAY does more harm than good. On the other 
>> hand,
>>  > TCP_QUICKACK would surely solve the issue without any drawback. As
>>  > Nagle himself put it once, "set TCP_QUICKACK. If you find a case 
>> where
>>  > that makes things worse, let me know."
>>  >
>>  > I�aki
>>  >
>>  > On Wed, 4 Nov 2020 at 04:34, Simon Urbanek 
>> mailto:simon.urba...@r-project.org>> 
>> wrote:
>>  > >
>>  > > I'm not sure the user would know ;). This is very 
>> system-specific issue just because the Linux network stack behaves 
>> so differently from other OSes (for purely historical reasons). That 
>> makes it hard to abstract as a "feature" for the R sockets that are 
>> supposed to be platform-independent. At least TCP_NODELAY is 
>> actually part of POSIX so it is on better footing, and disabling 
>> delayed ACK is practically only useful to work around the other side 
>> having Nagle on, so I would expect it to be rarely used.
>>  > >
>>  > > This is essentially RFC since we don't have a mechanism for 
>> socket options (well, almost, there is timeout and blocking 
>

Re: [Rd] FLIBS in MacOS M1 binary at odds with documentation for optional libraries/tools

2021-11-02 Thread Balasubramanian Narasimhan
Thanks, Simon.  I only had sporadic access to a M1 laptop but now 
actually have one. Will try to do my part.


Best,

-Naras

On 11/1/21 8:22 PM, Simon Urbanek wrote:

Naras,

thanks. It seems that the FLIBS check resolves symlinks, unfortunately (all 
others are fine).

I would like to remind people that reports are a lot more useful *before* the 
release - that's why we publish RCs.

Thanks,
Simon



On Nov 2, 2021, at 3:03 PM, Balasubramanian Narasimhan  
wrote:

The Mac OS M1 pre-built binary arrives with a
/Library/Frameworks/R.framework/Resources/etc/Makevars containing

FLIBS =  
-L/Volumes/Builds/opt/R/arm64/gfortran/lib/gcc/aarch64-apple-darwin20.2.0/11.0.0
 -L/Volumes/Builds/opt/R/arm64/gfortran/lib/gcc 
-L/Volumes/Builds/opt/R/arm64/gfortran/lib -lgfortran -lemutls_w -lm

This is inconsistent with what is at said at the top of
https://mac.r-project.org/libs-arm64/: that all binaries live in
/opt/R/arm64, not /Volumes/Builds/opt/R/arm64.

So no one would be able to build a source package containing Fortran
without either modifying Makevars or creating symbolic links.

-Naras


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Re: Wrong number of names?

2021-11-02 Thread luke-tierney

On Mon, 1 Nov 2021, Martin Maechler wrote:


Duncan Murdoch
on Mon, 1 Nov 2021 06:36:17 -0400 writes:


   > The StackOverflow post
   > https://stackoverflow.com/a/69767361/2554330 discusses a
   > dataframe which has a named numeric column of length 1488
   > that has 744 names. I don't think this is ever legal, but
   > am I wrong about that?

   > The `dat.rds` file mentioned in the post is temporarily
   > available online in case anyone else wants to examine it.

   > Assuming that the file contains a badly formed object, I
   > wonder if readRDS() should do some sanity checks as it
   > reads.

   > Duncan Murdoch

Good question.

In the mean time, I've also added a bit on the SO page
above.. e.g.

---

d <- readRDS("<.>dat.rds")
str(d)
## 'data.frame':1488 obs. of  4 variables:
##  $ facet_var: chr  "AUT" "AUT" "AUT" "AUT" ...
##  $ date : Date, format: "2020-04-26" "2020-04-27" ...
##  $ variable : Factor w/ 2 levels "arima","prophet": 1 1 1 1 1 1 1 1 1 1 ...
##  $ score: Named num  2.74e-06 2.41e-06 2.48e-06 2.39e-06 2.79e-06 ...
##   ..- attr(*, "names")= chr [1:744] "new_confirmed10" "new_confirmed10" 
"new_confirmed10" "new_confirmed10" ...

ds <- d$score
c(length(ds), length(names(ds)))
## 1488   744

dput(ds) # ->

##  *** caught segfault ***
## address (nil), cause 'memory not mapped'


If I'm reading this right then dput is where the segfault is
happening, so that could use some more bulletproofing.

Best,

luke




---

Hence  "proving" that the dat.rds  really contains an invalid object,
when simple  dput(.) directly gives a segmentation fault.

I think we are aware that using C code and say .Call(..)  one
can create all kinds of invalid objects "easily".. and I think
it's clear that it's not feasible to check for validity of such
objects "everwhere".

Your proposal to have at least our deserialization code used in
readRDS() do (at least *some*) validity checks seems good, but
maybe we should think of more cases, and / or  do such validity
checks already during serialization { <-> saveRDS() here } ?

.. Such questions then really are for those who understand more than
me about (de)serialization in R, its performance bottlenecks etc.
Given the speed impact we should probably have such checks *optional*
but have them *on* by default e.g., at least for saveRDS() ?

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Fwd: Using existing envars in Renviron on friendly Windows

2021-11-02 Thread Henrik Bengtsson
Oh, I see, I misunderstood.  Thanks for clarifying.

One more thing, to mix-and-match environment variables and strings
with escaped characters, while mimicking how POSIX shells does it, by
using strings with double and single quotes. For example, with:

$ cat .Renviron
APPDATA='C:\Users\foobar\AppData\Roaming'
R_LIBS_USER="${APPDATA}"'\R-library'

we get:

$ Rscript --no-init-file --quiet -e 'cat(sprintf("R_LIBS_USER=[%s]\n",
Sys.getenv("R_LIBS_USER")))'
R_LIBS_USER=[C:\Users\foobar\AppData\Roaming\R-library]

and

$ source .Renviron
$ echo "R_LIBS_USER=[${R_LIBS_USER}]"
R_LIBS_USER=[C:\Users\foobar\AppData\Roaming\R-library]

/Henrik

On Sun, Oct 31, 2021 at 2:59 AM Tomas Kalibera  wrote:
>
>
> On 10/31/21 2:55 AM, Henrik Bengtsson wrote:
> >> ... If one still needed backslashes,
> >> they could then be entered in single quotes, e.g. VAR='c:\users'.
> > I don't think it matters whether you use single or double quotes -
> > both will work.  Here's a proof of concept on Linux with R 4.1.1:
> >
> > $ cat ./.Renviron
> > A=C:\users
> > B='C:\users'
> > C="C:\users"
> >
> > $ Rscript -e "Sys.getenv(c('A', 'B', 'C'))"
> >A   B   C
> >"C:users" "C:\\users" "C:\\users"
>
> Yes, but as I wrote "I think the Renviron files should be written in a
> way so that they would work the same in a POSIX shell". This is why
> single quotes. With double quotes, backslashes are interpreted
> differently from a POSIX shell.
>
> Tomas
>
>
> >
> > /Henrik
> >
> > On Wed, Oct 27, 2021 at 11:45 AM Tomas Kalibera
> >  wrote:
> >>
> >> On 10/21/21 5:18 PM, Martin Maechler wrote:
>  Michał Bojanowski
>    on Wed, 20 Oct 2021 16:31:08 +0200 writes:
> >>>   > Hello Tomas,
> >>>   > Yes, that's accurate although rather terse, which is perhaps the
> >>>   > reason why I did not realize it applies to my case.
> >>>
> >>>   > How about adding something in the direction of:
> >>>
> >>>   > 1. Continuing the cited paragraph with:
> >>>   > In particular, on Windows it may be necessary to quote references 
> >>> to
> >>>   > existing environment variables, especially those containing file 
> >>> paths
> >>>   > (which include backslashes). For example: `"${WINVAR}"`.
> >>>
> >>>   > 2. Add an example (not run):
> >>>
> >>>   > # On Windows do quote references to variables containing paths, 
> >>> e.g.:
> >>>   > # If APPDATA=C:\Users\foobar\AppData\Roaming
> >>>   > # to point to a library tree inside APPDATA in .Renviron use
> >>>   > R_LIBS_USER="${APPDATA}"/R-library
> >>>
> >>>   > Incidentally the last example is on backslashes too.
> >>>
> >>>
> >>>   > What do you think?
> >>>
> >>> I agree that adding an example really helps a lot in such cases,
> >>> in my experience, notably if it's precise enough to be used +/- directly.
> >> Yes, I agree as well. I think the Renviron files should be written in a
> >> way so that they would work the same in a POSIX shell, so e.g.
> >> VAR="${VAR0}" or VAR="${VAR0}/subdir" are the recommended ways to
> >> preserve backslashes in VAR0. It is better to use forward slashes in
> >> string literals, e.g. VAR="c:/users". If one still needed backslashes,
> >> they could then be entered in single quotes, e.g. VAR='c:\users'.
> >>
> >> The currently implemented parsing of Renviron files differs in a number
> >> of details from POSIX shells, some are documented and some are not.
> >> Relying only on the documented behavior that is the same as in POSIX
> >> shells is the best choice for future compatibility.
> >>
> >> Tomas
> >>
> >>>
> >>>   > On Mon, Oct 18, 2021 at 5:02 PM Tomas Kalibera 
> >>>  wrote:
> >>>   >>
> >>>   >>
> >>>   >> On 10/15/21 6:44 PM, Michał Bojanowski wrote:
> >>>   >> > Perhaps a small update to ?.Renviron would be in order to 
> >>> mention that...
> >>>   >>
> >>>   >> Would you have a more specific suggestion how to update the
> >>>   >> documentation? Please note that it already says
> >>>   >>
> >>>   >> "‘value’ is then processed in a similar way to a Unix shell: in
> >>>   >> particular the outermost level of (single or double) quotes is 
> >>> stripped,
> >>>   >> and backslashes are removed except inside quotes."
> >>>   >>
> >>>   >> Thanks,
> >>>   >> Tomas
> >>>   >>
> >>>   >> > On Fri, Oct 15, 2021 at 6:43 PM Michał Bojanowski 
> >>>  wrote:
> >>>   >> >> Indeed quoting works! Kevin suggested the same, but he didnt 
> >>> reply to the list.
> >>>   >> >> Thank you all!
> >>>   >> >> Michal
> >>>   >> >>
> >>>   >> >> On Fri, Oct 15, 2021 at 6:40 PM Ivan Krylov 
> >>>  wrote:
> >>>   >> >>> Sorry for the noise! I wasn't supposed to send my previous 
> >>> message.
> >>>   >> >>>
> >>>   >> >>> On Fri, 15 Oct 2021 16:44:28 +0200
> >>>   >> >>> Michał Bojanowski  wrote:
> >>>   >> >>>
> >>>   >>  AVAR=${APPDATA}/foo/bar
> >>>   >> 
> >>>  

Re: [Rd] BUG?: R CMD check with --as-cran *disables* checks for unused imports otherwise performed

2021-11-02 Thread Henrik Bengtsson
I've just posted this to BugZilla as PR18229
(https://bugs.r-project.org/show_bug.cgi?id=18229) to make sure it's
tracked.

/Henrik

On Wed, Oct 20, 2021 at 8:08 PM Jeffrey Dick  wrote:
>
> FWIW, I also encountered this issue and posted on R-pkg-devel about it, with 
> no resolution at the time (May 2020). See "Dependencies NOTE lost with 
> --as-cran" (https://stat.ethz.ch/pipermail/r-package-devel/2020q2/005467.html)
>
> On Wed, Oct 20, 2021 at 11:55 PM Henrik Bengtsson 
>  wrote:
>>
>> ISSUE:
>>
>> Using 'R CMD check' with --as-cran,
>> set_R_CHECK_PACKAGES_USED_IGNORE_UNUSED_IMPORTS_=TRUE, whereas the
>> default is FALSE, which you get if you don't add --as-cran.
>> I would expect --as-cran to check more things and more be conservative
>> than without.  So, is this behavior a mistake?  Could it be a thinko
>> around the negating "IGNORE", and the behavior is meant to be vice
>> verse?
>>
>> Example:
>>
>> $ R CMD check QDNAseq_1.29.4.tar.gz
>> ...
>> * using R version 4.1.1 (2021-08-10)
>> * using platform: x86_64-pc-linux-gnu (64-bit)
>> ...
>> * checking dependencies in R code ... NOTE
>> Namespace in Imports field not imported from: ‘future’
>>   All declared Imports should be used.
>>
>> whereas, if I run with --as-cran, I don't get that NOTE;
>>
>> $ R CMD check --as-cran QDNAseq_1.29.4.tar.gz
>> ...
>> * checking dependencies in R code ... OK
>>
>>
>> TROUBLESHOOTING:
>>
>> In src/library/tools/R/check.R [1], the following is set if --as-cran is 
>> used:
>>
>>   Sys.setenv("_R_CHECK_PACKAGES_USED_IGNORE_UNUSED_IMPORTS_" = "TRUE")
>>
>> whereas, if not set, the default is:
>>
>> ignore_unused_imports <-
>> config_val_to_logical(Sys.getenv("_R_CHECK_PACKAGES_USED_IGNORE_UNUSED_IMPORTS_",
>> "FALSE"))
>>
>> [1] 
>> https://github.com/wch/r-source/blob/b50e3f755674cbb697a4a7395b766647a5cfeea2/src/library/tools/R/check.R#L6335
>> [2] 
>> https://github.com/wch/r-source/blob/b50e3f755674cbb697a4a7395b766647a5cfeea2/src/library/tools/R/QC.R#L5954-L5956
>>
>> /Henrik
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel