Re: [Rd] Get Logical processor count correctly whether NUMA is enabled or disabled
Dear Tomas, thank you very much. I installed r-devel r75201 and tested. The machine with 88 cores has NUMA disabled. It therefore has 2 processor groups with 64 and 24 processors each. require(parallel) detectCores() # [1] 88 This is great! Then I went on to test with a simple 'foreach()' loop. I started with 64 processors (max limit of 1 processor group). I ran with a simple function of 0.5s sleep. require(snow) require(doSNOW) require(foreach) cl <- makeCluster(64L, "SOCK") registerDoSNOW(cl) system.time(foreach(i=1:64) %dopar% Sys.sleep(0.5)) # user system elapsed # 0.060.000.64 system.time(foreach(i=1:65) %dopar% Sys.sleep(0.5)) #user system elapsed #0.030.011.04 stopCluster(cl) With a cluster of 64 processors and loop running with 64 iterations, it completed in ~.5s (0.64), and with 65 iterations, it took ~1s as expected. cl <- makeCluster(65L, "SOCK") registerDoSNOW(cl) system.time(foreach(i=1:64) %dopar% Sys.sleep(0.5)) user system elapsed 0.030.020.61 system.time(foreach(i=1:65) %dopar% Sys.sleep(0.5)) # Timing stopped at: 0.08 0 293 stopCluster(cl) However, when I increased the cluster to have 65 processors, a loop with 64 iterations seem to complete as expected, but using all 65 processors to loop over 65 iterations didn't seem to complete. I stopped it after ~5mins. The same happens with the cluster started with any number between 65 and 88. It seems to me like we are still not being able to use >64 processors all at the same time even if detectCores() returns the right count now. I'd appreciate your thoughts on this. Best, Arun. -Original Message- From: Tomas Kalibera Sent: 27 August 2018 19:43 To: Srinivasan, Arunkumar ; r-devel@r-project.org Subject: Re: [Rd] Get Logical processor count correctly whether NUMA is enabled or disabled Dear Arun, thank you for checking the workaround scripts. I've modified detectCores() to use GetLogicalProcessorInformationEx. It is in revision 75198 of R-devel, could you please test it on your machines? For a binary, you can wait until the R-devel snapshot build gets to at least this svn revision. Thanks for the link to the processor groups documentation. I don't have a machine to test this on, but I would hope that snow clusters (e.g. PSOCK) should work fine on systems with >64 logical processors as they spawn new processes (not just threads). Note that FORK clusters are not supported on Windows. Thanks Tomas On 08/21/2018 02:53 PM, Srinivasan, Arunkumar wrote: > Dear Tomas, thank you for looking into this. Here's the output: > > # number of logical processors - what detectCores() should return out > <- system("wmic cpu get numberoflogicalprocessors", intern=TRUE) > [1] "NumberOfLogicalProcessors \r" "22 \r" "22 > \r" > [4] "20 \r" "22 \r" "\r" > sum(as.numeric(gsub("([0-9]+).*", "\\1", grep("[0-9]+[ \t]*", out, > value=TRUE # [1] 86 > > [I've asked the IT team to understand why one of the values is 20 instead of > 22]. > > # number of cores - what detectCores(FALSE) should return out <- > system("wmic cpu get numberofcores", intern=TRUE) > [1] "NumberOfCores \r" "22 \r" "22 \r" "20 > \r" "22 \r" > [6] "\r" > sum(as.numeric(gsub("([0-9]+).*", "\\1", grep("[0-9]+[ \t]*", out, > value=TRUE # [1] 86 > > [Currently hyperthreading is disabled. So this output being identical to the > previous output makes sense]. > > system("wmic computersystem get numberofprocessors") > NumberOfProcessors > 4 > > In addition, I'd also bring to your attention this documentation: > https://docs.microsoft.com/en-us/windows/desktop/ProcThread/processor-groups > on processor groups which explain how one should go about running a process > ro run on multiple groups (which seems to be different to NUMA). All this > seems overly complicated to allow a process to use all cores by default TBH. > > Here's a project on Github 'fio' where the issue of running a process on more > than 1 processor group has come up - https://github.com/axboe/fio/issues/527 > and is addressed - > https://github.com/axboe/fio/blob/c479640d6208236744f0562b1e79535eec290e2b/os/os-windows-7.h > . I am not sure though if this is entirely relevant since we would be > forking new processes in R instead of allowing a single process to use all > cores. Apologies if this is utterly irrelevant. > > Thank you, > Arun. > > From: Tomas Kalibera > Sent: 21 August 2018 11:50 > To: Srinivasan, Arunkumar ; > r-devel@r-project.org > Subject: Re: [Rd] Get Logical processor count correctly whether NUMA > is enabled or disabled > > Dear Arun, > > thank you for the report. I agree with the analysis, detectCores() will only > report logical processors in the NUMA group in which R is running. I don't > have a system to test on, could you please check these workarounds for
Re: [Rd] conflicted: an alternative conflict resolution strategy
>> conflicted applies a few heuristics to minimise false positives (at the >> cost of introducing a few false negatives). The overarching goal is to >> ensure that code behaves identically regardless of the order in which >> packages are attached. >> >> - A number of packages provide a function that appears to conflict >> with a function in a base package, but they follow the superset >> principle (i.e. they only extend the API, as explained to me by >> Hervè Pages). >> >> conflicted assumes that packages adhere to the superset principle, >> which appears to be true in most of the cases that I’ve seen. > > > It seems that you may be able to strengthen this heuristic from a blanket > assumption to something more narrowly targeted by looking for one or more of > the following to confirm likely-superset adherence > > matching or purely extending formals (ie all the named arguments of base::fun > match including order, and there are new arguments in pkg::fun only if > base::fun takes ...) > explicit call to base::fun in the body of pkg::fun > UseMethod(funname) and at least one provided S3 method calls base::fun > S4 generic creation using fun or base::fun as the seeding/default method body > or called from at least one method Oooh nice, idea I'll definitely try it out. >> For >> example, the lubridate package provides `as.difftime()` and `date()` >> which extend the behaviour of base functions, and provides S4 >> generics for the set operators. >> >> conflict_scout(c("lubridate", "base")) >> #> 5 conflicts: >> #> * `as.difftime`: [lubridate] >> #> * `date` : [lubridate] >> #> * `intersect` : [lubridate] >> #> * `setdiff`: [lubridate] >> #> * `union` : [lubridate] >> >> There are two popular functions that don’t adhere to this principle: >> `dplyr::filter()` and `dplyr::lag()` :(. conflicted handles these >> special cases so they correctly generate conflicts. (I sure wish I’d >> know about the subset principle when creating dplyr!) >> >> conflict_scout(c("dplyr", "stats")) >> #> 2 conflicts: >> #> * `filter`: dplyr, stats >> #> * `lag` : dplyr, stats >> >> - Deprecated functions should never win a conflict, so conflicted >> checks for use of `.Deprecated()`. This rule is very useful when >> moving functions from one package to another. For example, many >> devtools functions were moved to usethis, and conflicted ensures >> that you always get the non-deprecated version, regardess of package >> attach order: > > > I would completely believe this rule is useful for refactoring as you > describe, but that is the "same function" case. For an end-user in the > "different function same symbol" case it's not at all clear to me that the > deprecated function should always win. > > People sometimes use deprecated functions. It's not great, and eventually > they'll need to fix that for any given case, but imagine if you deprecated > the filter verb in dplyr (I know this will never happen, but I think it's > illustrative none the less). > > Consider a piece of code someone wrote before this hypothetical deprecation > of filter. The fact that it's now deprecated certainly doesn't mean that they > secretly wanted stats::filter all along, right? Conflicted acting as if it > does will lead to them getting the exact kind of error you're looking to > protect them from, and with even less ability to understand why because they > are already doing "The right thing" to protect themselves by using conflicted > in the first place... Ah yes, good point. I'll add some heuristic to check that the function name appears in the first argument of the .Deprecated call (assuming that the call looks something like `.Deprecated("pkg::foo")`) >> Finally, as mentioned above, the user can declare preferences: >> >> conflict_prefer("select", "MASS") >> #> [conflicted] Will prefer MASS::select over any other package >> conflict_scout(c("dplyr", "MASS")) >> #> 1 conflict: >> #> * `select`: [MASS] >> > > I deeply worry about people putting this kind of thing, or even just > library(conflicted), in their .Rprofile and thus making their scripts > substantially less reproducible. Is that a consequence you have thought about > to this kind of functionality? Yes, and I've already recommended against it in two places :) I'm not sure if there's any more I can do - people already put (e.g.) `library(ggplot2)` in their .Rprofile, which is just as bad from a reproducibility standpoint. Thanks for the thoughtful feedback! Hadley -- http://hadley.nz __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Where does L come from?
Thanks for the great discussion everyone! Hadley On Sat, Aug 25, 2018 at 8:26 AM Hadley Wickham wrote: > > Hi all, > > Would someone mind pointing to me to the inspiration for the use of > the L suffix to mean "integer"? This is obviously hard to google for, > and the R language definition > (https://cran.r-project.org/doc/manuals/r-release/R-lang.html#Constants) > is silent. > > Hadley > > -- > http://hadley.nz -- http://hadley.nz __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] "utils::file.edit" does not understand "editor" with additional arguments
We do not have the 'at a minimum' information requested by the posting guide, and I cannot reproduce anything like this on a Unix-alike. Both file.edit and edit.default call the same underlying C code, and that single-quotes the 'editor' argument to allow for spaces in its path/name so I would not expect this to work. Two workarounds: 1) Set an alias in your shell (e.g. in .bashrc) for 'subl -n'. This is something widely needed on macOS where many editors are invoked by 'open -a', and I also use it for 'emacsclient -n'. 2) Make use of the ability to specify editor as an R function, invoking the external program by system2() etc. On 28/08/2018 20:07, Randy Lai wrote: I am using Sublime Text as my editor. If I run `subl -n .Rprofile` in bash, a file would be opened in a new window. Back in R, if I run this file.edit(".Rprofile", editor="'subl -n'") sh: 'subl -n': command not found Warning message: error in running command However, the interesting bit happens when I run edit(1:10, editor="'subl -n’") It does open Sublime Text. It seems that `file.edit` and `edit` are behaving differently when “editor” has additional arguments. Randy -- Brian D. Ripley, rip...@stats.ox.ac.uk Emeritus Professor of Applied Statistics, University of Oxford __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] build package with unicode (farsi) strings
Hi, I have a R script file with Persian letters in it defined as a variable: #' @export letters_fa <- c('الف','ب','پ','ت','ث','ج','چ','ح','خ','ر','ز','د') I have specified the encoding field in my DESCRIPTION file of my package. ... Encoding: UTF-8 ... I also included Sys.setlocale(locale="Persian") in my .RProfile, so it is executed when RCMD is called. However, after a BUILD and INSTALL, when I access the variable from the package, the characters are not printed correctly: > futils::letters_fa [1] "<84><81>" "" "" "" "" [6] "" "<86>" "" "" "" [11] "" "" thanks Farid [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel