Re: [Rd] [parallel] fixes load balancing of parLapplyLB
Dear Tomas, Thanks for your commitment to fix this issue and also to add the chunk size as an argument. If you want our input, let us know ;) Best Regards On 02/26/2018 04:01 PM, Tomas Kalibera wrote: > Dear Christian and Henrik, > > thank you for spotting the problem and suggestions for a fix. We'll probably > add a chunk.size argument to parLapplyLB and parLapply to follow OpenMP > terminology, which has already been an inspiration for the present code > (parLapply already implements static scheduling via internal function > staticClusterApply, yet with a fixed chunk size; parLapplyLB already > implements dynamic scheduling via internal function dynamicClusterApply, but > with a fixed chunk size set to an unlucky value so that it behaves like > static scheduling). The default chunk size for parallelLapplyLB will be set > so that there is some dynamism in the schedule even by default. I am now > testing a patch with these changes. > > Best > Tomas > > > On 02/20/2018 11:45 AM, Christian Krause wrote: >> Dear Henrik, >> >> The rationale is just that it is within these extremes and that it is really >> simple to calculate, without making any assumptions and knowing that it >> won't be perfect. >> >> The extremes A and B you are mentioning are special cases based on >> assumptions. Case A is based on the assumption that the function has a long >> runtime or varying runtime, then you are likely to get the best load >> balancing with really small chunks. Case B is based on the assumption that >> the function runtime is the same for each list element, i.e. where you don't >> actually need load balancing, i.e. just use `parLapply` without load >> balancing. >> >> This new default is **not the best one**. It's just a better one than we had >> before. There is no best one we can use as default because **we don't know >> the function runtime and how it varies**. The user needs to decide that >> because he/she knows the function. As mentioned before, I will write a patch >> that makes the chunk size an optional argument, so the user can decide >> because only he/she has all the information to choose the best chunk size, >> just like you did with the `future.scheduling` parameter. >> >> Best Regards >> >> On February 19, 2018 10:11:04 PM GMT+01:00, Henrik Bengtsson >> wrote: >>> Hi, I'm trying to understand the rationale for your proposed amount of >>> splitting and more precisely why that one is THE one. >>> >>> If I put labels on your example numbers in one of your previous post: >>> >>> nbrOfElements <- 97 >>> nbrOfWorkers <- 5 >>> >>> With these, there are two extremes in how you can split up the >>> processing in chunks such that all workers are utilized: >>> >>> (A) Each worker, called multiple times, processes one element each >>> time: >>> nbrOfElements <- 97 nbrOfWorkers <- 5 nbrOfChunks <- nbrOfElements sapply(parallel:::splitList(1:nbrOfElements, nbrOfChunks), length) >>> [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 >>> [30] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 >>> [59] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 >>> [88] 1 1 1 1 1 1 1 1 1 1 >>> >>> >>> (B) Each worker, called once, processes multiple element: >>> nbrOfElements <- 97 nbrOfWorkers <- 5 nbrOfChunks <- nbrOfWorkers sapply(parallel:::splitList(1:nbrOfElements, nbrOfChunks), length) >>> [1] 20 19 19 19 20 >>> >>> I understand that neither of these two extremes may be the best when >>> it comes to orchestration overhead and load balancing. Instead, the >>> best might be somewhere in-between, e.g. >>> >>> (C) Each worker, called multiple times, processing multiple elements: >>> nbrOfElements <- 97 nbrOfWorkers <- 5 nbrOfChunks <- nbrOfElements / nbrOfWorkers sapply(parallel:::splitList(1:nbrOfElements, nbrOfChunks), length) >>> [1] 5 5 5 5 4 5 5 5 5 5 4 5 5 5 5 4 5 5 5 5 >>> >>> However, there are multiple alternatives between the two extremes, e.g. >>> nbrOfChunks <- scale * nbrOfElements / nbrOfWorkers >>> So, is there a reason why you argue for scale = 1.0 to be the optimal? >>> >>> FYI, In future.apply::future_lapply(X, FUN, ...) there is a >>> 'future.scheduling' scale factor(*) argument where default >>> future.scheduling = 1 corresponds to (B) and future.scheduling = +Inf >>> to (A). Using future.scheduling = 4 achieves the amount of >>> load-balancing you propose in (C). (*) Different definition from the >>> above 'scale'. (Disclaimer: I'm the author) >>> >>> /Henrik >>> >>> On Mon, Feb 19, 2018 at 10:21 AM, Christian Krause >>> wrote: Dear R-Devel List, I have installed R 3.4.3 with the patch applied on our cluster and >>> ran a *real-world* job of one of our users to confirm that the patch >>> works to my satisfaction. Here are the results. The original was a series of jobs, all essentially doing the same >>> stuff using bootstrapped data, so for the original there
[Rd] Small program embedding R crashes in 64 bits
Hi everyone, I'm trying to create a small C++ program which embed R, but I'm having problems when I try to do it on Windows 64 bits. I have created a minimal reproducible example which is just the src/gnuwin32/front-ends/rtest.c file with the R_ReplDLLdo1() loop, the only difference is that I set the interactive mode to TRUE. Here is the cpp file: https://gist.github.com/anonymous/08b42e83c949e250f60b068d58a3ec51 When compiled in 32 bits, everything works: I enter R commands and no crash. When compiled in 64 bits (mingw64 and R x64 libs, and executed with R x64 in the PATH), everything works except when there is an error in R with a command entered by the user. Typically, entering "a" shows "Error: object 'a' not found" and then the program immediately crashes. Typing a stop() also trigger a crash. Code returned by the program is 0xC028, which is STATUS_BAD_STACK with the description: "An invalid or unaligned stack was encountered during an unwind operation". I'm not really good at C++ or makefile/compiler stuff, but I can't get it to work. I'm guessing this as to do with some longjumps to return to the prompt when there is an error but I don't know how to fix it. Compiling in 32 bits: P:/Rtools/mingw_32/bin/g++ -O3 -Wall -pedantic -IP:/R/R-3.4.3/include -c testr.cpp -o testr.o P:/Rtools/mingw_32/bin/g++ -o ./32.exe ./testr.o -LP:/R/R-3.4.3/bin/i386 -lR -lRgraphapp Results in: C:\test> 32.exe > a Error: object 'a' not found > # it works! But compiling in 64 bits: P:/Rtools/mingw_64/bin/g++ -O3 -Wall -pedantic -IP:/R/R-3.4.3/include -c testr.cpp -o testr.o P:/Rtools/mingw_64/bin/g++ -o ./64.exe ./testr.o -LP:/R/R-3.4.3/bin/x64 -lR -lRgraphapp Fails like this: C:\test> 64.exe > b <- 1 > b [1] 1 > a Error: object 'a' not found I've tried lots of -std= flags, -DWIN64 -D_WIN64 and lots of other defines I could find or think of but with no luck. What is missing? Thanks, William. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Bug report - duplicate row names with as.data.frame()
Hello, I'd like to report what I think is a bug: using as.data.frame() we can create duplicate row names in a data frame. R version 3.4.3 (current stable release). Rather than paste code in an email, please see the example formatted code here: https://stackoverflow.com/questions/49031523/duplicate-row-names-in-r-using-as-data-frame I posted to StackOverflow, and consensus was that we should proceed with this as a bug report. Thanks, Ron [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Bug report - duplicate row names with as.data.frame()
On Thu, 2018-03-01 at 09:36 -0500, Ron wrote: > Hello, > > I'd like to report what I think is a bug: using as.data.frame() we can > create duplicate row names in a data frame. R version 3.4.3 (current stable > release). > > Rather than paste code in an email, please see the example formatted code > here: > https://stackoverflow.com/questions/49031523/duplicate-row-names-in-r-using-as-data-frame > > I posted to StackOverflow, and consensus was that we should proceed with > this as a bug report. Yes that is definitely a bug. The end of the as.data.frame.matrix method has: attr(value, "row.names") <- row.names class(value) <- "data.frame" value Changing this to: class(value) <- "data.frame" row.names(value) <- row.names value ensures that the row.names<-.data.frame method is called with its built -in check for duplicate names. There are quite a few as.data.frame methods so this could be a recurring problem. I will check. Martyn > Thanks, > Ron > > [[alternative HTML version deleted]] > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] scale.default gives an incorrect error message when is.numeric() fails on a dgeMatrix
> Michael Chirico > on Tue, 27 Feb 2018 20:18:34 +0800 writes: Slightly amended 'Subject': (unimportant mistake: a dgeMatrix is *not* sparse) MM: modified to commented R code, slightly changed from your post: ## I am attempting to use the lars package with a sparse input feature matrix, ## but the following fails: library(Matrix) library(lars) data(diabetes) # from 'lars' ##UAagghh! not like this -- both attach() *and* as.data.frame() are horrific! ##UA attach(diabetes) ##UA x = as(as.matrix(as.data.frame(x)), 'dgCMatrix') x <- as(unclass(diabetes$x), "dgCMatrix") lars(x, y, intercept = FALSE) ## Error in scale.default(x, FALSE, normx) : ## length of 'scale' must equal the number of columns of 'x' ## More specifically, scale.default fails as called from lars(): normx <- new("dgeMatrix", x = c(4, 0, 9, 1, 1, -1, 4, -2, 6, 6)*1e-14, Dim = c(1L, 10L), Dimnames = list(NULL, c("x.age", "x.sex", "x.bmi", "x.map", "x.tc", "x.ldl", "x.hdl", "x.tch", "x.ltg", "x.glu"))) scale.default(x, center=FALSE, scale = normx) ## Error in scale.default(x, center = FALSE, scale = normx) : ## length of 'scale' must equal the number of columns of 'x' > The problem is that this check fails because is.numeric(normx) is FALSE: > if (is.numeric(scale) && length(scale) == nc) > So, the error message is misleading. In fact length(scale) is the same as > nc. Correct, twice. > At a minimum, the error message needs to be repaired; do we also want to > attempt as.numeric(normx) (which I believe would have allowed scale to work > in this case)? It seems sensible to allow both 'center' and 'scale' to only have to *obey* as.numeric(.) rather than fulfill is.numeric(.). Though that is not a bug in scale() as its help page has always said that 'center' and 'scale' should either be a logical value or a numeric vector. For that reason I can really claim a bug in 'lars' which should really not use scale(x, FALSE, normx) but rather scale(x, FALSE, scale = as.numeric(normx)) and then all would work. > - > (I'm aware that there's some import issues in lars, as the offending line > to create normx *should* work, as is.numeric(sqrt(drop(rep(1, nrow(x)) %*% > (x^2 is TRUE -- it's simply that lars doesn't import the appropriate S4 > methods) > Michael Chirico Yes, 'lars' has _not_ been updated since Spring 2013, notably because its authors have been saying (for rather more than 5 years I think) that one should really use require("glmnet") instead. Your point is still valid that it would be easy to enhance base :: scale.default() so it'd work in more cases. Thank you for that. I do plan to consider such a change in R-devel (planned to become R 3.5.0 in April). Martin Maechler, ETH Zurich __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] scale.default gives an incorrect error message when is.numeric() fails on a dgeMatrix
thanks. I know the setup code is a mess, just duct-taped something together from the examples in lars (which are a mess in turn). in fact when I messaged Prof. Hastie he recommended using glmnet. I wonder why lars is kept on CRAN if they've no intention of maintaining it... but I digress... On Mar 2, 2018 1:52 AM, "Martin Maechler" wrote: > > Michael Chirico > > on Tue, 27 Feb 2018 20:18:34 +0800 writes: > > Slightly amended 'Subject': (unimportant mistake: a dgeMatrix is *not* > sparse) > > MM: modified to commented R code, slightly changed from your post: > > > ## I am attempting to use the lars package with a sparse input feature > matrix, > ## but the following fails: > > library(Matrix) > library(lars) > data(diabetes) # from 'lars' > ##UAagghh! not like this -- both attach() *and* as.data.frame() are > horrific! > ##UA attach(diabetes) > ##UA x = as(as.matrix(as.data.frame(x)), 'dgCMatrix') > x <- as(unclass(diabetes$x), "dgCMatrix") > lars(x, y, intercept = FALSE) > ## Error in scale.default(x, FALSE, normx) : > ## length of 'scale' must equal the number of columns of 'x' > > ## More specifically, scale.default fails as called from lars(): > normx <- new("dgeMatrix", > x = c(4, 0, 9, 1, 1, -1, 4, -2, 6, 6)*1e-14, Dim = c(1L, 10L), > Dimnames = list(NULL, > c("x.age", "x.sex", "x.bmi", "x.map", "x.tc", > "x.ldl", "x.hdl", "x.tch", "x.ltg", "x.glu"))) > scale.default(x, center=FALSE, scale = normx) > ## Error in scale.default(x, center = FALSE, scale = normx) : > ## length of 'scale' must equal the number of columns of 'x' > > > The problem is that this check fails because is.numeric(normx) is FALSE: > > > if (is.numeric(scale) && length(scale) == nc) > > > So, the error message is misleading. In fact length(scale) is the same > as > > nc. > > Correct, twice. > > > At a minimum, the error message needs to be repaired; do we also want to > > attempt as.numeric(normx) (which I believe would have allowed scale to > work > > in this case)? > > It seems sensible to allow both 'center' and 'scale' to only > have to *obey* as.numeric(.) rather than fulfill is.numeric(.). > > Though that is not a bug in scale() as its help page has always > said that 'center' and 'scale' should either be a logical value > or a numeric vector. > > For that reason I can really claim a bug in 'lars' which should > really not use > >scale(x, FALSE, normx) > > but rather > >scale(x, FALSE, scale = as.numeric(normx)) > > and then all would work. > > > - > > > (I'm aware that there's some import issues in lars, as the offending > line > > to create normx *should* work, as is.numeric(sqrt(drop(rep(1, nrow(x)) > %*% > > (x^2 is TRUE -- it's simply that lars doesn't import the > appropriate S4 > > methods) > > > Michael Chirico > > Yes, 'lars' has _not_ been updated since Spring 2013, notably > because its authors have been saying (for rather more than 5 > years I think) that one should really use > > require("glmnet") > > instead. > > Your point is still valid that it would be easy to enhance > base :: scale.default() so it'd work in more cases. > > Thank you for that. I do plan to consider such a change in > R-devel (planned to become R 3.5.0 in April). > > Martin Maechler, > ETH Zurich > > > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Repeated use of dyn.load().
I sent this enquiry to r-help and received several sympathetic replies, none of which were definitive. It was kindly suggested to me that I might get better mileage out of r-devel, so I'm trying here. I hope that this is not inappropriate. My original enquiry to r-help: == I am working with a function "foo" that explicitly dynamically loads a shared object library or "DLL", doing something like dyn.load("bar.so"). This is a debugging exercise so I make changes to the underlying Fortran code (yes, I acknowledge that I am a dinosaur) remake the DLL "bar.so" and then run foo again. This is all *without* quitting and restarting R. (I'm going to have to do this a few brazillion times, and I want the iterations to be as quick as possible.) This seems to work --- i.e. foo seems to obtain the latest version of bar.so. But have I just been lucky so far? (I have not experimented heavily). Am I running risks of leading myself down the garden path? Are there Traps for Young (or even Old) Players lurking about? I would appreciate Wise Counsel. == One of the replies that I received from r-help indicated that it might be safer if I were to apply dyn.unload() on each iteration. So I thought I might put in the line of code on.exit(dyn.unload("bar.so")) immediately after my call to dyn.load(). Comments? Another reply pointed out that "Writing R Extensions" indicates that there could be problems under Solaris, but does not single out any other OS for comment. Might I infer that I am "safe" as long as I don't use Solaris? (Which I certainly *won't* be doing.) Thanks. cheers, Rolf Turner -- Technical Editor ANZJS Department of Statistics University of Auckland Phone: +64-9-373-7599 ext. 88276 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel