Re: [Rd] Bug in the "reformulate" function in stats package
> Ben Bolker > on Thu, 4 Apr 2019 12:46:37 -0400 writes: > Proposed patch Thank you Ben! [the rest is technical nit-picking .. but hopefully interesting to the smart R-devel reader base:] There was a very subtle thinko in your patch which is not easily diagnosed from R's parse_Rd(): Error in parse_Rd("/u/maechler/R/D/r-devel/R/src/library/stats/man/delete.response.Rd", : Unexpected end of input (in " quoted string opened at delete.response.Rd:78:63) In addition: Warning message: In parse_Rd("/u/maechler/R/D/r-devel/R/src/library/stats/man/delete.response.Rd", : newline within quoted string at delete.response.Rd:74 and even I needed more than a minute to find out that the culprit was that reformulate(sprintf("`%s`", x)) is not ok in *.Rd and must be reformulate(sprintf("`\%s`", x)) - > (I think .txt files work OK as attachments to the list?) yes, typically -- what really counts is if your e-mail program marks them with MIME-type 'text/plain' and most E-mail programs are very "silly" / "safe" nowadays and don't expect to have smart users and hence mark (and sometimes encode) everything unknown as non-text. Using very old flexible e-mail interfaces such as Emacs VM allow you to specify the MIME-type in addition to the file *and* it also proposes smart defaults, I think by using something like unix 'file' to determine that your 'foo.diff' file is plain text. {{ .. and we all know that Windows is sillily using file extensions to determine file type and only knows Windows-extensions plus those added explicitly by software installed; so nowadays *.rda is marked as an Rstudio file ... [argh]. }} Martin > On 2019-04-04 2:21 a.m., Martin Maechler wrote: >>> Ben Bolker >>> on Fri, 29 Mar 2019 12:34:50 -0400 writes: >> >> > I suspect that the issue is addressed (obliquely) in the examples, >> > which shows that variables with spaces in them (or otherwise >> > 'non-syntactic', i.e. not satisfying the constraints of legal R symbols) >> > can be handled by protecting them with backticks (``) >> >> > ## using non-syntactic names: >> > reformulate(c("`P/E`", "`% Growth`"), response = as.name("+-")) >> >> > It seems to me there could be room for a *documentation* patch (stating >> > explicitly that if termlabels has length > 1 its elements are >> > concatenated with "+", and explicitly stating that non-syntactic names >> > must be protected with back-ticks). (There is a little bit of obscurity >> > in the fact that the elements of termlabels don't have to be >> > syntactically valid names: many will be included in formulas if they can >> > be interpreted as *parseable* expressions, e.g. reformulate("x<2")) >> >> > I would be happy to give it a shot if the consensus is that it would >> > be worthwhile. >> >> I think it would be worthwhile to add to the docs a bit. >> >> [With currently just your and my vote, we have a 100% consensus >> ;-)] >> >> Martin >> >> > One workaround to the OP's problem is below (may be worth including >> > as an example in docs) >> >> >> z <- c("a variable","another variable") >> >> reformulate(z) >> > Error in parse(text = termtext, keep.source = FALSE) : >> > :1:6: unexpected symbol >> > 1: ~ a variable >> > ^ >> >> reformulate(sprintf("`%s`",z)) >> > ~`a variable` + `another variable` >> >> >> >> >> > On 2019-03-29 11:54 a.m., J C Nash wrote: >> >> The main thing is to post the "small reproducible example". >> >> >> >> My (rather long term experience) can be written >> >> >> >> if (exists("reproducible example") ) { >> >> DeveloperFixHappens() >> >> } else { >> >> NULL >> >> } >> >> >> >> JN >> >> >> >> On 2019-03-29 11:38 a.m., Saren Tasciyan wrote: >> >>> Well, first I can't sign in bugzilla myself, that is why I wrote here first. Also, I don't know if I have the time at >> >>> the moment to provide tests, multiple examples or more. If that is not ok or welcomed, that is fine, I can come back, >> >>> whenever I have more time to properly report the bug. >> >>> >> >>> I didn't find the existing bug report, sorry for that. >> >>> >> >>> Yes, it is related. My problem was that I have column names with spaces and current solution doesn't solve it. I have a >> >>> solution, which works for me and maybe also for others. >> >>> >> >>> Either, someone can register me to bugzilla or I can post it here, which could give some direction to developers. I >> >>> don't mind whichever is preferred here. >> >>> >> >>> Best, >> >>> >> >>> Saren >> >>> >> >>> >> >>> On 29.03.19 09:29, Martin Maechler wrote: >> > Saren Tasciyan >> > on Thu, 28 Mar 2019 17:02:10 +0100
Re: [Rd] subscripting a terms object
Dear Terry, > Therneau, Terry M , Ph D via R-devel > on Thu, 4 Apr 2019 22:48:49 -0400 writes: > Someone sent me a bug report for survival2.44.1-1 that involves a model with both cluster > and offset. It turns out to be a 3 part issue with [.terms and my own untangle.specials > routine. I've spent an evening sorting out the details. > 1. The delete.response() function doesn't remove the response from the dataClasses > attribute, which leads to a later failure in [.terms for no-response models. Is there a > reason for this, or can I make my patch include this oversight as well? > 2. [.terms messes up predvars and dataClasses if the model has an offset term in it. > (In select cases 1 and 2 can cancel out and give the correct dataClasses attribute.) The above two seem interesting and relevant to R itself. As we've recently just fixed a buglet in reformulate() -- probably unrelated to your problem -- I'd really be interested to see a repr.ex. (reproducible example) for the above two statements. ... and if you want also a proposal on how to address the problem in delete.response() and/or `[.terms`() Martin > 3. The survival::untangle.specials routine assumed that you can use the same > subscripting for the terms of a model and the term() object itself, which turns out to be > almost always true, but only almost. > The failure turns out to have probably been there since the Splus days, which tells one > just how often such a model is used. (One of two edge case bugs sent to me in the first > days after I pushed it to CRAN: a new release seems to attact them.) I'm willing to put > together a patch, but given the rarity of these would folks prefer to wait until after the > April release? I'm fine with that. I need the answer to 1 though. > Terry T. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Parsing code with newlines
Hello! This is my first post here. I came across the very same problem. It can be reproduced within modified tests/Embedding/RParseEval.c Actually this example has another issue, namely it doesn't wrap everything in R_ToplevelExec . This is a major show stopper for newcomers as that function is barely mentioned anywhere and longjmp into terminated setuploop function followed by R_suicide look like a mystery. Error: bad value Fatal error: unable to initialize the JIT That aside, here is the code with newlines that fails to parse. I hope it will paste alright here. #include "embeddedRCall.h" #include int main(int argc, char *argv[]) { SEXP e, tmp; int hadError; ParseStatus status; init_R(argc, argv); PROTECT(tmp = mkString("\n\r ls()")); PROTECT(e = R_ParseVector(tmp, 1, &status, R_NilValue)); if (status != PARSE_OK) { printf("boo boo\n"); } else { PrintValue(e); R_tryEval(VECTOR_ELT(e,0), R_GlobalEnv, &hadError); } UNPROTECT(2); end_R(); return(0); } -- Mikhail __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Deep Replicable Bug With AMD Threadripper MultiCore
The following program is whittled down from a much larger program that always works on Intel, and always works on AMD's threadripper with lapply but not mclappy. With mclapply on AMD, all processes go into "suspend" mode and the program then hangs. This bug is replicable on an AMD Ryzen Threadripper 2950X 16-Core Processor (128GB RAM), running latest ubuntu 18.04. The R version 3.5.3 (2019-03-11) -- "Great Truth" , invoked with --vanilla. I hope this helps...it took quite a while to get it to this stage. I sure hope that I am not reporting an old bug... options("mc.cores"=4) library(data.table) library(parallel) if (!file.exists("bugsample.csv")) { NR <- 6480 notused <- data.frame(v1=1:NR, v2=1:NR, v3=1:NR, x1=log(1:NR), x2=log(1:NR)) fwrite(notused, file="bugsample.csv") stop("you can quit now and restart the program") } if (!exists("notused")) notused <- fread("bugsample.csv", nrows= Inf) ## needed! Inf cannot be replaced by actual NR sample <- data.frame( groupidentifier=c( rep(1,2000), rep(2, 4500 ) ) ) sample$yvar <- sin(1:nrow(sample)) sample$xvar <- 1:nrow(sample) testfun <- function(dl) { with(dl, message("Working: ", first(groupidentifier), " with ", nrow(dl))) lapply( 1:nrow(dl), FUN=function(onedayindex) { if ((onedayindex %% 500) != 0) return(NULL) with(dl[1:onedayindex,], c( tryCatch( coef(lm( yvar ~ xvar, data=dl[1:onedayindex,] ))[2], error = function(e) NA ) ) ) }) } message("starting --- replicable hang with mclapply, but not lapply") o <- mclapply(split( 1:nrow(sample), sample$groupidentifier ), FUN=function(.index) testfun( sample[.index, , drop=FALSE] )) message("never gets here with mclapply") print( do.call("c", o[[1]]) ) print( do.call("c", o[[2]]) ) -- Ivo Welch (ivo.we...@ucla.edu) [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Deep Replicable Bug With AMD Threadripper MultiCore
On 4 April 2019 at 17:28, ivo welch wrote: | The following program is whittled down from a much larger program that | always works on Intel, and always works on AMD's threadripper with | lapply but not mclappy. With mclapply on AMD, all processes go into | "suspend" mode and the program then hangs. This bug is replicable on an | AMD Ryzen Threadripper 2950X 16-Core Processor (128GB RAM), running | latest ubuntu 18.04. The R version 3.5.3 (2019-03-11) -- "Great Truth" , | invoked with --vanilla. I hope this helps...it took quite a while to get | it to this stage. I sure hope that I am not reporting an old bug... | | options("mc.cores"=4) | library(data.table) | library(parallel) Just how you set mc.cores to 4 for parallel::mclapply I would try throttling data.table which in its current version goes for all cores. So do, say, setDTthreads(4) and see if that helps. Try lower and lower values to see if you get by. While there may well be a different race condition in mclapply, it may help to not overschedule. (FWIW, the next version of data.table, in queue at CRAN, is less aggressive and has additional options for fine tuning.) Dirk | if (!file.exists("bugsample.csv")) { | NR <- 6480 | notused <- data.frame(v1=1:NR, v2=1:NR, v3=1:NR, x1=log(1:NR), | x2=log(1:NR)) | fwrite(notused, file="bugsample.csv") | stop("you can quit now and restart the program") | } | | if (!exists("notused")) notused <- fread("bugsample.csv", nrows= Inf) ## | needed! Inf cannot be replaced by actual NR | | | sample <- data.frame( groupidentifier=c( rep(1,2000), rep(2, 4500 ) | ) ) | sample$yvar <- sin(1:nrow(sample)) | sample$xvar <- 1:nrow(sample) | | | testfun <- function(dl) { | with(dl, message("Working: ", first(groupidentifier), " with ", | nrow(dl))) | | lapply( 1:nrow(dl), FUN=function(onedayindex) { | if ((onedayindex %% 500) != 0) return(NULL) | with(dl[1:onedayindex,], | c( tryCatch( coef(lm( yvar ~ xvar, data=dl[1:onedayindex,] | ))[2], error = function(e) NA ) ) ) | }) | } | | | message("starting --- replicable hang with mclapply, but not lapply") | | o <- mclapply(split( 1:nrow(sample), sample$groupidentifier ), | FUN=function(.index) testfun( sample[.index, , drop=FALSE] )) | | message("never gets here with mclapply") | | print( do.call("c", o[[1]]) ) | print( do.call("c", o[[2]]) ) | | | | -- | Ivo Welch (ivo.we...@ucla.edu) | | [[alternative HTML version deleted]] | | __ | R-devel@r-project.org mailing list | https://stat.ethz.ch/mailman/listinfo/r-devel -- http://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Deep Replicable Bug With AMD Threadripper MultiCore
In addition you can also try to use a PSOCK cluster (see makeCluster, parLapply) to avoid the problem - it should help if the problem is somehow related to forking in mclapply(). The problem you are seeing may be in base R, in data.table, or in interaction between the two (mclapply() from base R uses forking directly, data.table uses OpenMP). If you think the bug is in base R, it would be much better if you could find a reproducible example that would only use packages shipped directly with R, otherwise it might be best to contact the maintainer of data.table. Please also make sure to use the latest version of R 3.5 (or R-devel). The implementation of forking in parallel packages, and hence also in mclapply, has been rewritten since R 3.4. Best Tomas On 4/5/19 1:28 PM, Dirk Eddelbuettel wrote: On 4 April 2019 at 17:28, ivo welch wrote: | The following program is whittled down from a much larger program that | always works on Intel, and always works on AMD's threadripper with | lapply but not mclappy. With mclapply on AMD, all processes go into | "suspend" mode and the program then hangs. This bug is replicable on an | AMD Ryzen Threadripper 2950X 16-Core Processor (128GB RAM), running | latest ubuntu 18.04. The R version 3.5.3 (2019-03-11) -- "Great Truth" , | invoked with --vanilla. I hope this helps...it took quite a while to get | it to this stage. I sure hope that I am not reporting an old bug... | | options("mc.cores"=4) | library(data.table) | library(parallel) Just how you set mc.cores to 4 for parallel::mclapply I would try throttling data.table which in its current version goes for all cores. So do, say, setDTthreads(4) and see if that helps. Try lower and lower values to see if you get by. While there may well be a different race condition in mclapply, it may help to not overschedule. (FWIW, the next version of data.table, in queue at CRAN, is less aggressive and has additional options for fine tuning.) Dirk | if (!file.exists("bugsample.csv")) { | NR <- 6480 | notused <- data.frame(v1=1:NR, v2=1:NR, v3=1:NR, x1=log(1:NR), | x2=log(1:NR)) | fwrite(notused, file="bugsample.csv") | stop("you can quit now and restart the program") | } | | if (!exists("notused")) notused <- fread("bugsample.csv", nrows= Inf) ## | needed! Inf cannot be replaced by actual NR | | | sample <- data.frame( groupidentifier=c( rep(1,2000), rep(2, 4500 ) | ) ) | sample$yvar <- sin(1:nrow(sample)) | sample$xvar <- 1:nrow(sample) | | | testfun <- function(dl) { | with(dl, message("Working: ", first(groupidentifier), " with ", | nrow(dl))) | | lapply( 1:nrow(dl), FUN=function(onedayindex) { | if ((onedayindex %% 500) != 0) return(NULL) | with(dl[1:onedayindex,], | c( tryCatch( coef(lm( yvar ~ xvar, data=dl[1:onedayindex,] | ))[2], error = function(e) NA ) ) ) | }) | } | | | message("starting --- replicable hang with mclapply, but not lapply") | | o <- mclapply(split( 1:nrow(sample), sample$groupidentifier ), | FUN=function(.index) testfun( sample[.index, , drop=FALSE] )) | | message("never gets here with mclapply") | | print( do.call("c", o[[1]]) ) | print( do.call("c", o[[2]]) ) | | | | -- | Ivo Welch (ivo.we...@ucla.edu) | | [[alternative HTML version deleted]] | | __ | R-devel@r-project.org mailing list | https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] all.equal failure
This arose in testing [.terms and has me confused. data(esoph) # use a standard data set t0x <- terms(model.frame( ~ tobgp, data=esoph)) t1 <- terms(model.frame(ncases ~ agegp + tobgp, data=esoph)) t1x <- (delete.response(t1))[-1] > all.equal(t0x, t1x) [1] TRUE # the above is wrong, because they actually are not the same > all.equal(attr(t0x, 'dataClasses'), attr(t1x, 'dataClasses')) [1] "Names: 1 string mismatch" [2] "Lengths (1, 2) differ (string compare on first 1)" > sessionInfo() R Under development (unstable) (2019-04-05 r76323) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 18.04.2 LTS Matrix products: default BLAS: /usr/local/src/R-devel/lib/libRblas.so LAPACK: /usr/local/src/R-devel/lib/libRlapack.so locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_3.7.0 tools_3.7.0 [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] [.terms issue
As footnote, the error in survival:::untangle.specials is that it assumed that if attr(myterms, 'specials')[['strata']] was = to 4, then one could use myterms[-4] to remove the strata term. Not so. In the model y ~ x1 + x2 + strata(x3) the attritube will be 4 -- the response counts --- but to remove it I need to use [-3] since the response does not count in [.terms or drop.terms. Is this an inconsistency that should be documented and/or repaired? What would break if we did? I don't know. By chance, all the usages in the survival package happened after a call to delete.response so it would be immune to such a change. Terry T. [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] all.equal failure
On 05/04/2019 9:03 a.m., Therneau, Terry M., Ph.D. via R-devel wrote: This arose in testing [.terms and has me confused. data(esoph) # use a standard data set t0x <- terms(model.frame( ~ tobgp, data=esoph)) t1 <- terms(model.frame(ncases ~ agegp + tobgp, data=esoph)) t1x <- (delete.response(t1))[-1] > all.equal(t0x, t1x) [1] TRUE # the above is wrong, because they actually are not the same > all.equal(attr(t0x, 'dataClasses'), attr(t1x, 'dataClasses')) [1] "Names: 1 string mismatch" [2] "Lengths (1, 2) differ (string compare on first 1)" As documented, all.equal() is generic, with methods for different classes. The classes of both t0x and t1x are c("terms","formula") with no all.equal.terms method, so all.equal.formula is called. That method isn't specifically documented, but you can see its definition as function (target, current, ...) { if (length(target) != length(current)) return(paste0("target, current differ in having response: ", length(target) == 3L, ", ", length(current) == 3L)) if (!identical(deparse(target), deparse(current))) "formulas differ in contents" else TRUE } So the issue is that deparse(t0x) and deparse(t1x) give the same strings with no attributes shown, even though "showAttributes" is set by default. I haven't traced through the C code to see where things are going wrong. Duncan Murdoch > sessionInfo() R Under development (unstable) (2019-04-05 r76323) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 18.04.2 LTS Matrix products: default BLAS: /usr/local/src/R-devel/lib/libRblas.so LAPACK: /usr/local/src/R-devel/lib/libRlapack.so locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_3.7.0 tools_3.7.0 [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [EXTERNAL] Re: all.equal failure
Duncan, I should have included it in my original note, but all.equal(unclass(t0x), unclass(t1x)) returns TRUE as well. I had tried that as well. But a further look at all.equal.default shows the following line right near the top: if (is.language(target) || is.function(target)) return(all.equal.language(target, current, ...)) and that path explicitly ignores attributes. I'll change my original original title to "all.equal was not a good tool for testing certain code issues". Thanks for the pointer, Terry On 4/5/19 9:00 AM, Duncan Murdoch wrote: > On 05/04/2019 9:03 a.m., Therneau, Terry M., Ph.D. via R-devel wrote: >> This arose in testing [.terms and has me confused. >> >> data(esoph) # use a standard data set >> >> t0x <- terms(model.frame( ~ tobgp, data=esoph)) >> t1 <- terms(model.frame(ncases ~ agegp + tobgp, data=esoph)) >> t1x <- (delete.response(t1))[-1] >> >> > all.equal(t0x, t1x) >> [1] TRUE >> >> # the above is wrong, because they actually are not the same >> >> > all.equal(attr(t0x, 'dataClasses'), attr(t1x, 'dataClasses')) >> [1] "Names: 1 string mismatch" >> [2] "Lengths (1, 2) differ (string compare on first 1)" > > As documented, all.equal() is generic, with methods for different classes. > The classes > of both t0x and t1x are > > c("terms","formula") > > with no all.equal.terms method, so all.equal.formula is called. That method > isn't > specifically documented, but you can see its definition as > > function (target, current, ...) > { > if (length(target) != length(current)) > return(paste0("target, current differ in having response: ", > length(target) == 3L, ", ", length(current) == 3L)) > if (!identical(deparse(target), deparse(current))) > "formulas differ in contents" > else TRUE > } > > So the issue is that deparse(t0x) and deparse(t1x) give the same strings with > no > attributes shown, even though "showAttributes" is set by default. I haven't > traced > through the C code to see where things are going wrong. > > Duncan Murdoch > >> >> > sessionInfo() >> R Under development (unstable) (2019-04-05 r76323) >> Platform: x86_64-pc-linux-gnu (64-bit) >> Running under: Ubuntu 18.04.2 LTS >> >> Matrix products: default >> BLAS: /usr/local/src/R-devel/lib/libRblas.so >> LAPACK: /usr/local/src/R-devel/lib/libRlapack.so >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> loaded via a namespace (and not attached): >> [1] compiler_3.7.0 tools_3.7.0 >> >> >> [[alternative HTML version deleted]] >> >> __ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [EXTERNAL] Re: all.equal failure
On 05/04/2019 10:19 a.m., Therneau, Terry M., Ph.D. wrote: Duncan, I should have included it in my original note, but all.equal(unclass(t0x), unclass(t1x)) returns TRUE as well. I had tried that as well. But a further look at all.equal.default shows the following line right near the top: if (is.language(target) || is.function(target)) return(all.equal.language(target, current, ...)) and that path explicitly ignores attributes. Which R version are you using? I see deparse(target) and deparse(current) in all.equal.language(), and those should not be ignoring attributes according to the documentation. Duncan Murdoch I'll change my original original title to "all.equal was not a good tool for testing certain code issues". Thanks for the pointer, Terry On 4/5/19 9:00 AM, Duncan Murdoch wrote: On 05/04/2019 9:03 a.m., Therneau, Terry M., Ph.D. via R-devel wrote: This arose in testing [.terms and has me confused. data(esoph) # use a standard data set t0x <- terms(model.frame( ~ tobgp, data=esoph)) t1 <- terms(model.frame(ncases ~ agegp + tobgp, data=esoph)) t1x <- (delete.response(t1))[-1] > all.equal(t0x, t1x) [1] TRUE # the above is wrong, because they actually are not the same > all.equal(attr(t0x, 'dataClasses'), attr(t1x, 'dataClasses')) [1] "Names: 1 string mismatch" [2] "Lengths (1, 2) differ (string compare on first 1)" As documented, all.equal() is generic, with methods for different classes. The classes of both t0x and t1x are c("terms","formula") with no all.equal.terms method, so all.equal.formula is called. That method isn't specifically documented, but you can see its definition as function (target, current, ...) { if (length(target) != length(current)) return(paste0("target, current differ in having response: ", length(target) == 3L, ", ", length(current) == 3L)) if (!identical(deparse(target), deparse(current))) "formulas differ in contents" else TRUE } So the issue is that deparse(t0x) and deparse(t1x) give the same strings with no attributes shown, even though "showAttributes" is set by default. I haven't traced through the C code to see where things are going wrong. Duncan Murdoch > sessionInfo() R Under development (unstable) (2019-04-05 r76323) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 18.04.2 LTS Matrix products: default BLAS: /usr/local/src/R-devel/lib/libRblas.so LAPACK: /usr/local/src/R-devel/lib/libRlapack.so locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_3.7.0 tools_3.7.0 [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [EXTERNAL] Re: Re: all.equal failure
On 4/5/19 9:39 AM, Duncan Murdoch wrote: On 05/04/2019 10:19 a.m., Therneau, Terry M., Ph.D. wrote: Duncan, I should have included it in my original note, but all.equal(unclass(t0x), unclass(t1x)) returns TRUE as well. I had tried that as well. But a further look at all.equal.default shows the following line right near the top: if (is.language(target) || is.function(target)) return(all.equal.language(target, current, ...)) and that path explicitly ignores attributes. Which R version are you using? I see deparse(target) and deparse(current) in all.equal.language(), and those should not be ignoring attributes according to the documentation. I'm using today's version of R-devel on Ubuntu. (svn up this AM) But I agree, both target and current appear. Duncan Murdoch I'll change my original original title to "all.equal was not a good tool for testing certain code issues". Thanks for the pointer, Terry On 4/5/19 9:00 AM, Duncan Murdoch wrote: On 05/04/2019 9:03 a.m., Therneau, Terry M., Ph.D. via R-devel wrote: This arose in testing [.terms and has me confused. data(esoph) # use a standard data set t0x <- terms(model.frame( ~ tobgp, data=esoph)) t1 <- terms(model.frame(ncases ~ agegp + tobgp, data=esoph)) t1x <- (delete.response(t1))[-1] > all.equal(t0x, t1x) [1] TRUE # the above is wrong, because they actually are not the same > all.equal(attr(t0x, 'dataClasses'), attr(t1x, 'dataClasses')) [1] "Names: 1 string mismatch" [2] "Lengths (1, 2) differ (string compare on first 1)" As documented, all.equal() is generic, with methods for different classes. The classes of both t0x and t1x are c("terms","formula") with no all.equal.terms method, so all.equal.formula is called. That method isn't specifically documented, but you can see its definition as function (target, current, ...) { if (length(target) != length(current)) return(paste0("target, current differ in having response: ", length(target) == 3L, ", ", length(current) == 3L)) if (!identical(deparse(target), deparse(current))) "formulas differ in contents" else TRUE } So the issue is that deparse(t0x) and deparse(t1x) give the same strings with no attributes shown, even though "showAttributes" is set by default. I haven't traced through the C code to see where things are going wrong. Duncan Murdoch > sessionInfo() R Under development (unstable) (2019-04-05 r76323) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 18.04.2 LTS Matrix products: default BLAS: /usr/local/src/R-devel/lib/libRblas.so LAPACK: /usr/local/src/R-devel/lib/libRlapack.so locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_3.7.0 tools_3.7.0 [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [EXTERNAL] Re: Re: all.equal failure
On 05/04/2019 10:46 a.m., Therneau, Terry M., Ph.D. wrote: On 4/5/19 9:39 AM, Duncan Murdoch wrote: On 05/04/2019 10:19 a.m., Therneau, Terry M., Ph.D. wrote: Duncan, I should have included it in my original note, but all.equal(unclass(t0x), unclass(t1x)) returns TRUE as well. I had tried that as well. But a further look at all.equal.default shows the following line right near the top: if (is.language(target) || is.function(target)) return(all.equal.language(target, current, ...)) and that path explicitly ignores attributes. Which R version are you using? I see deparse(target) and deparse(current) in all.equal.language(), and those should not be ignoring attributes according to the documentation. I'm using today's version of R-devel on Ubuntu. (svn up this AM) But I agree, both target and current appear. That's not what I said. I said that the attributes should not be ignored in that function. I don't see anything in the R-devel version of it that ignores attributes: > all.equal.language function (target, current, ...) { mt <- mode(target) mc <- mode(current) if (mt == "expression" && mc == "expression") return(all.equal.list(target, current, ...)) ttxt <- paste(deparse(target), collapse = "\n") ctxt <- paste(deparse(current), collapse = "\n") msg <- c(if (mt != mc) paste0("Modes of target, current: ", mt, ", ", mc), if (ttxt != ctxt) { if (pmatch(ttxt, ctxt, 0L)) "target is a subset of current" else if (pmatch(ctxt, ttxt, 0L)) "current is a subset of target" else "target, current do not match when deparsed" }) if (is.null(msg)) TRUE else msg } Duncan Murdoch Duncan Murdoch I'll change my original original title to "all.equal was not a good tool for testing certain code issues". Thanks for the pointer, Terry On 4/5/19 9:00 AM, Duncan Murdoch wrote: On 05/04/2019 9:03 a.m., Therneau, Terry M., Ph.D. via R-devel wrote: This arose in testing [.terms and has me confused. data(esoph) # use a standard data set t0x <- terms(model.frame( ~ tobgp, data=esoph)) t1 <- terms(model.frame(ncases ~ agegp + tobgp, data=esoph)) t1x <- (delete.response(t1))[-1] > all.equal(t0x, t1x) [1] TRUE # the above is wrong, because they actually are not the same > all.equal(attr(t0x, 'dataClasses'), attr(t1x, 'dataClasses')) [1] "Names: 1 string mismatch" [2] "Lengths (1, 2) differ (string compare on first 1)" As documented, all.equal() is generic, with methods for different classes. The classes of both t0x and t1x are c("terms","formula") with no all.equal.terms method, so all.equal.formula is called. That method isn't specifically documented, but you can see its definition as function (target, current, ...) { if (length(target) != length(current)) return(paste0("target, current differ in having response: ", length(target) == 3L, ", ", length(current) == 3L)) if (!identical(deparse(target), deparse(current))) "formulas differ in contents" else TRUE } So the issue is that deparse(t0x) and deparse(t1x) give the same strings with no attributes shown, even though "showAttributes" is set by default. I haven't traced through the C code to see where things are going wrong. Duncan Murdoch > sessionInfo() R Under development (unstable) (2019-04-05 r76323) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 18.04.2 LTS Matrix products: default BLAS: /usr/local/src/R-devel/lib/libRblas.so LAPACK: /usr/local/src/R-devel/lib/libRlapack.so locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_3.7.0 tools_3.7.0 [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [EXTERNAL] Re: Re: all.equal failure
> Duncan Murdoch > on Fri, 5 Apr 2019 11:12:48 -0400 writes: > On 05/04/2019 10:46 a.m., Therneau, Terry M., Ph.D. wrote: >> >> >> On 4/5/19 9:39 AM, Duncan Murdoch wrote: >>> On 05/04/2019 10:19 a.m., Therneau, Terry M., Ph.D. wrote: Duncan, I should have included it in my original note, but all.equal(unclass(t0x), unclass(t1x)) returns TRUE as well. I had tried that as well. But a further look at all.equal.default shows the following line right near the top: if (is.language(target) || is.function(target)) return(all.equal.language(target, current, ...)) and that path explicitly ignores attributes. >>> >>> Which R version are you using? I see deparse(target) and deparse(current) in >>> all.equal.language(), and those should not be ignoring attributes according to the >>> documentation. But the problem is that indeed "of course" all.equal.formula() and not all.equal.language() is called for the terms since as you yourself remarked, their class is c("terms", "formula"), and so what Terry reported is indeed correct *and* a bug and in "all versions" of R (I did not look far back, but these things haven't changed much). The cleanest would probably be to define an all.equal.terms() method, as I think there may be more code relying on the behavior of all.equal.formula() to only look at the formulas themselves and not their attributes... but you (Duncan) and others may have a different opinion. Martin Maechler ETH Zurich and R Core Team __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [EXTERNAL] Re: Re: all.equal failure
> Martin Maechler > on Fri, 5 Apr 2019 17:33:54 +0200 writes: > Duncan Murdoch > on Fri, 5 Apr 2019 11:12:48 -0400 writes: >> On 05/04/2019 10:46 a.m., Therneau, Terry M., Ph.D. wrote: >>> >>> >>> On 4/5/19 9:39 AM, Duncan Murdoch wrote: On 05/04/2019 10:19 a.m., Therneau, Terry M., Ph.D. wrote: > Duncan, > I should have included it in my original note, but > > all.equal(unclass(t0x), unclass(t1x)) > > returns TRUE as well. I had tried that as well. But a further look at > all.equal.default shows the following line right near the top: > if (is.language(target) || is.function(target)) > return(all.equal.language(target, current, ...)) > > and that path explicitly ignores attributes. Which R version are you using? I see deparse(target) and deparse(current) in all.equal.language(), and those should not be ignoring attributes according to the documentation. > But the problem is that indeed "of course" all.equal.formula() > and not all.equal.language() is called for the terms since as > you yourself remarked, their class is c("terms", "formula"), > and so what Terry reported is indeed correct *and* a bug > and in "all versions" of R (I did not look far back, but these things > haven't changed much). > The cleanest would probably be to define an all.equal.terms() > method, as I think there may be more code relying on the > behavior of all.equal.formula() to only look at the formulas > themselves and not their attributes... > but you (Duncan) and others may have a different opinion. and I do agree with Duncan even more now that indeed it's very unsatisfactory that deparse() {and dput(), dump() ..} of a terms object would only reproduce the formula and nothing else; and yes that's all in the C code: --> src/main/deparse.c --> in function deparse2buff() --> inside the (350 lines large) 'case LANGSXP'. Martin __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [EXTERNAL] Re: Re: all.equal failure
On 05/04/2019 11:33 a.m., Martin Maechler wrote: Duncan Murdoch on Fri, 5 Apr 2019 11:12:48 -0400 writes: > On 05/04/2019 10:46 a.m., Therneau, Terry M., Ph.D. wrote: >> >> >> On 4/5/19 9:39 AM, Duncan Murdoch wrote: >>> On 05/04/2019 10:19 a.m., Therneau, Terry M., Ph.D. wrote: Duncan, I should have included it in my original note, but all.equal(unclass(t0x), unclass(t1x)) returns TRUE as well. I had tried that as well. But a further look at all.equal.default shows the following line right near the top: if (is.language(target) || is.function(target)) return(all.equal.language(target, current, ...)) and that path explicitly ignores attributes. >>> >>> Which R version are you using? I see deparse(target) and deparse(current) in >>> all.equal.language(), and those should not be ignoring attributes according to the >>> documentation. But the problem is that indeed "of course" all.equal.formula() and not all.equal.language() is called for the terms since as you yourself remarked, their class is c("terms", "formula"), and so what Terry reported is indeed correct *and* a bug and in "all versions" of R (I did not look far back, but these things haven't changed much). The cleanest would probably be to define an all.equal.terms() method, as I think there may be more code relying on the behavior of all.equal.formula() to only look at the formulas themselves and not their attributes... but you (Duncan) and others may have a different opinion. I don't know if that would be easy -- it seems to me there is a bug in deparse(), which won't show attributes on language objects even if you ask it to: # This is fine: deparse(structure(1, attrib=2)) # [1] "structure(1, attrib = 2)" # This doesn't show the attributes deparse(structure(quote(f(1)), attrib=2)) # [1] "f(1)" But as you mention, if this isn't a new bug fixing it will likely cause problems for people who assume it is intentional... Duncan __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] all.equal failure and [.terms
The all.equal was a side issue for me; I don't have strong opinions one way or the other. You are welcome to leave me out of the loop on that. (Or leave me on the cc, whatever is easiest). I will update the survival package once the [.terms issues are addressed. One debatable issues is the choice of change vs document for the offset() issue. With my proposed fix or without it, offsets are completely ignored by [.terms and dropterms. So with a formula of z <- terms(y~ x1 + offset(x2) + x3) the 2 in drop.terms(z, 2) or z[-2] refers to x3, and the result will drop both the offset and x3. For the use cases that I can think of the two functions are used at the 'build the X matrix' stage, offsets have already been accounted for, and the present behavior is fine. My vote would be to document it with a few lines in the help file since that is the easiest. Offsets don't count as a 'term' in the assign attribute either so the current behavior is consistent in that respect. Terry T. [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel