I knew I had seen this before but couldn't previously remember where. https://github.com/lme4/lme4/issues/441 ... I initially fixed with gsub(), but (pushed by Martin Maechler to do better) I eventually fixed it by storing the original names of the model frame (without backticks) as an attribute for later retrieval: https://github.com/lme4/lme4/commit/56416fc8b3b5153df7df5547082835c5d5725e89.
On Wed, Mar 7, 2018 at 8:22 AM, Therneau, Terry M., Ph.D. via R-devel <r-devel@r-project.org> wrote: > Thanks to Bill Dunlap for the clarification. On follow-up it turns out that > this will be an issue for many if not most of the routines in the survival > package: a lot of them look at the terms structure and make use of the > dimnames of attr(terms, 'factors'), which also keeps the unneeded > backquotes. Others use the term.labels attribute. To dodge this I will > need to create a fixterms() routine which I call at the top of every single > routine in the library. > > Is there a chance for a fix at a higher level? > > Terry T. > > > > On 03/05/2018 03:55 PM, William Dunlap wrote: >> >> I believe this has to do terms() making "term.labels" (hence the dimnames >> of "factors") >> with deparse(), so that the backquotes are included for non-syntactic >> names. The backquotes >> are not in the column names of the input data.frame (nor model frame) so >> you get a mismatch >> when subscripting the data.frame or model.frame with elements of >> terms()$term.labels. >> >> I think you can avoid the problem by adding right after >> ll <- attr(Terms, "term.labels") >> the line >> ll <- gsub("^`|`$", "", ll) >> >> E.g., >> >> > d <- data.frame(check.names=FALSE, y=1/(1:5), `b$a$d`=sin(1:5)+2, `x y >> z`=cos(1:5)+2) >> > Terms <- terms( y ~ log(`b$a$d`) + `x y z` ) >> > m <- model.frame(Terms, data=d) >> > colnames(m) >> [1] "y" "log(`b$a$d`)" "x y z" >> > attr(Terms, "term.labels") >> [1] "log(`b$a$d`)" "`x y z`" >> > ll <- attr(Terms, "term.labels") >> > gsub("^`|`$", "", ll) >> [1] "log(`b$a$d`)" "x y z" >> >> It is a bit of a mess. >> >> >> Bill Dunlap >> TIBCO Software >> wdunlap tibco.com <http://tibco.com> >> >> On Mon, Mar 5, 2018 at 12:55 PM, Therneau, Terry M., Ph.D. via R-devel >> <r-devel@r-project.org <mailto:r-devel@r-project.org>> wrote: >> >> A user reported a problem with the survdiff function and the use of >> variables that >> contain a space. Here is a simple example. The same issue occurs in >> survfit for the >> same reason. >> >> lung2 <- lung >> names(lung2)[1] <- "in st" # old name is inst >> survdiff(Surv(time, status) ~ `in st`, data=lung2) >> Error in `[.data.frame`(m, ll) : undefined columns selected >> >> In the body of the code the program want to send all of the right-hand >> side variables >> forward to the strata() function. The code looks more or less like >> this, where m is >> the model frame >> >> Terms <- terms(m) >> index <- attr(Terms, "term.labels") >> if (length(index) ==0) X <- rep(1L, n) # no coariates >> else X <- strata(m[index]) >> >> For the variable with a space in the name the term.label is "`in st`", >> and the >> subscript fails. >> >> Is this intended behaviour or a bug? The issue is that the name of >> this column in the >> model frame does not have the backtics, while the terms structure does >> have them. >> >> Terry T. >> >> ______________________________________________ >> R-devel@r-project.org <mailto:R-devel@r-project.org> mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> <https://stat.ethz.ch/mailman/listinfo/r-devel> >> >> > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel