Thanks to Bill Dunlap for the clarification. On follow-up it turns out that this will be
an issue for many if not most of the routines in the survival package: a lot of them look
at the terms structure and make use of the dimnames of attr(terms, 'factors'), which also
keeps the unneeded backquotes. Others use the term.labels attribute. To dodge this I
will need to create a fixterms() routine which I call at the top of every single routine
in the library.
Is there a chance for a fix at a higher level?
Terry T.
On 03/05/2018 03:55 PM, William Dunlap wrote:
I believe this has to do terms() making "term.labels" (hence the dimnames of
"factors")
with deparse(), so that the backquotes are included for non-syntactic names.
The backquotes
are not in the column names of the input data.frame (nor model frame) so you
get a mismatch
when subscripting the data.frame or model.frame with elements of
terms()$term.labels.
I think you can avoid the problem by adding right after
ll <- attr(Terms, "term.labels")
the line
ll <- gsub("^`|`$", "", ll)
E.g.,
> d <- data.frame(check.names=FALSE, y=1/(1:5), `b$a$d`=sin(1:5)+2, `x y
z`=cos(1:5)+2)
> Terms <- terms( y ~ log(`b$a$d`) + `x y z` )
> m <- model.frame(Terms, data=d)
> colnames(m)
[1] "y" "log(`b$a$d`)" "x y z"
> attr(Terms, "term.labels")
[1] "log(`b$a$d`)" "`x y z`"
> ll <- attr(Terms, "term.labels")
> gsub("^`|`$", "", ll)
[1] "log(`b$a$d`)" "x y z"
It is a bit of a mess.
Bill Dunlap
TIBCO Software
wdunlap tibco.com <http://tibco.com>
On Mon, Mar 5, 2018 at 12:55 PM, Therneau, Terry M., Ph.D. via R-devel
<r-devel@r-project.org <mailto:r-devel@r-project.org>> wrote:
A user reported a problem with the survdiff function and the use of
variables that
contain a space. Here is a simple example. The same issue occurs in
survfit for the
same reason.
lung2 <- lung
names(lung2)[1] <- "in st" # old name is inst
survdiff(Surv(time, status) ~ `in st`, data=lung2)
Error in `[.data.frame`(m, ll) : undefined columns selected
In the body of the code the program want to send all of the right-hand side
variables
forward to the strata() function. The code looks more or less like this,
where m is
the model frame
Terms <- terms(m)
index <- attr(Terms, "term.labels")
if (length(index) ==0) X <- rep(1L, n) # no coariates
else X <- strata(m[index])
For the variable with a space in the name the term.label is "`in st`", and
the
subscript fails.
Is this intended behaviour or a bug? The issue is that the name of this
column in the
model frame does not have the backtics, while the terms structure does have
them.
Terry T.
______________________________________________
R-devel@r-project.org <mailto:R-devel@r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
<https://stat.ethz.ch/mailman/listinfo/r-devel>
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel