Small follow-up: (1) in order for lm() to actually work you need
keep.response=TRUE in the drop.terms() call (I realize that this is
*not* the problem in your example)
test4 <- terms(mpg ~ hp + I(cyl==4) + disp + wt )
check4 <- drop.terms(test4, 3, keep.response = TRUE)
formula(check4)
lm( check4, data=mtcars)
(2) I'm ambivalent about your "We can argue that the user should have
used I(cyl==4), but very many won't." argument. This is the ever-present
"document precisely and require users to know and follow the
documentation" vs. "try to protect users from themselves" debate -
taking either side to an extreme is (IMO) unproductive. I don't know how
hard it would be to make drop.terms() **not** drop parentheses, but it
seems like it may be very hard/low-level. My vote would be to see if
there is a reasonably robust way to detect these constructions and
**warn** about them.
I have probably asked about this before, but if anyone knows of
useful materials that go into more details about the definitions and
implementation of model matrix/terms/etc. machinery, *beyond* the
appropriate chapter of "Statistical Models in S" (Becker/Chambers white
book), *or* the source code itself, I would love some pointers ...
Ben Bolker
On 8/23/21 10:36 AM, Therneau, Terry M., Ph.D. via R-devel wrote:
This is a follow-up to my earlier note on [.terms. Based on a couple days'
work getting
the survival package to work around issues, this will hopefully be more
concise and
better expressed than the prior note.
1.
test1 <- terms( y ~ x1:x2 + x3)
check <- drop.terms(termobj =test1, dropx = 1)
formula(check)
## ~x1:x2
The documentation for the dropx argument is "vector of positions of variables
to drop from
the right hand side of the model", but it is not clear what "positions" is. I
originally
assumed "the order in the formula as typed", but was wrong. I suggest adding
a line
"Position refers to the order of terms in the term.labels attribute of the
terms object,
which is also the order they will appear in a coefficient vector (not counting
the
intercept).
2.
library(splines)
test2 <- terms(model.frame(mpg ~ offset(cyl) + ns(hp, df=3) + disp + wt,
data=mtcars))
check2 <- drop.terms(test2, dropx = 2)
formula(check2)
## ~ns(hp, df=3) + wt
One side effect of how drop.terms is implemented, and one that I suspect was
not intended,
is that offsets are completly ignored. The above drops both the offset and
the disp
term from the formula The dataClasses and predvars attributes of the result
are also
incorrect: they have lost the ns() term rather than the disp term;
the results of predict will be incorrect.
attr(check2, "predvars")
## list(offset(cyl), disp, wt)
Question: should the function be updated to not drop offsets? If not a line
needs to be
added to the help file. The handling of predvars needs to be fixed regardless.
3.
test3 <- terms(mpg ~ hp + (cyl==4) + disp + wt )
check3 <- drop.terms(test3, 3)
formula(check3)
lm( check3, data=mtcars) # fails
The drop.terms action has lost the () around the logical expression, which
leads to an
invalid formula. We can argue that the user should have used I(cyc==4), but
very many won't.
4. As a footnote, more confusion (for me) is generated by the fact that the
"specials"
attribute of a formula does not use the numbering discussed in 1 above. I had
solved
this issue long ago in the untangle.specials function; long enough ago that I
forgot I had
solved it, and just wasted a day rediscovering that fact.
---
I can create a patch for 1 and 2 (once we answer my question), but a fix for 3
is not
clear to me. It currently leads to failure in a coxph call that includes a
strata so I am
directly interested in a solution; e.g., coxph(Surv(time, status) ~ age +
(ph.ecog==2) +
strata(inst), data=lung)
Terry T
--
Dr. Benjamin Bolker
Professor, Mathematics & Statistics and Biology, McMaster University
Director, School of Computational Science and Engineering
Graduate chair, Mathematics & Statistics
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel