Small follow-up: (1) in order for lm() to actually work you need keep.response=TRUE in the drop.terms() call (I realize that this is *not* the problem in your example)

test4 <- terms(mpg ~ hp + I(cyl==4) + disp + wt )
check4 <- drop.terms(test4, 3, keep.response = TRUE)
formula(check4)
lm( check4, data=mtcars)

(2) I'm ambivalent about your "We can argue that the user should have used I(cyl==4), but very many won't." argument. This is the ever-present "document precisely and require users to know and follow the documentation" vs. "try to protect users from themselves" debate - taking either side to an extreme is (IMO) unproductive. I don't know how hard it would be to make drop.terms() **not** drop parentheses, but it seems like it may be very hard/low-level. My vote would be to see if there is a reasonably robust way to detect these constructions and **warn** about them.

I have probably asked about this before, but if anyone knows of useful materials that go into more details about the definitions and implementation of model matrix/terms/etc. machinery, *beyond* the appropriate chapter of "Statistical Models in S" (Becker/Chambers white book), *or* the source code itself, I would love some pointers ...

 Ben Bolker


On 8/23/21 10:36 AM, Therneau, Terry M., Ph.D. via R-devel wrote:
This is a follow-up to my earlier note on [.terms.   Based on a couple days' 
work getting
the survival package to work around  issues, this will hopefully be more 
concise and
better expressed than the prior note.

1.
test1 <- terms( y ~ x1:x2 + x3)
check <- drop.terms(termobj =test1, dropx = 1)
formula(check)
## ~x1:x2

The documentation for the dropx argument is "vector of positions of variables 
to drop from
the right hand side of the model", but it is not clear what "positions" is.   I 
originally
assumed "the order in the formula as typed", but was wrong.   I suggest adding 
a line
"Position refers to the order of terms in the term.labels attribute of the 
terms object,
which is also the order they will appear in a coefficient vector (not counting 
the
intercept).

2.
library(splines)
test2 <- terms(model.frame(mpg ~  offset(cyl) + ns(hp, df=3) + disp + wt, 
data=mtcars))
check2 <- drop.terms(test2,  dropx = 2)
formula(check2)
## ~ns(hp, df=3) + wt

One side effect of how drop.terms is implemented, and one that I suspect was 
not intended,
is that offsets are completly ignored.    The above drops both the offset and 
the disp
term from the formula   The dataClasses and predvars attributes of the result 
are also
incorrect: they have lost the ns() term rather than the disp term;
the results of predict will be incorrect.

attr(check2, "predvars")
##    list(offset(cyl), disp, wt)

Question: should the function be updated to not drop offsets? If not a line 
needs to be
added to the help file.   The handling of predvars needs to be fixed regardless.

3.
test3 <- terms(mpg ~ hp + (cyl==4) + disp + wt )
check3 <- drop.terms(test3, 3)
formula(check3)
lm( check3, data=mtcars)   # fails

The drop.terms action has lost the () around the logical expression, which 
leads to an
invalid formula.  We can argue that the user should have used I(cyc==4), but 
very many won't.

4. As a footnote, more confusion (for me) is generated by the fact that the 
"specials"
attribute of a formula does not use the numbering discussed in 1 above.   I had 
solved
this issue long ago in the untangle.specials function; long enough ago that I 
forgot I had
solved it, and just wasted a day rediscovering that fact.

---

I can create a patch for 1 and 2 (once we answer my question), but a fix for 3 
is not
clear to me.  It currently leads to failure in a coxph call that includes a 
strata so I am
directly interested in a solution; e.g.,  coxph(Surv(time, status) ~ age + 
(ph.ecog==2) +
strata(inst), data=lung)

Terry T


--
Dr. Benjamin Bolker
Professor, Mathematics & Statistics and Biology, McMaster University
Director, School of Computational Science and Engineering
Graduate chair, Mathematics & Statistics

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to