>>>>> Duncan Murdoch <murdoch.dun...@gmail.com> >>>>> on Fri, 27 Apr 2018 10:28:16 -0400 writes:
> On 27/04/2018 9:25 AM, Hadley Wickham wrote: >> Hi all, >> >> Very surprising (to me!) and mystifying result from predict.glm(): the >> predictions vary depending on whether or not I use ns() or >> splines::ns(). Reprex follows: > >> library(splines) >> >> set.seed(12345) >> dat <- data.frame(claim = rbinom(1000, 1, 0.5)) >> mns <- c(3.4, 3.6) >> sds <- c(0.24, 0.35) >> dat$wind <- exp(rnorm(nrow(dat), mean = mns[dat$claim + 1], sd = >> sds[dat$claim + 1])) >> dat <- dat[order(dat$wind), ] >> >> m1 <- glm(claim ~ ns(wind, df = 5), data = dat, family = binomial) >> m2 <- glm(claim ~ splines::ns(wind, df = 5), data = dat, family = binomial) >> >> # The model coefficients are the same >> unname(coef(m1)) >> #> [1] 0.5194712 -0.8687737 -0.6803954 4.0838947 2.3908674 4.1564128 >> unname(coef(m2)) >> #> [1] 0.5194712 -0.8687737 -0.6803954 4.0838947 2.3908674 4.1564128 >> >> # But the predictions are not! >> newdf <- data.frame(wind = seq(min(dat$wind), max(dat$wind), length = 5)) >> unname(predict(m1, newdata = newdf)) >> #> [1] 0.51947119 0.03208719 2.82548847 3.90883496 4.06743266 >> unname(predict(m2, newdata = newdf)) >> #> [1] 0.5194712 -0.5666554 -0.1731268 2.8134844 3.9295814 >> >> Is this a bug? > The two objects m1 and m2 differ more than they should, so this looks > like a problem in glm, not just in predict.glm. >> attr(m1$terms, "predvars") > list(claim, ns(wind, knots = c(25.4756277492997, 30.2270250736796, > 35.4093171222502, 43.038645381669), Boundary.knots = c(12.9423820390783, > 108.071583734075), intercept = FALSE)) >> attr(m2$terms, "predvars") > list(claim, splines::ns(wind, df = 5)) > This appears to be due to a bug in the splines package. There, the > function splines:::makepredictcall.ns looks like this: > makepredictcall.ns <- function(var, call) > { > if(as.character(call)[1L] != "ns") return(call) > at <- attributes(var)[c("knots", "Boundary.knots", "intercept")] > xxx <- call[1L:2L] > xxx[names(at)] <- at > xxx > } > The test fails for m2, because as.character(call)[1L] is "splines::ns" > instead of "ns". I'll see if I can work out a better test and submit a > patch. > Duncan Murdoch Thank you, Duncan, for the good diagnosis and the patch! I will deal with it. Two things however (both *not* needing a new patch; I'll deal with it after dinner ;-) 1) I'd like to commit that ASAP, but do want to add a *minimal* Reprex for regression test, rather than the above, notably as I see that the only place our code calls makepredictcall() is in model.frame.default(), so I thought this should not really be related to glm at all, and I was right. 2) Reading the help page ?makepredictcall --- yes, a good idea, even for experts, believe me, ((notably when it is from a base package, not produced from roxy.. ;-\)) --- reveals what I vaguely remembered: The function was introduced to fix a bug first encountered in lm(. ~ poly(.)) and indeed: -> That help page actually contains an indirect closer to minimal reprex. -> I will also patch the makepredictcall.poly() method. [ as I said: after dinner .. ;-)] -- Martin ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel