Don't use $ notation in lm() formulas. Use lm(w ~ h, data=DAT). -pd
> On 4 Nov 2020, at 10:50 , Boris Steipe <boris.ste...@utoronto.ca> wrote: > > Can't get data from a data frame into predict() without a detour that seems > quite unnecessary ... > > Reprex: > > # Data frame with simulated data in columns "h" (independent) and "w" > (dependent) > DAT <- structure(list(h = c(2.174, 2.092, 2.059, 1.952, 2.216, 2.118, > 1.755, 2.060, 2.136, 2.126, 1.792, 1.574, > 2.117, 1.741, 2.295, 1.526, 1.666, 1.581, > 1.522, 1.995), > w = c(90.552, 89.518, 84.124, 94.685, 94.710, 82.429, > 87.176, 90.318, 76.873, 84.183, 57.890, 62.005, > 84.258, 78.317,101.304, 64.982, 71.237, 77.124, > 65.010, 81.413)), > row.names = c( "1", "2", "3", "4", "5", "6", "7", > "8", "9", "10", "11", "12", "13", "14", > "15", "16", "17", "18", "19", "20"), > class = "data.frame") > > > myFit <- lm(DAT$w ~ DAT$h) > coef(myFit) > > # (Intercept) DAT$h > # 11.76475 35.92002 > > > # Create 50 x-values with seq() to plot confidence intervals > myNew <- data.frame(seq(min(DAT$h), max(DAT$h), length.out = 50)) > > pc <- predict(myFit, newdata = myNew, interval = "confidence") > > # Warning message: > # 'newdata' had 50 rows but variables found have 20 rows > > # Problem: predict() was not able to take the single column in myNew > # as the independent variable. > > # Ugly workaround: but with that everything works as expected. > xx <- DAT$h > yy <- DAT$w > myFit <- lm(yy ~ xx) > coef(myFit) > > myNew <- data.frame(seq(min(DAT$h), max(DAT$h), length.out = 50)) > colnames(myNew) <- "xx" # This fixes it! > > pc <- predict(myFit, newdata = myNew, interval = "confidence") > str(pc) > > # So: specifying the column in newdata to have same name as the coefficient > # name should work, right? > # Back to the original ... > > myFit <- lm(DAT$w ~ DAT$h) > colnames(myNew) <- "`DAT$h`" > # ... same error > > colnames(myNew) <- "h" > # ... same error again. > > Bottom line: how can I properly specify newdata? The documentation is opaque. > It seems the algorithm is trying to EXACTLY match the text of the RHS of the > formula, which is unlikely to result in a useful column name, unless I assign > to an intermediate variable. There must be a better way ... > > > > Thanks! > Boris > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd....@cbs.dk Priv: pda...@gmail.com ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.