On Wed, 2010-06-23 at 16:30 -0700, cc super wrote: > What if the size of the newdata is different from the previous one > used to generate the regression model? > > Let's say > > pdat <- data.frame(x1 = rnorm(5, 2), x2 = rnorm(5)) > predict(lin, pdat) > > It comes up with warning and the result is not correct.
No it doesn't (not on my system at least) and you give no indication what the warning/error is or why you think the result is incorrect. set.seed(1234) y <- rnorm(10, mean = 5) dat <- data.frame(x1 = rnorm(10, mean = 2), x2 = rnorm(10)) lin <- lm(y~x1+x2, data = dat) pdat <- data.frame(x1 = rnorm(5, 2), x2 = rnorm(5)) predict(lin, pdat) > predict(lin, pdat) 1 2 3 4 5 4.971021 6.482085 5.742130 4.652898 5.194980 The size of newdata shouldn't matter at all: > pdat2 <- data.frame(x1 = rnorm(1, 2), x2 = rnorm(1)) > predict(lin, pdat2) 1 4.185984 > pdat3 <- data.frame(x1 = rnorm(1000, 2), x2 = rnorm(1000)) > head((p3 <- predict(lin, pdat3))) 1 2 3 4 5 6 4.076389 4.118338 4.054443 3.976399 4.055590 4.126018 > length(p3) [1] 1000 So can you show the exact code you used plus the errors/warnings you report and why you think the results are incorrect? For example, automating your by-hand computation a bit shows that predict() produces the correct result. Compare: > ## The cbind(1, ....) bit is for the intercept... > data.matrix(cbind(1, pdat)) %*% coef(lin) [,1] [1,] 4.178516 [2,] 4.101112 [3,] 4.081370 [4,] 4.146819 [5,] 4.186405 > predict(lin, pdat) 1 2 3 4 5 4.178516 4.101112 4.081370 4.146819 4.186405 HTH G > > Thanks! > > > 2010/6/23 Gavin Simpson <gavin.simp...@ucl.ac.uk> > On Tue, 2010-06-22 at 23:11 -0700, cc super wrote: > > Hi, everyone, > > > > Night. I have three questions about multiple linear > regression in R. > > > > Q1: > > > > y=rnorm(10,mean=5) > > x1=rnorm(10,mean=2) > > x2=rnorm(10) > > lin=lm(y~x1+x2) > > summary(lin) > > > > ## In the summary, 'Residual standard error: 1.017 on 7 > degrees of freedom', > > 1.017 is the estimate of the constance variance? > > > Yes, it is sigma. > > Just a note, in order for the above code to yield the same > results as > you quote, you need a call to set.seed() to fix the pseudo > random number > generator. > > > Q2: > > > > beta0=lin$coefficients[1] > > beta1=lin$coefficients[2] > > beta2=lin$coefficients[3] > > > > y_hat=beta0+beta1*x1+beta2*x2 > > > > ## Is there any built-in function in R to obtain y_hat > directly? > > > fitted(lin) > > Note that there are quite a few standard extractor functions > like fitted > available for modelling functions in R. coef() for example > should be > used to extract the coefficients, resid() will extract > residuals etc. > > > Q3: > > > > If I want to apply this regression result to another > dataset, that is, new > > x1 and x2. Is the built-in function in 2 still available? > > > It is called predict() (although if you called predict(lin) > above > instead of fitted(lin) it would have produced the same answer; > the > fitted values for the observations). > > One gotcha that catches people out is that in the new dataset, > the > variables (used in the model) must have the same names as the > data frame > used to fit it. So we could do: > > pdat <- data.frame(x1 = rnorm(10, 2), x2 = rnorm(10)) > predict(lin, pdat) > > to get predictions at the new values of x1 an x2. > > > Thank you in advance! > > HTH > > G > > -- > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~ > %~%~%~% > Dr. Gavin Simpson [t] +44 (0)20 7679 0522 > ECRC, UCL Geography, [f] +44 (0)20 7679 0565 > Pearson Building, [e] > gavin.simpsonATNOSPAMucl.ac.uk > Gower Street, London [w] > http://www.ucl.ac.uk/~ucfagls/ > UK. WC1E 6BT. [w] > http://www.freshwaters.org.uk > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~ > %~%~%~% > > -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.