carol white <wht_crl <at> yahoo.com> writes: > > Hi, > I split a data set into two partitions (80 and 42), use the first as the training set in glm and the second as > testing set in glm predict. But when I call glm.predict, I get the warning message: > > Warning message: > 'newdata' had 42 rows but variable(s) found have 80 rows > ---------------------
[snip] The warning correctly diagnoses the problem. The posting guide asks for a 'reproducible example', but you did not give us one. There is one below. Note what happens when predict() tries to reconstruct the variable 'x[1:4]' as dictated by the formula. How many elements can 'x[1:4]' have when newdata has (say) nrowsNew? Use the subset argument to select a subset of observations. > y <- sample(factor(1:2),80,repl=T) > y <- sample(factor(1:2),5,repl=T) > x <- 1:4 > fit <- glm( y[1:4] ~ x[1:4], family = binomial) > fit Call: glm(formula = y[1:4] ~ x[1:4], family = binomial) Coefficients: (Intercept) x[1:4] -1.110e-16 0.000e+00 Degrees of Freedom: 3 Total (i.e. Null); 2 Residual Null Deviance: 5.545 Residual Deviance: 5.545 AIC: 9.545 > predict(fit,newdata=data.frame(x=1:2)) 1 2 3 4 -1.110223e-16 -1.110223e-16 NA NA Warning message: 'newdata' had 2 rows but variable(s) found have 4 rows > predict(fit,newdata=data.frame(x=1:5)) 1 2 3 4 -1.110223e-16 -1.110223e-16 -1.110223e-16 -1.110223e-16 Warning message: 'newdata' had 5 rows but variable(s) found have 4 rows > HTH, Chuck [rest deleted] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.