Hi: I think the problem is that you're trying to append the predicted probabilities as a new variable in the (one-line) data frame, when in fact a vector of probabilities is output ( = number of ordered levels of the response) for each new observation. Here's a reproducible example hacked from the faraway package that shows a few ways to deal with the problem.
library("MASS") library('faraway') # Some messing around with the nes96 data to coarsen the factor levels # of the response and to change income from an ordered factor to # numeric by replacing factor levels with the midpoint incomes in # each class. The result is still ordered. nes96$sPID <- nes96$PID levels(nes96$sPID) <- c('Dem', 'Dem', 'Ind', 'Ind', 'Ind', 'Rep', 'Rep') incavg <- c(1.5,4,6,8,9.5,10.5,11.5,12.5,13.5,14.5,16,18.5,21,23.5,27.5, 32.5,37.5,42.5,47.5,55,67.5,82.5,97.5,115) nes96$nincome <- incavg[unclass(nes96$income)] # Fit the model: pomod <- polr(sPID ~ age + educ + nincome, data = nes96) # Create a new data frame for prediction ndat <- data.frame(age = 40, educ = 'BAdeg', nincome = mean(nincome)) # Predict: result is a vector rather than a scalar predict(pomod, newdata = ndat, type = 'probs') Dem Ind Rep 0.3787940 0.2663598 0.3548461 # Fails because you're trying to append a new column of length three # to a data frame with one row ndat$predprob <- predict(pomod, newdata = ndat, type = 'probs') Error in `$<-.data.frame`(`*tmp*`, "predprob", value = c(0.378794008534862, : replacement has 3 rows, data has 1 # Better: cbind the predictions to the prediction data frame # Method 1: for one new observation, cbind() creates a three-line data frame cbind(ndat, predprob = predict(pomod, newdata = ndat, type = 'probs')) age educ nincome predprob Dem 40 BAdeg 46.57574 0.3787940 Ind 40 BAdeg 46.57574 0.2663598 Rep 40 BAdeg 46.57574 0.3548461 # Method 2: One-line data frame output for one-line newdata input # Since predict() outputs a numeric vector, first coerce it into a list # and then to a one-line data frame. Then cbind() returns a one line # data frame. cbind(ndat, predprob = as.data.frame(as.list(predict(pomod, newdata = ndat, type = 'probs')))) # age educ nincome predprob.Dem predprob.Ind predprob.Rep #1 40 BAdeg 46.57574 0.378794 0.2663598 0.3548461 # Now try with multiple new observations: ndat2 <- data.frame(age = c(35, 40), educ = c('HS', 'BAdeg'), nincome = mean(nincome)) cbind(ndat2, predprob = predict(pomod, newdata = ndat2, type = 'probs')) age educ nincome predprob.Dem predprob.Ind predprob.Rep 1 35 HS 46.57574 0.4261833 0.2627280 0.3110888 2 40 BAdeg 46.57574 0.3787940 0.2663598 0.3548461 So method 2 is consistent with what you would get from predicting multiple new observations. HTH, Dennis On Tue, Oct 18, 2011 at 6:49 PM, Xu Jun <junx...@gmail.com> wrote: > Dear R-Help listers, > > I am trying to estimate an proportional odds logistic regression model > (or ordered logistic regression) and then make predictions by > supplying a hypothetical x vector. However, somehow this does not > work. I guess I must have missed something here. I first used the polr > function in the MASS package, and I create a data frame and supply it > to the predict function (see below): > > ############################################################### > myologit <- polr(factor(warm) ~ yr89 + male + white + age + ed + prst, > data=ordwarm2, method=c("logistic")) > yr89 <- c(1) > male <- c(1) > white <- c(1) > age <- c(mean(ordwarm2$age)) > ed <- c(mean(ordwarm2$ed)) > prst <- c(mean(ordwarm2$prst)) > prdata <- data.frame(yr89, male, white, age, ed, prst) > > prdata$rankR <-predict(myologit,newdata=prdata,type="probs") > ################################################################ > > I do not have any problem estimating the model, but when it comes to > the last time (predict), I got the following message: > > Error in `$<-.data.frame`(`*tmp*`, "rankR", value = c(0.124294963178868, : > replacement has 4 rows, data has 1 > > Can anyone help me out with this error here? Thanks a lot! > > Jun Xu, Phd > Assistant Professor > Department of Sociology > Ball State University > > P.S.: below is the detailed output from R > > ################################################################ > > Call: > polr(formula = factor(warm) ~ yr89 + male + white + age + ed + > prst, data = ordwarm2, method = c("logistic")) > > Coefficients: > Value Std. Error t value > yr89 0.523912 0.079899 6.557 > male -0.733309 0.078483 -9.344 > white -0.391140 0.118381 -3.304 > age -0.021666 0.002469 -8.777 > ed 0.067176 0.015975 4.205 > prst 0.006072 0.003293 1.844 > > Intercepts: > Value Std. Error t value > 1|2 -2.4654 0.2389 -10.3188 > 2|3 -0.6309 0.2333 -2.7042 > 3|4 1.2618 0.2340 5.3919 > > Residual Deviance: 5689.825 > AIC: 5707.825 > >> # predictions for hypothetical data points >> # white male in year 89 >> yr89 <- c(1) >> male <- c(1) >> white <- c(1) >> age <- c(mean(ordwarm2$age)) >> ed <- c(mean(ordwarm2$ed)) >> prst <- c(mean(ordwarm2$prst)) >> prdata <- data.frame(yr89, male, white, age, ed, prst) >> >> prdata$rankR <-predict(myologit,newdata=prdata,type="probs") > Error in `$<-.data.frame`(`*tmp*`, "rankR", value = c(0.124294963178868, : > replacement has 4 rows, data has 1 >> >> prdata > yr89 male white age ed prst > 1 1 1 1 44.93546 12.21805 39.58526 > ###############################################################3 > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.