Hello again all, I am responding to my own earlier post about a "non-conformable arguments" error with the predict() function of the pls package ( http://cran.r-project.org/web/packages/pls/) in R 2.13.0 (running in Ubuntu 10.10).
I believe I have narrowed down the cause of the error. My new understanding is that if the test data to be predicted using a regression model (where the test data is passed in as 'newdata' to the predict() function) does not contain all possible levels of factors in the training data then the predict() function returns a "non-conformable arguments" error. However, this seems like an odd behaviour to me. Surely not all new data for which the dependent variable(s) are to be predicted will contain all levels of a factor present in the training data. Can someone shed some light on why the predict() function of the pls package has this behaviour? And how to avoid it if possible in a way that doesn't involve users having to insert dummy values in new data? Thanks, Alison On Mon, Apr 18, 2011 at 6:18 PM, Alison Callahan <alison.calla...@gmail.com>wrote: > Hello all, > > I have generated a principal components regression model using the pcr() > function from the PLS package (R version 2.13.0). I am getting a > "non-conformable arguments" error when I try to use the predict() function > on new data, but only when I try to read in the new data from a separate > file. > > More specifically, when my data looks like this > > #########training data #1################# > > var1 var2 var3 response train > 1 2 type1 33 > TRUE > 2 23 type2 44 TRUE > ..... > ....... > 18 11 type1 45 > FALSE > > > and I use the predict() function from the PLS package as in the example > from http://rss.acs.unt.edu/Rdoc/library/pls/html/predict.mvr.html, e.g. > > ################################### > mydata <- read.csv("mydata.csv", header=TRUE) > > mydata <- data.frame(mydata) > > pcrmodel <- pcr(response ~ var1+var2+var3, data = mydata[mydata$train,]) > > predict(pcrmodel, type = "response", newdata = mydata[!mydata$train,]) > > ################################### > > the code works, and the model predicts new values for the "response" > variable rows where train=FALSE. > > However, as soon as I put the rows where train = FALSE into a separate file > and remove the "train" column so that my training data looks like this: > > #########training data #2 ################ > var1 var2 var3 response > 1 2 type1 33 > 2 23 type2 44 > ..... > > > and my new test data, saved in a separate file (say "newdata.csv") looks > like this > > ########test data in separate file, newdata.csv ############### > var1 var2 var3 response > 3 5 type1 23 > 4 7 type2 30 > ..... > 18 11 type1 45 > > if I train a PCR model using the training data #2 and try to predict with > the resulting model and the data from "newdata.csv", e.g., > > ################################## > trainingdata <- read.csv("mydata_without_train_column.csv", header=TRUE) > > trainingdata <- data.frame(trainingdata) > > testingdata <- read.csv("newdata.csv", header=TRUE) > > testingdata <- data.frame(testingdata) > > pcrmodel2 <- pcr(response ~ var1+var2+var3, data = trainingdata) > > predict(pcrmodel, type = "response", newdata = testingdata) > ############################## > > I get the following error: > > "Error in newX %*% B : non-conformable arguments" > > I don't understand why I get this error only when I put the non-training > data into a separate file from the training data and load it as a separate > object. Any help is appreciated, > > Alison > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.