On 09.02.2012 22:39, Yang Zhang wrote:
I always bump into a few (very minor) problems when building model matrices with e.g.: train = model.matrix(label~., read.csv('train.csv')) target = model.matrix(label~., read.csv('target.csv')) (1) The two may have different factor levels, yielding different matrices. I usually first rbind the data frames together to "meld" the factors, and then split them apart and matrixify them.
You can preprocess the data and explicitly define the levels for factor variables in your data.frames.
(2) The target set that I'm predicting on typically doesn't have labels. I usually manually append dummy labels to the target data frame.
R cannot know labels if you do not provide any.
(3) I almost always remove the Intercept from the model matrices, since it seems to always be redundant (I usually use caret).
Then change your model formula to: "label ~ . - 1". But note the interpretation changes and it is *not* redundant in general.
Uwe Ligges
None of these is a big deal at all, but I'm just curious if I'm missing something simple in how I'm doing things. Thanks.
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.