Package plyr is designed for this sort of thing, but functions split() and unsplit() will work as well. This example just uses a simple lm() model:
> data(iris) > iris <- iris[(iris$Species=="setosa" | iris$Species=="versicolor"),] > set.seed(42) > irisindex <- sample(1:nrow(iris), nrow(iris)) > iris <- iris[irisindex,] > iris$Species <- factor(iris$Species) # Eliminate empty level virginica > iris2 <- split(iris, iris$Species) # List with two data.frames > results <- lapply(iris2, function(x) lm(Sepal.Length ~ Sepal.Width + + Petal.Length + Petal.Width, x)) > fit <- lapply(results, predict) > iris3 <- lapply(names(iris2), function(x) data.frame(iris2[[x]], fitted=fit[[x]])) > iris4 <- unsplit(iris3, iris$Species) > head(iris4) Sepal.Length Sepal.Width Petal.Length Petal.Width Species fitted 92 6.1 3.0 4.6 1.4 versicolor 6.283549 93 5.8 2.6 4.0 1.2 versicolor 5.719649 29 5.2 3.4 1.4 0.2 setosa 4.961338 81 5.5 2.4 3.8 1.1 versicolor 5.528532 62 5.9 3.0 4.2 1.5 versicolor 5.852292 50 5.0 3.3 1.4 0.2 setosa 4.895855 ---------------------------------------------- David L Carlson Associate Professor of Anthropology Texas A&M University College Station, TX 77843-4352 > -----Original Message----- > From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- > project.org] On Behalf Of Brian Feeny > Sent: Monday, December 10, 2012 4:41 PM > To: r-help@r-project.org > Subject: [R] splitting dataset based on variable and re-combining > > > I have a dataset and I wish to use two different models to predict. > Both models are SVM. The reason for two different models is based > on the sex of the observation. I wish to be able to make predictions > and have the results be in the same order as my original dataset. To > illustrate I will use iris: > > # Take Iris and create a dataframe of just two Species, setosa and > versicolor, shuffle them > data(iris) > iris <- iris[(iris$Species=="setosa" | iris$Species=="versicolor"),] > irisindex <- sample(1:nrow(iris), nrow(iris)) > iris <- iris[irisindex,] > > # Make predictions on setosa using the mySetosaModel model, and on > versicolor using the myVersicolorModel: > > predict(mySetosaModel, iris[iris$Species=="setosa",]) > predict(myVersicolorModel, iris[iris$Species=="versicolor",]) > > The problem is this will give me a vector of just the setosa results, > and then one of just the versicolor results. > > I wish to take the results and have them be in the same order as the > original dataset. So if the original dataset had: > > > Species > setosa > setosa > versicolor > setosa > versicolor > setosa > > I wish for my results to have: > <prediction for setosa> > <prediction for setosa> > <prediction for versicolor> > <prediction for setosa> > <prediction for versicolor> > <prediction for setosa> > > But instead, what I am ending up with is two result sets, and no way I > can think of to combine them. I am sure this comes up alot where you > have a factor you wish to split your models on, say sex (male vs. > female), and you need to present the results back so it matches to the > order of the orignal dataset. > > I have tried to think of ways to use an index, to try to keep things in > order, but I can't figure it out. > > Any help is greatly appreciated. > > Brian > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.