Hello Gerrit,
Thanks for the explanation. Let me give a specific example. Assume Temp (column 4) is the output and the rest of the columns are input is the training features. Note that I only use the air quality data for illustration purpose. T input->output mapping may not make sense in the real interpretation of this data. library(e1071) data(airquality) mytable=airquality colnames(mytable)=c('a','b','c','d','e','f') modelSVM1=svm(mytable[,6] ~ .,data=mytable) modelSVM2=svm(mytable[,-6],mytable[,6]) modelSVM3=svm(f ~ ., data=mytable) predSVM1=predict(modelSVM1,newdata=mytable) predSVM2=predict(modelSVM2,newdata=mytable[,-6]) predSVM3=predict(modelSVM3,newdata=mytable) Results of predSVM2 is similar with predSVM3 but different from predSVM1. Question: Which is the correct formulation? Why R doesn't detect error/discrepancy in formulation? If I use the same formulation with rpart using the same data: library(rpart) data(airquality) mytable=airquality colnames(mytable)=c('a','b','c','d','e','f') modelRP1=rpart(mytable[,6]~.,data=mytable,method='anova') # this works modelRP3=rpart(f ~ ., data=mytable,method='anova') # this works predRP1=predict(modelRP1,newdata=mytable) predRP3=predict(modelRP3,newdata=mytable) The results between predRP1 and predRP3 are different while the statements: predRP2=predict(modelRP2,newdata=mytable[,-6]) modelRP2=rpart(mytable[,-6],mytable[,6],method='anova') have errors. _____________________ From: Gerrit Eichner <gerrit.eich...@math.uni-giessen.de> To: Paulito Palmes <ppal...@yahoo.com> Cc: "r-help@r-project.org" <r-help@r-project.org> Sent: Wednesday, 11 September 2013, 10:48 Subject: Re: [R] Formula in a model Hello, Paulito, first, I think you haven't received an answer yet because you did not "provide commented, minimal, self-contained, reproducible code" as the posting guide does request it from you. Second, see inline below. On Wed, 11 Sep 2013, Paulito Palmes wrote: > Hi, > > I have a data.frame with dimension 336x336 called *training*, and > another one called *observation* which is 336x1. I combined them as one > table using table=data.frame(training, observation). table now has > 336x337 dimension with the last column as the observation to learn using > the training data of the rest of the column in the table. For > prediction, i combined the testing data and observation and pass it like > predict(model,testingWTesingObservation) > > > I've used the formula: rpart(table[,337] ~ ., data=table) or > svm(table[,337] ~ ., data=table). I am not familiar with rpart() nor with svm() but "table[,337] ~ ., data = table" has the consequence that table[,337] is also in the right hand side of the formula, so that your "observations" are also in the "training" data. That doesn't seem to make sense to me, and is different from the call to svm() below. Hth -- Gerrit > I recently discovered that this formulation produces different model > from the: svm(training, observation) formulation. Which is correct and > why one of them is not correct? I thought that syntactically, both are > the same. I hope that R should be able to detect the error in one of the > formulation to avoid the possibility of using it. > > Regards, > Paul > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.