Re: [R] Formula in a model

Paulito Palmes Wed, 11 Sep 2013 07:29:44 -0700


Hello Gerrit,

Thanks for the explanation. Let me give a specific example.

Assume Temp (column 4) is the output and the rest of the columns are input is 
the training features. Note that I only use the air quality data for 
illustration purpose. T input->output mapping may not make sense in the real 
interpretation of this data.

library(e1071)

data(airquality)
mytable=airquality

colnames(mytable)=c('a','b','c','d','e','f')

modelSVM1=svm(mytable[,6] ~ .,data=mytable)
modelSVM2=svm(mytable[,-6],mytable[,6])
modelSVM3=svm(f ~ ., data=mytable)

predSVM1=predict(modelSVM1,newdata=mytable)
predSVM2=predict(modelSVM2,newdata=mytable[,-6])
predSVM3=predict(modelSVM3,newdata=mytable)

Results of predSVM2 is similar with predSVM3  but different from predSVM1.

Question: Which is the correct formulation? Why R doesn't detect 
error/discrepancy in formulation?

If I use the same formulation with rpart using the same data:

library(rpart)

data(airquality)
mytable=airquality

colnames(mytable)=c('a','b','c','d','e','f')

modelRP1=rpart(mytable[,6]~.,data=mytable,method='anova') # this works
modelRP3=rpart(f ~ ., data=mytable,method='anova') # this works

predRP1=predict(modelRP1,newdata=mytable)
predRP3=predict(modelRP3,newdata=mytable)

The results between predRP1 and predRP3 are different while the statements:

predRP2=predict(modelRP2,newdata=mytable[,-6])
modelRP2=rpart(mytable[,-6],mytable[,6],method='anova') 

have errors.

_____________________
From: Gerrit Eichner <gerrit.eich...@math.uni-giessen.de>
To: Paulito Palmes <ppal...@yahoo.com> 
Cc: "r-help@r-project.org" <r-help@r-project.org> 
Sent: Wednesday, 11 September 2013, 10:48
Subject: Re: [R] Formula in a model

Hello, Paulito,

first, I think you haven't received an answer yet because you did not 
"provide commented, minimal, self-contained, reproducible code" as the 
posting guide does request it from you.

Second, see inline below.

On Wed, 11 Sep 2013, Paulito Palmes wrote:

> Hi,
>
> I have a data.frame with dimension 336x336 called *training*, and 
> another one called *observation* which is 336x1. I combined them as one 
> table using table=data.frame(training, observation). table now has 
> 336x337 dimension with the last column as the observation to learn using 
> the training data of the rest of the column in the table. For 
> prediction, i combined the testing data and observation and pass it like 
> predict(model,testingWTesingObservation)
>
>
> I've used the formula: rpart(table[,337] ~ ., data=table) or 
> svm(table[,337] ~ ., data=table).

I am not familiar with rpart() nor with svm() but "table[,337] ~ ., data = 
table" has the consequence that table[,337] is also in the right hand side 
of the formula, so that your "observations" are also in the "training" 
data. That doesn't seem to make sense to me, and is different from the 
call to svm() below.

  Hth  --  Gerrit

> I recently discovered that this formulation produces different model 
> from the: svm(training, observation) formulation. Which is correct and 
> why one of them is not correct? I thought that syntactically, both are 
> the same. I hope that R should be able to detect the error in one of the 
> formulation to avoid the possibility of using it.
>
> Regards,
> Paul
>     [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Formula in a model

Reply via email to