Re: [R] Question about rpart(sth~.,database)

Gavin Simpson Sun, 19 Apr 2009 04:16:47 -0700

Grześ wrote:

I have a standard database - HouseVotes84For example:

       Class   V1 V2 V3   V4 V5 V6 V7 V8 V9 V10  V11 V12 V13 V14 V15  V16
1 republican    n  y  n    y  y  y  n  n  n   y <NA>   y   y   y   n    y
2 republican    n  y  n    y  y  y  n  n  n   n    n   y   y   y   n <NA>
3 democrat <NA>  y  y <NA>  y  y  n  n  n   n    y   n   y   y   n    n
     .
     .
     .

end I build a tree like this:

hv.tree1=rpart(Class~.,HouseVotes84)

everything is ok! My question is:

What exactly mean "Class~.,"?

It means include all remaining variables in HouseVotes84 on the rhs ofthe formula, i.e. as variables that should be used to predict the Classvariable.


Why when I use "Class~.," - then I get the best solution but when I use as a
parameter like this:

hv.tree2=rpart(V2~.,HouseVotes84)

Why does this surprise you? You are now trying to predict the variableV2 (y/n) from Class and all remaining variables.

I also get solution but not such good like before.


They are solutions to two different problems.

If you want to predict Class, then you need

Class ~ ., data = HouseVotes84

or, to specify exactly which variables to use as predictors of Class,state them explicitly:


Class ~ V1 + V3 + V4, data = HouseVotes84

I think you should look at the documentation that comes with R (AnIntroduction to R) or some of the contributed help documents on the RWebsite to read up on model formulae and how to represent models usingthis notation.


HTH

G

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Question about rpart(sth~.,database)

Reply via email to