Re: [R] Trees (and Forests) with packages 'party' vs. 'partykit': Different results

Achim Zeileis Mon, 14 Sep 2015 07:56:30 -0700

Christopher,

thanks for you interest.

I'm currently exploring a dataset with the help of conditional inferencetrees (still very much a beginner with this technique & log. reg.methods as a whole t.b.h.), since they explained more variation in mydataset than a binary logistic regression with /glm/. I started out withthe /party /package, but after I while I ran into the 'updated'/partykit /package and tried this out, too.

If you want to use individual trees (as opposed to forests), then the"partykit" package is recommended because it contains much improvedre-implementations of ctree() and mob() as well as the mob() convenienceinterfaces lmtree() and glmtree(). For forests see below.

Now, the strange thing is that both trees look quite different -actually even the very first split is different.

This might be due to several partitioning variables being associated withtiny p-values in the root node. The re-implementation in partykitinternally computes with log-p-values and hence should be numerically morestable. In the old implementation it could happen that from several highlysignificant variables, always the first is chosen because the p-valueswere essentially indistinguishable for the computer.

If you think that this is not the problem, then please contact the packagemaintainer with a reproducible example.

Except for bug fixes like the one above, the trees grown bypartykit::ctree and party::ctree should be the same.

So I did some research and came across the 'forest' concept. However, itseems that the /varImp /function does not yet work in the /partykit/implementation,

Correct. While the ctree() implementation in partykit is better than thatin party, the same is _not_ true for cforest(). The new partykit::cforestis currently still a basic implementation which doesn't offer as manyfeatures as the party::cforest implementation. More work is neededespecially for variable importance measures and different kinds ofpredictions.

which raises the question for me how I should evaluate the /partykit/forest - how can I find out whether the variables are important in theforest as in my /partykit /tree? Is there some way to do this or someother solution for this problem? I'd prefer to continue the /partykit/implementation of ctree, since it allows more settings for the finalplot, which I'd need to get the final (large) plot into a readable form.
Related to this project, I'd also like to give statistics for the overall
model, e.g. overall significance, Nagelkerke's R², a C-value. After a
'regular' binary log. reg., I would use the lrm function to get these
values, but I am unsure whether it would be correct to also apply this
method to my tree data.

Overall significance is difficult because you have done model selectionwhen growing the tree. As for pseudo R-squared or information criteriaetc., it is relatively easy to compute these "by hand" based on theobserved and fitted responses. An example for this is provided at:

http://stackoverflow.com/questions/29524670/how-to-find-the-the-deviance-of-an-as-party-object-converted-from-rpart-tree-in/29693223#29693223

Any help would be greatly appreciated!

-- Christopher



--
View this message in context: 
http://r.789695.n4.nabble.com/Trees-and-Forests-with-packages-party-vs-partykit-Different-results-tp4712214.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Trees (and Forests) with packages 'party' vs. 'partykit': Different results

Reply via email to