Christopher,
thanks for you interest.
I'm currently exploring a dataset with the help of conditional inference
trees (still very much a beginner with this technique & log. reg.
methods as a whole t.b.h.), since they explained more variation in my
dataset than a binary logistic regression with /glm/. I started out with
the /party /package, but after I while I ran into the 'updated'
/partykit /package and tried this out, too.
If you want to use individual trees (as opposed to forests), then the
"partykit" package is recommended because it contains much improved
re-implementations of ctree() and mob() as well as the mob() convenience
interfaces lmtree() and glmtree(). For forests see below.
Now, the strange thing is that both trees look quite different -
actually even the very first split is different.
This might be due to several partitioning variables being associated with
tiny p-values in the root node. The re-implementation in partykit
internally computes with log-p-values and hence should be numerically more
stable. In the old implementation it could happen that from several highly
significant variables, always the first is chosen because the p-values
were essentially indistinguishable for the computer.
If you think that this is not the problem, then please contact the package
maintainer with a reproducible example.
Except for bug fixes like the one above, the trees grown by
partykit::ctree and party::ctree should be the same.
So I did some research and came across the 'forest' concept. However, it
seems that the /varImp /function does not yet work in the /partykit
/implementation,
Correct. While the ctree() implementation in partykit is better than that
in party, the same is _not_ true for cforest(). The new partykit::cforest
is currently still a basic implementation which doesn't offer as many
features as the party::cforest implementation. More work is needed
especially for variable importance measures and different kinds of
predictions.
which raises the question for me how I should evaluate the /partykit
/forest - how can I find out whether the variables are important in the
forest as in my /partykit /tree? Is there some way to do this or some
other solution for this problem? I'd prefer to continue the /partykit
/implementation of ctree, since it allows more settings for the final
plot, which I'd need to get the final (large) plot into a readable form.
Related to this project, I'd also like to give statistics for the overall
model, e.g. overall significance, Nagelkerke's R², a C-value. After a
'regular' binary log. reg., I would use the lrm function to get these
values, but I am unsure whether it would be correct to also apply this
method to my tree data.
Overall significance is difficult because you have done model selection
when growing the tree. As for pseudo R-squared or information criteria
etc., it is relatively easy to compute these "by hand" based on the
observed and fitted responses. An example for this is provided at:
http://stackoverflow.com/questions/29524670/how-to-find-the-the-deviance-of-an-as-party-object-converted-from-rpart-tree-in/29693223#29693223
Any help would be greatly appreciated!
-- Christopher
--
View this message in context:
http://r.789695.n4.nabble.com/Trees-and-Forests-with-packages-party-vs-partykit-Different-results-tp4712214.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.