On Thu, 19 Aug 2010, Gavin Simpson wrote:
On Thu, 2010-08-19 at 13:42 -0700, Kay Cichini wrote:
hello everyone,
i sampled 100 stands at 20 restoration sites and presence of 3 different
invasive plant species.
i came across logistic regression trees and wonder if this is suited for my
purpose - predicting presence of these problematic invasive plant species
(one by one) by a set of recorded ecological / geographical parameters.
i'd be glad if someone would comment on applying this mehtod to such data -
maybe someone could point me useful references.
also, i was not able to find out if there is a package implementing logistic
regression?
Not sure what a logistic regression tree is, but a classification tree
would be useful here: Treat each species as present (== 1) or absent (==
0) and try to fit a tree consisting of a set of splits in X covariates
that minimise a suitable deviance criterion.
If you want to fit all three species at once, try multivariate trees,
but IIRC, they (in package mvpart at least) expect a count-based data
set, i.e. the deviance criterion they used (sum of squares) is probably
not suited to binary type data.
To add to Gavin's comments about the modeling techniques:
ctree() in package "party" supports recursive partitioning of multivariate
responses of arbitrary types (numeric, categorical, censored, etc.).
Function mob() in the same package can also be used for partitioning based
on logistic regressions. See the manual pages for further references.
Also the machine learning and environmentrics task views at
http://CRAN.R-project.org/view=MachineLearning
http://CRAN.R-project.org/view=Environmetrics
have some more pointers.
Z
The one problem I foresee is that you only have 100 data points and even
that number is pseudo replicated as you have multiple samples from just
20 "sites". Trees are unstable at the best of times and work best when
given a lot of data. Boosting, bagging and randomForests can help but
they again work best/well with large data sets. I suppose large will be
relative to the signal to noise ratio in your data.
Ecologically, one needs to consider what a 0 value means (an absence):
was the invasive not present due to the environment being bad or just
because it hasn't got there yet despite environment being good? How you
deal with that is anybody's guess.
Try the R-SIG-Ecology list for further help.
G
thanks in advance,
kay
-----
------------------------
Kay Cichini
Postgraduate student
Institute of Botany
Univ. of Innsbruck
------------------------
--
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Dr. Gavin Simpson [t] +44 (0)20 7679 0522
ECRC, UCL Geography, [f] +44 (0)20 7679 0565
Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/
UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.