On Sun, 2008-09-28 at 19:26 -0600, Darin Brooks wrote: > I certainly appreciate your comments, Bert. It is abundantly clear that I > won't be invited to any of the cocktail parties hosted by the "polite > circles". I am not a statistician. I am merely a geographer (in the field > of ecology) trying to develop a predictor to assist in a forestry-based > decision making process. My work in the natural world has taught me that > NOTHING is predictable ... and the very idea of a bullet-proof ecological > predictive model is doomed to fail. > That said, there ARE some basic predictors that assist foresters in their > salvage decisions. They use these on a daily basis. The problem is that > most of the evidence and modeling is anecdotal. There really are no models > in the field that I am working in. And for good reason ... The natural > world isn't interested in being modeled. I think we can all agree on this - > guru or not.
Hi Darin, As an ecologist myself, I think you overstate things a bit here. Clearly there are features of the "ecological" world out there that follow "rules" --- otherwise we might as well consign the whole branch of theoretical ecology to the bin. These things can be modelled, but we are often looking for a relatively small signal in a whole load of noise. You really do need to "model" your system in order to make predictions about it. How you go about the "modelling" is another matter. I think you may be better off with some of the more algorithm-centric data mining methods that are currently the rage in some quarters of ecology (predicting climate change effects on species +/-, change in range etc); things like regression/classification trees and randomForest, boosting etc. Names to look out for in this literature are JR Leathwick, Antoine Guisan, Miguel B Araujo and J Elith. You'll find a lot of work looking at these modern methods in these authors' work, and that of others. These methods have less statistical theoretical underpinnings, but can be evaluated on how well they make predictions. Which is often the whole point of doing the analysis. > But even the most basic predictive model (using only the GIS/mappable data > that is readily available to most users) is a starting point. The resultant > dataset(s) of this potential model will be followed-up and field verified. > Providing this simple starting point (or catalyst if you will)could > potentially save A LOT of time and money. > What I need to do is to isolate the best available variables into a model > and assign a confidence to it. It doesn't have to change everyone's world > ... it just has to change the way of thinking in my small little world. > These past few days have been an education for me in the subject of stepwise > regression. I approach it with much more apprehension now. So if nothing > else good comes of this discussion/exercise/experience ... I've learned > something. I too would like to thank the contributors to this thread --- very informative! All the best, G > > Darin Brooks > > -----Original Message----- > From: Bert Gunter [mailto:[EMAIL PROTECTED] > Sent: Sunday, September 28, 2008 6:26 PM > To: 'David Winsemius'; 'Darin Brooks' > Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] > Subject: RE: [R] FW: logistic regression > > > The Inferno awaits me -- but I cannot resist a comment (but DO look at > Frank's website). > > There is a deep and disconcerting dissonance here. Scientists are > (naturally) interested in getting at mechanisms, and so want to know which > of the variables "count" and which do not. But statistical analysis -- > **any** statistical analysis -- cannot tell you that. All statistical > analysis can do is build models that give good predictions (and only over > the range of the data). The models you get depend **both** on the way Nature > works **and** the peculiarities of your data (which is what Frank referred > to in his comment on data reduction). In fact, it is highly likely that with > your data there are many alternative prediction equations built from > different collections of covariates that perform essentially equally well. > Sometimes it is otherwise, typically when prospective, carefully designed > studies are performed -- there is a reason that the FDA insists on clinical > trials, after all (and reasons why such studies are difficult and expensive > to do!). > > The belief that "data mining" (as it is known in the polite circles that > Frank obviously eschews) is an effective (and even automated!) tool for > discovering how Nature works is a misconception, but one that for many > reasons is enthusiastically promoted. If you are looking only to predict, > it may do; but you are deceived if you hope for Truth. Can you get hints? -- > well maybe, maybe not. Chaos beckons. > > I think many -- maybe even most -- statisticians rue the day that stepwise > regression was invented and certainly that it has been marketed as a tool > for winnowing out the "important" few variables from the blizzard of > "irrelevant" background noise. Pogo was right: " We have seen the enemy -- > and it is us." > > (As I said, the Inferno awaits...) > > Cheers to all, > Bert Gunter > > DEFINITELY MY OWN OPINIONS HERE! > > > > -----Original Message----- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On > Behalf Of David Winsemius > Sent: Saturday, September 27, 2008 5:34 PM > To: Darin Brooks > Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] > Subject: Re: [R] FW: logistic regression > > It's more a statement that it expresses a statistical perspective very > succinctly, somewhat like a Zen koan. Frank's book,"Regression Modeling > Strategies", has entire chapters on reasoned approaches to your question. > His website also has quite a bit of material free for the taking. > > -- > David Winsemius > Heritage Laboratories > > On Sep 27, 2008, at 7:24 PM, Darin Brooks wrote: > > > Glad you were amused. > > > > I assume that "booking this as a fortune" means that this was an > > idiotic way to model the data? > > > > MARS? Boosted Regression Trees? Any of these a better choice to > > extract significant predictors (from a list of about 44) for a > > measured dependent variable? > > > > -----Original Message----- > > From: [EMAIL PROTECTED] > > [mailto:[EMAIL PROTECTED] > > ] On > > Behalf Of Ted Harding > > Sent: Saturday, September 27, 2008 4:30 PM > > To: [EMAIL PROTECTED] > > Subject: Re: [R] FW: logistic regression > > > > > > > > On 27-Sep-08 21:45:23, Dieter Menne wrote: > >> Frank E Harrell Jr <f.harrell <at> vanderbilt.edu> writes: > >> > >>> Estimates from this model (and especially standard errors and > >>> P-values) > >>> will be invalid because they do not take into account the stepwise > >>> procedure above that was used to torture the data until they > >>> confessed. > >>> > >>> Frank > >> > >> Please book this as a fortune. > >> > >> Dieter > > > > Seconded! > > Ted. > > > > -------------------------------------------------------------------- > > E-Mail: (Ted Harding) <[EMAIL PROTECTED]> > > Fax-to-email: +44 (0)870 094 0861 > > Date: 27-Sep-08 Time: 23:30:19 > > ------------------------------ XFMail ------------------------------ > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > No virus found in this incoming message. > > Checked by AVG - http://www.avg.com > > > > 6:55 PM > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > No virus found in this incoming message. > Checked by AVG - http://www.avg.com > > 1:11 PM > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.