Thank you for all your suggestions. I will start with the chapter. Annie
On Thu, Sep 3, 2009 at 1:50 PM, Don McKenzie <d...@u.washington.edu> wrote: > Frank may be too modest to suggest it, but a great place to start that > reading is in his book "Regression Modeling Strategies" chapter 4. > On Sep 3, 2009, at 1:45 PM, Frank E Harrell Jr wrote: > > You'll need to do a huge amount of background reading first. These > stepwise options do not incorporate penalization. > > Frank > > annie Zhang wrote: > > Hi, Frank, > If I want to do prediction as well as to select important predictors, > which may be the best function to use when I have 35 samples and 35 > predictors (penalized logistic with variable selection)? I saw there is a > 'fastbw' function in the Design package. And there is a 'step.plr' function > in the 'stepPlr' package. > Thank you, > Annie > On Thu, Sep 3, 2009 at 10:11 AM, Frank E Harrell Jr < > f.harr...@vanderbilt.edu > <mailto:f.harr...@vanderbilt.edu<f.harr...@vanderbilt.edu>>> > wrote: > annie Zhang wrote: > Thank you for all your reply. > Actually as Bert said, besides predicion, I also need variable > selection (I need to know which variables are important). As far > as the sample size and number of variables, both of them are > small around 35. How can I get accurate prediction as long as > good predictors? > Annie > It is next to impossible to find a unique list of 'important' > variables without having 50 times as many subjects as potential > predictors, unless your signal:noise ratio is stunning. > Frank > On Thu, Sep 3, 2009 at 8:28 AM, Bert Gunter > <gunter.ber...@gene.com > <mailto:gunter.ber...@gene.com<gunter.ber...@gene.com> > > > <mailto:gunter.ber...@gene.com <gunter.ber...@gene.com> < > mailto:gunter.ber...@gene.com <gunter.ber...@gene.com>>>> > wrote: > But let's be clear here folks: > Ben's comment is apropos: ""As many variables as samples" is > particularly > scary." > (Aside -- how much scarier then are -omics analyses in which > the > number of > variables is thousands of times the number of samples?) > Sensible penalization (it's usually not too sensitive to the > details) is > only another way of obtaining a parsimonious model with good > (in the > sense > of minimizing overall prediction error: bias + variance) > prediction > properties. Alas, this is often not what scientists want: > they use > variable > selection to find the "right" covariates, the "most > important" variables > affecting the response. But this is beyond the power of > empirical > modeling > here: "as many variables as samples" almost guarantees that > there > will be > many different and even nonoverlapping subsets of variables > that > are, within > statistical noise, equally "optimal" predictors. That is, > variable > selection > in such circumstances is just a pretty sophisticated random > number > generator > -- ergo Frank's Draconian warnings. Penalization produces > better > prediction > engines with better properties, but it cannot overcome the > "as many > variables as samples" problem either. Entropy rules. If what is > sought is a > way to determine the "truly important" variables, then the > study must be > designed to provide the information to do so. You don't get > something for > nothing. > Cheers, > Bert Gunter > Genentech Nonclinical Biostatistics > -----Original Message----- > From: r-help-boun...@r-project.org > <mailto:r-help-boun...@r-project.org<r-help-boun...@r-project.org> > > > <mailto:r-help-boun...@r-project.org<r-help-boun...@r-project.org> > <mailto:r-help-boun...@r-project.org<r-help-boun...@r-project.org> > >> > [mailto:r-help-boun...@r-project.org<r-help-boun...@r-project.org> > <mailto:r-help-boun...@r-project.org<r-help-boun...@r-project.org> > > > <mailto:r-help-boun...@r-project.org<r-help-boun...@r-project.org> > <mailto:r-help-boun...@r-project.org<r-help-boun...@r-project.org>>>] > On > Behalf Of Frank E Harrell Jr > Sent: Wednesday, September 02, 2009 9:07 PM > To: annie Zhang > Cc: r-help@r-project.org > <mailto:r-help@r-project.org<r-help@r-project.org> > > > <mailto:r-help@r-project.org <r-help@r-project.org> < > mailto:r-help@r-project.org <r-help@r-project.org>>> > Subject: Re: [R] variable selection in logistic > annie Zhang wrote: > > Hi, Frank, > > > > You mean the backward and forward stepwise selection is > bad? You also > > suggest the penalized logistic regression is the best > choice? Is > there > > any function to do it as well as selecting the best penalty? > > > > Annie > All variable selection is bad unless its in the context of > penalization. > You'll need penalized logistic regression not necessarily with > variable selection, for example a quadratic penalty as in a > case study > in my book, or an L1 penalty (lasso) using other packages. > Frank > > > > On Wed, Sep 2, 2009 at 7:41 PM, Frank E Harrell Jr > > <f.harr...@vanderbilt.edu > <mailto:f.harr...@vanderbilt.edu <f.harr...@vanderbilt.edu>> > <mailto:f.harr...@vanderbilt.edu <f.harr...@vanderbilt.edu> < > mailto:f.harr...@vanderbilt.edu <f.harr...@vanderbilt.edu>>> > <mailto:f.harr...@vanderbilt.edu <f.harr...@vanderbilt.edu> > <mailto:f.harr...@vanderbilt.edu <f.harr...@vanderbilt.edu>> > <mailto:f.harr...@vanderbilt.edu <f.harr...@vanderbilt.edu> > <mailto:f.harr...@vanderbilt.edu <f.harr...@vanderbilt.edu>>>>> > wrote: > > > > David Winsemius wrote: > > > > > > On Sep 2, 2009, at 9:36 PM, annie Zhang wrote: > > > > Hi, R users, > > > > What may be the best function in R to do > variable > selection > > in logistic > > regression? > > > > > > PhD theses, and books by famous statisticians have > been > pursuing > > the answer to that question for decades. > > > > I have the same number of variables as the > number of > samples, > > and I want to select the best variablesfor > prediction. Is > > there any function > > doing forward selection followed by backward > elimination in > > stepwise > > logistic regression? > > > > > > You should probably be reading up on penalized > regression > > methods. The stepwise procedures reporting > unadjusted > > "significance" made available by SAS and SPSS to > the unwary > > neophyte user have very poor statistical properties. > > > > -- > > > > David Winsemius, MD > > > > > > Amen to that. > > > > Annie, resist the temptation. These methods bite. > > > > Frank > > > > > > Heritage Laboratories > > West Hartford, CT > > > > ______________________________________________ > > R-help@r-project.org > <mailto:R-help@r-project.org<R-help@r-project.org> > > > <mailto:R-help@r-project.org <R-help@r-project.org> < > mailto:R-help@r-project.org <R-help@r-project.org>>> > <mailto:R-help@r-project.org <R-help@r-project.org> < > mailto:R-help@r-project.org <R-help@r-project.org>> > <mailto:R-help@r-project.org <R-help@r-project.org> < > mailto:R-help@r-project.org <R-help@r-project.org>>>> > mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > > http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> > <http://www.r-project.org/posting-guide.html> > <http://www.r-project.org/posting-guide.html> > > <http://www.r-project.org/posting-guide.html> > > and provide commented, minimal, self-contained, > reproducible code. > > > > > > > > -- > > Frank E Harrell Jr Professor and Chair > School of > Medicine > > Department of Biostatistics > Vanderbilt > University > > > > > -- > Frank E Harrell Jr Professor and Chair School of > Medicine > Department of Biostatistics Vanderbilt > University > ______________________________________________ > R-help@r-project.org > <mailto:R-help@r-project.org<R-help@r-project.org> > > > <mailto:R-help@r-project.org <R-help@r-project.org> < > mailto:R-help@r-project.org <R-help@r-project.org>>> > mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> > <http://www.r-project.org/posting-guide.html> > <http://www.r-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible > code. > -- Frank E Harrell Jr Professor and Chair School of > Medicine > Department of Biostatistics Vanderbilt > University > > > > -- > Frank E Harrell Jr Professor and Chair School of Medicine > Department of Biostatistics Vanderbilt University > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. > > > > > > Don McKenzie > Research Ecologist > Pacific Wildland Fire Sciences Lab > US Forest Service > > Affiliate Professor > College of Forest Resources and CSES Climate Impacts Group > University of Washington > > phone: 206-732-7824 > cell: 206-321-5966 > d...@u.washington.edu > > > > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.