Thanks for clarifying it Frank. (Yes, no univariate screening prior to feeding the model to validate. And the "bias correct" I guess is from my spanglish factory of new terminology ;-)
Best, R. On Fri, Feb 12, 2010 at 6:26 PM, Frank E Harrell Jr <f.harr...@vanderbilt.edu> wrote: > Ramon Diaz-Uriarte wrote: >> >> Frank, let me make sure I understand: >> >> >> >> On Fri, Feb 12, 2010 at 5:57 PM, Frank E Harrell Jr >> <f.harr...@vanderbilt.edu> wrote: >>> >>> Ramon Diaz-Uriarte wrote: >>>> >>>> Dear Frank, >>>> >>>> Thanks a lot for your response. And apologies for the question, >>>> because the answer was obviously in the help. >>>> >>>> As for the caveats on selection: yes, thanks. I think I am actually >>>> closely following your book (eg., pp. 249 to 253), and one of the >>>> points I am trying to make to my colleagues is that by doing variable >>>> selection, we are actually getting a worse model (as evidenced by the >>>> bias-corrected AUC, which is smaller if attempting variable >>>> selection). >>>> >>>> >>>> Best, >>>> >>>> R. >>> >>> Thanks Ramon. >>> >>> Bias-corrected measures need to be penalized for all variable selection >>> steps and for univariable screening. When the penalization is complete, >>> you >>> usually see worse model performance as compared with full model fits, as >>> you >>> wrote. >>> >> >> I thought that by using validate, and starting from the original >> (non-screened) model and using "bw = TRUE" in the call to validate, >> the bias-corrected statistics already include that penalization. After >> all, for each one of the bootstrap iterations, the selection process >> is carried out only with the in-bag bootstrap sample, but the "test" >> is conducted with the out-of-bag sample. So my understanding was that >> using the Dxy under the "corrected index" column I had accounted for >> the screening involved in the variable selection. >> >> >> Thanks, >> >> R. > > Ramon, > > Yes you have it right, assuming there was no univariable or other screening > done that bw=TRUE would not know about. [Note that test and training > samples overlap with the ordinary bootstrap procedure though.] I wasn't > familiar with "bias correct AIC" and assumed that came from another > function. validate() produces the proper corrected indexes for the indexes > it prints. > > Frank > >> >> >> >> >>> Cheers >>> Frank >>> >>>> >>>> >>>> >>>> >>>> On Fri, Feb 12, 2010 at 3:13 PM, Frank E Harrell Jr >>>> <f.harr...@vanderbilt.edu> wrote: >>>>> >>>>> Ramon Diaz-Uriarte wrote: >>>>>> >>>>>> Dear All, >>>>>> >>>>>> For logistic regression models: is it possible to use validate (rms >>>>>> package) to compute bias-corrected AUC, but have variable selection >>>>>> with AIC use step (or stepAIC, from MASS), instead of fastbw? >>>>>> >>>>>> >>>>>> More details: >>>>>> >>>>>> I've been using the validate function (in the rms package, by Frank >>>>>> Harrell) to obtain, among other things, bootstrap bias-corrected >>>>>> estimates of the AUC, when variable selection is carried out (using >>>>>> AIC as criterion). validate calls predab.resample, which in turn calls >>>>>> fastbw (from the Design package, by Harrell). fastbw " Performs a >>>>>> slightly inefficient but numerically stable version of fast backward >>>>>> elimination on factors, using a method based on Lawless and Singhal >>>>>> (1978). This method uses the fitted complete model (...)". However, I >>>>>> am finding that the models returned by fastbw are much smaller than >>>>>> those returned by stepAIC or step (a simple example is shown below), >>>>>> probably because of the approximation and using the complete model. >>>>>> >>>>>> I'd like to use step instead of fastbw. I think this can be done by >>>>>> hacking predab.resample in a couple of places but I am wondering if >>>>>> this is a bad idea (why?) or if I am reinventing the wheel. >>>>>> >>>>>> >>>>>> Best, >>>>>> >>>>>> R. >>>>>> >>>>>> >>>>>> P.S. Simple example of fastbw compared to step: >>>>>> >>>>>> library(MASS) ## for stepAIC and bwt data >>>>>> example(birthwt) >>>>>> library(rms) >>>>>> >>>>>> bwt.glm <- glm(low ~ ., family = binomial, data = bwt) >>>>>> bwt.lrm <- lrm(low ~ ., data = bwt) >>>>>> >>>>>> step(bwt.glm) >>>>>> ## same as stepAIC(bwt.glm) >>>>>> >>>>>> fastbw(bwt.lrm) >>>>> >>>>> Hi Ramon, >>>>> >>>>> By default fastbw uses type='residual' to compute test statistics on >>>>> all >>>>> deleted variables combined. Use type='individual' to get the behavior >>>>> in >>>>> step. In your example fastbw(..., type='ind') gives the same model as >>>>> step() and comes surprisingly close to estimating the MLEs without >>>>> refitting. Of course you refit the reduced model to get MLEs. Both >>>>> true >>>>> and approximate MLEs are biased by the variable selection so beware. >>>>> type= >>>>> can be passed from calibrate or validate to fastbw. >>>>> >>>>> Note that none of the statistics computed by step or fastbw were >>>>> designed >>>>> to >>>>> be used with more than two completely pre-specified models. Variable >>>>> selection is hazardous both to inference and to prediction. There is no >>>>> free >>>>> lunch; we are torturing data to confess its own sins. >>>>> >>>>> Frank >>>>> > -- Ramon Diaz-Uriarte Structural Biology and Biocomputing Programme Spanish National Cancer Centre (CNIO) http://ligarto.org/rdiaz Phone: +34-91-732-8000 ext. 3019 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.