Take a look at the validate.lrm function in the rms package.
Note that the use of threshold probabilities results in an improper scoring rule which will mislead you. Also note that you need to repeat 10-fold CV 50-100 times for precision, and that at each repeat you have to start from zero in analyzing associations.
Frank zhu yao wrote:
Hi, everyone: I ask for help about translating a stata program into R. The program perform cross validation as it stated. #1. Randomly divide the data set into 10 sets of equal size, ensuring equal numbers of events in each set #2. Fit the model leaving out the 1st set #3. Apply the fitted model in (2) to the 1st set to obtain the predicted probability of a prostate cancer diagnosis. #4. Repeat steps (2) to (3) leaving out and then applying the fitted model to the ith group, i = 2, 3... 10. Every subject now has a predicted probability of a prostate cancer diagnosis. #5. Using the predicted probabilities, compute the net benefit at various threshold probabilities. #6. Repeat steps (1) to (5) 200 times. The corrected net benefit for each threshold probability is the mean across the 200 replications. ================================================================= First is stata code. forvalues i=1(1)200 { local event="cancer" local predictors1 = "total_psa" local predictors2 = "total_psa free_psa" local prediction1 = "base" local prediction2 = "full" g `prediction1'=. g `prediction2'=. quietly g u = uniform() sort `event' u g set = mod(_n, 10) + 1 forvalues j=1(1)10{ quietly logit `event' `predictors1' if set~=`j' quietly predict ptemp if set==`j' quietly replace `prediction1' = ptemp if set==`j' drop ptemp quietly logit `event' `predictors2' if set~=`j' quietly predict ptemp if set==`j' quietly replace `prediction2' = ptemp if set==`j' drop ptemp } tempfile dca`i' quietly dca `event' `prediction1' `prediction2', graphoff saving("`dca`i''") drop u set `prediction1' `prediction2' } use "`dca1'", clear forvalues i=2(1)200 { append using "`dca`i''" } collapse all none modelp1 modelp2, by(threshold) save "cross validation dca output.dta", replace twoway(line none all modelp1 modelp2 threshold, sort) ================================================================= Here is my draft of R code. cMain is my dataset. predca<-rep(0,40000) dim(predca)<-c(200,200) for (i in 1:200) { cvgroup<-rep(1:10,length=110) cvgroup<-sample(cvgroup) cvpre<-rep(0,length=110) cvMain<-cbind(cMain,cvgroup,cvpre) for (j in 1:10) { cvdev<-cvMain[cvMain$cvgroup!=j,] cvval<-cvMain[cvMain$cvgroup==j,] cvfit<-lrm(Y~X,data=cvdev,x=T,y=T) cvprej<-predict(cvfit,cvval,type="fitted") #put the fitted value in dataset cvMain[cvgroup==j,]$cvpre<-prej } cvdcaop<-dca(cvMain$Y,cvMain$cvpre,prob=("Y")) cvnb<-100*(cvdcaop[,1]-cvdcaop[,2]) cvtpthres<-cvdcaop[,4]/(100-cvdcaop[,4]) cvnr<-cvnb/cvtpthres predca[cvn,1:99]<-cvnb predca[cvn,101:199]<-cvnr } ================================================================= My questions are 1. How to ensure equal numbers of events in each set in R? 2. A part of stata code is forvalues j=1(1)10{ quietly logit `event' `predictors1' if set~=`j' quietly predict ptemp if set==`j' quietly replace `prediction1' = ptemp if set==`j' drop ptemp quietly logit `event' `predictors2' if set~=`j' quietly predict ptemp if set==`j' quietly replace `prediction2' = ptemp if set==`j' drop ptemp } I don't understand what's the difference between prediction1 and prediction2 3. Is my code right? Thanks ! Yao Zhu Department of Urology Fudan University Shanghai Cancer Center Shanghai, China [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Frank E Harrell Jr Professor and Chairman School of Medicine Department of Biostatistics Vanderbilt University ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.