Re: [R] cross validation function translated from stata

Frank E Harrell Jr Thu, 21 Jan 2010 07:20:28 -0800

Take a look at the validate.lrm function in the rms package.

Note that the use of threshold probabilities results in an improperscoring rule which will mislead you. Also note that you need to repeat10-fold CV 50-100 times for precision, and that at each repeat you haveto start from zero in analyzing associations.


Frank

zhu yao wrote:

Hi, everyone:

I ask for help about translating a stata program into R.

The program perform cross validation as it stated.

#1. Randomly divide the data set into 10 sets of equal size, ensuring equal
numbers of events in each set
#2. Fit the model leaving out the 1st set
#3. Apply the fitted model in (2) to the 1st set to obtain the predicted
probability of a prostate cancer diagnosis.
#4. Repeat steps (2) to (3) leaving out and then applying the fitted model
to the ith group, i = 2, 3... 10. Every subject now has a predicted
probability of a prostate cancer diagnosis.
#5. Using the predicted probabilities, compute the net benefit at various
threshold probabilities.
#6. Repeat steps (1) to (5) 200 times. The corrected net benefit for each
threshold probability is the mean across the 200 replications.

=================================================================
First is stata code.

forvalues i=1(1)200 {
local event="cancer"
local predictors1 = "total_psa"
local predictors2 = "total_psa free_psa"
local prediction1 = "base"
local prediction2 = "full"

g `prediction1'=.
g `prediction2'=.

quietly g u = uniform()
sort `event' u
g set = mod(_n, 10) + 1

forvalues j=1(1)10{
quietly logit `event' `predictors1' if set~=`j'
quietly predict ptemp if set==`j'
quietly replace `prediction1' = ptemp if set==`j'
drop ptemp


quietly logit `event' `predictors2' if set~=`j'
quietly predict ptemp if set==`j'
quietly replace `prediction2' = ptemp if set==`j'
drop ptemp
}
tempfile dca`i'
quietly dca `event' `prediction1' `prediction2', graphoff saving("`dca`i''")

drop u set `prediction1' `prediction2'
}


use "`dca1'", clear
forvalues i=2(1)200 {
append using "`dca`i''"
}

collapse all none modelp1 modelp2, by(threshold)

save "cross validation dca output.dta", replace

twoway(line none all modelp1 modelp2 threshold, sort)

=================================================================
Here is my draft of R code. cMain is my dataset.

predca<-rep(0,40000)
dim(predca)<-c(200,200)

for (i in 1:200) {
 cvgroup<-rep(1:10,length=110)
cvgroup<-sample(cvgroup)
cvpre<-rep(0,length=110)
cvMain<-cbind(cMain,cvgroup,cvpre)

for (j in 1:10) {
cvdev<-cvMain[cvMain$cvgroup!=j,]
cvval<-cvMain[cvMain$cvgroup==j,]
 cvfit<-lrm(Y~X,data=cvdev,x=T,y=T)
cvprej<-predict(cvfit,cvval,type="fitted")

#put the fitted value in dataset
cvMain[cvgroup==j,]$cvpre<-prej
}

cvdcaop<-dca(cvMain$Y,cvMain$cvpre,prob=("Y"))
cvnb<-100*(cvdcaop[,1]-cvdcaop[,2])
cvtpthres<-cvdcaop[,4]/(100-cvdcaop[,4])
cvnr<-cvnb/cvtpthres
predca[cvn,1:99]<-cvnb
predca[cvn,101:199]<-cvnr
}

=================================================================

My questions are
1. How to ensure equal numbers of events in each set in R?
2. A part of stata code is
                        forvalues j=1(1)10{
quietly logit `event' `predictors1' if set~=`j'
quietly predict ptemp if set==`j'
quietly replace `prediction1' = ptemp if set==`j'
drop ptemp

quietly logit `event' `predictors2' if set~=`j'
quietly predict ptemp if set==`j'
quietly replace `prediction2' = ptemp if set==`j'
drop ptemp
}
  I don't understand what's the difference between prediction1 and
prediction2
3. Is my code right?

Thanks !

Yao Zhu
Department of Urology
Fudan University Shanghai Cancer Center
Shanghai, China

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Frank E Harrell Jr   Professor and Chairman        School of Medicine
                     Department of Biostatistics   Vanderbilt University

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cross validation function translated from stata

Reply via email to