[R] loops & sampling

Garth.Warren Wed, 31 Oct 2007 22:37:09 -0800

Hi,


I'm new to R (and statistics) and my boss has thrown me in the deep-end with 
the following task: 

 

We want to evaluate the impact that sampling size has on our ability to create 
a robust model, or evaluate how robust the model is to sample size for the 
purpose of cross-validation i.e. in our current project we have collected a 
series of independent data at 250 locations, from which we have built a 
predictive model, we want to know whether we could get away with collecting 
fewer samples and still build a decent model; for the obvious operational 
reasons of cost, time spent in the field etc.. 

 

Our thinking was that we could apply a bootstrap type procedure:

 

We would remove 10 records or samples from the total n=250 and then replace 
those 10 removed with replacements (or copies) from the remaining 240. With 
this new data-frame we would apply our model and calculate an r², we would then 
repeat through looping 1000 times before generating the mean r² from those 1000 
r² values generated. After which we would start the process again by remove 20 
samples from our data with replacements from the remaining 230 records and so 
on... 

 

Below is a simplified version of the real code which contains most of the basic 
elements. My main problem is I'm not sure what the 'for(i in 1:nboot)' line is 
doing, originally I though what this meant was that it removed 1 sample or 
record from the data which was replaced by a copy of one of the records from 
the remaining n, such that 'for(i in 10:nboot)' when used in the context of the 
below code removed 10 samples with replacements as I have said above. I'm 
almost positive that this isn't happening and if not how can I make the code 
below for example do what we want it to? 

 

library(utils)

#data

a <- c(5.5, 2.3, 8.5, 9.1, 8.6, 5.1)

b <- c(5.2, 2.2, 8.6, 9.1, 8.8, 5.7)

c <- c(5.0,14.6, 8.9, 9.0, 9.1, 5.5)

#join

abc <- data.frame(a,b,c)

#set column names

names(abc)[1]<-"y"

names(abc)[2]<-"x1"

names(abc)[3]<-"x2"

abc2 <- abc

#sample

abc3 <- as.data.frame(t(as.matrix(data.frame(abc2))))

n <- length(abc2)

npboot.function <- function(nboot)

{

boot.cor <- vector(length=nboot)

for(i in 1:nboot){

rdata <- sample(abc3,n,replace=T)

abc4 <- as.data.frame(t(as.matrix(data.frame(rdata))))

model <- lm(asin(sqrt(abc4$y/100)) ~ I(abc4$x1^2) + abc4$x2)

boot.cor[i] <- cor(abc4$y, model$fit)}

boot.cor

}

bt.cor <- npboot.function(nboot=10)

bootmean <- mean(bt.cor)

 

 

Any assistance would be greatly appreciated, also the sooner the better as we 
are under pressure to reach a conclusion.

 

Cheers,

 

Garth


        [[alternative HTML version deleted]]

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] loops & sampling

Reply via email to