Re: [R] bootstrap sample for clustered data

Liu, Lei Mon, 17 Sep 2018 10:12:44 -0700

Thanks for the help. My friend helped me and here is the solution:

boot.cluster <- function(x, id){
  boot.id <- sample(unique(id), replace=T)
  out <- lapply(1:length(boot.id), 
function(newid){cbind(x[id%in%boot.id[newid],],newid)})
  return( do.call("rbind",out) )
}


Lei

-----Original Message-----
From: Jeff Newmiller [mailto:jdnew...@dcn.davis.ca.us] 
Sent: Monday, September 17, 2018 2:32 AM
To: r-help@r-project.org; Liu, Lei <lei....@wustl.edu>; r-help@R-project.org
Subject: Re: [R] bootstrap sample for clustered data

You are telling us that the ID values in your data set indicate clusters. 
However you went about making that determination in the first place might be an 
obvious(?) way to do it again with your bootstrapped sample, ignoring the 
cluster assignments you have in place. This is the wrong place to have a 
discussion about which theoretical method for cluster identification you should 
use, and if you do know that then searching the web or using the sos package 
would be the appropriate way to find implementations of a specific clustering 
algorithm.

I am not an ME expert, but AFAIK "complicated" analyses such as mixed effects 
models tend to have rather hefty appetites for data completeness, so you may 
have to design a special sampling plan in order to avoid generating data sets 
for which those analyses won't break, and you will probably need a very large 
data set to start with in order to have sufficient data in each cluster. That 
is, you may be better off keeping the original cluster identification and just 
restructuring your bootstrap sampling to sample within clusters.

The R-sig-me mailing list is probably a better venue for your questions. 

On September 16, 2018 8:22:44 PM PDT, "Liu, Lei" <lei....@wustl.edu> wrote:
>Hi there,
>
>I posted this message before but there may be some confusion in my 
>previous post. So here is a clearer version:
>
>I'd like to do a bootstrap sampling for clustered data. Then I will run 
>some complicated models (say mixed effects models) on the bootstrapped 
>sample. Here id is the cluster. Note different clusters have different 
>number of subjects, e.g., id 2 has 2 observations, id 3 has 3 
>observations.
>
>id=c(1, 1, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5) y=c(.5, .6, .4, .3, .4, 1, .9, 
>1, .5, 2, 2.2, 3) x=c(0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1 )
>
>xx=data.frame(id, x, y)
>
>boot.cluster <- function(x, id){
>
>  boot.id <- sample(unique(id), replace=T)  out <- lapply(boot.id, 
> function(i) x[id%in%i,])
>
>  return( do.call("rbind",out) )
>
>}
>
>boot.xx=boot.cluster(xx, xx$id)
>
>Here is the generated boot.xx dataset:
>
>   id x y
>   3 0 0.4
>   3 0 1.0
>   3 0 0.9
>   1 0 0.5
>   1 0 0.6
>   5 1 2.2
>   5 1 3.0
>   2 1 0.4
>   2 1 0.3
>   1 0 0.5
>   1 0 0.6
>
>You can see that some clusters (ids) appears multiple times (e.g., id 1 
>appears in two places - 4 rows), since bootstrap does a sample with 
>replacement, we could have the same cluster multiple times. Thus, we 
>cannot do a mixed effects model using this data, as we should assume 
>all the clusters are different in this new data. Instead, I will 
>reorganize the data as below (id is reordered from the above boot.xx 
>data). This is the step I need help:
>
>  id x  y
>   1 0 0.4
>   1 0 1.0
>   1 0 0.9
>   2 0 0.5
>   2 0 0.6
>   3 1 2.2
>   3 1 3.0
>   4 1 0.4
>   4 1 0.3
>   5 0 0.5
>   5 0 0.6
>
>Can someone help me with it? Thanks!
>
>Lei Liu
>Professor of Biostatistics
>Washington University in St. Louis
>
>
>       [[alternative HTML version deleted]]
>
>______________________________________________
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

--
Sent from my phone. Please excuse my brevity.
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] bootstrap sample for clustered data

Reply via email to