Dear R-Help-List, A few days ago I asked for help simulating case-control data. I got a great answer to help me with my code, but I am having trouble modifying it for 1:M matched case-control data. Does anyone have any guidance/pointers for simulating 1:M matched data?.
Thank you, -R > Dear R-Help-List, > > I was wondering if anyone had experience simulating > case-control data in R? I think the only simple method that allows you to specify any arbitrary population distribution of predictors and does not rely on the logistic regression model being true is to simulate cohorts and then take a case-control sample from each one Eg for a case-control sample of 500 cases and 1000 controls where there is about a 1% cumulative incidence 1. Generate all your predictor variables for a cohort of 50,000 people, from any distributions you want 2. Specify the disease model. This could be logistic logit(p(Y=1))=eta = b0+b1x1+b2x2+... p = exp(eta)/(1+exp(eta)) or it could be anything else. 3. Now sum(p) gives the expected number of cases. Adjust b0 so that this is a bit bigger than your desired number, eg 550. 4. Generate Y for the population by rbinom(50000,1,p) 5. Choose 500 cases and 1000 controls using sample(). ____________________________________________________________________________________ Looking for last minute shopping deals? ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.