Thank you in advance for your consideration.

I have a dataframe of 2000+ observations with repeated measures across approximately 300 unique individuals An event either does or does not happen (1,0) and there is a suit of independent variables associated with the event. A simplified representation follows:

my.df<-data.frame("id"=c("A","A","A","B","B","C","C","C", "C", "C"), event=c(0,0,1,0,1,0,0,1,1, 0))

_id_  _event_
A     0
A     0
A     1
B     0
B     1
C     0
C     0
C     1
C     1
C     0

I need to sample my.df to select the same number of observations with event = 0 as event = 1 for each unique id. I can reshape or tapply my.df to group id and determine what sample size I need. my.df.cast=

library(reshape)
my.df.melt<-melt(my.df, id="id")
my.df.cast<-cast(my.df.melt, id~value, length, fill=0)
my.df.cast

      Event
_id_      _0_   _1_
A     2     *1*
B     1     *1*
C     3     *2*

Given the above dataframe I need to randomly select (sample) from my.df *one* observation from my.df[my.df$id==A & my.df$event==0], *one* from my.df[my.df$id==B & my.df$event==0], and* two* from my.df[my.df$id==C & my.df$event==0] and then rbind them to my.df[my.df$event == 1]. However, it is impractical to individually code each case.

Alternatively if A in my.df matches A in my.df.cast then sample(my.df[my.df$id == A & my.df$event == 0], size=my.df.cast[1,3], replace=FALSE). I think I am close to a solution but I'm not sure how to code it to run through the entire dataframe.

This is how my.new.df would look:

_id event_
A     0
A     1
B     0
B     1
C     0
C     0
C     1
C     1

Thank you kindly for your help,

Eric

--
Eric Vander Wal
Ph.D. Candidate
University of Saskatchewan, Department of Biology, 112 Science Place, Saskatoon, SK., S7N 5E2

"Pluralitas non est ponenda sine neccesitate"

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to