Thanks a lot for all the comments and suggestions. It has helped me solve the problem. I find the "wide" to "long" transformation of the data especially helpful. I used this in STATA but was not aware that I could do the same in R.
Deepankar On Fri, 2007-10-26 at 08:44 -0500, Douglas Bates wrote: > Another approach is to convert the data frame that you have in what is > sometimes called the "wide" format to the "long" format. See ?reshape > for details on this transformation. > > In the process of doing the conversion I would also convert the sex of > the child to a factor with meaningful levels and the family number to > a factor. > > > birth # data in the original, "wide" format > b1 b2 b3 b4 b5 b6 > 1 1 2 1 2 NA NA > 2 2 2 NA NA NA NA > 3 1 2 1 1 1 NA > 4 2 1 NA NA NA NA > 5 1 NA NA NA NA NA > 6 2 1 2 1 NA NA > > bl <- reshape(birth, varying = list(1:6), > v.names = "sex", timevar = "ord", > idvar = "family", direction = "long") > > head(bl, n = 8) # a data frame with 3 columns > ord sex family > 1.1 1 1 1 > 2.1 1 2 2 > 3.1 1 1 3 > 4.1 1 2 4 > 5.1 1 1 5 > 6.1 1 2 6 > 1.2 2 2 1 > 2.2 2 2 2 > > bl$sex <- factor(bl$sex, labels = c("M", "F")) # use a factor with > > meaningful labels > > bl <- subset(bl, !is.na(sex)) # remove records of births that did not occur > > bl$family <- factor(bl$family) # convert family to a factor > > str(bl) # resulting structure has only 18 rows > 'data.frame': 18 obs. of 3 variables: > $ ord : int 1 1 1 1 1 1 2 2 2 2 ... > $ sex : Factor w/ 2 levels "M","F": 1 2 1 2 1 2 2 2 2 1 ... > $ family: Factor w/ 6 levels "1","2","3","4",..: 1 2 3 4 5 6 1 2 3 4 ... > > bl > ord sex family > 1.1 1 M 1 > 2.1 1 F 2 > 3.1 1 M 3 > 4.1 1 F 4 > 5.1 1 M 5 > 6.1 1 F 6 > 1.2 2 F 1 > 2.2 2 F 2 > 3.2 2 F 3 > 4.2 2 M 4 > 6.2 2 M 6 > 1.3 3 M 1 > 3.3 3 M 3 > 6.3 3 F 6 > 1.4 4 F 1 > 3.4 4 M 3 > 6.4 4 M 6 > 3.5 5 M 3 > > subset(bl, sex == "M") # these are the births of males only > ord sex family > 1.1 1 M 1 > 3.1 1 M 3 > 5.1 1 M 5 > 4.2 2 M 4 > 6.2 2 M 6 > 1.3 3 M 1 > 3.3 3 M 3 > 3.4 4 M 3 > 6.4 4 M 6 > 3.5 5 M 3 > > with(subset(bl, sex == "M"), tapply(ord, family, min)) # first male birth > > in family > 1 2 3 4 5 6 > 1 NA 1 2 1 2 > > The wide format may seem a natural representation for such data but > frequently it is inefficient and awkward. The long format is much > easier to manipulate in R. > > On 10/25/07, jim holtman <[EMAIL PROTECTED]> wrote: > > You might want to consider another representation, but it would depend > > on how you want to use it. Here is a 'list' that records for each row > > the position of the boys; does this start to give you the type of data > > that you want? These are the numeric values of where the boys occur. > > > > > x.m > > b1 b2 b3 b4 b5 b6 > > [1,] 1 2 1 2 NA NA > > [2,] 2 2 NA NA NA NA > > [3,] 1 2 1 1 1 NA > > [4,] 2 1 NA NA NA NA > > [5,] 1 NA NA NA NA NA > > [6,] 2 1 2 1 NA NA > > > apply(x.m, 1, function(a)which(a == 1)) > > [[1]] > > b1 b3 > > 1 3 > > > > [[2]] > > named integer(0) > > > > [[3]] > > b1 b3 b4 b5 > > 1 3 4 5 > > > > [[4]] > > b2 > > 2 > > > > [[5]] > > b1 > > 1 > > > > [[6]] > > b2 b4 > > 2 4 > > > > > > > > > > > On 10/25/07, Deepankar Basu <[EMAIL PROTECTED]> wrote: > > > Hi All, > > > > > > I have data on the sequence of births for families with completed > > > fertility cycle (in a data frame); the relevant variables are called b1, > > > b2, b3, b4, b5, b6 and record the birth of the first, second, ..., sixth > > > child. So, > > > b1=1 if the first birth is male, > > > b1=2 if the first birth is female, > > > and b1=NA if the family did not record any first birth. > > > > > > Similarly for b2, b3, b4, b5 and b6. > > > > > > I want to record the positions of the male children within their > > > family's birth history. So, I was thinking of creating six variables > > > boy_1, boy_2, ..., boy_6. boy_1 would record the position of the first > > > boy, boy_2 would record the position of the second boy and so on till > > > boy_6. I want to assign a value of zero to boy_i if the family in > > > question did not have the i_th boy. > > > > > > I am not sure how best to do this (i.e., whether to create variables as > > > I have suggested or do something else) and would appreciate any > > > suggestions. Later, I want to use the information on the position of the > > > male births to compute a likelihood function and do an MLE. > > > > > > Here is how my data frame would look: > > > > > > b1 b2 b3 b4 b5 b6 > > > 1 2 1 2 NA NA > > > 2 2 NA NA NA NA > > > 1 2 1 1 1 NA > > > 2 1 NA NA NA NA > > > 1 NA NA NA NA NA > > > 2 1 2 1 NA NA > > > > > > Thanks in advance. > > > > > > Deepankar > > > > > > ______________________________________________ > > > R-help@r-project.org mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide > > > http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > > > > > > > > -- > > Jim Holtman > > Cincinnati, OH > > +1 513 646 9390 > > > > What is the problem you are trying to solve? > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.