On Nov 3, 2011, at 12:28 PM, Stefano Sofia wrote:
Dear R users,
I have got the following data frame, called my_df:
gender day_birth month_birth year_birth labour
1 F 22 10
2001 1
2 M 29 10
2001 2
3 M 1 11
2001 1
4 F 3 11
2001 1
5 M 3 11
2001 2
6 F 4 11
2001 1
7 F 4 11
2001 2
8 F 5 12
2001 2
9 M 22 14
2001 2
10 F 29 13
2001 2
...
I need to count data in different ways:
1. count the births for each day (having 0 when necessary)
independently from the value of the "labour" column
xtabs sometimes give better results. If you want all 31 days then make
day_birth a factor with levels=1:31)
> xtabs( ~ day_birth + month_birth + year_birth, data=dat)
, , year_birth = 2001
month_birth
day_birth 10 11 12 13 14
1 0 1 0 0 0
3 0 2 0 0 0
4 0 2 0 0 0
5 0 0 1 0 0
22 1 0 0 0 1
29 1 0 0 1 0
2. count the births for each day (having 0 when necessary), divided
by the value of "labour" (which can have two valuers, 1 or 2)
Cannot figure out what is being asked here. What to do with the two
values? Just count them? This would give a partitioned count
> xtabs( labour==1 ~ day_birth + month_birth , data=dat)
month_birth
day_birth 10 11 12 13 14
1 0 1 0 0 0
3 0 1 0 0 0
4 0 1 0 0 0
5 0 0 0 0 0
22 1 0 0 0 0
29 0 0 0 0 0
> xtabs( labour==2 ~ day_birth + month_birth , data=dat)
month_birth
day_birth 10 11 12 13 14
1 0 0 0 0 0
3 0 1 0 0 0
4 0 1 0 0 0
5 0 0 1 0 0
22 0 0 0 0 1
29 1 0 0 1 0
3. count the births for each day of all the years (i.e. the 22nd of
October of all the years present in the data frame) independently
from the value of "labour"
If I understand correctly:
> xtabs( ~ day_birth + month_birth + year_birth, data=dat)
, , year_birth = 2001
month_birth
day_birth 10 11 12 13 14
1 0 1 0 0 0
3 0 2 0 0 0
4 0 2 0 0 0
5 0 0 1 0 0
22 1 0 0 0 1
29 1 0 0 1 0
4. count the births for each day of all the years (i.e. the 22nd of
October of all the years present in the data frame), divided by the
value of "labour"
Again confusing. Do you mean to use separate tables for labour==1 and
labour==2? Perhaps context to explain what these values represent.
Some of us are "concrete". The results of xtabs are tables and can be
divided like matrices.
I tried with the command
table(my_df$year_birth, my_df$month_birth, my_df$day_birth)
which satisfies (partially) question numer 1 (I am not able to have
0 in the not available days).
Is there a smart way to do that without invoking too many loops?
thank you for your help
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.