I am fairly new to R and have a (for me) slightly complicated set of data to
analyse. It contains several continuous and categorical variables for a
group of individuals – e.g;

ID      Sex     Age     Familysize      Phone   Education
1       M       23      3               Yes             Primary
2       F       25      4               Yes             Secondary
3       M       33      5               No              Tertiary
4       F       45      1               Yes             Secondary
5       F       67      10              Yes             Secondary


I want to summarise it in a table as follows;

                 All individuals        Male                     Female         
 
Comparison between sexes 
                                                                                
         
(I want to put p-values in this column) 
Age             Median (range)  Median (range)   Median (range)    Wilcoxon rank
sum test
Family size     Median (range)  Median (range)   Median (range)    Wilcoxon rank
sum test
Phone   Number Yes (%)  Number Yes (%)   Number Yes (%)    Chi-squared test
Education                                                                       
      
Chi-squared test
Primary      Number (%)         Number (%)         Number (%)        
Secondary  Number (%)        Number (%)          Number (%)
Tertiary        Number (%)        Number (%)          Number (%)


How can I use R to do this?
For the continuous variables I know I can write code like;
summary(Age)
by(Age,data["Sex"],summary)
wilcox.test(Age~Sex)
summary(Familysize)
by(Familysize,data[“Sex”],summary)
Wilcox.test(Familysize~Sex)

but is there any way of automating/looping the analysis so that I get
summaries and comparative statistical analysis of all of the continuous
variables in a single command? I’m sure this could be done by some kind of
‘looping’ given that the analysis is always the same. Presumably I then
still have to copy the output of interest (medians, ranges, p-values) into
the summary table manually?

For each categorical variable I have really cumbersome code from which I can
extract the information I need from each variable for the summary table–
e.g,

tphone<-xtabs(~Phone+Sex,data=data)
N<-margin.table(tphone,2)
tphone1<-rbind(tphone,N)
Total<-margin.table(tphone1,1)
tphone1<-cbind(tfbise3xul1,Total)
tphone1<-t(tphone1)
tphone1<-as.data.frame(tphone1)
tphone2<-within(tphone1,{
per.No<-100*(No/N)
per.Yes<-100*(Yes/N)
tphone2<-tphone2[,c(3,2,4,1,5)]
tphone2
chisq.test(tphone)

but there must be better ways of generating the counts, percentages, and
simple statistical analysis  which I need. Again, can I loop it to do all of
my categorical variables at once?

Obviously my dataset has more continuous and categorical variables than
those shown above but I’ve abbreviated it for simplicity of explanation – I
need to write simpler/looped code so that the whole thing is not crazily
long-winded. 

Sorry that my approach so far is so bad and long-winded! R is a long uphill
curve to start with, so I’m be very grateful for any help I can get from
anyone who won’t laugh at me.

Derek


--
View this message in context: 
http://r.789695.n4.nabble.com/Generating-summary-statistics-and-simple-statistical-analysis-from-my-data-set-how-can-I-automate-th-tp3492537p3492537.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to