Re: [R] counts and percentage of multiple categorical columns in R

arun Thu, 16 Jan 2014 14:08:46 -0800

Also,
You can do the same with the previous solution:
result1 <- result[,-6]


vec1 <- unique(unlist(dat1))
result2 <- as.data.frame(t(sapply(dat1,function(x) {counts<- 
table(factor(x,levels=vec1)); 
percentage<-sprintf("%.1f",(counts/sum(counts))*100); 
c(paste0(counts,paste0("(",percentage,")")), 
sum(!is.na(x)))})),stringsAsFactors=FALSE)
 result2[,6] <- as.numeric(result2[,6])
 colnames(result2) <- colnames(result1)
 identical(result1,result2)
#[1] TRUE

A.K.

On Thursday, January 16, 2014 11:24 AM, arun <smartpink...@yahoo.com> wrote:
Hi Jingxia,
May be this helps:

dat1 <- read.table(text="fatfreemilk fatmilk halfmilk 2fatmilk
A A A A
A B B A
B A A A
C C C C
D . A A
A E A E
C A B A
A . A A
A B . A
A A B 
E",sep="",header=TRUE,stringsAsFactors=FALSE,check.names=FALSE,na.strings=".")
 dat2 <- dat1
 dat2$id <- 1:nrow(dat2)

library(reshape2)
res <- acast(melt(dat2,id.var="id")[,-1],variable~value,length)
res[,-6] <- 
paste0(res[,-6],paste0("(",sprintf("%.1f",(res[,-6]/rowSums(res[,-6]))*100)),")")
 result <- as.data.frame(res,stringsAsFactors=FALSE)
#Either 
result$nonNAcount <- dim(dat1)[1]-as.numeric(result$`NA`)
#or
result$nonNAcount <- sapply(dat1,function(x) sum(!is.na(x)))
result[,-6]
#                  A       B       C       D       E nonNAcount
#fatfreemilk 6(60.0) 1(10.0) 2(20.0) 1(10.0)  0(0.0)         10
#fatmilk     4(50.0) 2(25.0) 1(12.5)  0(0.0) 1(12.5)          8
#halfmilk    5(55.6) 3(33.3) 1(11.1)  0(0.0)  0(0.0)          9
#2fatmilk    7(70.0)  0(0.0) 1(10.0)  0(0.0) 2(20.0)         10

A.K.





On Thursday, January 16, 2014 9:49 AM, Jingxia Lin <jingxi...@gmail.com> wrote:

Dear Arun,

Sorry to bother you again.. But may I ask you for one more question regarding 
the data set? 
I am using the following method you offered for the data set. In our original 
data, there are some blank cells (i.e. data missing) in some columns. So in the 
output data frame, can we add an additional column to show the number of 
response (i.e. the number of non-blank cells)? I tried a couple of ways but 
failed (sorry I'm really not good at R...) I would be very grateful if you can 
help us with this problem at your convenience. Thank you! 

Best,
Jingxia

 dat2 <- dat1
 dat2$id <- 1:nrow(dat2)
library(reshape2)
 res <- dcast(melt(dat2,id.var="id")[,-1],variable~value,length)
row.names(res) <- res[,1]
res1 <- res[,-1]
res2 <- as.matrix(res1)
 res2[]<- paste0(res2,paste0("(",(res2/rowSums(res2))*100),")")
 as.data.frame(res2)


results 
#                A     B     C     D     E
#fatfreemilk 6(60) 1(10) 2(20) 1(10)  0(0)
#fatmilk     6(60) 2(20) 1(10)  0(0) 1(10)
#halfmilk    5(50) 4(40) 1(10)  0(0)  0(0)
#2fatmilk    7(70)  0(0) 1(10)  0(0) 2(20)




On Mon, Dec 30, 2013 at 3:50 PM, arun <smartpink...@yahoo.com> wrote:

Dear Jingxia,
>No problem.  Happy New Year to you too!
>Arun
>
>
>
>
>
>
>
>
>On Monday, December 30, 2013 2:49 AM, Jingxia Lin <jingxi...@gmail.com> wrote:
>
>Dear Arun,
>
>Thank YOU for your kind help :)  Happy new year!
>
>Best,
>Jingxia
>
>
>
>On Mon, Dec 30, 2013 at 3:43 PM, arun <smartpink...@yahoo.com> wrote:
>
>Dear Jingxia,
>>
>>Glad that you were able to figure it out.  I was away from my computer.  My 
>>name is 'Arun Kirshna Sasikala-Appukuttan'.  I am a postdoctoral research 
>>fellow at Wayne State University, Detroit, MI, USA.  Thank you for the kind 
>>acknowledgment.
>>Regards,
>>Arun
>>
>>
>>
>>
>>
>>
>>
>>On Sunday, December 29, 2013 9:25 PM, Jingxia Lin <jingxi...@gmail.com> wrote:
>>
>>Dear A.K.
>>
>>I also solved the character problem by using library(xlsx). So everything is 
>>fine now. Thank you again!
>>
>>Best,
>>Jingxia
>>
>>
>>
>>On Mon, Dec 30, 2013 at 10:17 AM, Jingxia Lin <jingxi...@gmail.com> wrote:
>>
>>Dear A.K.,
>>>
>>>
>>>Thank you a lot! I tried your way and it works perfect. The only thing I 
>>>haven't figured out is that while I exported the final data frame into an 
>>>excel file, all Chinese characters were not shown correctly (my original 
>>>data has Chinese in row/column names). Other than that, everything is great! 
>>>Would you mind letting me know your name so that we can acknowledge your 
>>>help in our paper? Thank you again! 
>>>
>>>
>>>Best
>>>Jingxia
>>>
>>>
>>>
>>>
>>>
>>>On Mon, Dec 30, 2013 at 3:48 AM, arun <smartpink...@yahoo.com> wrote:
>>>
>>>Hi,
>>>>Try:
>>>>dat1 <- read.table(text="fatfreemilk fatmilk halfmilk 2fatmilk
>>>>
>>>>A A A A
>>>>A B B A
>>>>B A A A
>>>>C C C C
>>>>D A A A
>>>>A E A E
>>>>C A B A
>>>>A A A A
>>>>A B B A
>>>>A A B E",sep="",header=TRUE,stringsAsFactors=FALSE,check.names=FALSE)
>>>> dat2 <- dat1
>>>> dat2$id <- 1:nrow(dat2)
>>>>library(reshape2)
>>>> res <- dcast(melt(dat2,id.var="id")[,-1],variable~value,length)
>>>>row.names(res) <- res[,1]
>>>>res1 <- res[,-1]
>>>>res2 <- as.matrix(res1)
>>>> res2[]<- paste0(res2,paste0("(",(res2/rowSums(res2))*100),")")
>>>> as.data.frame(res2)
>>>>#                A     B     C     D     E
>>>>#fatfreemilk 6(60) 1(10) 2(20) 1(10)  0(0)
>>>>#fatmilk     6(60) 2(20) 1(10)  0(0) 1(10)
>>>>#halfmilk    5(50) 4(40) 1(10)  0(0)  0(0)
>>>>#2fatmilk    7(70)  0(0) 1(10)  0(0) 2(20)
>>>>A.K.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>On Sunday, December 29, 2013 1:07 PM, Jingxia Lin <jingxi...@gmail.com> 
>>>>wrote:
>>>>Dear R helpers,
>>>>
>>>>I have a data sheet (“milk”) with four types of milk from five brands (A,
>>>>B, C, D, E), the column shows the brands that each customer chose for each
>>>>type of the milk they bought. The data sheet goes like below. You can see
>>>>for some type of milk, no brand is chosen.
>>>>
>>>>fatfreemilk fatmilk halfmilk 2fatmilk
>>>>A A A A
>>>>A B B A
>>>>B A A A
>>>>C C C C
>>>>D A A A
>>>>A E A E
>>>>C A B A
>>>>A A A A
>>>>A B B A
>>>>A A B E
>>>>
>>>>I want to summarize each column so that for each type of milk, i know the
>>>>counts and percentages of the brands chosen for each milk type. I tried
>>>>"summary" in R, but the result is not shown nicely. How I can display the
>>>>result in a way like below:
>>>>A B C D E
>>>>fatfreemilk 6(60) 1(10) 2(20) 1(10) 0(0)
>>>>fatmilk 6(60) 2(20) 1(10) 0(10) 1(10)
>>>>halfmilk 5(50) 4(40) 1(10) 0(0) 0(0)
>>>>2fatmilk 7(70) 0(0) 1(10) 0(0) 2(20)
>>>>
>>>>Thank you!
>>>>
>>>>
>>>>    [[alternative HTML version deleted]]
>>>>
>>>>______________________________________________
>>>>R-help@r-project.org mailing list
>>>>https://stat.ethz.ch/mailman/listinfo/r-help
>>>>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>>
>>
>

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] counts and percentage of multiple categorical columns in R

Reply via email to