Boya- table() is the function that does what you want:
cdat = data.frame(membership=rep(1:3,rep(3,3)),
+ label=as.character(c(0,0,1,0,1,1,1,1,1)))
table(cdat)
label membership 0 1 1 2 1 2 1 2 3 0 3
From there, you can rearrange it in a variety of ways:
as.data.frame(table(cdat))
membership label Freq 1 1 0 2 2 2 0 1 3 3 0 0 4 1 1 1 5 2 1 2 6 3 1 3 Or, to conform with your request
reshape(as.data.frame(table(cdat)),idvar='membership',
+ v.names='Freq',timevar='label',direction='wide') membership Freq.0 Freq.1 1 1 2 1 2 2 1 2 3 3 0 3 - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spec...@stat.berkeley.edu On Wed, 21 Jul 2010, Boya Sun wrote:
Dear R experts, I have a labeled data set. Each data is assigned a binary label 0 or 1. Assume that I use some clustering algorithm to group the data by clusters (using some features of the data). Now I want to know how many data are labeled as 0/1 in each cluster. For example, assume that I have 9 labeled data grouped into three clusters. The ids of the clusters are 1, 2, and 3. The dataset is represented by the following matrix: membership Label d1 1 0 d2 1 0 d3 1 1 d4 2 0 d5 2 1 d6 2 1 d7 3 1 d8 3 1 d9 3 1 Now I want to get the following output, telling me how many data are labeled as 0 and 1 in each cluster cluster_id 0-data 1-data 1 2 1 2 1 2 3 0 3 The output does not have to be a matrix, it could be a summary of the statistics. How should I approach this problem? What R functions should I use to get such information? Thanks so much! Boya [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.