Boya-
   table() is the function that does what you want:

cdat = data.frame(membership=rep(1:3,rep(3,3)),
+                   label=as.character(c(0,0,1,0,1,1,1,1,1)))
table(cdat)
          label
membership 0 1
         1 2 1
         2 1 2
         3 0 3

From there, you can rearrange it in a variety of ways:

as.data.frame(table(cdat))
  membership label Freq
1          1     0    2
2          2     0    1
3          3     0    0
4          1     1    1
5          2     1    2
6          3     1    3

Or, to conform with your request

reshape(as.data.frame(table(cdat)),idvar='membership',
+         v.names='Freq',timevar='label',direction='wide')
  membership Freq.0 Freq.1
1          1      2      1
2          2      1      2
3          3      0      3


                                        - Phil Spector
                                         Statistical Computing Facility
                                         Department of Statistics
                                         UC Berkeley
                                         spec...@stat.berkeley.edu


On Wed, 21 Jul 2010, Boya Sun wrote:

Dear R experts,

I have a labeled data set. Each data is assigned a binary label 0 or 1.
Assume that I use some clustering algorithm to group the data by clusters
(using some features of the data). Now I want to know how many data are
labeled as 0/1 in each cluster.

For example, assume that I have 9 labeled data grouped into three clusters.
The ids of the clusters are 1, 2, and 3.  The dataset is represented by the
following matrix:

       membership        Label
d1    1                        0
d2    1                        0
d3    1                        1
d4    2                        0
d5    2                        1
d6    2                        1
d7    3                        1
d8    3                        1
d9    3                        1

Now I want to get the following output, telling me how many data are labeled
as 0 and 1 in each cluster

cluster_id    0-data    1-data
1                2            1
2                1            2
3                0            3

The output does not have to be a matrix, it could be a summary of the
statistics.

How should I approach this problem? What R functions should I use to get
such information?

Thanks so much!

Boya

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to