On 10/03/2009 04:15 AM, Andrew Spence wrote:
Dear R-help,



First of all, thank you VERY much for any help you have time to offer. I
greatly appreciate it.



I would like to write a function that, given an arbitrary number of factors
from a data frame, tabulates the number of occurrences of each unique
combination of the factors. Cleary, this works:



table(horse,date,surface)
<SNIP>

, , surface = TURF



                    date

horse               20080404 20080514 20081015 20081025 20081120 20081203
20090319

   Bedevil                  0        0        0        0        0        0
0

   Cut To The Point       227        0        0        0        0        0
0

<SNIP>



But I would prefer output that skips all the zeros, flattens any dimensions
greater than 2, and gives the level names rather than codes. I can write
code specifically for n factors like this: (here 2 levels):



ft<- function(x,y) {cbind(
levels(x)[unique(cbind(x,y))[,1]],levels(y)[unique(cbind(x,y))[,2]],
table(x,y)[unique(cbind(x,y))])}



which gives the lovely output I'm looking for:



#      [,1]                [,2]       [,3]

# [1,] "Cut To The Point"  "20080404" "227"

# [2,] "Prairie Wolf"      "20080404" "364"

# [3,] "Bedevil"           "20080514" "319"

# [4,] "Prairie Wolf"      "20080514" "330"



But my attempts to make this into a function that handles arbitrary numbers
of factors as separate input arguments has failed. The closest I can get is:



ft2<- function (...) { cbind( unique(cbind(...)),
table(...)[unique(cbind(...))] )



giving:

ft2(horse,date)
       horse date

  [1,]     2    1 227

  [2,]     9    1 364

  [3,]     1    2 319

  [4,]     9    2 330

  [5,]     9    3 291

  [6,]    12    3 249

  [7,]    10    3 286

  [8,]     5    4 217

  [9,]     3    4 426

[10,]     8    4 468

[11,]     9    5 319

[12,]    13    5 328

[13,]    12    5 138

[14,]     7    6 375

[15,]    11    6 366

[16,]     4    7 255

[17,]     6    7 517



I would be greatly in debt to anyone willing to show me how to make the
above function take arbitrary inputs and still produce output displaying
factor level names instead of the underlying coded numbers.

Hi Andrew,
The sizetree function in plotrix does what you want graphically, I think. Perhaps if each invocation returned the vector of counts, the deepest level of counts would be returned at the final exit with the factor levels.

Jim

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to