On Sun, May 11, 2008 at 9:49 AM, amarkos <[EMAIL PROTECTED]> wrote: > On May 11, 4:47 pm, "Douglas Bates" <[EMAIL PROTECTED]> wrote: > >> Do you mean that you want to collapse similar rows into a single row >> and perhaps a count of the number of times that this row occurs? > > Let me rephrase the problem by providing an example. > > Input: > > A = > [,1] [,2] > [1,] 1 1 > [2,] 1 3 > [3,] 2 1 > [4,] 1 2 > [5,] 2 1 > [6,] 1 2 > [7,] 1 1 > [8,] 1 2 > [9,] 1 3 > [10,] 2 1
An important question here is do you start with two or more variables like the columns of your matrix A? If so, there is a more direct method of getting the answers that you want. The natural way to store such variables in R is as factors. I prefer to use letters instead of numbers to represent the levels of a factor (that way I don't confuse a factor with a numeric variable when I look at rows) so I would create a data frame with two factors instead of a matrix. > V1 <- factor(c(1,1,2,1,2,1,1,1,1,2), labels = LETTERS[1:2]) > V2 <- factor(c(1,3,1,2,1,2,1,2,3,1), labels = letters[1:3]) > df <- data.frame(f1 = V1, f2 = V2) > df f1 f2 1 A a 2 A c 3 B a 4 A b 5 B a 6 A b 7 A a 8 A b 9 A c 10 B a You could produce the indicator matrix and check for unique rows, etc. - I will show that below - but all you need is the interaction of the two factors > df$f12 <- with(df, f1:f2)[drop = TRUE] > df f1 f2 f12 1 A a A:a 2 A c A:c 3 B a B:a 4 A b A:b 5 B a B:a 6 A b A:b 7 A a A:a 8 A b A:b 9 A c A:c 10 B a B:a > str(df) 'data.frame': 10 obs. of 3 variables: $ f1 : Factor w/ 2 levels "A","B": 1 1 2 1 2 1 1 1 1 2 $ f2 : Factor w/ 3 levels "a","b","c": 1 3 1 2 1 2 1 2 3 1 $ f12: Factor w/ 4 levels "A:a","A:b","A:c",..: 1 3 4 2 4 2 1 2 3 4 > table(df$f12) A:a A:b A:c B:a 2 3 2 3 > as.numeric(df$f12) [1] 1 3 4 2 4 2 1 2 3 4 Notice that this shows you that there are four distinct combinations that occur 2, 3, 2 and 3 times respectively; the first combination occurs in rows 1 and 7, it consists of the first level of f1 and the first level of f2, etc. If you really do want the indicator matrix you could generate it as > (ind <- cbind(model.matrix(~ 0 + f1, df), model.matrix(~ 0 + f2, df))) f1A f1B f2a f2b f2c 1 1 0 1 0 0 2 1 0 0 0 1 3 0 1 1 0 0 4 1 0 0 1 0 5 0 1 1 0 0 6 1 0 0 1 0 7 1 0 1 0 0 8 1 0 0 1 0 9 1 0 0 0 1 10 0 1 1 0 0 > unique(ind) f1A f1B f2a f2b f2c 1 1 0 1 0 0 2 1 0 0 0 1 3 0 1 1 0 0 4 1 0 0 1 0 but working with the factors is generally much simpler than working with the indicators. > # Indicator matrix > A <- data.frame(lapply(data.frame(obj), as.factor)) > > nocases <- dim(obj)[1] > novars <- dim(obj)[2] > > # variable levels > levels.n <- sapply(obj, nlevels) > n <- cumsum(levels.n) > > # Indicator matrix calculations > Z <- matrix(0, nrow = nocases, ncol = n[length(n)]) > newdat <- lapply(obj, as.numeric) > offset <- (c(0, n[-length(n)])) > for (i in 1:novars) > Z[1:nocases + (nocases * (offset[i] + newdat[[i]] - 1))] <- 1 > > ####### > > Output: > > Z = > > [,1] [,2] [,3] [,4] [,5] > [1,] 1 0 1 0 0 > [2,] 1 0 0 0 1 > [3,] 0 1 1 0 0 > [4,] 1 0 0 1 0 > [5,] 0 1 1 0 0 > [6,] 1 0 0 1 0 > [7,] 1 0 1 0 0 > [8,] 1 0 0 1 0 > [9,] 1 0 0 0 1 > [10,] 0 1 1 0 0 > > > Z is an indicator matrix in the Multiple Correspondence Analysis > framework. > My problem is to collapse identical rows (e.g. 2 and 9) into a single > row and > store the row ids. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.