On Mon, May 12, 2008 at 11:27 AM, amarkos <[EMAIL PROTECTED]> wrote: > Thanks, it works!
> Could you please provide the direct method you mentioned for the > multivariate case? I'm not sure what you mean. I looked at what I wrote and I don't see anything that would fit that description. May I suggest that you continue to cc: the R-help list on the discussion. I can't always respond rapidly to requests and there are many who read the list that can. > On May 12, 4:30 pm, "Douglas Bates" <[EMAIL PROTECTED]> wrote: >> On Sun, May 11, 2008 at 9:49 AM, amarkos <[EMAIL PROTECTED]> wrote: >> > On May 11, 4:47 pm, "Douglas Bates" <[EMAIL PROTECTED]> wrote: >> >> >> Do you mean that you want to collapse similar rows into a single row >> >> and perhaps a count of the number of times that this row occurs? >> >> > Let me rephrase the problem by providing an example. >> >> > Input: >> >> > A = >> > [,1] [,2] >> > [1,] 1 1 >> > [2,] 1 3 >> > [3,] 2 1 >> > [4,] 1 2 >> > [5,] 2 1 >> > [6,] 1 2 >> > [7,] 1 1 >> > [8,] 1 2 >> > [9,] 1 3 >> > [10,] 2 1 >> >> An important question here is do you start with two or more variables >> like the columns of your matrix A? If so, there is a more direct >> method of getting the answers that you want. The natural way to store >> such variables in R is as factors. I prefer to use letters instead of >> numbers to represent the levels of a factor (that way I don't confuse >> a factor with a numeric variable when I look at rows) so I would >> create a data frame with two factors instead of a matrix. >> >> > V1 <- factor(c(1,1,2,1,2,1,1,1,1,2), labels = LETTERS[1:2]) >> > V2 <- factor(c(1,3,1,2,1,2,1,2,3,1), labels = letters[1:3]) >> > df <- data.frame(f1 = V1, f2 = V2) >> > df >> >> f1 f2 >> 1 A a >> 2 A c >> 3 B a >> 4 A b >> 5 B a >> 6 A b >> 7 A a >> 8 A b >> 9 A c >> 10 B a >> >> You could produce the indicator matrix and check for unique rows, etc. >> - I will show that below - but all you need is the interaction of the >> two factors >> >> > df$f12 <- with(df, f1:f2)[drop = TRUE] >> > df >> >> f1 f2 f12 >> 1 A a A:a >> 2 A c A:c >> 3 B a B:a >> 4 A b A:b >> 5 B a B:a >> 6 A b A:b >> 7 A a A:a >> 8 A b A:b >> 9 A c A:c >> 10 B a B:a> str(df) >> >> 'data.frame': 10 obs. of 3 variables: >> $ f1 : Factor w/ 2 levels "A","B": 1 1 2 1 2 1 1 1 1 2 >> $ f2 : Factor w/ 3 levels "a","b","c": 1 3 1 2 1 2 1 2 3 1 >> $ f12: Factor w/ 4 levels "A:a","A:b","A:c",..: 1 3 4 2 4 2 1 2 3 4 >> >> > table(df$f12) >> >> A:a A:b A:c B:a >> 2 3 2 3> as.numeric(df$f12) >> >> [1] 1 3 4 2 4 2 1 2 3 4 >> >> Notice that this shows you that there are four distinct combinations >> that occur 2, 3, 2 and 3 times respectively; the first combination >> occurs in rows 1 and 7, it consists of the first level of f1 and the >> first level of f2, etc. >> >> If you really do want the indicator matrix you could generate it as >> >> > (ind <- cbind(model.matrix(~ 0 + f1, df), model.matrix(~ 0 + f2, df))) >> >> f1A f1B f2a f2b f2c >> 1 1 0 1 0 0 >> 2 1 0 0 0 1 >> 3 0 1 1 0 0 >> 4 1 0 0 1 0 >> 5 0 1 1 0 0 >> 6 1 0 0 1 0 >> 7 1 0 1 0 0 >> 8 1 0 0 1 0 >> 9 1 0 0 0 1 >> 10 0 1 1 0 0> unique(ind) >> >> f1A f1B f2a f2b f2c >> 1 1 0 1 0 0 >> 2 1 0 0 0 1 >> 3 0 1 1 0 0 >> 4 1 0 0 1 0 >> >> but working with the factors is generally much simpler than working >> with the indicators. >> >> >> >> > # Indicator matrix >> > A <- data.frame(lapply(data.frame(obj), as.factor)) >> >> > nocases <- dim(obj)[1] >> > novars <- dim(obj)[2] >> >> > # variable levels >> > levels.n <- sapply(obj, nlevels) >> > n <- cumsum(levels.n) >> >> > # Indicator matrix calculations >> > Z <- matrix(0, nrow = nocases, ncol = n[length(n)]) >> > newdat <- lapply(obj, as.numeric) >> > offset <- (c(0, n[-length(n)])) >> > for (i in 1:novars) >> > Z[1:nocases + (nocases * (offset[i] + newdat[[i]] - 1))] <- 1 >> >> > ####### >> >> > Output: >> >> > Z = >> >> > [,1] [,2] [,3] [,4] [,5] >> > [1,] 1 0 1 0 0 >> > [2,] 1 0 0 0 1 >> > [3,] 0 1 1 0 0 >> > [4,] 1 0 0 1 0 >> > [5,] 0 1 1 0 0 >> > [6,] 1 0 0 1 0 >> > [7,] 1 0 1 0 0 >> > [8,] 1 0 0 1 0 >> > [9,] 1 0 0 0 1 >> > [10,] 0 1 1 0 0 >> >> > Z is an indicator matrix in the Multiple Correspondence Analysis >> > framework. >> > My problem is to collapse identical rows (e.g. 2 and 9) into a single >> > row and >> > store the row ids. >> >> > ______________________________________________ >> > [EMAIL PROTECTED] mailing list >> >https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> >> ______________________________________________ >> [EMAIL PROTECTED] mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > Angelos Markos > Dr. of Applied Informatics, > University of Macedonia, Greece > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.