Using the generalized inner product defined in this post: https://www.stat.math.ethz.ch/pipermail/r-help/2006-July/109311.html
try this: cbind(S, d = rowSums(inner(S, obs, identical))) On Fri, Oct 16, 2009 at 4:29 AM, Robin Hankin <rk...@cam.ac.uk> wrote: > Hi > > I want a generalization of tabulate() which works on rows of a matrix. > Suppose I have an integer matrix 'observation': > >> observation > > y1 y2 y3 > 1 4 0 > 1 4 0 > 2 0 3 > 4 1 0 > 0 5 0 > 0 1 4 > 2 0 3 > > Each row corresponds to a (multivariate) observation. Note that the > first two rows are identical: this means that data "c(1,4,0)" was > observed twice. > > Now suppose I can list the sample space: > >> S > [1,] 5 0 0 > [2,] 4 1 0 > [3,] 3 2 0 > [4,] 2 3 0 > [5,] 1 4 0 > [6,] 0 5 0 > [7,] 4 0 1 > [8,] 3 1 1 > [9,] 2 2 1 > [10,] 1 3 1 > [11,] 0 4 1 > [12,] 3 0 2 > [13,] 2 1 2 > [14,] 1 2 2 > [15,] 0 3 2 > [16,] 2 0 3 > [17,] 1 1 3 > [18,] 0 2 3 > [19,] 1 0 4 > [20,] 0 1 4 > [21,] 0 0 5 > > (thus each row corresponds to a point in my sample space). > > Now what I need to do is to construct a new matrix, which uses the > 'observation' matrix above, which is a sort of table: > >> desired > > y1 y2 y3 d > [1,] 5 0 0 0 > [2,] 4 1 0 1 > [3,] 3 2 0 0 > [4,] 2 3 0 0 > [5,] 1 4 0 2 > [6,] 0 5 0 1 > [7,] 4 0 1 0 > [8,] 3 1 1 0 > [9,] 2 2 1 0 > [10,] 1 3 1 0 > [11,] 0 4 1 0 > [12,] 3 0 2 0 > [13,] 2 1 2 0 > [14,] 1 2 2 0 > [15,] 0 3 2 0 > [16,] 2 0 3 2 > [17,] 1 1 3 0 > [18,] 0 2 3 0 > [19,] 1 0 4 0 > [20,] 0 1 4 1 > [21,] 0 0 5 0 > > > Thus the 'd' column counts the number of times that each row occurs in > variable 'observation'. So desired[5,4]=2 because the observation > corresponding to desired[5,1:3] (viz c(1,4,0)) occurred twice. And > desired[1,4]=0 because the observation corresponding to desired[1,1:3] > (viz c(5,0,0)) did not occur once (it was not observed). > > In my application I have dim(S) ~= c(5,4e6). > > I've tried merge(), stack(), reshape(), but the best I can do > is the (derisory): > > require(partitions) > > > obs <- matrix(as.integer(c( > 1, 4, 0, > 1, 4, 0, > 2, 0, 3, > 4, 1, 0, > 0, 5, 0, > 0, 1, 4, > 2, 0, 3 > )),ncol=3,byrow=TRUE) > > S <- t(compositions(5,3)) > d <- rep(0,nrow(S)) > > > for(i in seq_len(nrow(obs))){ > for(j in seq_len(nrow(S))){ > if(all(obs[i,,drop=TRUE] == S[j,,drop=TRUE])){ > d[j] <- d[j]+1 > } > } > } > > S <- cbind(S,d) > > > Anyone got anything better before I try C? > > > -- > Robin K. S. Hankin > Uncertainty Analyst > University of Cambridge > 19 Silver Street > Cambridge CB3 9EP > 01223-764877 > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.