On Nov 16, 2012, at 8:26 AM, Sarah Goslee <sarah.gos...@gmail.com> wrote:
> Hi Peter, > > On Fri, Nov 16, 2012 at 9:04 AM, Peter Kupfer <peter.kup...@me.com> wrote: >> Dear all, >> maybe a simple problem but I found no solution for my problem. >> I have a matrix Y with 23 000 rows and 220 colums. The entries are "A", "B" >> or "C". > > A reproducible example with sample data is helpful. > >> I want to extract all rows (as a matrix ) of the matrix Y where all entries >> of a row are (for example) "A". > > Really? Why not just make a new matrix with the right number of "A" values? > >> Is there any solution? I tried the stringr- package but i doesn't work out. > > Of course there is. Here's one option. But I'm not sure you've really > stated your actual problem. This extracts the rows where all values > are "A", and might at least get you started toward your real problem. > > testdata <- matrix(c( > "A", "B", "C", > "B", "B", "B", > "C", "A", "A", > "A", "A", "A"), > ncol=3, byrow=TRUE) > > testdata.A <- testdata[apply(testdata, 1, function(x)all(x == "A")), , > drop=FALSE] Using something like rowSums() might be faster in this case, based upon brief testing. Since using a boolean returns TRUE/FALSE, which have numeric equivalent values of 1/0, respectively, you can subset the matrix based upon the rowSums() values being equal to the number of columns in the matrix, which indicates that all values in the row match your desired value. # Create a 230000 * 220 matrix with random values. set.seed(1) testdata <- matrix(sample(c("A", "B", "C"), 23000*220, replace = TRUE), ncol = 220) # Set 100 random rows to all "A"s set.seed(2) testdata[sample(23000, 100), ] <- rep("A", 220) > system.time(Sub1 <-testdata[apply(testdata, 1, function(x)all(x == "A")), > ,drop = FALSE]) user system elapsed 0.454 0.047 0.503 > system.time(Sub2 <- testdata[rowSums(testdata == "A") == ncol(testdata), , > drop = FALSE]) user system elapsed 0.089 0.001 0.090 > str(Sub1) chr [1:100, 1:220] "A" "A" "A" "A" "A" "A" "A" "A" ... > str(Sub2) chr [1:100, 1:220] "A" "A" "A" "A" "A" "A" "A" "A" ... > identical(Sub1, Sub2) [1] TRUE See ?rowSums, which uses a .Internal, so is fast code. Regards, Marc Schwartz ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.