On Nov 16, 2012, at 8:26 AM, Sarah Goslee <sarah.gos...@gmail.com> wrote:

> Hi Peter,
> 
> On Fri, Nov 16, 2012 at 9:04 AM, Peter Kupfer <peter.kup...@me.com> wrote:
>> Dear all,
>> maybe a simple problem but I found no solution for my problem.
>> I have a matrix Y with 23 000 rows and 220 colums. The entries are "A", "B" 
>> or "C".
> 
> A reproducible example with sample data is helpful.
> 
>> I want to extract all rows (as a matrix ) of the matrix Y where all entries 
>> of a row are (for example) "A".
> 
> Really? Why not just make a new matrix with the right number of "A" values?
> 
>> Is there any solution? I tried the stringr- package but i doesn't work out.
> 
> Of course there is. Here's one option. But I'm not sure you've really
> stated your actual problem. This extracts the rows where all values
> are "A", and might at least get you started toward your real problem.
> 
> testdata <- matrix(c(
> "A", "B", "C",
> "B", "B", "B",
> "C", "A", "A",
> "A", "A", "A"),
> ncol=3, byrow=TRUE)
> 
> testdata.A <- testdata[apply(testdata, 1, function(x)all(x == "A")), ,
> drop=FALSE]


Using something like rowSums() might be faster in this case, based upon brief 
testing. 

Since using a boolean returns TRUE/FALSE, which have numeric equivalent values 
of 1/0, respectively, you can subset the matrix based upon the rowSums() values 
being equal to the number of columns in the matrix, which indicates that all 
values in the row match your desired value.


# Create a 230000 * 220 matrix with random values.
set.seed(1)
testdata <- matrix(sample(c("A", "B", "C"), 23000*220, replace = TRUE), ncol = 
220)

# Set 100 random rows to all "A"s
set.seed(2)
testdata[sample(23000, 100), ] <- rep("A", 220)


> system.time(Sub1 <-testdata[apply(testdata, 1, function(x)all(x == "A")), 
> ,drop = FALSE])
   user  system elapsed 
  0.454   0.047   0.503 


> system.time(Sub2 <- testdata[rowSums(testdata == "A") == ncol(testdata), , 
> drop = FALSE])
   user  system elapsed 
  0.089   0.001   0.090 


> str(Sub1)
 chr [1:100, 1:220] "A" "A" "A" "A" "A" "A" "A" "A" ...

> str(Sub2)
 chr [1:100, 1:220] "A" "A" "A" "A" "A" "A" "A" "A" ...


> identical(Sub1, Sub2)
[1] TRUE


See ?rowSums, which uses a .Internal, so is fast code.

Regards,

Marc Schwartz

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to