On Wed, Feb 11, 2009 at 6:32 AM, cruz <crua...@gmail.com> wrote: > Hi, > > I have a big matrix X with M rows and N columns that I want to filter > it into smaller ones with m (<M) rows and N columns. > The filter rule is based on the values of each columns, i.e. > > X looks like this: > column name: a, b, c, d, ... etc > > a b c d ... > 1 2 3 4 ... > 5 6 7 8 ... > 9 8 7 6 ... > ... ... ... ... > > The filter rule with the result that I want is: > > X[X$a<5 & X$b<5 & X$c<5 & X$d<5 ...etc ,] > X[X$a<5 & X$b<5 & X$c<5 & X$d>=5 ...etc ,] > X[X$a<5 & X$b<5 & X$c>=5 & X$d<5 ...etc ,] > ... ... ... > ... > > with all the possible combinations which is 2^M > > I try to use multiple for loops to separate it: > > for (i in 1:2) > for (j in 1:2) > for (k in 1:2) > ... ... > assign(paste(i,j,k,...,sep="")), X[if (i==1) paste("X$a<5") > else paste("X$a>=5") & if (i==1) paste("X$b<5") else paste("X$b>=5") & > ..., ]) > > # there might be syntax errors, I just want to clearly describe my problem > > Since paste("X$a>=5") gives type of character; whereas the type of > X$a>=5 should be logical. > > How can I do this?
Wow. It's hard to know where to begin to comment on this. Generally we recommend taking the "whole object" approach when possible. For example Xl <- X < 5 performs all the comparisons in one go, returning a logical matrix of the same dimension as X. If you want only those rows of X in which every element is less than 5 you could take the matrix X < 5 and "apply" the "&" operator across the rows. There is a complication here in that "&" is a binary operator, not a summary function but for logical values the "prod" summary function has the same effect as reduction by "&" as long as you convert the result back to a logical value. That is X[as.logical(apply(Xl, 1, prod)),] However, even before considering that aspect of the calculation it would be best to back up and consider how you would store the result and what you would do with it once you got it. I really would recommend that you think about how you are approaching the larger problem of which, I assume, this represents one step. You are trying to do something difficult and the code you have outlined indicates that you have not yet achieved fluency in R. If indeed this approach is the best approach to the problem then you should spend some time reading up on R programming (Robert Gentleman's book "R Programming for Bioinformatics" would be a good starting point I think) to save yourself a lot of grief. For example, paste("foo") is simply "foo". The "$" operator extracts a component by name but the name must be a symbol, not the value of a variable. If you want a named component where the name is the value of a variable you must use x[[nm]]. When you find yourself trying to describe an algorithm as a set of nested loops where the number of loops is variable you need to rethink the algorithm. > All thoughts are greatly appreciated. > > Many Thanks, > cruz > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.