One approach to such a problem would be to use a logical vector inside the function colSums.

?colSums

> DF <- data.frame(XX= runif(20), YY=runif(20))

> colSums(DF > 0.5)
XX YY
11  9

> colSums(DF > -Inf)
XX YY
20 20
>
> colSums(DF> 0.5)/colSums(DF > -Inf) #could have used DF >= min(DF) in the denominator
  XX   YY
0.55 0.45



--
David Winsemius

On Jan 28, 2009, at 11:11 AM, Mark Na wrote:

Hi R-helpers,

I've been struggling with a problem for most of the day (!) so am finally
resorting to R-help.

I would like to subset the columns of my dataframe based on the frequency with which the columns contain non-zero values. For example, let's say that I want to retain only those columns which contain non-zero values in at
least 1% of their rows.

In Excel I would calculate a row at the bottom of my data sheet and use the
following function

=countif(range,">0")

to identify the number of non-zero cells in each column. Then, I would
divide that by the number of rows to obtain the frequency of non- zero values in each column. Then, I would delete those columns with frequencies < 0.01.

I don't think that would do what you describe unless you were only working with single column ranges. Functions on ranges in Excel are not calculated by column.



But, I'd like to do this in R. I think the missing link is an analog to
Excel's countif function. Any ideas?

Thanks! Mark

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to