One approach to such a problem would be to use a logical vector inside
the function colSums.
?colSums
> DF <- data.frame(XX= runif(20), YY=runif(20))
> colSums(DF > 0.5)
XX YY
11 9
> colSums(DF > -Inf)
XX YY
20 20
>
> colSums(DF> 0.5)/colSums(DF > -Inf) #could have used DF >= min(DF)
in the denominator
XX YY
0.55 0.45
--
David Winsemius
On Jan 28, 2009, at 11:11 AM, Mark Na wrote:
Hi R-helpers,
I've been struggling with a problem for most of the day (!) so am
finally
resorting to R-help.
I would like to subset the columns of my dataframe based on the
frequency
with which the columns contain non-zero values. For example, let's
say that
I want to retain only those columns which contain non-zero values in
at
least 1% of their rows.
In Excel I would calculate a row at the bottom of my data sheet and
use the
following function
=countif(range,">0")
to identify the number of non-zero cells in each column. Then, I would
divide that by the number of rows to obtain the frequency of non-
zero values
in each column. Then, I would delete those columns with frequencies
< 0.01.
I don't think that would do what you describe unless you were only
working with single column ranges. Functions on ranges in Excel are
not calculated by column.
But, I'd like to do this in R. I think the missing link is an analog
to
Excel's countif function. Any ideas?
Thanks! Mark
[[alternative HTML version deleted]]
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.