Re: [R] Logical subset of the columns in a dataframe

David Winsemius Wed, 28 Jan 2009 08:56:59 -0800

One approach to such a problem would be to use a logical vector insidethe function colSums.


?colSums


> DF <- data.frame(XX= runif(20), YY=runif(20))

> colSums(DF > 0.5)
XX YY
11  9

> colSums(DF > -Inf)
XX YY
20 20
>

> colSums(DF> 0.5)/colSums(DF > -Inf) #could have used DF >= min(DF)in the denominator

  XX   YY
0.55 0.45



--
David Winsemius

On Jan 28, 2009, at 11:11 AM, Mark Na wrote:

Hi R-helpers,
I've been struggling with a problem for most of the day (!) so amfinally
resorting to R-help.
I would like to subset the columns of my dataframe based on thefrequencywith which the columns contain non-zero values. For example, let'ssay thatI want to retain only those columns which contain non-zero values inat
least 1% of their rows.
In Excel I would calculate a row at the bottom of my data sheet anduse the
following function

=countif(range,">0")

to identify the number of non-zero cells in each column. Then, I would
divide that by the number of rows to obtain the frequency of non-zero valuesin each column. Then, I would delete those columns with frequencies< 0.01.

I don't think that would do what you describe unless you were onlyworking with single column ranges. Functions on ranges in Excel arenot calculated by column.

But, I'd like to do this in R. I think the missing link is an analogto

Excel's countif function. Any ideas?

Thanks! Mark

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Logical subset of the columns in a dataframe

Reply via email to