Re: [R] subset a data frame by largest frequencies of factors

S Ellison Fri, 06 Mar 2015 02:18:38 -0800


> -----Original Message-----
> A consulting client has a large data set with a binary response
> (negative) and two factors (ctry and member) which have many levels, but
> many occur with very small frequencies.  It is far too sparse with a model 
> like
> glm(negative ~ ctry+member, family=binomial).
> 
> For analysis, we'd like to subset the data to include only those that occur 
> with
> frequency greater than a given value


ave() helps with this kind of thing. 

Something like

freq <- ave(1:length(ctry), factor(ctry:member), FUN=length)

gives the count for each ctry:member call. Then you can subset a data frame 
using, for example

dfr.subset <- dfr[freq>10, ]

The 1:length(ctry) in the ave call is simply because ave wants a numeric there. 
If all we're doing with it is counting the number, it just has to be a numeric 
of the same length as your data. in a data frame it can be 1:nrow(dfr) etc.

S Ellison



*******************************************************************
This email and any attachments are confidential. Any use...{{dropped:8}}

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] subset a data frame by largest frequencies of factors

Reply via email to