> -----Original Message----- > A consulting client has a large data set with a binary response > (negative) and two factors (ctry and member) which have many levels, but > many occur with very small frequencies. It is far too sparse with a model > like > glm(negative ~ ctry+member, family=binomial). > > For analysis, we'd like to subset the data to include only those that occur > with > frequency greater than a given value
ave() helps with this kind of thing. Something like freq <- ave(1:length(ctry), factor(ctry:member), FUN=length) gives the count for each ctry:member call. Then you can subset a data frame using, for example dfr.subset <- dfr[freq>10, ] The 1:length(ctry) in the ave call is simply because ave wants a numeric there. If all we're doing with it is counting the number, it just has to be a numeric of the same length as your data. in a data frame it can be 1:nrow(dfr) etc. S Ellison ******************************************************************* This email and any attachments are confidential. Any use...{{dropped:8}} ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.