On Mon, Sep 28, 2009 at 5:03 PM, Raymond Danner <rdan...@vt.edu> wrote: > Dear Community, > > I have a data set with two columns, bird number and mass. Individual birds > were captured 1-13 times and weighed each time. I would like to remove > those individuals that were captured only once, so that I can assess mass > variability per bird. Iąve tried many approaches with no success. Can > anyone recommend a way to remove individuals that were captured only once?
Approach this one step at a time. My sample data is: > wts bird mass 1 1 2.3 2 1 3.2 3 1 2.1 4 2 1.2 5 3 5.4 6 3 4.5 7 3 4.4 8 4 3.2 how many times was each bird measured? Use table() > table(wts$bird) 1 2 3 4 3 1 3 1 table uses the row.names() function to get the row names of the original dataframe, so we want the row names where the count is greater than one: > row.names(table(wts$bird))[table(wts$bird)>1] [1] "1" "3" [This calls 'table' twice, so you might want to save the table to a new object] Now we want all the rows of our original dataframe where the bird number is in that set, so we select rows using %in%: > wts[wts$bird %in% row.names(table(wts$bird))[table(wts$bird)>1],] bird mass 1 1 2.3 2 1 3.2 3 1 2.1 5 3 5.4 6 3 4.5 7 3 4.4 Looks a bit messy, I'm not pleased with myself... Must be a better way... Aha! A table-free way of computing the bird counts is: > unique(wts$bird[duplicated(wts$bird)]) [1] 1 3 So you could do: > wts[wts$bird %in% unique(wts$bird[duplicated(wts$bird)]),] bird mass 1 1 2.3 2 1 3.2 3 1 2.1 5 3 5.4 6 3 4.5 7 3 4.4 which looks a bit neater! You might want to unravel unique(wts$bird[duplicated(wts$bird)]) to see what the various bits do. And read the help pages. TMTOWTDI, as they say. Barry ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.