Wow.. thanks for the deluge of responses! Aggregate seems like the way to go here.
But, suppose that instead of integers in column V2, I actually have dates (and instead of keeping the minimum integer, I want to keep the earliest date): > df = > data.frame(V1=c(1,1,2,3,1,2),V2=c('2002-03-13','1989-03-10','1988-01-20','1997-05-15','1996-11-18','2000-01-12')); > aggregate(df$V2,df['V1'],min) #thanks, Phil Error in Summary.factor(c(6L, 2L, 3L), na.rm = FALSE) : min not meaningful for factors So I checked: > min(df$V2) Error in Summary.factor(c(6L, 2L, 1L, 4L, 3L, 5L), na.rm = FALSE) : min not meaningful for factors This can be fixed with: > min(as.Date(df$V2)) [1] "1988-01-20" However, I get a strange response when I try: > aggregate(as.Date(df$V2),df['V1'],min) V1 x 1 1 7008 2 2 6593 3 3 9996 What I'm looking for, of course, is: > aggregate(as.Date(df$V2),df['V1'],min) V1 V2 1 1 1996-11-18 2 2 1988-01-20 3 3 1997-05-15 Any idea how to produce the desired output? Thanks, Jonathan On Mon, Feb 8, 2010 at 12:11 PM, Gabor Grothendieck <ggrothendi...@gmail.com> wrote: > Here are 3 solutions assuming DF contains the data frame: > >> # 1. aggregate >> aggregate(DF[2], DF[1], min) > V1 V2 > 1 a 2 > 2 b 9 > 3 c 4 > >> # 2. aggregate.formula - requires R 2.11.x >> aggregate(V2 ~ V1, DF, min) > V1 V2 > 1 a 2 > 2 b 9 > 3 c 4 > >> # 3. SQL using sqldf >> library(sqldf) >> sqldf("select V1, min(V2) V2 from DF group by V1") > V1 V2 > 1 a 2 > 2 b 9 > 3 c 4 > >> # 4. summaryBy in the doBy package >> library(doBy) >> summaryBy(V2 ~., DF, FUN = min, keep.names = TRUE) > V1 V2 > 1 a 2 > 2 b 9 > 3 c 4 > > On Mon, Feb 8, 2010 at 11:39 AM, Jonathan <jonsle...@gmail.com> wrote: >> Hi all, >> I'm feeling a little guilty to ask this question, since I've >> written a solution using a rather clunky for loop that gets the job >> done. But I'm convinced there must be a faster (and probably more >> elegant) way to accomplish what I'm looking to do (perhaps using the >> "merge" function?). I figured somebody out there might've already >> figured this out: >> >> I have a dataframe with two columns (let's call them V1 and V2). All >> rows are unique, although column V1 has several redundant entries. >> >> Ex: >> >> V1 V2 >> 1 a 3 >> 2 a 2 >> 3 b 9 >> 4 c 4 >> 5 a 7 >> 6 b 11 >> >> >> What I'd like is to return a dataframe cut down to have only unique >> entires in V1. V2 should contain a vector, for each V1, that is the >> minimum of all the possible choices from the set of redundant V1's. >> >> Example output: >> >> V1 V2 >> 1 a 2 >> 2 b 9 >> 3 c 4 >> >> >> If somebody could (relatively easily) figure out how to get closer to >> a solution, I'd appreciate hearing how. Also, I'd be interested to >> hear how you came upon the answer (so I can get better at searching >> the R resources myself). >> >> Regards, >> Jonathan >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.