[Rd] Aggregate dataframe variables, return more than 2 vars

violet lock Tue, 09 Feb 2010 14:04:50 -0800

Hello r-devel,
I have data.frame with 3 columns and I would like to group by 1 column(id),
find the max of the third column (date) and return the data for that max
date value along with the id and the value in the second column.


Example:
>dat <- data.frame(id = rep(1:3, 3), date = as.Date(rep(c("2005-08-25",
"2005-08-26", "2005-08-29"), each = 3)), decod = c("SCREEN", "SCREEN",
"SCREEN", "RAND", "RAND", "RAND", "COMPLETE", "COMPLETE", "WITHDRAWAL")  )


What I need is it to return is:



  id x.decod.1.                  end

1  1     COMPLETE         2005-08-29

2  2     COMPLETE         2005-08-29

3  3     WITHDRAWAL   2005-08-29




I can get the max date and the id 2 different ways:


> do.call("rbind", lapply(split(dat, dat$id), function(x) data.frame(id =
x$id[1], max_date = max(x$date))))

  id        end

1  1 2005-08-29

2  2 2005-08-29

3  3 2005-08-29


OR
> aggregate(dat$date, list(USUBJID=dat$id),FUN="max")

  USUBJID     x

1       1 13024

2       2 13024

3       3 13024

(which oddly returns some number of days after 1-1-1970 iso of the max as a
date value)





Id like to do this without looping or filtering for date and usubjid if
possible.   If there is a way to return the index from the max date function
that I can then use to index the data.frame?  I came across a function
dapply which looks like it might work but unfortunately the package isnt
one I can install in the near future due to some company restrictions.

Any ideas would be appreciated,
VL

        [[alternative HTML version deleted]]

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Aggregate dataframe variables, return more than 2 vars

Reply via email to