I think there's a pretty simple solution here, though probably not the most efficient:
t(sapply(split(a,a$ID), function(q) with(q,c(ID=unique(ID),x=unique(x),y=max(y)-min(y))))) Using 'unique' instead of min or [[1]] has the advantage that if x is in fact not time-invariant, this gives an error rather than silently ignore inconsistencies. Trying to package up this idiom into a function leads to: select <- function(df, groupby, selection) { pf <- parent.frame() fields <- substitute(selection) t(sapply(split(df,eval(substitute(groupby),df,enclos=pf)), function(q) eval(fields,q,enclos=pf))) } which I admit is rather ugly (and does no error-checking), but it does work: > select(a,ID,list(min(ID),unique(x),max(y)-min(y))) [,1] [,2] [,3] 1 1 10 20 2 2 12 15 3 3 5 5 Perhaps some of the more experienced people on the list could show me how to write this more cleanly. -s On Fri, Jan 2, 2009 at 4:20 AM, gallon li <gallon...@gmail.com> wrote: > I have the following data > > ID x y time > 1 10 20 0 > 1 10 30 1 > 1 10 40 2 > 2 12 23 0 > 2 12 25 1 > 2 12 28 2 > 2 12 38 3 > 3 5 10 0 > 3 5 15 2 > ..... > > x is time invariant, ID is the subject id number, y is changing over time. > > I want to find out the difference between the first and last observed y > value for each subject and get a table like > > ID x y > 1 10 20 > 2 12 15 > 3 5 5 > ...... > > Is there any easy way to generate the data set? > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.