Dear all, First, let's create some data to play around:
set.seed(1) (df <- data.frame(Group=rep(c("Group1","Group2","Group3"), each=10), Value=c(rexp(10, 1), rexp(10, 4), rexp(10, 10)))[sample(1:30,30),]) ## Now we need the empirical distribution function: edf <- function(x) ecdf(x)(x) # empirical distribution function evaluated at x ## The big question is how one can apply the empirical distribution function to ## each subset of df determined by "Group", so how to apply it to Group1, then ## to Group2, and finally to Group3. You might suggest (?) to use tapply: (edf. <- tapply(df$Value, df$Group, FUN=edf)) ## That's correct. But typically, one would like to obtain not only the values, ## but a data.frame containing the original information and the new (edf-)values. ## What's a simple way to get this? (one would be required to first sort df ## according to Group, then paste the values computed by edf to the sorted df; ## seems a bit tedious). ## A solution I have is the following (but I would like to know if there is a ## simpler one): (edf.. <- do.call("rbind", lapply(unique(df$Group), function(strg){ subdata <- subset(df, Group==strg) # sub-data subdata <- cbind(subdata, edf=edf(subdata$Value)) })) ) Cheers, Marius ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.