summary: The function 'by' inconsistently strips class from the data to which it is applied.
quick reason: tapply strips class when simplify is set to TRUE (the default) due to the class stripping behaviour of unlist. quick answer: This can be fixed by invoking tapply with simplify=FALSE, or changing tapply to use do.call(c instead of unlist executable example: mytimes=data.frame(date = 1:3 + Sys.time(), set = c(1,1,2)) by(mytimes$date, mytimes$set, function(x)x) INDICES: 1 [1] "2008-04-15 11:41:38 BST" "2008-04-15 11:41:39 BST" ---------------------------------------------------------------------------------------- INDICES: 2 [1] "2008-04-15 11:41:40 BST" by(mytimes[1,]$date, mytimes[1,]$set, function(x)x) INDICES: 1 [1] 1208256099 why this is a problem: This is a problem when you are feeding the output of this by into a function which expects the class to be maintained. I see this problem when constructing reason: tapply strips class when simplify is set to TRUE (the default) due to the behaviour of unlist: "Where possible the list elements are coerced to a common mode during the unlisting, and so the result often ends up as a character vector. Vectors will be coerced to the highest type of the components in the hierarchy NULL < raw < logical < integer < real < complex < character < list < expression: pairlists are treated as lists." solution: This problem can be fixed in the function by.data.frame by modifying the call to tapply in the function "by": by.data.frame = function (data, INDICES, FUN, ...) { if (!is.list(INDICES)) { IND <- vector("list", 1) IND[[1]] <- INDICES names(IND) <- deparse(substitute(INDICES))[1] } else IND <- INDICES FUNx <- function(x) FUN(data[x, ], ...) nd <- nrow(data) <<<< ans <- eval(substitute(tapply(1:nd, IND, FUNx)), data) ==== ans <- eval(substitute(tapply(1:nd, IND, FUNx, simplify=FALSE)), data) >>>> attr(ans, "call") <- match.call() class(ans) <- "by" ans } alternative solution: the call in tapply to unlist(ans, recursive=F) can be replaced by do.call(c,ans, recursive=F) to fix this issue, since c does not strip class. However, I haven't taken the time to work out if this will work in all cases. for example: function (X, INDEX, FUN = NULL, ..., simplify = TRUE) { FUN <- if (!is.null(FUN)) match.fun(FUN) if (!is.list(INDEX)) INDEX <- list(INDEX) nI <- length(INDEX) namelist <- vector("list", nI) names(namelist) <- names(INDEX) extent <- integer(nI) nx <- length(X) one <- 1L group <- rep.int(one, nx) ngroup <- one for (i in seq.int(INDEX)) { index <- as.factor(INDEX[[i]]) if (length(index) != nx) stop("arguments must have same length") namelist[[i]] <- levels(index) extent[i] <- nlevels(index) group <- group + ngroup * (as.integer(index) - one) ngroup <- ngroup * nlevels(index) } if (is.null(FUN)) return(group) ans <- lapply(split(X, group), FUN, ...) index <- as.integer(names(ans)) if (simplify && all(unlist(lapply(ans, length)) == 1)) { ansmat <- array(dim = extent, dimnames = namelist) <<<< ans <- unlist(ans, recursive = FALSE) ==== ans <- do.call(c, ans, recursive = FALSE) >>>> } else { ansmat <- array(vector("list", prod(extent)), dim = extent, dimnames = namelist) } if (length(index)) { names(ans) <- NULL ansmat[index] <- ans } ansmat } Alexander Brown Principal Engineer Transitive Maybrook House, 40 Blackfriars Street, Manchester M3 2EG Phone: +44 (0)161 836 2321 Fax: +44 (0)161 836 2399 Mobile: +44 (0)7980 708 221 www.transitive.com * The leader in cross-platform virtualization ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.