I'm rather confused by the semantics of factors. When applied to factors, some functions (whose results are elements of the original factor argument) return results of class factor, some return integer vectors, some return character vectors, some give errors. I understand some but not all of this. Consider:
Preserve factors: `[`, `[[`, sort, unique, subset, head, tapply, rep, rev, by, sample, expand.grid, as.matrix(structure(factor(1:3),dim=c(1,3))), data.frame, list Convert to integers: c, ifelse, cbind/rbind Convert to characters: intersect, union, setdiff, matrix, array, matrix(factor(1:3),1,3), as.matrix(factor(1:3)) Gives error: rle No error (output of some other type): <, ==, etc. In the case of ordered factors: Preserve factors: quantile (for exact quantiles only) Gives error: min, cut, range No error: which.min, pmin, rank (But some operations which are meaningful only on ordered factors also give results on unordered factors, without even a warning: which.min, pmin, rank, quantile.) The general principle seems to be that if the result can contain only elements of a single factor, then a factor is returned. I understand this: it may not be meaningful to mingle factors with different level sets. But I don't understand what the problem is with rle. If the result can contain elements from more than one factor, it is still not clear to me what the principle is for determining whether the factors are converted to the integers representing them, or to the characters naming them, or that the operation gives an error. I also don't understand what is going on with min. min is well-defined for any class supporting a < operator, but though < works on ordered factors as do pmin, rank, etc., min does not. And equally strangely, which.min and rank blithely convert *un*ordered factors to the integers which happen to represent them, returning what are presumably meaningless results without giving an error; while pmin appropriately gives an error. It is all very confusing. Of course, most of this behavior is documented and is easily determined by experimentation, but it would be easier to learn and teach the language if there were some clear principle underlying all this. What am I missing? -s ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.