With > R.version.string [1] "R Under development (unstable) (2013-01-26 r61752)"
'split.default' recycles a short factor for unclassed 'x', but not for an instance of x that is a class
> split(1:5, 1:2) $`1` [1] 1 3 5 $`2` [1] 2 4 Warning message: In split.default(1:5, 1:2) : data length is not a multiple of split variable > x = structure(1:5, class="A") > split(x, 1:2) $`1` [1] 1 $`2` [1] 2 Also, this is inconsistent with split<-, which does have recycling > split(x, 1:2) <- 1:2 Warning message: In split.default(seq_along(x), f, drop = drop, ...) : data length is not a multiple of split variable > x [1] 1 2 1 2 1 attr(,"class") [1] "A"A solution is to change a call to seq_along(f) toward the end of split.default to seq_along(x).
@@ -32,7 +32,7 @@
lf <- levels(f)
y <- vector("list", length(lf))
names(y) <- lf
- ind <- .Internal(split(seq_along(f), f))
+ ind <- .Internal(split(seq_along(x), f))
for(k in lf) y[[k]] <- x[ind[[k]]]
y
}
Maybe a little harder to argue the following, but in split.default, for a class
that one might wish to develop factor-like behaviour, e.g.,
Rle = setClass("Rle", representation(values="integer", lengths="integer"))
f = Rle(values=1:2, lengths=2:3)
the code
if (is.list(f))
f <- interaction(f, drop = drop, sep = sep)
else if (drop || !is.factor(f))
f <- factor(f)
requires that one make factor a generic and develop a method for factor.Rle.
This contradicts the documentation
f: a ‘factor’ in the sense that ‘as.factor(f)’ defines the
grouping, or a list of such factors in which case their
interaction is used for the grouping.
and perhaps the more common (?) pattern of coercion using as.*. A solution is to
make as.factor a generic and revises the code above to use something like
if (is.list(f)) f <- interaction(f, drop = drop, sep = sep)
else if (!is.factor(f)) f <- as.factor(f)
else if (drop) f <- factor(f)
One then gets split behaviour if there is an as.factor.Rle method
as.factor.Rle <- function(x, ...)
factor(rep(x@values, x@lengths), levels=unique(x@values))
setAs("Rle", "factor", function(from) as.factor.Rle(from))
These more elaborate changes are in the attached diff.
Martin
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
split.diff.tar.gz
Description: GNU Zip compressed data
______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
