With

> R.version.string
[1] "R Under development (unstable) (2013-01-26 r61752)"

'split.default' recycles a short factor for unclassed 'x', but not for an instance of x that is a class

> split(1:5, 1:2)
$`1`
[1] 1 3 5

$`2`
[1] 2 4

Warning message:
In split.default(1:5, 1:2) :
  data length is not a multiple of split variable
> x = structure(1:5, class="A")
> split(x, 1:2)
$`1`
[1] 1

$`2`
[1] 2

Also, this is inconsistent with split<-, which does have recycling

> split(x, 1:2) <- 1:2
Warning message:
In split.default(seq_along(x), f, drop = drop, ...) :
  data length is not a multiple of split variable
> x
[1] 1 2 1 2 1
attr(,"class")
[1] "A"

A solution is to change a call to seq_along(f) toward the end of split.default to seq_along(x).

@@ -32,7 +32,7 @@
     lf <- levels(f)
     y <- vector("list", length(lf))
     names(y) <- lf
-    ind <- .Internal(split(seq_along(f), f))
+    ind <- .Internal(split(seq_along(x), f))
     for(k in lf) y[[k]] <- x[ind[[k]]]
     y
 }



Maybe a little harder to argue the following, but in split.default, for a class that one might wish to develop factor-like behaviour, e.g.,

  Rle = setClass("Rle", representation(values="integer", lengths="integer"))
  f = Rle(values=1:2, lengths=2:3)

the code

    if (is.list(f))
        f <- interaction(f, drop = drop, sep = sep)
    else if (drop || !is.factor(f))
        f <- factor(f)

requires that one make factor a generic and develop a method for factor.Rle. This contradicts the documentation

       f: a ‘factor’ in the sense that ‘as.factor(f)’ defines the
          grouping, or a list of such factors in which case their
          interaction is used for the grouping.

and perhaps the more common (?) pattern of coercion using as.*. A solution is to make as.factor a generic and revises the code above to use something like

     if (is.list(f)) f <- interaction(f, drop = drop, sep = sep)
     else if (!is.factor(f)) f <- as.factor(f)
     else if (drop) f <- factor(f)

One then gets split behaviour if there is an as.factor.Rle method

    as.factor.Rle <- function(x, ...)
        factor(rep(x@values, x@lengths), levels=unique(x@values))
    setAs("Rle", "factor", function(from) as.factor.Rle(from))

These more elaborate changes are in the attached diff.

Martin
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

Attachment: split.diff.tar.gz
Description: GNU Zip compressed data

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to