>>>>> Gabriel Becker >>>>> on Tue, 29 Oct 2019 12:43:15 -0700 writes:
> Hi all, > So I've started working on this and I ran into something that I didn't > know, namely that for x a multi-dimensional (2+) array, head(x) and tail(x) > ignore dimension completely, treat x as an atomic vector, and return an > (unclassed) atomic vector: Well, that's (3+), not "2+" . But I did write (on Sep 17 in this thread!) > The current source for head() and tail() and all their methods > in utils is just 83 lines of code {file utils/R/head.R minus > the initial mostly copyright comments}. and if've ever looked at these few dozen of R code lines, you'll have seen that we just added two simple utilities with a few reasonable simple methods. To treat non-matrix (i.e. non-2d) arrays as vectors, is typically not unreasonable in R, but indeed with your proposals (in this thread), such non-2d arrays should be treated differently either via new head.array() / tail.array() methods ((or -- only if it can be done more nicely -- by the default method)). Note however the following historical quirk : > sapply(setNames(,1:5), function(K) inherits(array(pi, dim=1:K), "array")) 1 2 3 4 5 TRUE FALSE TRUE TRUE TRUE (Is this something we should consider changing for R 4.0.0 -- to have it TRUE also for 2d-arrays aka matrix objects ??) The consequence of that is that currently, "often" foo.matrix is just a copy of foo.array in the case the latter exists: "base" examples: foo in {unique, duplicated, anyDuplicated}. So I propose you change current head.matrix and tail.matrix to head.array and tail.array (and then have head.matrix <- head.array etc, at least if the above quirk must remain, or remains (which I currently guess to be the case)). >> x = array(100, c(4, 5, 5)) >> dim(x) > [1] 4 5 5 >> head(x, 1) > [1] 100 >> class(head(x)) > [1] "numeric" > (For a 1d array, it does return another 1d array). > When extending head/tail to understand multiple dimensions as discussed in > this thread, then, should the behavior for 2+d arrays be explicitly > retained, or should head and tail do the analogous thing (with a head(<2d array> ) behaving the same as head(<matrix>), which honestly is what I > expected to already be happening)? > Are people using/relying on this behavior in their code, and if so, why/for > what? > Even more generally, one way forward is to have the default methods check > for dimensions, and use length if it is null: > tail.default <- tail.data.frame <- function(x, n = 6L, ...) > { > if(any(n == 0)) > stop("n must be non-zero or unspecified for all dimensions") > if(!is.null(dim(x))) > dimsx <- dim(x) > else > dimsx <- length(x) > ## this returns a list of vectors of indices in each > ## dimension, regardless of length of the the n > ## argument > sel <- lapply(seq_along(dimsx), function(i) { > dxi <- dimsx[i] > ## select all indices (full dim) if not specified > ni <- if(length(n) >= i) n[i] else dxi > ## handle negative ns > ni <- if (ni < 0L) max(dxi + ni, 0L) else min(ni, dxi) > seq.int(to = dxi, length.out = ni) > }) > args <- c(list(x), sel, drop = FALSE) > do.call("[", args) > } > I think this precludes the need for a separate data.frame method at all, > actually, though (I would think) tail.data.frame would still be defined and > exported for backwards compatibility. (the matrix method has some extra > bits so my current conception of it is still separate, though it might not > NEED to be). > The question then becomes, should head/tail always return something with > the same dimensionally (number of dims) it got, or should data.frame and > matrix be special cased in this regard, as they are now? > What are people's thoughts? > ~G > [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel