Re: [Rd] RFC: tapply(*, ..., init.value = NA)

William Dunlap via R-devel Thu, 26 Jan 2017 07:49:11 -0800

It would be cool if the default for tapply's init.value could be
FUN(X[0]), so it would be 0 for FUN=sum or FUN=length, TRUE for
FUN=all, -Inf for FUN=max, etc.  But that would take time and would
break code for which FUN did not work on length-0 objects.
Bill Dunlap
TIBCO Software
wdunlap tibco.com



On Thu, Jan 26, 2017 at 2:42 AM, Martin Maechler
<maech...@stat.math.ethz.ch> wrote:
> Last week, we've talked here about "xtabs(), factors and NAs",
>  ->  https://stat.ethz.ch/pipermail/r-devel/2017-January/073621.html
>
> In the mean time, I've spent several hours on the issue
> and also committed changes to R-devel "in two iterations".
>
> In the case there is a *Left* hand side part to xtabs() formula,
> see the help page example using 'esoph',
> it uses  tapply(...,  FUN = sum)   and
> I now think there is a missing feature in tapply() there, which
> I am proposing to change.
>
> Look at a small example:
>
>> D2 <- data.frame(n = gl(3,4), L = gl(6,2, labels=LETTERS[1:6]), 
>> N=3)[-c(1,5), ]; xtabs(~., D2)
> , , N = 3
>
>    L
> n   A B C D E F
>   1 1 2 0 0 0 0
>   2 0 0 1 2 0 0
>   3 0 0 0 0 2 2
>
>> DN <- D2; DN[1,"N"] <- NA; DN
>    n L  N
> 2  1 A NA
> 3  1 B  3
> 4  1 B  3
> 6  2 C  3
> 7  2 D  3
> 8  2 D  3
> 9  3 E  3
> 10 3 E  3
> 11 3 F  3
> 12 3 F  3
>> with(DN, tapply(N, list(n,L), FUN=sum))
>    A  B  C  D  E  F
> 1 NA  6 NA NA NA NA
> 2 NA NA  3  6 NA NA
> 3 NA NA NA NA  6  6
>>
>
> and as you can see, the resulting matrix has NAs, all the same
> NA_real_, but semantically of two different kinds:
>
> 1) at ["1", "A"], the  NA  comes from the NA in 'N'
> 2) all other NAs come from the fact that there is no such factor combination
>    *and* from the fact that tapply() uses
>
>    array(dim = .., dimnames = ...)
>
> i.e., initializes the array with NAs  (see definition of 'array').
>
> My proposition is the following patch to  tapply(), adding a new
> option 'init.value':
>
> -----------------------------------------------------------------------------
>
> -tapply <- function (X, INDEX, FUN = NULL, ..., simplify = TRUE)
> +tapply <- function (X, INDEX, FUN = NULL, ..., init.value = NA, simplify = 
> TRUE)
>  {
>      FUN <- if (!is.null(FUN)) match.fun(FUN)
>      if (!is.list(INDEX)) INDEX <- list(INDEX)
> @@ -44,7 +44,7 @@
>      index <- as.logical(lengths(ans))  # equivalently, lengths(ans) > 0L
>      ans <- lapply(X = ans[index], FUN = FUN, ...)
>      if (simplify && all(lengths(ans) == 1L)) {
> -       ansmat <- array(dim = extent, dimnames = namelist)
> +       ansmat <- array(init.value, dim = extent, dimnames = namelist)
>         ans <- unlist(ans, recursive = FALSE)
>      } else {
>         ansmat <- array(vector("list", prod(extent)),
>
> -----------------------------------------------------------------------------
>
> With that, I can set the initial value to '0' instead of array's
> default of NA :
>
>> with(DN, tapply(N, list(n,L), FUN=sum, init.value=0))
>    A B C D E F
> 1 NA 6 0 0 0 0
> 2  0 0 3 6 0 0
> 3  0 0 0 0 6 6
>>
>
> which now has 0 counts and NA  as is desirable to be used inside
> xtabs().
>
> All fine... and would not be worth a posting to R-devel,
> except for this:
>
> The change will not be 100% back compatible -- by necessity: any new argument 
> for
> tapply() will make that argument name not available to be
> specified (via '...') for 'FUN'.  The new function would be
>
>> str(tapply)
> function (X, INDEX, FUN = NULL, ..., init.value = NA, simplify = TRUE)
>
> where the '...' are passed FUN(),  and with the new signature,
> 'init.value' then won't be passed to FUN  "anymore" (compared to
> R <= 3.3.x).
>
> For that reason, we could use   'INIT.VALUE' instead (possibly decreasing
> the probability the arg name is used in other functions).
>
>
> Opinions?
>
> Thank you in advance,
> Martin
>
> ______________________________________________
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] RFC: tapply(*, ..., init.value = NA)

Reply via email to